✨ feat(fermata): add secret filtering engine — the security brain
Implement Goals 1–3 and 5 from the reveal-layer security brain goal.
fermata now detects, redacts, and scans for secrets in AI agent tool
output, filling the ecosystem gap where no coding agent filters secrets
post-read.
New core/secrets/ module:
- config.rs: .botsecrets TOML format with hierarchical merge and ~40
built-in key patterns
- parser.rs: multi-format secret file parser (.env, TOML, YAML, JSON,
Python assignments, Java properties)
- manifest.rs: file discovery + parsing → known-secrets set
- redactor.rs: Aho-Corasick multi-pattern replacement with 4 styles
- scanner.rs: RegexSet heuristic detection with 35 gitleaks-derived
patterns (MIT) and Shannon entropy filtering
- patterns.rs: curated rules for AWS, GitHub, Stripe, Slack, JWT, etc.
Hook integration:
- fermata hook --event post-tool-use reads tool output, runs redactor +
scanner, returns updatedToolOutput for Claude Code
- Backward compatible: --event pre-tool-use (default) unchanged
- Fail-open: errors produce {} and exit 0
Library API:
- Redactor::new(manifest, style).redact(text) → RedactedText
- Scanner::new(config).scan(text) → Vec<Finding>
- Compiles without CLI feature for embedding in other crates
195 tests (130 new), all passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -1,19 +1,26 @@
|
||||
# Package: dirigent_fermata
|
||||
|
||||
Harness-agnostic policy gate for AI coding agents.
|
||||
Harness-agnostic policy gate and secret filtering engine for AI coding agents.
|
||||
|
||||
## Quick Facts
|
||||
- **Type**: Library + binary (`fermata`)
|
||||
- **Main Entry**: `src/lib.rs`, `src/bin/fermata.rs`
|
||||
- **Dependencies**: `ignore`, `toml`, `regex`, `globset`, `serde`, `clap` (cli feature)
|
||||
- **Status**: v0.1 — library + CLI + Claude hook adapter
|
||||
- **Dependencies**: `ignore`, `toml`, `regex`, `globset`, `serde`, `clap` (cli feature), `aho-corasick`, `serde_yaml`
|
||||
- **Status**: v0.2 — policy gate + secret filtering engine
|
||||
|
||||
## Layering
|
||||
|
||||
Three concentric layers; nothing inner imports from anything outer.
|
||||
|
||||
- **`core/`** — harness-unaware, transport-unaware, sync. Types (`Op`, `Decision`), `.botignore` walker, `botignore.toml` parser, `Policy::check` / `check_command`, path extraction. Sync, no tokio.
|
||||
- **`harness/`** — `HarnessAdapter` trait over a normalized `ToolCall`. Each adapter (Claude, future Codex, etc.) lives in its own submodule, feature-gated.
|
||||
- **`core/secrets/`** — the secret filtering engine:
|
||||
- `config.rs` — `.botsecrets` TOML parser and hierarchical resolution (user, project, local override).
|
||||
- `manifest.rs` — discovers secret-containing files from `.botsecrets` patterns and loads their content for redaction.
|
||||
- `parser.rs` — multi-format secret file parser (`.env`, TOML, YAML, JSON). Extracts key-value pairs where the value is a secret.
|
||||
- `patterns.rs` — built-in key name patterns (~30 universal patterns like `*_KEY`, `*_SECRET`, `*_PASSWORD`) and gitleaks-derived regex patterns for heuristic scanning.
|
||||
- `redactor.rs` — `Redactor` builds an Aho-Corasick automaton from known secret values and replaces them in arbitrary text. Sub-millisecond performance.
|
||||
- `scanner.rs` — `Scanner` applies heuristic regex patterns to detect secrets not covered by the known-value manifest (entropy-based and format-based detection).
|
||||
- **`harness/`** — `HarnessAdapter` trait over a normalized `ToolCall` (PreToolUse) and `PostToolUsePayload` (PostToolUse). Each adapter (Claude, future Codex, etc.) lives in its own submodule, feature-gated. PostToolUse enables output redaction via `updatedToolOutput` before content enters the LLM context.
|
||||
- **`bin/fermata.rs`** — only place where `clap`, stdio, and exit codes appear.
|
||||
|
||||
## Release Model
|
||||
@@ -24,11 +31,13 @@ Developed in this monorepo; planned to be exported as a standalone repo in the f
|
||||
|
||||
`dirigent_tools` depends on `dirigent_fermata`, never the reverse. Fermata must remain usable as a standalone hook/MCP without dragging in the in-process ACP tool runtime.
|
||||
|
||||
## Out of scope (v0.1)
|
||||
## Out of scope (v0.2)
|
||||
|
||||
Codex / Gemini hook adapters, MCP server mode, PostToolUse envelope, `readonly_only` Bash mode, audit log, filesystem watcher. Each is a future task with its own plan.
|
||||
Codex / Gemini hook adapters, MCP server mode, `readonly_only` Bash mode, audit log, filesystem watcher, context taint tracking. Each is a future task with its own plan.
|
||||
|
||||
## See also
|
||||
|
||||
- `docs/tools/fermata.md` — Dirigent integration plan
|
||||
- `docs/workpad/brainstorm/fermata.md` — canonical product spec
|
||||
- `docs/architecture/fermata-security-philosophy.md` — security philosophy and the reveal triangle
|
||||
- `.botsecrets` format: `core/secrets/config.rs` — the `.gitignore` of AI agent secret protection
|
||||
|
||||
@@ -19,6 +19,7 @@ path = "src/bin/fermata.rs"
|
||||
required-features = ["cli"]
|
||||
|
||||
[dependencies]
|
||||
aho-corasick = "1.1"
|
||||
globset = "0.4"
|
||||
ignore = "0.4"
|
||||
walkdir = "2"
|
||||
@@ -26,6 +27,7 @@ toml = "0.8"
|
||||
regex = "1.10"
|
||||
serde = { version = "1.0", features = ["derive"] }
|
||||
serde_json = "1.0"
|
||||
serde_yaml = "0.9"
|
||||
thiserror = "2.0"
|
||||
clap = { version = "4.5", features = ["derive"], optional = true }
|
||||
|
||||
|
||||
@@ -1,44 +1,40 @@
|
||||
# 𝄐 dirigent_fermata
|
||||
# dirigent_fermata
|
||||
|
||||
**A fast, harness-agnostic policy gate for AI coding agents.**
|
||||
**A fast, harness-agnostic policy gate and secret filtering engine for AI coding agents.**
|
||||
|
||||
Drop a `.botignore` file in your project root. Fermata reads it and blocks your agent from reading, writing, or running things it shouldn't — before the tool call happens.
|
||||
|
||||
```
|
||||
.env
|
||||
.env.*
|
||||
secrets/**
|
||||
conf/settings.local.yaml
|
||||
```
|
||||
|
||||
That's all it takes.
|
||||
Drop a `.botignore` to control what your agent can touch. Drop a `.botsecrets` to control what secret values your agent can see. Fermata enforces both -- before and after tool calls happen.
|
||||
|
||||
---
|
||||
|
||||
## Why Fermata
|
||||
|
||||
AI coding agents are powerful, but they don't have an innate sense of "don't touch `.env`." Native hook systems in tools like Claude Code let you intercept every file operation — but wiring up your own secure, fast hook for each project is friction. Fermata is that hook, ready to drop in.
|
||||
AI coding agents don't have an innate sense of "don't touch `.env`" -- and even if you block the file, they can still see its contents through shell output, log files, and indirect reads. Fermata solves both problems:
|
||||
|
||||
- **Fast** — written in Rust; ~1–5ms per call. Hooks fire on every read, write, and bash operation. Python cold-start (~50–150ms) compounds fast. Fermata doesn't.
|
||||
- **Familiar syntax** — `.botignore` uses gitignore rules via the `ignore` crate (the same engine powering ripgrep).
|
||||
- **Per-operation control** — `botignore.toml` lets you block writes to `vendor/**` while still allowing reads, or deny specific bash patterns without touching path rules.
|
||||
- **Harness-agnostic** — plain CLI exit codes work from any shell wrapper; the hook adapter speaks Claude Code's JSON natively.
|
||||
- **Policy gate** -- `.botignore` blocks reads, writes, and dangerous commands before they execute (PreToolUse).
|
||||
- **Secret filtering** -- `.botsecrets` redacts secret values from tool output before they enter the LLM context (PostToolUse).
|
||||
- **Fast** -- Rust, Aho-Corasick automaton for redaction, ~1-5ms per call.
|
||||
- **Familiar syntax** -- `.botignore` uses gitignore rules; `.botsecrets` uses TOML with glob patterns.
|
||||
- **Harness-agnostic** -- hook adapters for Claude Code (shipped), Codex and Gemini (planned), MCP proxy (planned).
|
||||
|
||||
---
|
||||
|
||||
## Status: v0.1
|
||||
## Status: v0.2
|
||||
|
||||
| Component | Status |
|
||||
|-----------|--------|
|
||||
| Library (`Op`, `Decision`, `Policy::check`, `Policy::check_command`) | Done |
|
||||
| `.botignore` walker (project-root walk-up, gitignore semantics) | Done |
|
||||
| Library (`Policy::check`, `Policy::check_command`) | Done |
|
||||
| `.botignore` walker (gitignore semantics) | Done |
|
||||
| `botignore.toml` parser (read / write / bash namespaces) | Done |
|
||||
| Path identification heuristics | Done |
|
||||
| CLI: `fermata check <path>...` | Done |
|
||||
| CLI: `fermata hook --harness claude` | Done |
|
||||
| CLI: `fermata check` / `fermata hook` | Done |
|
||||
| Claude Code PreToolUse adapter | Done |
|
||||
| Claude Code PostToolUse adapter (output redaction) | Done |
|
||||
| `.botsecrets` config parser | Done |
|
||||
| Secret manifest discovery and loading | Done |
|
||||
| Multi-format secret file parser (.env, TOML, YAML, JSON) | Done |
|
||||
| `Redactor` (known-value Aho-Corasick replacement) | Done |
|
||||
| `Scanner` (heuristic regex + gitleaks patterns) | Done |
|
||||
|
||||
Out of scope for v0.1: Codex / Gemini hook adapters, MCP server mode, audit log, filesystem watcher.
|
||||
Out of scope for v0.2: Codex / Gemini hook adapters, MCP proxy mode, audit log, filesystem watcher.
|
||||
|
||||
---
|
||||
|
||||
@@ -50,87 +46,43 @@ From source (this monorepo):
|
||||
cargo install --path crates/dirigent_fermata --features cli
|
||||
```
|
||||
|
||||
This installs the `fermata` binary into `~/.cargo/bin/`.
|
||||
---
|
||||
|
||||
## Secret Filtering
|
||||
|
||||
Fermata's secret filtering operates in three layers:
|
||||
|
||||
1. **Policy gate** (PreToolUse) -- `.botignore` blocks direct access to sensitive files. Catches ~90% of accidental reads.
|
||||
2. **Known-value redaction** (PostToolUse) -- `.botsecrets` declares which files contain secrets. Fermata parses them, extracts values, and replaces them in all tool output using an Aho-Corasick automaton. Zero false negatives for declared secrets.
|
||||
3. **Heuristic scanning** (PostToolUse) -- regex patterns derived from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs). Safety net for secrets not covered by the manifest.
|
||||
|
||||
### `.botsecrets` format
|
||||
|
||||
Create a `.botsecrets` file at your project root:
|
||||
|
||||
```toml
|
||||
# Files that contain secrets -- fermata parses these and redacts values
|
||||
[files]
|
||||
patterns = [".env", ".env.*", "secrets.*"]
|
||||
|
||||
# Additional secret key names (built-in defaults cover *_KEY, *_SECRET, etc.)
|
||||
[keys]
|
||||
include = ["STRIPE_*", "MY_APP_SIGNING_*"]
|
||||
|
||||
# Heuristic scanning on all tool output
|
||||
[heuristic]
|
||||
enabled = true
|
||||
```
|
||||
|
||||
That's the typical case. Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
|
||||
|
||||
---
|
||||
|
||||
## Usage
|
||||
|
||||
### Checking a path
|
||||
### Claude Code hook configuration
|
||||
|
||||
```bash
|
||||
fermata check --op read /path/to/.env
|
||||
# exit 1 — blocked
|
||||
# stderr: blocked by rule ".env" in /your/project/.botignore
|
||||
|
||||
fermata check --op write /path/to/src/main.rs
|
||||
# exit 0 — allowed
|
||||
```
|
||||
|
||||
### Claude Code hook adapter
|
||||
|
||||
```bash
|
||||
fermata hook --harness claude < hook_payload.json
|
||||
```
|
||||
|
||||
Reads the PreToolUse JSON from stdin, extracts the tool name and path or command, applies policy, and emits the Claude-shaped JSON response. The hook's exit code is always `0`; the verdict is in the JSON body.
|
||||
|
||||
---
|
||||
|
||||
## Configuration
|
||||
|
||||
### `.botignore` — the 80% case
|
||||
|
||||
Create a `.botignore` at your project root. Gitignore syntax. Blocks both reads and writes.
|
||||
|
||||
```gitignore
|
||||
# Secrets
|
||||
.env
|
||||
.env.*
|
||||
secrets/**
|
||||
|
||||
# Local config overrides
|
||||
conf/settings.local.yaml
|
||||
conf/settings.test.yaml
|
||||
|
||||
# Generated files — let the tools rebuild them, not patch them
|
||||
dist/**
|
||||
*.lock
|
||||
```
|
||||
|
||||
Fermata walks up from the target file to find the nearest `.botignore`, so it works correctly even when an agent changes directory.
|
||||
|
||||
### `botignore.toml` — per-operation rules
|
||||
|
||||
For cases where `.botignore`'s uniform read+write block isn't granular enough:
|
||||
|
||||
```toml
|
||||
[read]
|
||||
# Block reading secrets outright
|
||||
patterns = [".env*", "secrets/**", "conf/settings.local.yaml"]
|
||||
|
||||
[write]
|
||||
# Allow reading vendor code but block patching it
|
||||
patterns = ["vendor/**", "*.lock"]
|
||||
|
||||
[bash]
|
||||
# Hard-block destructive or exfiltrating commands
|
||||
deny = [
|
||||
"rm -rf /",
|
||||
"curl * | sh",
|
||||
"git push --force*",
|
||||
]
|
||||
# Ask before any removal or move
|
||||
ask = ["rm:*", "mv:*"]
|
||||
# Narrow allowlist for automated commands
|
||||
allow_prefixes = ["make test", "git checkout:*"]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## How it fits into Claude Code
|
||||
|
||||
Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
|
||||
Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
@@ -139,10 +91,15 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
|
||||
{
|
||||
"matcher": "Bash|Read|Edit|Write",
|
||||
"hooks": [
|
||||
{
|
||||
"type": "command",
|
||||
"command": "fermata hook --harness claude"
|
||||
{ "type": "command", "command": "fermata hook --harness claude" }
|
||||
]
|
||||
}
|
||||
],
|
||||
"PostToolUse": [
|
||||
{
|
||||
"matcher": "Bash|Read|Edit|Write",
|
||||
"hooks": [
|
||||
{ "type": "command", "command": "fermata hook --harness claude --event post-tool-use" }
|
||||
]
|
||||
}
|
||||
]
|
||||
@@ -150,50 +107,68 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
|
||||
}
|
||||
```
|
||||
|
||||
When Claude attempts a `Read(.env)`, `Write(vendor/foo.js)`, or `Bash(rm ./secrets/key.pem)`, fermata intercepts the call, checks policy, and returns a deny with a human-readable reason — before any damage is done.
|
||||
PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
|
||||
|
||||
### Checking a path
|
||||
|
||||
```bash
|
||||
fermata check --op read /path/to/.env
|
||||
# exit 1 -- blocked
|
||||
|
||||
fermata check --op write /path/to/src/main.rs
|
||||
# exit 0 -- allowed
|
||||
```
|
||||
|
||||
### Library API
|
||||
|
||||
```rust
|
||||
use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
|
||||
|
||||
// Load .botsecrets config and build the manifest
|
||||
let config = SecretsConfig::load("/path/to/project")?;
|
||||
let manifest = Manifest::discover(&config)?;
|
||||
|
||||
// Known-value redaction (Aho-Corasick, sub-millisecond)
|
||||
let redactor = Redactor::from_manifest(&manifest);
|
||||
let clean = redactor.redact("DB_PASSWORD=hunter2\nAPI_KEY=sk-abc123");
|
||||
// -> "DB_PASSWORD=*****\nAPI_KEY=*****"
|
||||
|
||||
// Heuristic scanning (regex patterns)
|
||||
let scanner = Scanner::new(&config);
|
||||
let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
|
||||
// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Real-world scenario
|
||||
## Configuration
|
||||
|
||||
A project has `.env`, `conf/settings.local.yaml`, and a `vendor/` tree it doesn't want patched. With `.botignore`:
|
||||
### `.botignore` -- access control
|
||||
|
||||
Gitignore syntax. Blocks both reads and writes.
|
||||
|
||||
```gitignore
|
||||
.env
|
||||
.env.*
|
||||
conf/settings.local.yaml
|
||||
vendor/**
|
||||
secrets/**
|
||||
```
|
||||
|
||||
Claude attempts to read credentials:
|
||||
|
||||
```
|
||||
Tool: Read
|
||||
Path: ./conf/settings.local.yaml
|
||||
Decision: BLOCK — matched rule "conf/settings.local.yaml" (.botignore)
|
||||
```
|
||||
|
||||
Claude attempts to read application code:
|
||||
|
||||
```
|
||||
Tool: Read
|
||||
Path: ./src/app/main.rs
|
||||
Decision: ALLOW
|
||||
```
|
||||
|
||||
Claude attempts to run `cat .env` via bash — which would bypass a path-only check:
|
||||
### `botignore.toml` -- per-operation rules
|
||||
|
||||
```toml
|
||||
# botignore.toml
|
||||
[read]
|
||||
patterns = [".env*", "secrets/**"]
|
||||
|
||||
[write]
|
||||
patterns = ["vendor/**", "*.lock"]
|
||||
|
||||
[bash]
|
||||
deny = ["cat .env*", "cat conf/settings.local*"]
|
||||
deny = ["rm -rf /", "curl * | sh"]
|
||||
```
|
||||
|
||||
```
|
||||
Tool: Bash
|
||||
Command: cat .env
|
||||
Decision: BLOCK — matched bash deny rule "cat .env*"
|
||||
```
|
||||
### `.botsecrets` -- secret value redaction
|
||||
|
||||
See the Secret Filtering section above.
|
||||
|
||||
---
|
||||
|
||||
@@ -201,14 +176,16 @@ Decision: BLOCK — matched bash deny rule "cat .env*"
|
||||
|
||||
Three concentric layers; nothing inner imports from anything outer:
|
||||
|
||||
- **`core/`** — harness-unaware, sync. Types, `.botignore` walker, `botignore.toml` parser, `Policy::check` / `check_command`, path extraction.
|
||||
- **`harness/`** — `HarnessAdapter` trait over a normalized `ToolCall`. Each adapter lives in its own submodule, feature-gated.
|
||||
- **`bin/fermata.rs`** — the only place `clap`, stdio, and exit codes appear.
|
||||
- **`core/`** -- harness-unaware, sync. Policy types, `.botignore` walker, `botignore.toml` parser, `Policy::check`.
|
||||
- **`core/secrets/`** -- `.botsecrets` config, manifest discovery, multi-format parser, Aho-Corasick redactor, heuristic scanner.
|
||||
- **`harness/`** -- `HarnessAdapter` trait for PreToolUse (policy gate) and PostToolUse (output redaction). Each adapter is feature-gated.
|
||||
- **`bin/fermata.rs`** -- `clap`, stdio, and exit codes.
|
||||
|
||||
---
|
||||
|
||||
## See also
|
||||
|
||||
- `docs/tools/fermata.md` — Dirigent integration plan
|
||||
- `docs/workpad/brainstorm/fermata.md` — full product spec and field notes
|
||||
- `docs/architecture/crates.md` — crate dependency map
|
||||
- `docs/tools/fermata.md` -- Dirigent integration plan
|
||||
- `docs/architecture/fermata-security-philosophy.md` -- security philosophy and the reveal triangle
|
||||
- `docs/workpad/brainstorm/fermata.md` -- full product spec and field notes
|
||||
- `docs/architecture/crates.md` -- crate dependency map
|
||||
|
||||
+180
-18
@@ -1,5 +1,6 @@
|
||||
use clap::{Parser, Subcommand, ValueEnum};
|
||||
use dirigent_fermata::core::{project::find_project_root, Decision, Op, Policy};
|
||||
use dirigent_fermata::harness::HookEvent;
|
||||
use std::io::{Read, Write};
|
||||
use std::path::PathBuf;
|
||||
use std::process::ExitCode;
|
||||
@@ -23,7 +24,11 @@ enum Cmd {
|
||||
},
|
||||
/// Read a harness hook payload from stdin and render the decision.
|
||||
Hook {
|
||||
#[arg(long)]
|
||||
/// Hook event type: pre-tool-use or post-tool-use.
|
||||
#[arg(long, default_value = "pre-tool-use")]
|
||||
event: String,
|
||||
/// Harness adapter name.
|
||||
#[arg(long, default_value = "claude")]
|
||||
harness: String,
|
||||
},
|
||||
}
|
||||
@@ -49,7 +54,7 @@ fn main() -> ExitCode {
|
||||
let cli = Cli::parse();
|
||||
match cli.cmd {
|
||||
Cmd::Check { op, json, paths } => run_check(op.into(), json, &paths),
|
||||
Cmd::Hook { harness } => run_hook(&harness),
|
||||
Cmd::Hook { event, harness } => run_hook(&event, &harness),
|
||||
}
|
||||
}
|
||||
|
||||
@@ -92,7 +97,7 @@ fn run_check(op: Op, json: bool, paths: &[PathBuf]) -> ExitCode {
|
||||
}
|
||||
}
|
||||
|
||||
fn run_hook(harness: &str) -> ExitCode {
|
||||
fn run_hook(event_str: &str, harness: &str) -> ExitCode {
|
||||
let adapter = match dirigent_fermata::harness::lookup(harness) {
|
||||
Some(a) => a,
|
||||
None => {
|
||||
@@ -100,28 +105,51 @@ fn run_hook(harness: &str) -> ExitCode {
|
||||
return ExitCode::from(2);
|
||||
}
|
||||
};
|
||||
let event = match HookEvent::parse(event_str) {
|
||||
Some(e) => e,
|
||||
None => {
|
||||
eprintln!("fermata: unknown event '{event_str}'");
|
||||
return ExitCode::from(2);
|
||||
}
|
||||
};
|
||||
|
||||
let mut buf = Vec::new();
|
||||
if let Err(e) = std::io::stdin().lock().read_to_end(&mut buf) {
|
||||
eprintln!("fermata: stdin: {e}");
|
||||
return ExitCode::from(2);
|
||||
}
|
||||
let call = match adapter.parse_request(&buf) {
|
||||
|
||||
match event {
|
||||
HookEvent::PreToolUse => run_pre_tool_use(&*adapter, &buf),
|
||||
HookEvent::PostToolUse => run_post_tool_use(&*adapter, &buf),
|
||||
}
|
||||
}
|
||||
|
||||
/// Handle a PreToolUse hook event (policy gate).
|
||||
fn run_pre_tool_use(
|
||||
adapter: &dyn dirigent_fermata::harness::HarnessAdapter,
|
||||
buf: &[u8],
|
||||
) -> ExitCode {
|
||||
use dirigent_fermata::harness::{PathKind, ToolOp};
|
||||
|
||||
let call = match adapter.parse_request(buf) {
|
||||
Ok(c) => c,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: parse: {e}");
|
||||
return ExitCode::from(2);
|
||||
// Fail-open: output empty JSON and exit 0.
|
||||
let _ = std::io::stdout().lock().write_all(b"{}");
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
};
|
||||
|
||||
use dirigent_fermata::harness::{PathKind, ToolOp};
|
||||
let decision = match &call.op {
|
||||
ToolOp::Path { path, kind } => {
|
||||
let root = match find_project_root(path) {
|
||||
// No project root → fail-open allow (hook must always exit 0 with a verdict).
|
||||
// run_check silently skips these paths; here we must still emit JSON.
|
||||
Some(r) => r,
|
||||
None => {
|
||||
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &Decision::Allow)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
@@ -130,7 +158,9 @@ fn run_hook(harness: &str) -> ExitCode {
|
||||
Ok(p) => p,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: load error: {e}");
|
||||
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &Decision::Allow)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
@@ -143,32 +173,36 @@ fn run_hook(harness: &str) -> ExitCode {
|
||||
Ok(d) => d,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: check error: {e}");
|
||||
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &Decision::Allow)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
}
|
||||
}
|
||||
ToolOp::Command { text } => {
|
||||
// For commands, we look up the project from cwd (no path argument).
|
||||
let cwd = match std::env::current_dir() {
|
||||
Ok(d) => d,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: cwd error: {e}");
|
||||
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &Decision::Allow)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
};
|
||||
match find_project_root(&cwd) {
|
||||
// No project root → fail-open allow (see Path branch note above).
|
||||
None => Decision::Allow,
|
||||
Some(root) => {
|
||||
let policy = match Policy::load(&root) {
|
||||
Ok(p) => p,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: load error: {e}");
|
||||
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &Decision::Allow)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
@@ -177,7 +211,9 @@ fn run_hook(harness: &str) -> ExitCode {
|
||||
Ok(d) => d,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: check error: {e}");
|
||||
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &Decision::Allow)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
@@ -186,9 +222,135 @@ fn run_hook(harness: &str) -> ExitCode {
|
||||
}
|
||||
}
|
||||
};
|
||||
let out = adapter.render_decision(&call, &decision).unwrap_or_default();
|
||||
let out = adapter
|
||||
.render_decision(&call, &decision)
|
||||
.unwrap_or_default();
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
ExitCode::from(0) // hook bins always exit 0; the JSON carries the verdict
|
||||
ExitCode::from(0)
|
||||
}
|
||||
|
||||
/// Handle a PostToolUse hook event (output redaction).
|
||||
///
|
||||
/// Fail-open: any error results in `{}` on stdout and exit 0, so the
|
||||
/// harness continues with the original output.
|
||||
fn run_post_tool_use(
|
||||
adapter: &dyn dirigent_fermata::harness::HarnessAdapter,
|
||||
buf: &[u8],
|
||||
) -> ExitCode {
|
||||
use dirigent_fermata::core::secrets::{
|
||||
config::HeuristicMode, Manifest, Redactor, Scanner, SecretsConfig,
|
||||
};
|
||||
|
||||
// Parse payload; fail-open on error.
|
||||
let payload = match adapter.parse_post_tool_use(buf) {
|
||||
Ok(p) => p,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: post-tool-use parse: {e}");
|
||||
let _ = std::io::stdout().lock().write_all(b"{}");
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
};
|
||||
|
||||
// Empty tool response — nothing to redact.
|
||||
if payload.tool_response.is_empty() {
|
||||
let _ = std::io::stdout().lock().write_all(b"{}");
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
|
||||
// Find project root from cwd (PostToolUse has no reliable path).
|
||||
let root = match std::env::current_dir().ok().and_then(|d| find_project_root(&d)) {
|
||||
Some(r) => r,
|
||||
None => {
|
||||
// No project root → nothing to redact, pass through.
|
||||
let _ = std::io::stdout().lock().write_all(b"{}");
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
};
|
||||
|
||||
// Load secrets config; fail-open if missing or broken.
|
||||
let config = match SecretsConfig::load(&root) {
|
||||
Ok(c) => c,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: secrets config: {e}");
|
||||
let _ = std::io::stdout().lock().write_all(b"{}");
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
};
|
||||
|
||||
// Build manifest from config (discovers .env files etc.).
|
||||
let manifest = match Manifest::build(&config, &root) {
|
||||
Ok(m) => m,
|
||||
Err(e) => {
|
||||
eprintln!("fermata: manifest: {e}");
|
||||
let _ = std::io::stdout().lock().write_all(b"{}");
|
||||
return ExitCode::from(0);
|
||||
}
|
||||
};
|
||||
|
||||
// Run redactor over tool_response.
|
||||
let redactor = Redactor::new(&manifest, config.redaction.style);
|
||||
let redacted = redactor.redact(&payload.tool_response);
|
||||
|
||||
// Run heuristic scanner if enabled.
|
||||
let mut scanner_warning: Option<String> = None;
|
||||
if config.heuristic.enabled {
|
||||
if let Ok(scanner) = Scanner::new(&config.heuristic) {
|
||||
// Scan the (already redacted) text so we don't re-flag known secrets.
|
||||
let findings = scanner.scan(&redacted.text);
|
||||
if !findings.is_empty() {
|
||||
match config.heuristic.mode {
|
||||
HeuristicMode::Report => {
|
||||
// Log to stderr only; do not modify output.
|
||||
for f in &findings {
|
||||
eprintln!(
|
||||
"fermata: heuristic finding [{:?}] {}: {}",
|
||||
f.confidence, f.pattern_id, f.description
|
||||
);
|
||||
}
|
||||
}
|
||||
HeuristicMode::Enforce => {
|
||||
let descriptions: Vec<String> = findings
|
||||
.iter()
|
||||
.map(|f| format!("{} ({})", f.description, f.pattern_id))
|
||||
.collect();
|
||||
scanner_warning = Some(format!(
|
||||
"\n[fermata] WARNING: heuristic scan found {} potential secret(s): {}",
|
||||
findings.len(),
|
||||
descriptions.join(", ")
|
||||
));
|
||||
}
|
||||
HeuristicMode::Disabled => {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Determine whether we need to send back modified output.
|
||||
let redaction_count = redacted.redactions.len();
|
||||
let was_redacted = redaction_count > 0;
|
||||
let needs_update = was_redacted || scanner_warning.is_some();
|
||||
|
||||
let output = if needs_update {
|
||||
let mut text = redacted.text;
|
||||
if let Some(warning) = scanner_warning {
|
||||
text.push_str(&warning);
|
||||
}
|
||||
if was_redacted {
|
||||
eprintln!(
|
||||
"fermata: redacted {} secret(s) from {} output",
|
||||
redaction_count, payload.tool_name
|
||||
);
|
||||
}
|
||||
Some(text)
|
||||
} else {
|
||||
None
|
||||
};
|
||||
|
||||
let out = adapter
|
||||
.render_post_tool_use(&payload, output.as_deref())
|
||||
.unwrap_or_else(|_| b"{}".to_vec());
|
||||
let _ = std::io::stdout().lock().write_all(&out);
|
||||
ExitCode::from(0)
|
||||
}
|
||||
|
||||
fn merge_worst(a: Option<Decision>, b: Decision) -> Decision {
|
||||
|
||||
@@ -6,6 +6,7 @@ pub mod extract;
|
||||
pub mod op;
|
||||
pub mod policy;
|
||||
pub mod project;
|
||||
pub mod secrets;
|
||||
pub mod toml_config;
|
||||
|
||||
pub use decision::{Decision, Reason, Rule};
|
||||
|
||||
+1
-1
@@ -1,7 +1,7 @@
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
/// Strong markers that definitively identify a project root.
|
||||
const STRONG_MARKERS: &[&str] = &["botignore.toml", ".botignore.toml", ".git"];
|
||||
const STRONG_MARKERS: &[&str] = &["botignore.toml", ".botignore.toml", ".botsecrets", ".git"];
|
||||
|
||||
/// Walk upward from `target` (or its parent if `target` is a file) looking
|
||||
/// for the nearest project root. Strong markers (`botignore.toml`,
|
||||
|
||||
@@ -0,0 +1,530 @@
|
||||
//! Parse and merge `.botsecrets` TOML configuration files.
|
||||
//!
|
||||
//! The configuration is layered (most-specific wins):
|
||||
//!
|
||||
//! 1. Built-in defaults
|
||||
//! 2. `~/.config/fermata/.botsecrets` (user-global)
|
||||
//! 3. `<root>/.botsecrets` (project)
|
||||
//! 4. `<root>/.botsecrets.local` (local overrides, git-ignored)
|
||||
//!
|
||||
//! Vec fields like `files.patterns` are *replaced* by more-specific layers.
|
||||
//! `keys.include` and `keys.exclude` *accumulate* across layers.
|
||||
//! Scalar fields (style, mode, enabled) take the most-specific value.
|
||||
|
||||
use globset::{Glob, GlobMatcher};
|
||||
use serde::{Deserialize, Serialize};
|
||||
use std::path::{Path, PathBuf};
|
||||
use thiserror::Error;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Errors
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum SecretsConfigError {
|
||||
#[error("io error reading {path}: {source}")]
|
||||
Io {
|
||||
path: PathBuf,
|
||||
source: std::io::Error,
|
||||
},
|
||||
#[error("TOML parse error in {path}: {source}")]
|
||||
Parse {
|
||||
path: PathBuf,
|
||||
source: toml::de::Error,
|
||||
},
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Config types
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Top-level `.botsecrets` configuration.
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct SecretsConfig {
|
||||
#[serde(default)]
|
||||
pub files: FilesConfig,
|
||||
#[serde(default)]
|
||||
pub keys: KeysConfig,
|
||||
#[serde(default)]
|
||||
pub redaction: RedactionConfig,
|
||||
#[serde(default)]
|
||||
pub heuristic: HeuristicConfig,
|
||||
#[serde(default)]
|
||||
pub enforcement: EnforcementConfig,
|
||||
#[serde(default, rename = "file")]
|
||||
pub file_overrides: Vec<FileOverride>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct FilesConfig {
|
||||
#[serde(default = "default_file_patterns")]
|
||||
pub patterns: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct KeysConfig {
|
||||
#[serde(default)]
|
||||
pub include: Vec<String>,
|
||||
#[serde(default)]
|
||||
pub exclude: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
|
||||
#[serde(rename_all = "kebab-case")]
|
||||
pub enum RedactionStyle {
|
||||
Masked,
|
||||
Typed,
|
||||
Named,
|
||||
Absent,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct RedactionConfig {
|
||||
#[serde(default = "default_redaction_style")]
|
||||
pub style: RedactionStyle,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
|
||||
#[serde(rename_all = "kebab-case")]
|
||||
pub enum HeuristicMode {
|
||||
Enforce,
|
||||
Report,
|
||||
Disabled,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct HeuristicConfig {
|
||||
#[serde(default = "default_true")]
|
||||
pub enabled: bool,
|
||||
#[serde(default = "default_heuristic_mode")]
|
||||
pub mode: HeuristicMode,
|
||||
#[serde(default)]
|
||||
pub patterns: Vec<String>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
|
||||
#[serde(rename_all = "kebab-case")]
|
||||
pub enum EnforcementMode {
|
||||
Strict,
|
||||
Permissive,
|
||||
Audit,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
|
||||
#[serde(rename_all = "kebab-case")]
|
||||
pub enum ParseErrorAction {
|
||||
MaskEntireFile,
|
||||
Allow,
|
||||
Deny,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct EnforcementConfig {
|
||||
#[serde(default = "default_enforcement_mode")]
|
||||
pub mode: EnforcementMode,
|
||||
#[serde(default = "default_parse_error_action")]
|
||||
pub on_parse_error: ParseErrorAction,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Deserialize, Serialize)]
|
||||
pub struct FileOverride {
|
||||
pub path: String,
|
||||
#[serde(default)]
|
||||
pub format: Option<String>,
|
||||
#[serde(default)]
|
||||
pub keys: Vec<String>,
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Built-in defaults
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
pub(crate) fn default_file_patterns() -> Vec<String> {
|
||||
vec![
|
||||
".env",
|
||||
".env.*",
|
||||
"*.env",
|
||||
"secrets.*",
|
||||
"credentials.*",
|
||||
"*.key",
|
||||
"*.pem",
|
||||
"*.p12",
|
||||
"*.pfx",
|
||||
"id_rsa",
|
||||
"id_ed25519",
|
||||
"id_ecdsa",
|
||||
"Secrets.toml",
|
||||
"Secrets.*.toml",
|
||||
"terraform.tfvars",
|
||||
"*.auto.tfvars",
|
||||
"terraform.tfstate",
|
||||
"*.tfstate",
|
||||
".docker/config.json",
|
||||
"config/master.key",
|
||||
"config/credentials/*.key",
|
||||
".aws/credentials",
|
||||
".netrc",
|
||||
".htpasswd",
|
||||
"service-account.json",
|
||||
"service-account-key.json",
|
||||
]
|
||||
.into_iter()
|
||||
.map(String::from)
|
||||
.collect()
|
||||
}
|
||||
|
||||
/// Built-in key name patterns that are always treated as sensitive.
|
||||
pub const BUILTIN_KEY_PATTERNS: &[&str] = &[
|
||||
"*PASSWORD*",
|
||||
"*PASSWD*",
|
||||
"*SECRET*",
|
||||
"*API_KEY*",
|
||||
"*APIKEY*",
|
||||
"*TOKEN*",
|
||||
"*ACCESS_KEY*",
|
||||
"*PRIVATE_KEY*",
|
||||
"*AUTH*",
|
||||
"*CREDENTIAL*",
|
||||
"*CONNECTION_STRING*",
|
||||
"*CONN_STR*",
|
||||
"DATABASE_URL",
|
||||
"REDIS_URL",
|
||||
"MONGODB_URI",
|
||||
"AMQP_URL",
|
||||
"AWS_SECRET_ACCESS_KEY",
|
||||
"AWS_ACCESS_KEY_ID",
|
||||
"AWS_SESSION_TOKEN",
|
||||
"GITHUB_TOKEN",
|
||||
"GH_TOKEN",
|
||||
"GITLAB_TOKEN",
|
||||
"NPM_TOKEN",
|
||||
"NODE_AUTH_TOKEN",
|
||||
"STRIPE_SECRET_KEY",
|
||||
"STRIPE_WEBHOOK_SECRET",
|
||||
"OPENAI_API_KEY",
|
||||
"ANTHROPIC_API_KEY",
|
||||
"SENTRY_DSN",
|
||||
"HEROKU_API_KEY",
|
||||
"SENDGRID_API_KEY",
|
||||
"JWT_SECRET",
|
||||
"JWT_SIGNING_KEY",
|
||||
"SESSION_SECRET",
|
||||
"ENCRYPTION_KEY",
|
||||
"ENCRYPT_KEY",
|
||||
"MASTER_KEY",
|
||||
"SIGNING_KEY",
|
||||
"SECRET_KEY",
|
||||
"SECRET_KEY_BASE",
|
||||
"APP_KEY",
|
||||
"NEXTAUTH_SECRET",
|
||||
];
|
||||
|
||||
fn default_redaction_style() -> RedactionStyle {
|
||||
RedactionStyle::Masked
|
||||
}
|
||||
|
||||
fn default_heuristic_mode() -> HeuristicMode {
|
||||
HeuristicMode::Enforce
|
||||
}
|
||||
|
||||
fn default_true() -> bool {
|
||||
true
|
||||
}
|
||||
|
||||
fn default_enforcement_mode() -> EnforcementMode {
|
||||
EnforcementMode::Permissive
|
||||
}
|
||||
|
||||
fn default_parse_error_action() -> ParseErrorAction {
|
||||
ParseErrorAction::MaskEntireFile
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Default impls
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
impl Default for SecretsConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
files: FilesConfig::default(),
|
||||
keys: KeysConfig::default(),
|
||||
redaction: RedactionConfig::default(),
|
||||
heuristic: HeuristicConfig::default(),
|
||||
enforcement: EnforcementConfig::default(),
|
||||
file_overrides: Vec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for FilesConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
patterns: default_file_patterns(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for KeysConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
include: Vec::new(),
|
||||
exclude: Vec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for RedactionConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
style: default_redaction_style(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for HeuristicConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
enabled: default_true(),
|
||||
mode: default_heuristic_mode(),
|
||||
patterns: Vec::new(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for EnforcementConfig {
|
||||
fn default() -> Self {
|
||||
Self {
|
||||
mode: default_enforcement_mode(),
|
||||
on_parse_error: default_parse_error_action(),
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Partial layer (for merge)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// A partially-specified config layer parsed from a single `.botsecrets` file.
|
||||
/// `Option`-wrapped fields distinguish "absent" from "explicitly set".
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
struct PartialSecretsConfig {
|
||||
#[serde(default)]
|
||||
files: Option<PartialFilesConfig>,
|
||||
#[serde(default)]
|
||||
keys: Option<PartialKeysConfig>,
|
||||
#[serde(default)]
|
||||
redaction: Option<PartialRedactionConfig>,
|
||||
#[serde(default)]
|
||||
heuristic: Option<PartialHeuristicConfig>,
|
||||
#[serde(default)]
|
||||
enforcement: Option<PartialEnforcementConfig>,
|
||||
#[serde(default, rename = "file")]
|
||||
file: Option<Vec<FileOverride>>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
struct PartialFilesConfig {
|
||||
patterns: Option<Vec<String>>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
struct PartialKeysConfig {
|
||||
include: Option<Vec<String>>,
|
||||
exclude: Option<Vec<String>>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
struct PartialRedactionConfig {
|
||||
style: Option<RedactionStyle>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
struct PartialHeuristicConfig {
|
||||
enabled: Option<bool>,
|
||||
mode: Option<HeuristicMode>,
|
||||
patterns: Option<Vec<String>>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Default, Deserialize)]
|
||||
struct PartialEnforcementConfig {
|
||||
mode: Option<EnforcementMode>,
|
||||
on_parse_error: Option<ParseErrorAction>,
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Merge logic
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
impl SecretsConfig {
|
||||
/// Apply a partial layer on top of `self`.
|
||||
///
|
||||
/// - Vec fields (`files.patterns`, `heuristic.patterns`, `file_overrides`):
|
||||
/// **replaced** by the layer's value when present.
|
||||
/// - `keys.include` / `keys.exclude`: **accumulated** (appended).
|
||||
/// - Scalar fields: overwritten when present in the layer.
|
||||
fn merge_layer(&mut self, layer: PartialSecretsConfig) {
|
||||
// files
|
||||
if let Some(f) = layer.files {
|
||||
if let Some(patterns) = f.patterns {
|
||||
self.files.patterns = patterns;
|
||||
}
|
||||
}
|
||||
|
||||
// keys (accumulate)
|
||||
if let Some(k) = layer.keys {
|
||||
if let Some(inc) = k.include {
|
||||
self.keys.include.extend(inc);
|
||||
}
|
||||
if let Some(exc) = k.exclude {
|
||||
self.keys.exclude.extend(exc);
|
||||
}
|
||||
}
|
||||
|
||||
// redaction
|
||||
if let Some(r) = layer.redaction {
|
||||
if let Some(style) = r.style {
|
||||
self.redaction.style = style;
|
||||
}
|
||||
}
|
||||
|
||||
// heuristic
|
||||
if let Some(h) = layer.heuristic {
|
||||
if let Some(enabled) = h.enabled {
|
||||
self.heuristic.enabled = enabled;
|
||||
}
|
||||
if let Some(mode) = h.mode {
|
||||
self.heuristic.mode = mode;
|
||||
}
|
||||
if let Some(patterns) = h.patterns {
|
||||
self.heuristic.patterns = patterns;
|
||||
}
|
||||
}
|
||||
|
||||
// enforcement
|
||||
if let Some(e) = layer.enforcement {
|
||||
if let Some(mode) = e.mode {
|
||||
self.enforcement.mode = mode;
|
||||
}
|
||||
if let Some(action) = e.on_parse_error {
|
||||
self.enforcement.on_parse_error = action;
|
||||
}
|
||||
}
|
||||
|
||||
// file overrides (replace)
|
||||
if let Some(overrides) = layer.file {
|
||||
self.file_overrides = overrides;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Loading & discovery
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Return the user-global fermata config directory.
|
||||
/// `~/.config/fermata` on Unix, `%APPDATA%/fermata` on Windows.
|
||||
fn user_config_dir() -> Option<PathBuf> {
|
||||
#[cfg(unix)]
|
||||
{
|
||||
std::env::var_os("HOME").map(|h| PathBuf::from(h).join(".config").join("fermata"))
|
||||
}
|
||||
#[cfg(windows)]
|
||||
{
|
||||
std::env::var_os("APPDATA").map(|a| PathBuf::from(a).join("fermata"))
|
||||
}
|
||||
}
|
||||
|
||||
impl SecretsConfig {
|
||||
/// Load `.botsecrets` configuration for a project.
|
||||
///
|
||||
/// Merges layers in order (most-specific wins):
|
||||
/// 1. Built-in defaults
|
||||
/// 2. `~/.config/fermata/.botsecrets`
|
||||
/// 3. `<root>/.botsecrets`
|
||||
/// 4. `<root>/.botsecrets.local`
|
||||
pub fn load(root: &Path) -> Result<Self, SecretsConfigError> {
|
||||
let mut config = Self::default();
|
||||
|
||||
// Layer 2: user-global
|
||||
if let Some(user_dir) = user_config_dir() {
|
||||
let user_file = user_dir.join(".botsecrets");
|
||||
if user_file.is_file() {
|
||||
let layer = Self::read_partial(&user_file)?;
|
||||
config.merge_layer(layer);
|
||||
}
|
||||
}
|
||||
|
||||
// Layer 3: project root
|
||||
let project_file = root.join(".botsecrets");
|
||||
if project_file.is_file() {
|
||||
let layer = Self::read_partial(&project_file)?;
|
||||
config.merge_layer(layer);
|
||||
}
|
||||
|
||||
// Layer 4: local overrides
|
||||
let local_file = root.join(".botsecrets.local");
|
||||
if local_file.is_file() {
|
||||
let layer = Self::read_partial(&local_file)?;
|
||||
config.merge_layer(layer);
|
||||
}
|
||||
|
||||
Ok(config)
|
||||
}
|
||||
|
||||
/// Parse a single `.botsecrets` file into a partial layer.
|
||||
fn read_partial(path: &Path) -> Result<PartialSecretsConfig, SecretsConfigError> {
|
||||
let text = std::fs::read_to_string(path).map_err(|e| SecretsConfigError::Io {
|
||||
path: path.to_path_buf(),
|
||||
source: e,
|
||||
})?;
|
||||
toml::from_str(&text).map_err(|e| SecretsConfigError::Parse {
|
||||
path: path.to_path_buf(),
|
||||
source: e,
|
||||
})
|
||||
}
|
||||
|
||||
/// Load from a TOML string (useful for testing and embedding).
|
||||
pub fn from_toml(toml_str: &str) -> Result<Self, toml::de::Error> {
|
||||
toml::from_str(toml_str)
|
||||
}
|
||||
|
||||
/// Returns the effective key-include patterns: built-in defaults + user
|
||||
/// `keys.include`, minus any pattern that appears in `keys.exclude`.
|
||||
pub fn effective_key_includes(&self) -> Vec<String> {
|
||||
let mut patterns: Vec<String> = BUILTIN_KEY_PATTERNS
|
||||
.iter()
|
||||
.map(|s| (*s).to_owned())
|
||||
.collect();
|
||||
patterns.extend(self.keys.include.iter().cloned());
|
||||
|
||||
// Remove excluded patterns (exact string match).
|
||||
if !self.keys.exclude.is_empty() {
|
||||
let exclude_set: std::collections::HashSet<&str> =
|
||||
self.keys.exclude.iter().map(|s| s.as_str()).collect();
|
||||
patterns.retain(|p| !exclude_set.contains(p.as_str()));
|
||||
}
|
||||
|
||||
patterns
|
||||
}
|
||||
|
||||
/// Check whether `key` matches any of the effective key-include patterns.
|
||||
///
|
||||
/// Matching is case-insensitive and uses glob semantics (`*` wildcards).
|
||||
pub fn key_matches(&self, key: &str) -> bool {
|
||||
let patterns = self.effective_key_includes();
|
||||
let upper = key.to_ascii_uppercase();
|
||||
|
||||
for pat in &patterns {
|
||||
let pat_upper = pat.to_ascii_uppercase();
|
||||
// Build a glob matcher. Patterns without path separators are
|
||||
// matched as plain globs against the key name.
|
||||
if let Ok(glob) = Glob::new(&pat_upper) {
|
||||
let matcher: GlobMatcher = glob.compile_matcher();
|
||||
if matcher.is_match(&upper) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,310 @@
|
||||
//! Secret manifest loader.
|
||||
//!
|
||||
//! Discovers secret files per the `.botsecrets` configuration, parses them,
|
||||
//! filters by key patterns, and produces the known-secrets set that the
|
||||
//! Redactor will consume.
|
||||
|
||||
use std::path::{Path, PathBuf};
|
||||
|
||||
use globset::{Glob, GlobSetBuilder};
|
||||
use thiserror::Error;
|
||||
use walkdir::WalkDir;
|
||||
|
||||
use super::config::{ParseErrorAction, SecretsConfig};
|
||||
use super::parser::{self, FileFormat, ParseError, SecretEntry};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Errors
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ManifestError {
|
||||
#[error(transparent)]
|
||||
Parse(#[from] ParseError),
|
||||
#[error("glob pattern error: {0}")]
|
||||
Glob(String),
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Manifest
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// The complete set of known secrets discovered from a project.
|
||||
///
|
||||
/// Entries are sorted by value length descending (longest first) so the
|
||||
/// redactor replaces the most specific match before shorter substrings.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Manifest {
|
||||
entries: Vec<SecretEntry>,
|
||||
}
|
||||
|
||||
/// Minimum secret value length to keep. Anything shorter risks false-positive
|
||||
/// redaction (e.g. `"yes"`, `"on"`, `"42"`).
|
||||
const MIN_VALUE_LEN: usize = 4;
|
||||
|
||||
/// Directories that are unconditionally skipped during file discovery.
|
||||
const SKIP_DIRS: &[&str] = &[".git", "node_modules", "target", "__pycache__", ".venv"];
|
||||
|
||||
impl Manifest {
|
||||
/// Build a manifest by discovering and parsing secret files relative to
|
||||
/// `root`.
|
||||
pub fn build(config: &SecretsConfig, root: &Path) -> Result<Self, ManifestError> {
|
||||
let mut entries = Vec::new();
|
||||
|
||||
// 1. Discover files matching `config.files.patterns`.
|
||||
let discovered = discover_files(&config.files.patterns, root)?;
|
||||
|
||||
// 2. Parse each discovered file.
|
||||
for path in &discovered {
|
||||
match parse_discovered_file(path) {
|
||||
Ok(file_entries) => entries.extend(file_entries),
|
||||
Err(e) => match config.enforcement.on_parse_error {
|
||||
ParseErrorAction::Allow => {
|
||||
eprintln!(
|
||||
"fermata: warning: skipping unparseable file {}: {}",
|
||||
path.display(),
|
||||
e
|
||||
);
|
||||
}
|
||||
ParseErrorAction::Deny => {
|
||||
return Err(e.into());
|
||||
}
|
||||
ParseErrorAction::MaskEntireFile => {
|
||||
// We cannot extract individual secrets — the redactor
|
||||
// may choose to mask the entire file content if it
|
||||
// appears in output. For now we log and continue.
|
||||
eprintln!(
|
||||
"fermata: warning: cannot parse {}: {}",
|
||||
path.display(),
|
||||
e
|
||||
);
|
||||
}
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
// 3. Filter discovered entries by the effective key patterns.
|
||||
entries = filter_by_key_patterns(entries, config);
|
||||
|
||||
// 4. Process explicit `[[file]]` overrides — these bypass key filtering
|
||||
// because the user declared them intentionally.
|
||||
for override_cfg in &config.file_overrides {
|
||||
let override_path = root.join(&override_cfg.path);
|
||||
if !override_path.is_file() {
|
||||
continue;
|
||||
}
|
||||
|
||||
let format = override_cfg
|
||||
.format
|
||||
.as_deref()
|
||||
.and_then(FileFormat::from_hint);
|
||||
|
||||
let key_filter = if override_cfg.keys.is_empty() {
|
||||
None
|
||||
} else {
|
||||
Some(override_cfg.keys.as_slice())
|
||||
};
|
||||
|
||||
match parser::parse_secret_file(&override_path, format, key_filter) {
|
||||
Ok(file_entries) => entries.extend(file_entries),
|
||||
Err(e) => {
|
||||
eprintln!(
|
||||
"fermata: warning: cannot parse override file {}: {}",
|
||||
override_path.display(),
|
||||
e
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// 5. Deduplicate (same key + value from different discovery paths).
|
||||
entries.sort_by(|a, b| a.key.cmp(&b.key).then_with(|| a.value.cmp(&b.value)));
|
||||
entries.dedup_by(|a, b| a.key == b.key && a.value == b.value);
|
||||
|
||||
// 6. Sort by value length descending (longest first for redaction).
|
||||
entries.sort_by(|a, b| b.value.len().cmp(&a.value.len()));
|
||||
|
||||
// 7. Remove entries with very short values to avoid false replacements.
|
||||
entries.retain(|e| e.value.len() >= MIN_VALUE_LEN);
|
||||
|
||||
Ok(Self { entries })
|
||||
}
|
||||
|
||||
/// Build a manifest from a pre-built list of secret entries.
|
||||
///
|
||||
/// Applies the same post-processing as [`Manifest::build`]:
|
||||
/// - Deduplicates entries with the same key and value.
|
||||
/// - Sorts by value length descending (longest first for redaction).
|
||||
/// - Removes entries with values shorter than 4 characters.
|
||||
///
|
||||
/// Useful for testing and for library consumers that obtain secrets
|
||||
/// from sources other than filesystem discovery.
|
||||
pub fn from_entries(mut entries: Vec<SecretEntry>) -> Self {
|
||||
// Deduplicate (same key + value).
|
||||
entries.sort_by(|a, b| a.key.cmp(&b.key).then_with(|| a.value.cmp(&b.value)));
|
||||
entries.dedup_by(|a, b| a.key == b.key && a.value == b.value);
|
||||
|
||||
// Sort by value length descending (longest first for redaction).
|
||||
entries.sort_by(|a, b| b.value.len().cmp(&a.value.len()));
|
||||
|
||||
// Remove entries with very short values to avoid false replacements.
|
||||
entries.retain(|e| e.value.len() >= MIN_VALUE_LEN);
|
||||
|
||||
Self { entries }
|
||||
}
|
||||
|
||||
/// Build an empty manifest (no secrets known).
|
||||
pub fn empty() -> Self {
|
||||
Self {
|
||||
entries: Vec::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns all discovered secret entries.
|
||||
pub fn entries(&self) -> &[SecretEntry] {
|
||||
&self.entries
|
||||
}
|
||||
|
||||
/// Returns `true` if the manifest contains no secrets.
|
||||
pub fn is_empty(&self) -> bool {
|
||||
self.entries.is_empty()
|
||||
}
|
||||
|
||||
/// Number of known secrets.
|
||||
pub fn len(&self) -> usize {
|
||||
self.entries.len()
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// File discovery
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Walk the project tree and collect files matching any of the given glob
|
||||
/// patterns. Patterns are matched against paths *relative to* `root`.
|
||||
fn discover_files(patterns: &[String], root: &Path) -> Result<Vec<PathBuf>, ManifestError> {
|
||||
if patterns.is_empty() {
|
||||
return Ok(Vec::new());
|
||||
}
|
||||
|
||||
// Compile all patterns into a single GlobSet for efficient matching.
|
||||
let mut builder = GlobSetBuilder::new();
|
||||
for pat in patterns {
|
||||
// `globset` patterns match against the full relative path including
|
||||
// intermediate directories (e.g. `.docker/config.json`). We add
|
||||
// both the literal pattern and a `**/` prefixed variant so that
|
||||
// `.env` matches at the root and `subdir/.env` matches nested.
|
||||
let glob = Glob::new(pat).map_err(|e| ManifestError::Glob(e.to_string()))?;
|
||||
builder.add(glob);
|
||||
|
||||
// Also match nested occurrences: `**/<pattern>`.
|
||||
if !pat.contains('/') {
|
||||
let nested = format!("**/{pat}");
|
||||
let nested_glob =
|
||||
Glob::new(&nested).map_err(|e| ManifestError::Glob(e.to_string()))?;
|
||||
builder.add(nested_glob);
|
||||
}
|
||||
}
|
||||
let glob_set = builder.build().map_err(|e| ManifestError::Glob(e.to_string()))?;
|
||||
|
||||
let mut result = Vec::new();
|
||||
|
||||
for entry in WalkDir::new(root).follow_links(false) {
|
||||
let entry = match entry {
|
||||
Ok(e) => e,
|
||||
Err(_) => continue,
|
||||
};
|
||||
|
||||
// Skip common large / non-project directories.
|
||||
if entry.file_type().is_dir() {
|
||||
if let Some(name) = entry.file_name().to_str() {
|
||||
if SKIP_DIRS.contains(&name) {
|
||||
// WalkDir does not support in-place skip, but we simply
|
||||
// won't match anything under these dirs because we check
|
||||
// the dir name on each entry. We continue and let non-file
|
||||
// entries fall through.
|
||||
continue;
|
||||
}
|
||||
}
|
||||
continue; // Only interested in files.
|
||||
}
|
||||
|
||||
if !entry.file_type().is_file() {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Check that no ancestor directory is in the skip list.
|
||||
let abs_path = entry.path();
|
||||
if has_skipped_ancestor(abs_path, root) {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Match relative path against the glob set.
|
||||
let rel = match abs_path.strip_prefix(root) {
|
||||
Ok(r) => r,
|
||||
Err(_) => continue,
|
||||
};
|
||||
|
||||
if glob_set.is_match(rel) {
|
||||
result.push(abs_path.to_path_buf());
|
||||
}
|
||||
}
|
||||
|
||||
Ok(result)
|
||||
}
|
||||
|
||||
/// Returns `true` if any path component between `root` and `path` is in
|
||||
/// [`SKIP_DIRS`].
|
||||
fn has_skipped_ancestor(path: &Path, root: &Path) -> bool {
|
||||
if let Ok(rel) = path.strip_prefix(root) {
|
||||
for component in rel.parent().into_iter().flat_map(|p| p.components()) {
|
||||
if let Some(name) = component.as_os_str().to_str() {
|
||||
if SKIP_DIRS.contains(&name) {
|
||||
return true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
false
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Single-file parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Parse a single discovered file. Auto-detects format from extension.
|
||||
/// Returns an empty `Vec` if the format cannot be determined (e.g. `.key`,
|
||||
/// `.pem` — opaque/binary files).
|
||||
fn parse_discovered_file(path: &Path) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let format = match FileFormat::from_path(path) {
|
||||
Some(fmt) => fmt,
|
||||
None => return Ok(Vec::new()), // opaque file — skip
|
||||
};
|
||||
parser::parse_secret_file(path, Some(format), None)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Key-pattern filtering
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Keep only entries whose key matches the effective key-include patterns
|
||||
/// from the configuration.
|
||||
fn filter_by_key_patterns(entries: Vec<SecretEntry>, config: &SecretsConfig) -> Vec<SecretEntry> {
|
||||
entries
|
||||
.into_iter()
|
||||
.filter(|e| config.key_matches(&e.key))
|
||||
.collect()
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn empty_manifest() {
|
||||
let m = Manifest::empty();
|
||||
assert!(m.is_empty());
|
||||
assert_eq!(m.len(), 0);
|
||||
assert!(m.entries().is_empty());
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,15 @@
|
||||
//! Secret-filtering configuration (`.botsecrets` files), multi-format
|
||||
//! secret file parsing, and heuristic scanning.
|
||||
|
||||
pub mod config;
|
||||
pub mod manifest;
|
||||
pub mod parser;
|
||||
pub mod patterns;
|
||||
pub mod redactor;
|
||||
pub mod scanner;
|
||||
|
||||
pub use config::SecretsConfig;
|
||||
pub use manifest::{Manifest, ManifestError};
|
||||
pub use parser::{parse_secret_file, FileFormat, ParseError, SecretEntry};
|
||||
pub use redactor::{RedactedText, Redaction, Redactor};
|
||||
pub use scanner::{Confidence, Finding, Scanner};
|
||||
@@ -0,0 +1,517 @@
|
||||
//! Multi-format secret file parser.
|
||||
//!
|
||||
//! Reads secret files (`.env`, TOML, JSON, YAML, Python assignments,
|
||||
//! Java `.properties`) and extracts key-value pairs as [`SecretEntry`] items.
|
||||
//! Nested structures are flattened with dot-separated keys.
|
||||
|
||||
use globset::Glob;
|
||||
use std::path::{Path, PathBuf};
|
||||
use thiserror::Error;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Errors
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ParseError {
|
||||
#[error("io error reading {path}: {source}")]
|
||||
Io {
|
||||
path: PathBuf,
|
||||
source: std::io::Error,
|
||||
},
|
||||
#[error("parse error in {path}: {message}")]
|
||||
Format { path: PathBuf, message: String },
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Types
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// A single secret extracted from a file.
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct SecretEntry {
|
||||
/// The key name (e.g. `"DATABASE_URL"`, `"spring.datasource.password"`).
|
||||
pub key: String,
|
||||
/// The secret value.
|
||||
pub value: String,
|
||||
/// Which file the entry came from.
|
||||
pub source: PathBuf,
|
||||
}
|
||||
|
||||
/// Supported secret-file formats.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum FileFormat {
|
||||
/// `.env` / dotenv files.
|
||||
Env,
|
||||
/// TOML files (e.g. `Secrets.toml`).
|
||||
Toml,
|
||||
/// JSON files.
|
||||
Json,
|
||||
/// YAML files.
|
||||
Yaml,
|
||||
/// Python-style assignments: `KEY = "value"` or `KEY = 'value'`.
|
||||
PythonAssignments,
|
||||
/// Java `.properties` files: `key=value` or `key: value`.
|
||||
Properties,
|
||||
}
|
||||
|
||||
impl FileFormat {
|
||||
/// Guess format from file extension/name.
|
||||
pub fn from_path(path: &Path) -> Option<Self> {
|
||||
let name = path.file_name()?.to_str()?;
|
||||
let ext = path.extension().and_then(|e| e.to_str());
|
||||
|
||||
// .env, .env.local, .env.production, etc.
|
||||
if name.starts_with(".env") || name.ends_with(".env") {
|
||||
return Some(Self::Env);
|
||||
}
|
||||
|
||||
match ext {
|
||||
Some("toml") => Some(Self::Toml),
|
||||
Some("json") => Some(Self::Json),
|
||||
Some("yaml" | "yml") => Some(Self::Yaml),
|
||||
Some("py") => Some(Self::PythonAssignments),
|
||||
Some("properties") => Some(Self::Properties),
|
||||
// .key, .pem, etc. are binary/opaque — not parseable as key-value.
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse from the `format` string used in `.botsecrets` `[[file]]` overrides.
|
||||
pub fn from_hint(hint: &str) -> Option<Self> {
|
||||
match hint {
|
||||
"env" | "dotenv" => Some(Self::Env),
|
||||
"toml" => Some(Self::Toml),
|
||||
"json" => Some(Self::Json),
|
||||
"yaml" | "yml" => Some(Self::Yaml),
|
||||
"python-assignments" | "python" => Some(Self::PythonAssignments),
|
||||
"properties" | "java-properties" => Some(Self::Properties),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Public API
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Parse a secret file and extract key-value entries.
|
||||
///
|
||||
/// If `format` is `None`, auto-detects from path extension.
|
||||
/// If `key_filter` is `Some`, only entries whose keys match at least one
|
||||
/// glob pattern are returned.
|
||||
pub fn parse_secret_file(
|
||||
path: &Path,
|
||||
format: Option<FileFormat>,
|
||||
key_filter: Option<&[String]>,
|
||||
) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let content = std::fs::read_to_string(path).map_err(|e| ParseError::Io {
|
||||
path: path.to_path_buf(),
|
||||
source: e,
|
||||
})?;
|
||||
|
||||
let fmt = format
|
||||
.or_else(|| FileFormat::from_path(path))
|
||||
.ok_or_else(|| ParseError::Format {
|
||||
path: path.to_path_buf(),
|
||||
message: "cannot determine file format".into(),
|
||||
})?;
|
||||
|
||||
let entries = parse_content(&content, fmt, path)?;
|
||||
|
||||
match key_filter {
|
||||
Some(keys) => Ok(filter_entries(entries, keys)),
|
||||
None => Ok(entries),
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse content string without reading from disk (useful for testing).
|
||||
pub fn parse_content(
|
||||
content: &str,
|
||||
format: FileFormat,
|
||||
source: &Path,
|
||||
) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
match format {
|
||||
FileFormat::Env => parse_env(content, source),
|
||||
FileFormat::Toml => parse_toml(content, source),
|
||||
FileFormat::Json => parse_json(content, source),
|
||||
FileFormat::Yaml => parse_yaml(content, source),
|
||||
FileFormat::PythonAssignments => parse_python_assignments(content, source),
|
||||
FileFormat::Properties => parse_properties(content, source),
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Format parsers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Parse `.env` / dotenv files.
|
||||
///
|
||||
/// Supports `KEY=VALUE`, `KEY="VALUE"`, `KEY='VALUE'`, and the `export`
|
||||
/// prefix. Comments (`#`) and empty lines are skipped.
|
||||
fn parse_env(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let mut entries = Vec::new();
|
||||
|
||||
for line in content.lines() {
|
||||
let trimmed = line.trim();
|
||||
|
||||
// Skip blank lines and comments.
|
||||
if trimmed.is_empty() || trimmed.starts_with('#') {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Strip optional `export ` prefix.
|
||||
let trimmed = trimmed
|
||||
.strip_prefix("export ")
|
||||
.or_else(|| trimmed.strip_prefix("export\t"))
|
||||
.unwrap_or(trimmed);
|
||||
|
||||
// Split on first `=`.
|
||||
let Some((key, raw_value)) = trimmed.split_once('=') else {
|
||||
continue;
|
||||
};
|
||||
|
||||
let key = key.trim().to_string();
|
||||
if key.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
let value = strip_env_value(raw_value);
|
||||
|
||||
entries.push(SecretEntry {
|
||||
key,
|
||||
value,
|
||||
source: source.to_path_buf(),
|
||||
});
|
||||
}
|
||||
|
||||
Ok(entries)
|
||||
}
|
||||
|
||||
/// Strip surrounding quotes and trailing whitespace from an env value.
|
||||
fn strip_env_value(raw: &str) -> String {
|
||||
let trimmed = raw.trim();
|
||||
|
||||
// Double-quoted value.
|
||||
if trimmed.starts_with('"') && trimmed.ends_with('"') && trimmed.len() >= 2 {
|
||||
let inner = &trimmed[1..trimmed.len() - 1];
|
||||
// Interpret common escape sequences.
|
||||
return inner.replace("\\n", "\n").replace("\\t", "\t");
|
||||
}
|
||||
|
||||
// Single-quoted value (literal, no escapes).
|
||||
if trimmed.starts_with('\'') && trimmed.ends_with('\'') && trimmed.len() >= 2 {
|
||||
return trimmed[1..trimmed.len() - 1].to_string();
|
||||
}
|
||||
|
||||
// Unquoted — trim trailing whitespace (already trimmed above) and strip
|
||||
// inline comments.
|
||||
if let Some(pos) = trimmed.find(" #") {
|
||||
trimmed[..pos].trim_end().to_string()
|
||||
} else {
|
||||
trimmed.to_string()
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse TOML files. Nested tables are flattened with dot separators.
|
||||
/// Only string values are extracted.
|
||||
fn parse_toml(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let table: toml::Value = content.parse().map_err(|e: toml::de::Error| ParseError::Format {
|
||||
path: source.to_path_buf(),
|
||||
message: e.to_string(),
|
||||
})?;
|
||||
|
||||
let mut entries = Vec::new();
|
||||
flatten_toml_value(&table, "", source, &mut entries);
|
||||
Ok(entries)
|
||||
}
|
||||
|
||||
fn flatten_toml_value(
|
||||
value: &toml::Value,
|
||||
prefix: &str,
|
||||
source: &Path,
|
||||
entries: &mut Vec<SecretEntry>,
|
||||
) {
|
||||
match value {
|
||||
toml::Value::String(s) => {
|
||||
if !prefix.is_empty() {
|
||||
entries.push(SecretEntry {
|
||||
key: prefix.to_string(),
|
||||
value: s.clone(),
|
||||
source: source.to_path_buf(),
|
||||
});
|
||||
}
|
||||
}
|
||||
toml::Value::Table(map) => {
|
||||
for (k, v) in map {
|
||||
let key = if prefix.is_empty() {
|
||||
k.clone()
|
||||
} else {
|
||||
format!("{prefix}.{k}")
|
||||
};
|
||||
flatten_toml_value(v, &key, source, entries);
|
||||
}
|
||||
}
|
||||
toml::Value::Array(arr) => {
|
||||
for (i, v) in arr.iter().enumerate() {
|
||||
let key = if prefix.is_empty() {
|
||||
i.to_string()
|
||||
} else {
|
||||
format!("{prefix}.{i}")
|
||||
};
|
||||
flatten_toml_value(v, &key, source, entries);
|
||||
}
|
||||
}
|
||||
// Integer, Float, Boolean, Datetime — skip, not secrets.
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse JSON files. Nested objects are flattened with dot separators.
|
||||
/// Arrays use numeric indices. Only string values are extracted.
|
||||
fn parse_json(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let value: serde_json::Value =
|
||||
serde_json::from_str(content).map_err(|e| ParseError::Format {
|
||||
path: source.to_path_buf(),
|
||||
message: e.to_string(),
|
||||
})?;
|
||||
|
||||
let mut entries = Vec::new();
|
||||
flatten_json_value(&value, "", source, &mut entries);
|
||||
Ok(entries)
|
||||
}
|
||||
|
||||
fn flatten_json_value(
|
||||
value: &serde_json::Value,
|
||||
prefix: &str,
|
||||
source: &Path,
|
||||
entries: &mut Vec<SecretEntry>,
|
||||
) {
|
||||
match value {
|
||||
serde_json::Value::String(s) => {
|
||||
if !prefix.is_empty() {
|
||||
entries.push(SecretEntry {
|
||||
key: prefix.to_string(),
|
||||
value: s.clone(),
|
||||
source: source.to_path_buf(),
|
||||
});
|
||||
}
|
||||
}
|
||||
serde_json::Value::Object(map) => {
|
||||
for (k, v) in map {
|
||||
let key = if prefix.is_empty() {
|
||||
k.clone()
|
||||
} else {
|
||||
format!("{prefix}.{k}")
|
||||
};
|
||||
flatten_json_value(v, &key, source, entries);
|
||||
}
|
||||
}
|
||||
serde_json::Value::Array(arr) => {
|
||||
for (i, v) in arr.iter().enumerate() {
|
||||
let key = if prefix.is_empty() {
|
||||
i.to_string()
|
||||
} else {
|
||||
format!("{prefix}.{i}")
|
||||
};
|
||||
flatten_json_value(v, &key, source, entries);
|
||||
}
|
||||
}
|
||||
// Number, Bool, Null — skip.
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse YAML files. Nested mappings are flattened with dot separators.
|
||||
/// Only string values are extracted.
|
||||
fn parse_yaml(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let value: serde_yaml::Value =
|
||||
serde_yaml::from_str(content).map_err(|e| ParseError::Format {
|
||||
path: source.to_path_buf(),
|
||||
message: e.to_string(),
|
||||
})?;
|
||||
|
||||
let mut entries = Vec::new();
|
||||
flatten_yaml_value(&value, "", source, &mut entries);
|
||||
Ok(entries)
|
||||
}
|
||||
|
||||
fn flatten_yaml_value(
|
||||
value: &serde_yaml::Value,
|
||||
prefix: &str,
|
||||
source: &Path,
|
||||
entries: &mut Vec<SecretEntry>,
|
||||
) {
|
||||
match value {
|
||||
serde_yaml::Value::String(s) => {
|
||||
if !prefix.is_empty() {
|
||||
entries.push(SecretEntry {
|
||||
key: prefix.to_string(),
|
||||
value: s.clone(),
|
||||
source: source.to_path_buf(),
|
||||
});
|
||||
}
|
||||
}
|
||||
serde_yaml::Value::Mapping(map) => {
|
||||
for (k, v) in map {
|
||||
let k_str = match k {
|
||||
serde_yaml::Value::String(s) => s.clone(),
|
||||
other => format!("{other:?}"),
|
||||
};
|
||||
let key = if prefix.is_empty() {
|
||||
k_str
|
||||
} else {
|
||||
format!("{prefix}.{k_str}")
|
||||
};
|
||||
flatten_yaml_value(v, &key, source, entries);
|
||||
}
|
||||
}
|
||||
serde_yaml::Value::Sequence(arr) => {
|
||||
for (i, v) in arr.iter().enumerate() {
|
||||
let key = if prefix.is_empty() {
|
||||
i.to_string()
|
||||
} else {
|
||||
format!("{prefix}.{i}")
|
||||
};
|
||||
flatten_yaml_value(v, &key, source, entries);
|
||||
}
|
||||
}
|
||||
// Number, Bool, Null, Tagged — skip.
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
/// Parse Python-style assignment lines: `KEY = "value"` or `KEY = 'value'`.
|
||||
///
|
||||
/// This is heuristic — lines that don't match the pattern are silently skipped.
|
||||
fn parse_python_assignments(
|
||||
content: &str,
|
||||
source: &Path,
|
||||
) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let re = regex::Regex::new(r#"(?i)^([A-Z_][A-Z0-9_]*)\s*=\s*['"](.+?)['"]\s*$"#)
|
||||
.expect("valid regex");
|
||||
|
||||
let mut entries = Vec::new();
|
||||
|
||||
for line in content.lines() {
|
||||
let trimmed = line.trim();
|
||||
if trimmed.is_empty() || trimmed.starts_with('#') {
|
||||
continue;
|
||||
}
|
||||
if let Some(caps) = re.captures(trimmed) {
|
||||
entries.push(SecretEntry {
|
||||
key: caps[1].to_string(),
|
||||
value: caps[2].to_string(),
|
||||
source: source.to_path_buf(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
Ok(entries)
|
||||
}
|
||||
|
||||
/// Parse Java `.properties` files.
|
||||
///
|
||||
/// Supports `key=value`, `key: value`, `key value` (space separator).
|
||||
/// Lines starting with `#` or `!` are comments. Continuation lines ending
|
||||
/// with `\` are joined.
|
||||
fn parse_properties(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
|
||||
let mut entries = Vec::new();
|
||||
let mut lines = content.lines().peekable();
|
||||
|
||||
while let Some(line) = lines.next() {
|
||||
let trimmed = line.trim();
|
||||
|
||||
// Skip blank lines and comments.
|
||||
if trimmed.is_empty() || trimmed.starts_with('#') || trimmed.starts_with('!') {
|
||||
continue;
|
||||
}
|
||||
|
||||
// Handle continuation lines (trailing `\`).
|
||||
let mut logical_line = String::new();
|
||||
let mut current = trimmed.to_string();
|
||||
while current.ends_with('\\') {
|
||||
// Remove trailing backslash and append next line.
|
||||
logical_line.push_str(¤t[..current.len() - 1]);
|
||||
current = lines
|
||||
.next()
|
||||
.map(|l| l.trim_start().to_string())
|
||||
.unwrap_or_default();
|
||||
}
|
||||
logical_line.push_str(¤t);
|
||||
|
||||
// Split on first `=`, `:`, or whitespace.
|
||||
let (key, value) = split_property_line(&logical_line);
|
||||
if key.is_empty() {
|
||||
continue;
|
||||
}
|
||||
|
||||
entries.push(SecretEntry {
|
||||
key,
|
||||
value,
|
||||
source: source.to_path_buf(),
|
||||
});
|
||||
}
|
||||
|
||||
Ok(entries)
|
||||
}
|
||||
|
||||
/// Split a logical properties line into (key, value).
|
||||
/// Recognises `=`, `:`, or whitespace as the separator.
|
||||
fn split_property_line(line: &str) -> (String, String) {
|
||||
// Find the first unescaped separator.
|
||||
let mut i = 0;
|
||||
let bytes = line.as_bytes();
|
||||
let len = bytes.len();
|
||||
|
||||
while i < len {
|
||||
// Skip escaped characters.
|
||||
if bytes[i] == b'\\' {
|
||||
i += 2;
|
||||
continue;
|
||||
}
|
||||
if bytes[i] == b'=' || bytes[i] == b':' {
|
||||
let key = line[..i].trim().to_string();
|
||||
let value = line[i + 1..].trim().to_string();
|
||||
return (key, value);
|
||||
}
|
||||
if bytes[i] == b' ' || bytes[i] == b'\t' {
|
||||
let key = line[..i].trim().to_string();
|
||||
let value = line[i..].trim().to_string();
|
||||
return (key, value);
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
|
||||
// No separator found — the entire line is a key with an empty value.
|
||||
(line.trim().to_string(), String::new())
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Key filtering
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Filter entries by glob patterns (case-insensitive).
|
||||
fn filter_entries(entries: Vec<SecretEntry>, patterns: &[String]) -> Vec<SecretEntry> {
|
||||
// Pre-compile matchers.
|
||||
let matchers: Vec<_> = patterns
|
||||
.iter()
|
||||
.filter_map(|p| {
|
||||
Glob::new(&p.to_ascii_uppercase())
|
||||
.ok()
|
||||
.map(|g| g.compile_matcher())
|
||||
})
|
||||
.collect();
|
||||
|
||||
if matchers.is_empty() {
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
entries
|
||||
.into_iter()
|
||||
.filter(|entry| {
|
||||
let upper = entry.key.to_ascii_uppercase();
|
||||
matchers.iter().any(|m| m.is_match(&upper))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
@@ -0,0 +1,258 @@
|
||||
//! Built-in regex patterns for heuristic secret detection.
|
||||
//!
|
||||
//! Rules are derived from [gitleaks](https://github.com/gitleaks/gitleaks) (MIT license)
|
||||
//! and curated for high-confidence detection in AI agent output streams.
|
||||
|
||||
use std::borrow::Cow;
|
||||
|
||||
/// A single detection rule.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct DetectionRule {
|
||||
/// Unique identifier (e.g. `"aws-access-key"`, `"github-pat"`).
|
||||
pub id: Cow<'static, str>,
|
||||
/// Human-readable description.
|
||||
pub description: Cow<'static, str>,
|
||||
/// Regex pattern string.
|
||||
pub pattern: Cow<'static, str>,
|
||||
/// Minimum Shannon entropy threshold for matched text.
|
||||
/// `None` means no entropy check — the pattern alone is sufficient.
|
||||
pub entropy_threshold: Option<f64>,
|
||||
}
|
||||
|
||||
/// Returns the built-in detection rules (gitleaks-derived, MIT licensed).
|
||||
pub fn builtin_rules() -> &'static [DetectionRule] {
|
||||
&RULES
|
||||
}
|
||||
|
||||
/// Convenience macro to define a static rule with `Cow::Borrowed`.
|
||||
macro_rules! rule {
|
||||
($id:expr, $desc:expr, $pat:expr) => {
|
||||
DetectionRule {
|
||||
id: Cow::Borrowed($id),
|
||||
description: Cow::Borrowed($desc),
|
||||
pattern: Cow::Borrowed($pat),
|
||||
entropy_threshold: None,
|
||||
}
|
||||
};
|
||||
($id:expr, $desc:expr, $pat:expr, entropy: $threshold:expr) => {
|
||||
DetectionRule {
|
||||
id: Cow::Borrowed($id),
|
||||
description: Cow::Borrowed($desc),
|
||||
pattern: Cow::Borrowed($pat),
|
||||
entropy_threshold: Some($threshold),
|
||||
}
|
||||
};
|
||||
}
|
||||
|
||||
static RULES: [DetectionRule; 35] = [
|
||||
// -----------------------------------------------------------------------
|
||||
// Cloud provider keys
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"aws-access-key",
|
||||
"AWS Access Key ID",
|
||||
r"(?:A3T[A-Z0-9]|AKIA|ASIA|ABIA|ACCA)[A-Z2-7]{16}"
|
||||
),
|
||||
rule!(
|
||||
"aws-secret-key",
|
||||
"AWS Secret Access Key (near assignment)",
|
||||
r"(?i)aws[_\-\.]?secret[_\-\.]?access[_\-\.]?key[\s]*[=:\s]+[\s]*['\x22]?([A-Za-z0-9/+=]{40})['\x22]?"
|
||||
),
|
||||
rule!(
|
||||
"gcp-api-key",
|
||||
"GCP API Key",
|
||||
r"AIza[0-9A-Za-z\-_]{35}"
|
||||
),
|
||||
rule!(
|
||||
"gcp-service-account",
|
||||
"GCP Service Account JSON",
|
||||
r#"\x22type\x22\s*:\s*\x22service_account\x22"#
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Code hosting tokens
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"github-pat-fine-grained",
|
||||
"GitHub Fine-Grained Personal Access Token",
|
||||
r"github_pat_[A-Za-z0-9_]{82}"
|
||||
),
|
||||
rule!(
|
||||
"github-pat-classic",
|
||||
"GitHub Classic Personal Access Token",
|
||||
r"ghp_[A-Za-z0-9]{36}"
|
||||
),
|
||||
rule!(
|
||||
"github-oauth",
|
||||
"GitHub OAuth Access Token",
|
||||
r"gho_[A-Za-z0-9]{36}"
|
||||
),
|
||||
rule!(
|
||||
"github-app-user-token",
|
||||
"GitHub App User-to-Server Token",
|
||||
r"ghu_[A-Za-z0-9]{36}"
|
||||
),
|
||||
rule!(
|
||||
"github-app-server-token",
|
||||
"GitHub App Server-to-Server Token",
|
||||
r"ghs_[A-Za-z0-9]{36}"
|
||||
),
|
||||
rule!(
|
||||
"gitlab-pat",
|
||||
"GitLab Personal Access Token",
|
||||
r"glpat-[A-Za-z0-9\-_]{20,}"
|
||||
),
|
||||
rule!(
|
||||
"gitlab-pipeline-token",
|
||||
"GitLab Pipeline Trigger Token",
|
||||
r"glptt-[A-Za-z0-9\-_]{20,}"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Payment
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"stripe-secret-key",
|
||||
"Stripe Secret Key",
|
||||
r"sk_live_[A-Za-z0-9]{24,}"
|
||||
),
|
||||
rule!(
|
||||
"stripe-restricted-key",
|
||||
"Stripe Restricted Key",
|
||||
r"rk_live_[A-Za-z0-9]{24,}"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Communication
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"slack-bot-token",
|
||||
"Slack Bot Token",
|
||||
r"xoxb-[0-9]{10,}-[0-9]{10,}-[A-Za-z0-9]{24,}"
|
||||
),
|
||||
rule!(
|
||||
"slack-user-token",
|
||||
"Slack User Token",
|
||||
r"xoxp-[0-9]{10,}-[0-9]{10,}-[0-9]{10,}-[a-z0-9]{32}"
|
||||
),
|
||||
rule!(
|
||||
"slack-webhook",
|
||||
"Slack Incoming Webhook URL",
|
||||
r"https://hooks\.slack\.com/services/T[A-Z0-9]{8,}/B[A-Z0-9]{8,}/[A-Za-z0-9]{24,}"
|
||||
),
|
||||
rule!(
|
||||
"twilio-api-key",
|
||||
"Twilio API Key",
|
||||
r"SK[a-f0-9]{32}"
|
||||
),
|
||||
rule!(
|
||||
"sendgrid-api-key",
|
||||
"SendGrid API Key",
|
||||
r"SG\.[A-Za-z0-9_\-]{22}\.[A-Za-z0-9_\-]{43}"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Auth / Identity
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"jwt",
|
||||
"JSON Web Token",
|
||||
r"eyJ[A-Za-z0-9_-]{10,}\.eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_\-]{10,}"
|
||||
),
|
||||
rule!(
|
||||
"bearer-token",
|
||||
"Bearer Token in Authorization Header",
|
||||
r"(?i)bearer\s+[A-Za-z0-9\-._~+/]+=*"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Cryptographic material
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"private-key-header",
|
||||
"Private Key (PEM Header)",
|
||||
r"-----BEGIN\s?(?:RSA |DSA |EC |PGP |OPENSSH )?PRIVATE KEY-----"
|
||||
),
|
||||
rule!(
|
||||
"pgp-private-key",
|
||||
"PGP Private Key Block",
|
||||
r"-----BEGIN PGP PRIVATE KEY BLOCK-----"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Database
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"database-connection-url",
|
||||
"Database Connection URL with Credentials",
|
||||
r"(?i)(?:postgres|mysql|mongodb|redis|amqp)://[^:\s]+:[^@\s]+@[^\s]+"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Infrastructure
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"heroku-api-key",
|
||||
"Heroku API Key",
|
||||
r"(?i)heroku[_\-\.]?api[_\-\.]?key[\s]*[=:\s]+[\s]*[A-Fa-f0-9]{8}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{12}"
|
||||
),
|
||||
rule!(
|
||||
"npm-token",
|
||||
"npm Access Token",
|
||||
r"(?i)npm_[A-Za-z0-9]{36}"
|
||||
),
|
||||
rule!(
|
||||
"pypi-token",
|
||||
"PyPI API Token",
|
||||
r"pypi-[A-Za-z0-9_\-]{50,}"
|
||||
),
|
||||
rule!(
|
||||
"docker-hub-token",
|
||||
"Docker Hub Personal Access Token",
|
||||
r"dckr_pat_[A-Za-z0-9_\-]{27,}"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// AI / ML
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"openai-api-key-legacy",
|
||||
"OpenAI API Key (Legacy Format)",
|
||||
r"sk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}"
|
||||
),
|
||||
rule!(
|
||||
"openai-project-key",
|
||||
"OpenAI Project API Key",
|
||||
r"sk-proj-[A-Za-z0-9\-_]{40,}"
|
||||
),
|
||||
rule!(
|
||||
"anthropic-api-key",
|
||||
"Anthropic API Key",
|
||||
r"sk-ant-[A-Za-z0-9\-_]{40,}"
|
||||
),
|
||||
// -----------------------------------------------------------------------
|
||||
// Generic patterns (entropy-gated)
|
||||
// -----------------------------------------------------------------------
|
||||
rule!(
|
||||
"generic-api-key",
|
||||
"Generic API Key Assignment",
|
||||
r"(?i)(?:api[_\-]?key|apikey)[\s]*[=:]\s*['\x22]?([A-Za-z0-9_\-]{20,})['\x22]?",
|
||||
entropy: 3.5
|
||||
),
|
||||
rule!(
|
||||
"generic-secret",
|
||||
"Generic Secret/Password/Token Assignment",
|
||||
r"(?i)(?:secret|password|passwd|token)[\s]*[=:]\s*['\x22]?([^\s'\x22]{8,})['\x22]?",
|
||||
entropy: 3.0
|
||||
),
|
||||
rule!(
|
||||
"generic-private-key",
|
||||
"Generic Private Key Assignment",
|
||||
r"(?i)private[_\-]?key[\s]*[=:]\s*['\x22]?([^\s'\x22]{20,})['\x22]?",
|
||||
entropy: 3.5
|
||||
),
|
||||
rule!(
|
||||
"high-entropy-hex",
|
||||
"High-Entropy Hex String (32+ chars)",
|
||||
r"(?i)[=:]\s*['\x22]?([0-9a-f]{32,})['\x22]?",
|
||||
entropy: 3.5
|
||||
),
|
||||
rule!(
|
||||
"high-entropy-base64",
|
||||
"High-Entropy Base64 String (24+ chars)",
|
||||
r"(?i)[=:]\s*['\x22]?([A-Za-z0-9+/]{24,}={0,3})['\x22]?",
|
||||
entropy: 4.0
|
||||
),
|
||||
];
|
||||
@@ -0,0 +1,172 @@
|
||||
//! Secret value redactor.
|
||||
//!
|
||||
//! Takes the known-secrets [`Manifest`] and efficiently replaces every
|
||||
//! occurrence of a secret value in arbitrary text using an Aho-Corasick
|
||||
//! automaton for multi-pattern matching.
|
||||
|
||||
use aho_corasick::AhoCorasick;
|
||||
|
||||
use super::config::RedactionStyle;
|
||||
use super::manifest::Manifest;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Output types
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// A redaction event -- records what was replaced and where.
|
||||
#[derive(Debug, Clone, PartialEq, Eq)]
|
||||
pub struct Redaction {
|
||||
/// The key name of the redacted secret.
|
||||
pub key: String,
|
||||
/// Byte offset in the *original* text where the match starts.
|
||||
pub offset: usize,
|
||||
/// Length (in bytes) of the original secret value that was replaced.
|
||||
pub original_len: usize,
|
||||
}
|
||||
|
||||
/// The result of redacting text.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct RedactedText {
|
||||
/// The text with secret values replaced.
|
||||
pub text: String,
|
||||
/// List of redactions that were applied (in order of occurrence).
|
||||
pub redactions: Vec<Redaction>,
|
||||
}
|
||||
|
||||
impl RedactedText {
|
||||
/// Returns `true` if any redactions were made.
|
||||
pub fn was_redacted(&self) -> bool {
|
||||
!self.redactions.is_empty()
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Redactor
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Replaces known secret values in text with configurable placeholders.
|
||||
///
|
||||
/// Construction is cheap when the manifest is empty and O(n) in the total
|
||||
/// length of secret values otherwise (Aho-Corasick automaton build).
|
||||
/// Redaction itself is O(n) in the length of the input text.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Redactor {
|
||||
/// The Aho-Corasick automaton for multi-pattern matching.
|
||||
/// `None` when the manifest is empty (no-op fast path).
|
||||
automaton: Option<AhoCorasick>,
|
||||
/// Secret entries parallel to the automaton patterns.
|
||||
/// Index `i` in the automaton corresponds to `entries[i]`.
|
||||
entries: Vec<RedactorEntry>,
|
||||
/// How to format replacements.
|
||||
style: RedactionStyle,
|
||||
}
|
||||
|
||||
/// Internal entry -- stores info needed for replacement formatting.
|
||||
#[derive(Debug, Clone)]
|
||||
struct RedactorEntry {
|
||||
key: String,
|
||||
value_len: usize,
|
||||
}
|
||||
|
||||
impl Redactor {
|
||||
/// Build a redactor from a manifest and redaction style.
|
||||
///
|
||||
/// The manifest entries are already sorted by value length descending,
|
||||
/// but Aho-Corasick with `LeftmostLongest` handles overlap correctly
|
||||
/// regardless of input order.
|
||||
pub fn new(manifest: &Manifest, style: RedactionStyle) -> Self {
|
||||
let secrets = manifest.entries();
|
||||
if secrets.is_empty() {
|
||||
return Self {
|
||||
automaton: None,
|
||||
entries: Vec::new(),
|
||||
style,
|
||||
};
|
||||
}
|
||||
|
||||
// Build patterns from secret *values* (not keys).
|
||||
let patterns: Vec<&str> = secrets.iter().map(|e| e.value.as_str()).collect();
|
||||
let entries: Vec<RedactorEntry> = secrets
|
||||
.iter()
|
||||
.map(|e| RedactorEntry {
|
||||
key: e.key.clone(),
|
||||
value_len: e.value.len(),
|
||||
})
|
||||
.collect();
|
||||
|
||||
// LeftmostLongest ensures that when one secret value is a substring
|
||||
// of another, the longer match wins.
|
||||
let automaton = AhoCorasick::builder()
|
||||
.match_kind(aho_corasick::MatchKind::LeftmostLongest)
|
||||
.build(&patterns)
|
||||
.ok(); // If build fails (shouldn't for valid strings), fall back to no-op.
|
||||
|
||||
Self {
|
||||
automaton,
|
||||
entries,
|
||||
style,
|
||||
}
|
||||
}
|
||||
|
||||
/// Redact all known secret values in the input text.
|
||||
///
|
||||
/// Returns the redacted text together with metadata about each
|
||||
/// replacement (key name, byte offset, original length).
|
||||
pub fn redact(&self, text: &str) -> RedactedText {
|
||||
let automaton = match &self.automaton {
|
||||
Some(a) => a,
|
||||
None => {
|
||||
return RedactedText {
|
||||
text: text.to_string(),
|
||||
redactions: Vec::new(),
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
let mut result = String::with_capacity(text.len());
|
||||
let mut redactions = Vec::new();
|
||||
let mut last_end = 0;
|
||||
|
||||
for mat in automaton.find_iter(text) {
|
||||
let entry = &self.entries[mat.pattern().as_usize()];
|
||||
|
||||
// Append text before the match.
|
||||
result.push_str(&text[last_end..mat.start()]);
|
||||
|
||||
// Append the replacement placeholder.
|
||||
let replacement = self.format_replacement(entry);
|
||||
result.push_str(&replacement);
|
||||
|
||||
redactions.push(Redaction {
|
||||
key: entry.key.clone(),
|
||||
offset: mat.start(),
|
||||
original_len: entry.value_len,
|
||||
});
|
||||
|
||||
last_end = mat.end();
|
||||
}
|
||||
|
||||
// Append remaining text after the last match.
|
||||
result.push_str(&text[last_end..]);
|
||||
|
||||
RedactedText {
|
||||
text: result,
|
||||
redactions,
|
||||
}
|
||||
}
|
||||
|
||||
/// Format the replacement string according to the configured style.
|
||||
fn format_replacement(&self, entry: &RedactorEntry) -> String {
|
||||
match self.style {
|
||||
RedactionStyle::Masked => "*****".to_string(),
|
||||
RedactionStyle::Typed => format!("<REDACTED:string:{}>", entry.value_len),
|
||||
RedactionStyle::Named => format!("<REDACTED:{}>", entry.key),
|
||||
RedactionStyle::Absent => String::new(),
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns `true` if this redactor has any secrets loaded.
|
||||
pub fn has_secrets(&self) -> bool {
|
||||
self.automaton.is_some()
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,250 @@
|
||||
//! Heuristic secret scanner using [`RegexSet`] for single-pass multi-pattern
|
||||
//! matching with optional Shannon entropy filtering.
|
||||
//!
|
||||
//! The scanner operates purely on text input — it has no knowledge of redaction,
|
||||
//! manifests, or file structure. Callers feed it text and receive [`Finding`]s.
|
||||
|
||||
use std::borrow::Cow;
|
||||
use std::ops::Range;
|
||||
|
||||
use regex::{Regex, RegexSet};
|
||||
use thiserror::Error;
|
||||
|
||||
use super::config::HeuristicConfig;
|
||||
use super::patterns::{self, DetectionRule};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Errors
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[derive(Debug, Error)]
|
||||
pub enum ScannerError {
|
||||
#[error("invalid regex pattern: {0}")]
|
||||
Regex(#[from] regex::Error),
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Finding types
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Confidence level of a finding.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
|
||||
pub enum Confidence {
|
||||
/// Specific provider pattern matched (e.g. `ghp_`, `AKIA`).
|
||||
High,
|
||||
/// Generic pattern matched and passed entropy threshold.
|
||||
Medium,
|
||||
/// Generic pattern matched but entropy was borderline.
|
||||
Low,
|
||||
}
|
||||
|
||||
/// A single potential secret detected in the input text.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct Finding {
|
||||
/// The matched text substring.
|
||||
pub matched_text: String,
|
||||
/// Which detection rule triggered this finding.
|
||||
pub pattern_id: String,
|
||||
/// Human-readable description of the rule.
|
||||
pub description: String,
|
||||
/// Confidence level.
|
||||
pub confidence: Confidence,
|
||||
/// Byte range in the input text.
|
||||
pub span: Range<usize>,
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Scanner
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Pre-compiled multi-pattern secret scanner.
|
||||
///
|
||||
/// Holds a [`RegexSet`] for fast "any match?" bulk filtering and parallel
|
||||
/// individual [`Regex`] instances for extracting match details and spans.
|
||||
pub struct Scanner {
|
||||
/// Pre-compiled set for fast "any match?" check.
|
||||
regex_set: RegexSet,
|
||||
/// Individual compiled regexes for extracting match details (parallel to `regex_set`).
|
||||
regexes: Vec<Regex>,
|
||||
/// Rule metadata (parallel to `regex_set`).
|
||||
rules: Vec<DetectionRule>,
|
||||
}
|
||||
|
||||
impl std::fmt::Debug for Scanner {
|
||||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
|
||||
f.debug_struct("Scanner")
|
||||
.field("rule_count", &self.rules.len())
|
||||
.finish()
|
||||
}
|
||||
}
|
||||
|
||||
impl Scanner {
|
||||
/// Build a scanner from the given heuristic configuration.
|
||||
///
|
||||
/// Includes all built-in rules plus any custom patterns from `config.patterns`.
|
||||
pub fn new(config: &HeuristicConfig) -> Result<Self, ScannerError> {
|
||||
let mut rules: Vec<DetectionRule> = patterns::builtin_rules().to_vec();
|
||||
|
||||
// Append custom patterns from config.
|
||||
for (i, pat) in config.patterns.iter().enumerate() {
|
||||
rules.push(DetectionRule {
|
||||
id: Cow::Owned(format!("custom-{i}")),
|
||||
description: Cow::Owned(format!("Custom pattern #{i}")),
|
||||
pattern: Cow::Owned(pat.clone()),
|
||||
entropy_threshold: None,
|
||||
});
|
||||
}
|
||||
|
||||
let pattern_strings: Vec<&str> = rules.iter().map(|r| r.pattern.as_ref()).collect();
|
||||
|
||||
let regex_set = RegexSet::new(&pattern_strings)?;
|
||||
let regexes = pattern_strings
|
||||
.iter()
|
||||
.map(|p| Regex::new(p))
|
||||
.collect::<Result<Vec<_>, _>>()?;
|
||||
|
||||
Ok(Self {
|
||||
regex_set,
|
||||
regexes,
|
||||
rules,
|
||||
})
|
||||
}
|
||||
|
||||
/// Build a scanner with only the built-in rules (no custom patterns).
|
||||
pub fn builtin() -> Result<Self, ScannerError> {
|
||||
Self::new(&HeuristicConfig::default())
|
||||
}
|
||||
|
||||
/// Scan `text` for potential secrets.
|
||||
///
|
||||
/// Returns findings sorted by byte position with overlapping matches
|
||||
/// deduplicated (first match wins).
|
||||
pub fn scan(&self, text: &str) -> Vec<Finding> {
|
||||
let matches = self.regex_set.matches(text);
|
||||
if !matches.matched_any() {
|
||||
return Vec::new();
|
||||
}
|
||||
|
||||
let mut findings = Vec::new();
|
||||
|
||||
for idx in matches.iter() {
|
||||
let rule = &self.rules[idx];
|
||||
let regex = &self.regexes[idx];
|
||||
|
||||
for mat in regex.find_iter(text) {
|
||||
let matched_text = mat.as_str();
|
||||
|
||||
// Apply entropy threshold when configured.
|
||||
if let Some(threshold) = rule.entropy_threshold {
|
||||
if shannon_entropy(matched_text) < threshold {
|
||||
continue;
|
||||
}
|
||||
}
|
||||
|
||||
let confidence = if rule.entropy_threshold.is_some() {
|
||||
Confidence::Medium
|
||||
} else {
|
||||
Confidence::High
|
||||
};
|
||||
|
||||
findings.push(Finding {
|
||||
matched_text: matched_text.to_string(),
|
||||
pattern_id: rule.id.to_string(),
|
||||
description: rule.description.to_string(),
|
||||
confidence,
|
||||
span: mat.start()..mat.end(),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
// Sort by position, then deduplicate overlapping spans.
|
||||
findings.sort_by_key(|f| f.span.start);
|
||||
dedup_overlapping(&mut findings);
|
||||
findings
|
||||
}
|
||||
|
||||
/// Returns the number of active rules (built-in + custom).
|
||||
pub fn rule_count(&self) -> usize {
|
||||
self.rules.len()
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shannon entropy
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Calculate Shannon entropy of `s` in bits per character.
|
||||
///
|
||||
/// Returns 0.0 for empty strings. Maximum entropy for ASCII printable text
|
||||
/// is ~6.57 bits/char.
|
||||
pub fn shannon_entropy(s: &str) -> f64 {
|
||||
if s.is_empty() {
|
||||
return 0.0;
|
||||
}
|
||||
let mut freq = [0u32; 256];
|
||||
let len = s.len() as f64;
|
||||
for &b in s.as_bytes() {
|
||||
freq[b as usize] += 1;
|
||||
}
|
||||
freq.iter()
|
||||
.filter(|&&c| c > 0)
|
||||
.map(|&c| {
|
||||
let p = c as f64 / len;
|
||||
-p * p.log2()
|
||||
})
|
||||
.sum()
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Deduplication
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Remove findings whose span overlaps with an earlier (higher-priority) finding.
|
||||
///
|
||||
/// Input must be sorted by `span.start`. When two findings overlap, the one
|
||||
/// appearing first (lower start position) is kept.
|
||||
fn dedup_overlapping(findings: &mut Vec<Finding>) {
|
||||
let mut i = 0;
|
||||
while i < findings.len() {
|
||||
let end = findings[i].span.end;
|
||||
let mut j = i + 1;
|
||||
while j < findings.len() {
|
||||
if findings[j].span.start < end {
|
||||
findings.remove(j);
|
||||
} else {
|
||||
break;
|
||||
}
|
||||
}
|
||||
i += 1;
|
||||
}
|
||||
}
|
||||
|
||||
#[cfg(test)]
|
||||
mod tests {
|
||||
use super::*;
|
||||
|
||||
#[test]
|
||||
fn entropy_all_same_chars() {
|
||||
// All same characters → 0 entropy.
|
||||
assert!((shannon_entropy("aaaaaaaaaa") - 0.0).abs() < f64::EPSILON);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn entropy_two_equal_chars() {
|
||||
// "ab" repeated → exactly 1.0 bits/char.
|
||||
let e = shannon_entropy("abababababababababab");
|
||||
assert!((e - 1.0).abs() < 0.01);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn entropy_high_randomness() {
|
||||
// A string with many distinct characters should have high entropy.
|
||||
let s = "aB3$kL9!mZ7@wQ1#";
|
||||
assert!(shannon_entropy(s) > 3.5);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn entropy_empty_string() {
|
||||
assert!((shannon_entropy("") - 0.0).abs() < f64::EPSILON);
|
||||
}
|
||||
}
|
||||
+54
-5
@@ -1,10 +1,11 @@
|
||||
//! Claude Code hook adapter (PreToolUse).
|
||||
//! Claude Code hook adapter (PreToolUse + PostToolUse).
|
||||
//!
|
||||
//! Wire format: stdin is one JSON object with `tool_name` and `tool_input`.
|
||||
//! Wire format: stdin is one JSON object with `tool_name` and `tool_input`
|
||||
//! (and optionally `tool_response` for PostToolUse).
|
||||
//! Stdout is `{"hookSpecificOutput": {...}}` with exit code 0; the JSON
|
||||
//! carries the verdict.
|
||||
//! carries the verdict / updated output.
|
||||
|
||||
use super::{AdapterError, HarnessAdapter, PathKind, ToolCall, ToolOp};
|
||||
use super::{AdapterError, HarnessAdapter, PathKind, PostToolUsePayload, ToolCall, ToolOp};
|
||||
use crate::core::Decision;
|
||||
use serde_json::{json, Value};
|
||||
use std::path::PathBuf;
|
||||
@@ -16,6 +17,8 @@ impl HarnessAdapter for ClaudeAdapter {
|
||||
"claude"
|
||||
}
|
||||
|
||||
// -- PreToolUse --------------------------------------------------------
|
||||
|
||||
fn parse_request(&self, input: &[u8]) -> Result<ToolCall, AdapterError> {
|
||||
let v: Value = serde_json::from_slice(input)?;
|
||||
let tool_name = v
|
||||
@@ -39,7 +42,11 @@ impl HarnessAdapter for ClaudeAdapter {
|
||||
})
|
||||
}
|
||||
|
||||
fn render_decision(&self, _call: &ToolCall, decision: &Decision) -> Result<Vec<u8>, AdapterError> {
|
||||
fn render_decision(
|
||||
&self,
|
||||
_call: &ToolCall,
|
||||
decision: &Decision,
|
||||
) -> Result<Vec<u8>, AdapterError> {
|
||||
let (verdict, reason) = match decision {
|
||||
Decision::Allow => ("allow", String::new()),
|
||||
Decision::Ask(r) => ("ask", r.message.clone()),
|
||||
@@ -54,6 +61,48 @@ impl HarnessAdapter for ClaudeAdapter {
|
||||
});
|
||||
Ok(serde_json::to_vec(&out)?)
|
||||
}
|
||||
|
||||
// -- PostToolUse -------------------------------------------------------
|
||||
|
||||
fn parse_post_tool_use(&self, input: &[u8]) -> Result<PostToolUsePayload, AdapterError> {
|
||||
let v: Value = serde_json::from_slice(input)?;
|
||||
let tool_name = v
|
||||
.get("tool_name")
|
||||
.and_then(|x| x.as_str())
|
||||
.ok_or_else(|| AdapterError::Parse("missing tool_name".into()))?
|
||||
.to_string();
|
||||
let tool_input = v.get("tool_input").cloned().unwrap_or(Value::Null);
|
||||
let tool_response = v
|
||||
.get("tool_response")
|
||||
.and_then(|x| x.as_str())
|
||||
.unwrap_or("")
|
||||
.to_string();
|
||||
Ok(PostToolUsePayload {
|
||||
tool_name,
|
||||
tool_input,
|
||||
tool_response,
|
||||
raw: v,
|
||||
})
|
||||
}
|
||||
|
||||
fn render_post_tool_use(
|
||||
&self,
|
||||
_payload: &PostToolUsePayload,
|
||||
redacted_output: Option<&str>,
|
||||
) -> Result<Vec<u8>, AdapterError> {
|
||||
// When there are no changes, return `{}` — Claude Code interprets
|
||||
// an empty object as "use original output, no modifications".
|
||||
let out = match redacted_output {
|
||||
Some(text) => json!({
|
||||
"hookSpecificOutput": {
|
||||
"hookEventName": "PostToolUse",
|
||||
"updatedToolOutput": text,
|
||||
}
|
||||
}),
|
||||
None => json!({}),
|
||||
};
|
||||
Ok(serde_json::to_vec(&out)?)
|
||||
}
|
||||
}
|
||||
|
||||
fn path_op(tool_input: &Value, kind: PathKind) -> Result<ToolOp, AdapterError> {
|
||||
|
||||
+55
-3
@@ -42,16 +42,68 @@ pub enum PathKind {
|
||||
Write,
|
||||
}
|
||||
|
||||
/// Trait implemented by each harness adapter. Adapters parse the harness's
|
||||
/// hook stdin payload into `ToolCall` and render a `Decision` back to the
|
||||
/// harness's expected stdout format.
|
||||
/// A PostToolUse hook payload -- tool already executed, output available for
|
||||
/// inspection/redaction.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct PostToolUsePayload {
|
||||
/// Harness's tool name (e.g. "Read", "Bash").
|
||||
pub tool_name: String,
|
||||
/// The tool input that was originally provided.
|
||||
pub tool_input: serde_json::Value,
|
||||
/// The tool's output text that may contain secrets.
|
||||
pub tool_response: String,
|
||||
/// Original raw payload.
|
||||
pub raw: serde_json::Value,
|
||||
}
|
||||
|
||||
/// Hook event type discriminator.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum HookEvent {
|
||||
PreToolUse,
|
||||
PostToolUse,
|
||||
}
|
||||
|
||||
impl HookEvent {
|
||||
/// Parse a hook event name from CLI or payload strings.
|
||||
///
|
||||
/// Accepts both kebab-case (`pre-tool-use`) and PascalCase (`PreToolUse`).
|
||||
pub fn parse(s: &str) -> Option<Self> {
|
||||
match s {
|
||||
"pre-tool-use" | "PreToolUse" => Some(Self::PreToolUse),
|
||||
"post-tool-use" | "PostToolUse" => Some(Self::PostToolUse),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// Trait implemented by each harness adapter.
|
||||
///
|
||||
/// Adapters handle both PreToolUse (policy gate) and PostToolUse (output
|
||||
/// redaction) hook events.
|
||||
pub trait HarnessAdapter {
|
||||
/// The CLI name (e.g. "claude", "codex", "gemini").
|
||||
fn name(&self) -> &'static str;
|
||||
|
||||
// -- PreToolUse --------------------------------------------------------
|
||||
|
||||
/// Parse a PreToolUse hook payload into a normalized `ToolCall`.
|
||||
fn parse_request(&self, input: &[u8]) -> Result<ToolCall, AdapterError>;
|
||||
|
||||
/// Render a policy `Decision` back to the harness's PreToolUse wire format.
|
||||
fn render_decision(&self, call: &ToolCall, decision: &Decision) -> Result<Vec<u8>, AdapterError>;
|
||||
|
||||
// -- PostToolUse -------------------------------------------------------
|
||||
|
||||
/// Parse a PostToolUse hook payload (tool name, input, response).
|
||||
fn parse_post_tool_use(&self, input: &[u8]) -> Result<PostToolUsePayload, AdapterError>;
|
||||
|
||||
/// Render a PostToolUse response. `redacted_output` is the (possibly
|
||||
/// modified) tool output to send back to the harness.
|
||||
fn render_post_tool_use(
|
||||
&self,
|
||||
payload: &PostToolUsePayload,
|
||||
redacted_output: Option<&str>,
|
||||
) -> Result<Vec<u8>, AdapterError>;
|
||||
}
|
||||
|
||||
#[cfg(feature = "harness-claude")]
|
||||
|
||||
@@ -0,0 +1,298 @@
|
||||
use assert_cmd::Command;
|
||||
use std::fs;
|
||||
|
||||
/// Helper: create a temp project directory with a `.botsecrets` config and
|
||||
/// a `.env` file containing the given secrets.
|
||||
fn setup_project(
|
||||
env_content: &str,
|
||||
botsecrets_content: Option<&str>,
|
||||
) -> tempfile::TempDir {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
|
||||
// .env with test secrets
|
||||
fs::write(tmp.path().join(".env"), env_content).unwrap();
|
||||
|
||||
// .botsecrets config (use default if not specified)
|
||||
let botsecrets = botsecrets_content.unwrap_or(
|
||||
r#"
|
||||
[files]
|
||||
patterns = [".env"]
|
||||
"#,
|
||||
);
|
||||
fs::write(tmp.path().join(".botsecrets"), botsecrets).unwrap();
|
||||
|
||||
// .botignore (empty — required for project root detection)
|
||||
fs::write(tmp.path().join(".botignore"), "").unwrap();
|
||||
|
||||
tmp
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn post_tool_use_redacts_known_secret() {
|
||||
let tmp = setup_project("DB_PASSWORD=supersecret123\n", None);
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": "/some/file.txt" },
|
||||
"tool_response": "DB_HOST=localhost\nDB_PASSWORD=supersecret123\nDB_PORT=5432"
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
|
||||
.current_dir(tmp.path())
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
let updated = v["hookSpecificOutput"]["updatedToolOutput"]
|
||||
.as_str()
|
||||
.expect("expected updatedToolOutput");
|
||||
|
||||
assert!(
|
||||
updated.contains("*****"),
|
||||
"expected masked secret, got: {updated}"
|
||||
);
|
||||
assert!(
|
||||
!updated.contains("supersecret123"),
|
||||
"secret should be redacted, got: {updated}"
|
||||
);
|
||||
assert!(
|
||||
updated.contains("DB_HOST=localhost"),
|
||||
"non-secret lines should be preserved, got: {updated}"
|
||||
);
|
||||
assert!(
|
||||
updated.contains("DB_PORT=5432"),
|
||||
"non-secret lines should be preserved, got: {updated}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn post_tool_use_no_secrets_passthrough() {
|
||||
let tmp = setup_project("DB_PASSWORD=supersecret123\n", None);
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": "/some/file.txt" },
|
||||
"tool_response": "Hello, world! This text has no secrets."
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
|
||||
.current_dir(tmp.path())
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
// Empty JSON object means "no changes".
|
||||
assert_eq!(v, serde_json::json!({}), "expected empty JSON for passthrough");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn post_tool_use_empty_response_passthrough() {
|
||||
let tmp = setup_project("DB_PASSWORD=supersecret123\n", None);
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": "/some/file.txt" },
|
||||
"tool_response": ""
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
|
||||
.current_dir(tmp.path())
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v, serde_json::json!({}));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn post_tool_use_heuristic_enforce_appends_warning() {
|
||||
// Use a config with heuristic in enforce mode (the default).
|
||||
let botsecrets = r#"
|
||||
[files]
|
||||
patterns = [".env"]
|
||||
|
||||
[heuristic]
|
||||
enabled = true
|
||||
mode = "enforce"
|
||||
"#;
|
||||
let tmp = setup_project("UNRELATED_KEY=foo\n", Some(botsecrets));
|
||||
|
||||
// Include something that looks like a GitHub PAT (classic) in the response.
|
||||
// Pattern requires `ghp_` followed by exactly 36 alphanumeric chars.
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Bash",
|
||||
"tool_input": { "command": "cat output.log" },
|
||||
"tool_response": "deploy log: token ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij used"
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
|
||||
.current_dir(tmp.path())
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
let updated = v["hookSpecificOutput"]["updatedToolOutput"]
|
||||
.as_str()
|
||||
.expect("expected updatedToolOutput with heuristic warning");
|
||||
assert!(
|
||||
updated.contains("[fermata] WARNING"),
|
||||
"expected heuristic warning, got: {updated}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pre_tool_use_backward_compat_default_event() {
|
||||
// `--event` defaults to pre-tool-use; existing `--harness claude` still works.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
fs::write(tmp.path().join(".botignore"), ".env\n").unwrap();
|
||||
let target = tmp.path().join(".env");
|
||||
fs::write(&target, "").unwrap();
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": target.to_str().unwrap() }
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--harness", "claude"])
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v["hookSpecificOutput"]["permissionDecision"], "deny");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn pre_tool_use_explicit_event_flag() {
|
||||
// Explicitly passing `--event pre-tool-use` works identically.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
fs::write(tmp.path().join(".botignore"), ".env\n").unwrap();
|
||||
let target = tmp.path().join("safe.txt");
|
||||
fs::write(&target, "").unwrap();
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": target.to_str().unwrap() }
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "pre-tool-use", "--harness", "claude"])
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v["hookSpecificOutput"]["permissionDecision"], "allow");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unknown_event_exits_2() {
|
||||
Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "nonsense", "--harness", "claude"])
|
||||
.write_stdin("{}")
|
||||
.assert()
|
||||
.code(2);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn post_tool_use_no_project_root_passthrough() {
|
||||
// When run in a directory with no .botignore / .botsecrets,
|
||||
// PostToolUse should fail-open with `{}`.
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": "/some/file.txt" },
|
||||
"tool_response": "DB_PASSWORD=supersecret123"
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
|
||||
.current_dir(tmp.path())
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v, serde_json::json!({}));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn post_tool_use_multiple_secrets_redacted() {
|
||||
let tmp = setup_project(
|
||||
"DB_PASSWORD=supersecret123\nAPI_KEY=my-api-key-abc\n",
|
||||
None,
|
||||
);
|
||||
|
||||
let payload = serde_json::json!({
|
||||
"tool_name": "Read",
|
||||
"tool_input": { "file_path": "/some/config" },
|
||||
"tool_response": "config: password=supersecret123, key=my-api-key-abc, host=localhost"
|
||||
})
|
||||
.to_string();
|
||||
|
||||
let out = Command::cargo_bin("fermata")
|
||||
.unwrap()
|
||||
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
|
||||
.current_dir(tmp.path())
|
||||
.write_stdin(payload)
|
||||
.assert()
|
||||
.success()
|
||||
.get_output()
|
||||
.stdout
|
||||
.clone();
|
||||
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
let updated = v["hookSpecificOutput"]["updatedToolOutput"]
|
||||
.as_str()
|
||||
.expect("expected updatedToolOutput");
|
||||
|
||||
assert!(!updated.contains("supersecret123"), "first secret should be redacted");
|
||||
assert!(!updated.contains("my-api-key-abc"), "second secret should be redacted");
|
||||
assert!(updated.contains("host=localhost"), "non-secret should be preserved");
|
||||
}
|
||||
@@ -0,0 +1,388 @@
|
||||
use dirigent_fermata::core::secrets::config::{
|
||||
EnforcementMode, HeuristicMode, ParseErrorAction, RedactionStyle, SecretsConfig,
|
||||
BUILTIN_KEY_PATTERNS,
|
||||
};
|
||||
|
||||
#[test]
|
||||
fn parse_minimal_files_only() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[files]
|
||||
patterns = [".env", ".env.*"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
assert_eq!(cfg.files.patterns, vec![".env", ".env.*"]);
|
||||
// Other sections use defaults
|
||||
assert_eq!(cfg.redaction.style, RedactionStyle::Masked);
|
||||
assert_eq!(cfg.enforcement.mode, EnforcementMode::Permissive);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_full_config() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[files]
|
||||
patterns = [".env", "secrets.*"]
|
||||
|
||||
[keys]
|
||||
include = ["STRIPE_*", "TWILIO_*"]
|
||||
exclude = ["PUBLIC_KEY", "SSH_KEY_PATH"]
|
||||
|
||||
[redaction]
|
||||
style = "typed"
|
||||
|
||||
[heuristic]
|
||||
enabled = false
|
||||
mode = "report"
|
||||
patterns = ['AKIA[A-Z2-7]{16}']
|
||||
|
||||
[enforcement]
|
||||
mode = "strict"
|
||||
on_parse_error = "deny"
|
||||
|
||||
[[file]]
|
||||
path = "settings.py"
|
||||
format = "python-assignments"
|
||||
keys = ["SECRET_KEY", "DATABASES.*.PASSWORD"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
assert_eq!(cfg.files.patterns, vec![".env", "secrets.*"]);
|
||||
assert_eq!(cfg.keys.include, vec!["STRIPE_*", "TWILIO_*"]);
|
||||
assert_eq!(cfg.keys.exclude, vec!["PUBLIC_KEY", "SSH_KEY_PATH"]);
|
||||
assert_eq!(cfg.redaction.style, RedactionStyle::Typed);
|
||||
assert!(!cfg.heuristic.enabled);
|
||||
assert_eq!(cfg.heuristic.mode, HeuristicMode::Report);
|
||||
assert_eq!(cfg.heuristic.patterns, vec!["AKIA[A-Z2-7]{16}"]);
|
||||
assert_eq!(cfg.enforcement.mode, EnforcementMode::Strict);
|
||||
assert_eq!(cfg.enforcement.on_parse_error, ParseErrorAction::Deny);
|
||||
assert_eq!(cfg.file_overrides.len(), 1);
|
||||
assert_eq!(cfg.file_overrides[0].path, "settings.py");
|
||||
assert_eq!(
|
||||
cfg.file_overrides[0].format.as_deref(),
|
||||
Some("python-assignments")
|
||||
);
|
||||
assert_eq!(
|
||||
cfg.file_overrides[0].keys,
|
||||
vec!["SECRET_KEY", "DATABASES.*.PASSWORD"]
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_toml_returns_defaults() {
|
||||
let cfg = SecretsConfig::from_toml("").unwrap();
|
||||
assert!(!cfg.files.patterns.is_empty());
|
||||
assert!(cfg.files.patterns.contains(&".env".to_string()));
|
||||
assert_eq!(cfg.redaction.style, RedactionStyle::Masked);
|
||||
assert!(cfg.heuristic.enabled);
|
||||
assert_eq!(cfg.heuristic.mode, HeuristicMode::Enforce);
|
||||
assert_eq!(cfg.enforcement.mode, EnforcementMode::Permissive);
|
||||
assert_eq!(
|
||||
cfg.enforcement.on_parse_error,
|
||||
ParseErrorAction::MaskEntireFile
|
||||
);
|
||||
assert!(cfg.file_overrides.is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn invalid_toml_produces_error() {
|
||||
let result = SecretsConfig::from_toml("this is not valid {{ toml");
|
||||
assert!(result.is_err());
|
||||
let err_msg = result.unwrap_err().to_string();
|
||||
assert!(
|
||||
err_msg.contains("expected"),
|
||||
"error should describe parse issue: {err_msg}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn effective_key_includes_has_builtins() {
|
||||
let cfg = SecretsConfig::default();
|
||||
let effective = cfg.effective_key_includes();
|
||||
for builtin in BUILTIN_KEY_PATTERNS {
|
||||
assert!(
|
||||
effective.contains(&builtin.to_string()),
|
||||
"missing builtin: {builtin}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn effective_key_includes_adds_user_patterns() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[keys]
|
||||
include = ["MY_CUSTOM_SECRET_*"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
let effective = cfg.effective_key_includes();
|
||||
assert!(effective.contains(&"MY_CUSTOM_SECRET_*".to_string()));
|
||||
// Builtins still present
|
||||
assert!(effective.contains(&"*PASSWORD*".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn effective_key_includes_removes_excluded() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[keys]
|
||||
exclude = ["*TOKEN*", "SENTRY_DSN"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
let effective = cfg.effective_key_includes();
|
||||
assert!(
|
||||
!effective.contains(&"*TOKEN*".to_string()),
|
||||
"excluded pattern should be removed"
|
||||
);
|
||||
assert!(
|
||||
!effective.contains(&"SENTRY_DSN".to_string()),
|
||||
"excluded pattern should be removed"
|
||||
);
|
||||
// Other builtins still present
|
||||
assert!(effective.contains(&"*PASSWORD*".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn key_matches_glob_case_insensitive() {
|
||||
let cfg = SecretsConfig::default();
|
||||
assert!(cfg.key_matches("DATABASE_URL"));
|
||||
assert!(cfg.key_matches("database_url"));
|
||||
assert!(cfg.key_matches("my_password_here"));
|
||||
assert!(cfg.key_matches("MY_PASSWORD_HERE"));
|
||||
assert!(cfg.key_matches("STRIPE_SECRET_KEY"));
|
||||
assert!(cfg.key_matches("AWS_ACCESS_KEY_ID"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn key_matches_non_secret_keys() {
|
||||
let cfg = SecretsConfig::default();
|
||||
assert!(!cfg.key_matches("DEBUG"));
|
||||
assert!(!cfg.key_matches("LOG_LEVEL"));
|
||||
assert!(!cfg.key_matches("PORT"));
|
||||
assert!(!cfg.key_matches("HOST"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn key_matches_respects_user_include() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[keys]
|
||||
include = ["MY_APP_*"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
assert!(cfg.key_matches("MY_APP_SETTING"));
|
||||
assert!(cfg.key_matches("my_app_setting"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn key_matches_respects_user_exclude() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[keys]
|
||||
exclude = ["*TOKEN*"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
// TOKEN patterns were excluded, so GITHUB_TOKEN should no longer match
|
||||
// via the *TOKEN* pattern. But it might match via GITHUB_TOKEN literal.
|
||||
// Let's check something that only matched *TOKEN*.
|
||||
assert!(!cfg.key_matches("MY_TOKEN"));
|
||||
// PASSWORD still matches
|
||||
assert!(cfg.key_matches("MY_PASSWORD"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn builtin_file_patterns_present() {
|
||||
let cfg = SecretsConfig::default();
|
||||
let patterns = &cfg.files.patterns;
|
||||
assert!(patterns.contains(&".env".to_string()));
|
||||
assert!(patterns.contains(&"*.pem".to_string()));
|
||||
assert!(patterns.contains(&".aws/credentials".to_string()));
|
||||
assert!(patterns.contains(&"terraform.tfvars".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn load_missing_files_returns_defaults() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
let cfg = SecretsConfig::load(tmp.path()).unwrap();
|
||||
assert_eq!(cfg.files.patterns, SecretsConfig::default().files.patterns);
|
||||
assert_eq!(cfg.redaction.style, RedactionStyle::Masked);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn load_project_botsecrets() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(
|
||||
tmp.path().join(".botsecrets"),
|
||||
r#"
|
||||
[redaction]
|
||||
style = "named"
|
||||
|
||||
[keys]
|
||||
include = ["CUSTOM_*"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let cfg = SecretsConfig::load(tmp.path()).unwrap();
|
||||
assert_eq!(cfg.redaction.style, RedactionStyle::Named);
|
||||
assert!(cfg.effective_key_includes().contains(&"CUSTOM_*".to_string()));
|
||||
// File patterns remain at defaults (not overridden)
|
||||
assert!(cfg.files.patterns.contains(&".env".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn load_local_overrides_project() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(
|
||||
tmp.path().join(".botsecrets"),
|
||||
r#"
|
||||
[redaction]
|
||||
style = "named"
|
||||
[enforcement]
|
||||
mode = "strict"
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
std::fs::write(
|
||||
tmp.path().join(".botsecrets.local"),
|
||||
r#"
|
||||
[redaction]
|
||||
style = "absent"
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let cfg = SecretsConfig::load(tmp.path()).unwrap();
|
||||
// .local overrides .botsecrets for redaction style
|
||||
assert_eq!(cfg.redaction.style, RedactionStyle::Absent);
|
||||
// enforcement from .botsecrets is preserved (not in .local)
|
||||
assert_eq!(cfg.enforcement.mode, EnforcementMode::Strict);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn load_invalid_botsecrets_returns_error() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(tmp.path().join(".botsecrets"), "invalid {{ toml").unwrap();
|
||||
let result = SecretsConfig::load(tmp.path());
|
||||
assert!(result.is_err());
|
||||
let err = result.unwrap_err().to_string();
|
||||
assert!(err.contains(".botsecrets"), "error should mention file: {err}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_keys_accumulate() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(
|
||||
tmp.path().join(".botsecrets"),
|
||||
r#"
|
||||
[keys]
|
||||
include = ["FROM_PROJECT"]
|
||||
exclude = ["EXCLUDE_PROJECT"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
std::fs::write(
|
||||
tmp.path().join(".botsecrets.local"),
|
||||
r#"
|
||||
[keys]
|
||||
include = ["FROM_LOCAL"]
|
||||
exclude = ["EXCLUDE_LOCAL"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let cfg = SecretsConfig::load(tmp.path()).unwrap();
|
||||
assert!(cfg.keys.include.contains(&"FROM_PROJECT".to_string()));
|
||||
assert!(cfg.keys.include.contains(&"FROM_LOCAL".to_string()));
|
||||
assert!(cfg.keys.exclude.contains(&"EXCLUDE_PROJECT".to_string()));
|
||||
assert!(cfg.keys.exclude.contains(&"EXCLUDE_LOCAL".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn merge_file_patterns_replaced_not_appended() {
|
||||
let tmp = tempfile::tempdir().unwrap();
|
||||
std::fs::write(
|
||||
tmp.path().join(".botsecrets"),
|
||||
r#"
|
||||
[files]
|
||||
patterns = ["only-this.env"]
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let cfg = SecretsConfig::load(tmp.path()).unwrap();
|
||||
assert_eq!(cfg.files.patterns, vec!["only-this.env"]);
|
||||
// Defaults should be gone, replaced by the project's list
|
||||
assert!(!cfg.files.patterns.contains(&".env".to_string()));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn all_redaction_styles_parse() {
|
||||
for (input, expected) in [
|
||||
("masked", RedactionStyle::Masked),
|
||||
("typed", RedactionStyle::Typed),
|
||||
("named", RedactionStyle::Named),
|
||||
("absent", RedactionStyle::Absent),
|
||||
] {
|
||||
let toml_str = format!("[redaction]\nstyle = \"{input}\"");
|
||||
let cfg = SecretsConfig::from_toml(&toml_str).unwrap();
|
||||
assert_eq!(cfg.redaction.style, expected, "failed for: {input}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn all_enforcement_modes_parse() {
|
||||
for (input, expected) in [
|
||||
("strict", EnforcementMode::Strict),
|
||||
("permissive", EnforcementMode::Permissive),
|
||||
("audit", EnforcementMode::Audit),
|
||||
] {
|
||||
let toml_str = format!("[enforcement]\nmode = \"{input}\"");
|
||||
let cfg = SecretsConfig::from_toml(&toml_str).unwrap();
|
||||
assert_eq!(cfg.enforcement.mode, expected, "failed for: {input}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn all_heuristic_modes_parse() {
|
||||
for (input, expected) in [
|
||||
("enforce", HeuristicMode::Enforce),
|
||||
("report", HeuristicMode::Report),
|
||||
("disabled", HeuristicMode::Disabled),
|
||||
] {
|
||||
let toml_str = format!("[heuristic]\nmode = \"{input}\"");
|
||||
let cfg = SecretsConfig::from_toml(&toml_str).unwrap();
|
||||
assert_eq!(cfg.heuristic.mode, expected, "failed for: {input}");
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn serialization_roundtrip() {
|
||||
let cfg = SecretsConfig::from_toml(
|
||||
r#"
|
||||
[files]
|
||||
patterns = [".env"]
|
||||
[redaction]
|
||||
style = "typed"
|
||||
[enforcement]
|
||||
mode = "audit"
|
||||
on_parse_error = "allow"
|
||||
"#,
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let serialized = toml::to_string(&cfg).unwrap();
|
||||
let deserialized: SecretsConfig = toml::from_str(&serialized).unwrap();
|
||||
assert_eq!(deserialized.redaction.style, RedactionStyle::Typed);
|
||||
assert_eq!(deserialized.enforcement.mode, EnforcementMode::Audit);
|
||||
assert_eq!(
|
||||
deserialized.enforcement.on_parse_error,
|
||||
ParseErrorAction::Allow
|
||||
);
|
||||
}
|
||||
@@ -0,0 +1,307 @@
|
||||
//! Integration tests for `secrets::manifest` — the manifest loader that
|
||||
//! discovers secret files, parses them, and builds the known-secrets set.
|
||||
|
||||
use std::fs;
|
||||
|
||||
use dirigent_fermata::core::secrets::config::SecretsConfig;
|
||||
use dirigent_fermata::core::secrets::manifest::Manifest;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/// Create a minimal config that only discovers `.env*` files and matches
|
||||
/// common secret key patterns (the defaults).
|
||||
fn default_config() -> SecretsConfig {
|
||||
SecretsConfig::default()
|
||||
}
|
||||
|
||||
/// Create a config from TOML.
|
||||
fn config_from_toml(toml: &str) -> SecretsConfig {
|
||||
SecretsConfig::from_toml(toml).expect("valid TOML config")
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Tests
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn discovers_env_file_and_extracts_matching_secrets() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
"DATABASE_URL=postgres://localhost/db\nAPP_NAME=myapp\nSECRET_KEY=super-secret-value-1234\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
// DATABASE_URL and SECRET_KEY match the default key patterns; APP_NAME does not.
|
||||
assert!(!manifest.is_empty());
|
||||
|
||||
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
|
||||
assert!(keys.contains(&"DATABASE_URL"), "expected DATABASE_URL, got {keys:?}");
|
||||
assert!(keys.contains(&"SECRET_KEY"), "expected SECRET_KEY, got {keys:?}");
|
||||
assert!(!keys.contains(&"APP_NAME"), "APP_NAME should be filtered out");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn discovers_nested_env_local_file() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
let nested = dir.path().join("services").join("auth");
|
||||
fs::create_dir_all(&nested).unwrap();
|
||||
fs::write(
|
||||
nested.join(".env.local"),
|
||||
"AUTH_TOKEN=tok_abcdefgh12345678\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
assert!(!manifest.is_empty());
|
||||
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
|
||||
assert!(keys.contains(&"AUTH_TOKEN"), "expected AUTH_TOKEN, got {keys:?}");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn filters_entries_by_key_patterns() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
"MY_PASSWORD=hunter2hunter2\nNOT_SENSITIVE=hello-world-1234\nAPI_KEY=abcdef1234567890\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
|
||||
assert!(keys.contains(&"MY_PASSWORD"));
|
||||
assert!(keys.contains(&"API_KEY"));
|
||||
assert!(!keys.contains(&"NOT_SENSITIVE"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn file_override_with_explicit_format_and_key_filter() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
// Write a file that wouldn't normally be discovered by default patterns.
|
||||
fs::write(
|
||||
dir.path().join("custom_secrets.conf"),
|
||||
"SERVICE_TOKEN=long-token-value-here\nDEBUG=true-ish-thing\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = config_from_toml(
|
||||
r#"
|
||||
[files]
|
||||
patterns = []
|
||||
|
||||
[[file]]
|
||||
path = "custom_secrets.conf"
|
||||
format = "env"
|
||||
keys = ["SERVICE_TOKEN"]
|
||||
"#,
|
||||
);
|
||||
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
assert_eq!(manifest.len(), 1);
|
||||
assert_eq!(manifest.entries()[0].key, "SERVICE_TOKEN");
|
||||
assert_eq!(manifest.entries()[0].value, "long-token-value-here");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn empty_project_yields_empty_manifest() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
// No files at all.
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
assert!(manifest.is_empty());
|
||||
assert_eq!(manifest.len(), 0);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn entries_sorted_by_value_length_descending() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
// Deliberately out of order by length.
|
||||
"TOKEN_A=short1234\nTOKEN_B=a-much-longer-secret-value-here\nTOKEN_C=medium-value1\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
let lengths: Vec<usize> = manifest.entries().iter().map(|e| e.value.len()).collect();
|
||||
for window in lengths.windows(2) {
|
||||
assert!(
|
||||
window[0] >= window[1],
|
||||
"entries not sorted by value length descending: {lengths:?}"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn short_values_filtered_out() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
"PASSWORD_TINY=yes\nPASSWORD_OK=long-enough-password\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
|
||||
// "yes" is 3 chars, below the 4-char minimum.
|
||||
assert!(!keys.contains(&"PASSWORD_TINY"), "short value should be filtered");
|
||||
assert!(keys.contains(&"PASSWORD_OK"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn deduplication_of_same_key_value() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
// Same secret appears in two different .env files.
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
"SECRET_KEY=shared-secret-value-12345\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let sub = dir.path().join("sub");
|
||||
fs::create_dir(&sub).unwrap();
|
||||
fs::write(sub.join(".env"), "SECRET_KEY=shared-secret-value-12345\n").unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
// Should be deduplicated to a single entry.
|
||||
let matching: Vec<_> = manifest
|
||||
.entries()
|
||||
.iter()
|
||||
.filter(|e| e.key == "SECRET_KEY")
|
||||
.collect();
|
||||
assert_eq!(
|
||||
matching.len(),
|
||||
1,
|
||||
"duplicate entries should be collapsed: found {}",
|
||||
matching.len()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unparseable_file_with_allow_is_skipped() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
// Write a file that looks like an env file but contains garbage TOML.
|
||||
// Actually, .env parser is lenient, so let's use a .toml extension
|
||||
// with invalid TOML content to trigger a parse error.
|
||||
let secrets_dir = dir.path();
|
||||
fs::write(secrets_dir.join("secrets.toml"), "this is not valid toml {{{\n").unwrap();
|
||||
|
||||
// Also write a valid .env so we can confirm it still works.
|
||||
fs::write(
|
||||
secrets_dir.join(".env"),
|
||||
"API_KEY=valid-secret-12345678\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = config_from_toml(
|
||||
r#"
|
||||
[enforcement]
|
||||
on_parse_error = "allow"
|
||||
"#,
|
||||
);
|
||||
|
||||
let manifest = Manifest::build(&config, secrets_dir).unwrap();
|
||||
|
||||
// The broken secrets.toml is skipped; .env is still processed.
|
||||
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
|
||||
assert!(keys.contains(&"API_KEY"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn unparseable_file_with_deny_returns_error() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
fs::write(dir.path().join("secrets.toml"), "not valid toml {{{\n").unwrap();
|
||||
|
||||
let config = config_from_toml(
|
||||
r#"
|
||||
[enforcement]
|
||||
on_parse_error = "deny"
|
||||
"#,
|
||||
);
|
||||
|
||||
let result = Manifest::build(&config, dir.path());
|
||||
assert!(result.is_err(), "deny mode should propagate parse errors");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn manifest_empty_and_is_empty() {
|
||||
let m = Manifest::empty();
|
||||
assert!(m.is_empty());
|
||||
assert_eq!(m.len(), 0);
|
||||
assert!(m.entries().is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn skips_git_and_node_modules_directories() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
|
||||
// .env inside .git should be skipped.
|
||||
let git_dir = dir.path().join(".git");
|
||||
fs::create_dir(&git_dir).unwrap();
|
||||
fs::write(git_dir.join(".env"), "SECRET_KEY=git-secret-12345\n").unwrap();
|
||||
|
||||
// .env inside node_modules should be skipped.
|
||||
let nm_dir = dir.path().join("node_modules").join("pkg");
|
||||
fs::create_dir_all(&nm_dir).unwrap();
|
||||
fs::write(nm_dir.join(".env"), "TOKEN=nm-token-12345678\n").unwrap();
|
||||
|
||||
// .env at root should be found.
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
"API_KEY=root-api-key-12345\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
let values: Vec<&str> = manifest.entries().iter().map(|e| e.value.as_str()).collect();
|
||||
assert!(
|
||||
values.contains(&"root-api-key-12345"),
|
||||
"root .env should be found"
|
||||
);
|
||||
assert!(
|
||||
!values.contains(&"git-secret-12345"),
|
||||
".git/.env should be skipped"
|
||||
);
|
||||
assert!(
|
||||
!values.contains(&"nm-token-12345678"),
|
||||
"node_modules/.env should be skipped"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn opaque_file_formats_are_skipped_gracefully() {
|
||||
let dir = tempfile::tempdir().unwrap();
|
||||
// .pem and .key files match default patterns but have no parseable format.
|
||||
fs::write(dir.path().join("server.key"), "binary-ish key data here\n").unwrap();
|
||||
fs::write(
|
||||
dir.path().join(".env"),
|
||||
"PASSWORD=parseable-secret-12345\n",
|
||||
)
|
||||
.unwrap();
|
||||
|
||||
let config = default_config();
|
||||
let manifest = Manifest::build(&config, dir.path()).unwrap();
|
||||
|
||||
// Should not error, should still find the .env entry.
|
||||
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
|
||||
assert!(keys.contains(&"PASSWORD"));
|
||||
}
|
||||
@@ -0,0 +1,404 @@
|
||||
//! Integration tests for the multi-format secret file parser.
|
||||
|
||||
use dirigent_fermata::core::secrets::parser::{
|
||||
parse_content, parse_secret_file, FileFormat, SecretEntry,
|
||||
};
|
||||
use std::path::Path;
|
||||
use tempfile::NamedTempFile;
|
||||
|
||||
fn p(s: &str) -> &Path {
|
||||
Path::new(s)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// .env parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn env_basic_key_value() {
|
||||
let entries = parse_content("DATABASE_URL=postgres://localhost/db", FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "DATABASE_URL");
|
||||
assert_eq!(entries[0].value, "postgres://localhost/db");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn env_double_quoted() {
|
||||
let entries = parse_content(r#"SECRET="hello world""#, FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries[0].value, "hello world");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn env_single_quoted() {
|
||||
let entries = parse_content("SECRET='hello world'", FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries[0].value, "hello world");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn env_comments_and_empty_lines() {
|
||||
let content = "# comment\n\nKEY=value\n # indented comment\n";
|
||||
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "KEY");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn env_export_prefix() {
|
||||
let content = "export API_KEY=abc123\nexport TOKEN=\"xyz\"";
|
||||
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert_eq!(entries[0].key, "API_KEY");
|
||||
assert_eq!(entries[0].value, "abc123");
|
||||
assert_eq!(entries[1].key, "TOKEN");
|
||||
assert_eq!(entries[1].value, "xyz");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn env_whitespace_handling() {
|
||||
let content = " KEY = value \nKEY2= spaced ";
|
||||
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
|
||||
// Key is trimmed; unquoted value trimmed.
|
||||
assert_eq!(entries[0].key, "KEY");
|
||||
assert_eq!(entries[0].value, "value");
|
||||
assert_eq!(entries[1].key, "KEY2");
|
||||
assert_eq!(entries[1].value, "spaced");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn env_escape_sequences_in_double_quotes() {
|
||||
let content = r#"MSG="line1\nline2""#;
|
||||
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries[0].value, "line1\nline2");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// TOML parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn toml_flat_table() {
|
||||
let content = r#"
|
||||
API_KEY = "abc"
|
||||
DB_PASS = "secret"
|
||||
"#;
|
||||
let entries = parse_content(content, FileFormat::Toml, p("Secrets.toml")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "API_KEY" && e.value == "abc"));
|
||||
assert!(entries.iter().any(|e| e.key == "DB_PASS" && e.value == "secret"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn toml_nested_tables() {
|
||||
let content = r#"
|
||||
[database]
|
||||
password = "secret"
|
||||
host = "localhost"
|
||||
port = 5432
|
||||
"#;
|
||||
let entries = parse_content(content, FileFormat::Toml, p("config.toml")).unwrap();
|
||||
// Only string values extracted; port (integer) skipped.
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "database.password" && e.value == "secret"));
|
||||
assert!(entries.iter().any(|e| e.key == "database.host" && e.value == "localhost"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn toml_mixed_types_only_strings() {
|
||||
let content = r#"
|
||||
name = "app"
|
||||
debug = true
|
||||
count = 42
|
||||
ratio = 3.14
|
||||
"#;
|
||||
let entries = parse_content(content, FileFormat::Toml, p("app.toml")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "name");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// JSON parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn json_flat_object() {
|
||||
let content = r#"{"api_key": "abc", "secret": "xyz"}"#;
|
||||
let entries = parse_content(content, FileFormat::Json, p("secrets.json")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "api_key" && e.value == "abc"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_nested_objects() {
|
||||
let content = r#"{"db": {"password": "foo", "port": 5432}}"#;
|
||||
let entries = parse_content(content, FileFormat::Json, p("secrets.json")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "db.password");
|
||||
assert_eq!(entries[0].value, "foo");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_arrays() {
|
||||
let content = r#"{"keys": ["a", "b"]}"#;
|
||||
let entries = parse_content(content, FileFormat::Json, p("secrets.json")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "keys.0" && e.value == "a"));
|
||||
assert!(entries.iter().any(|e| e.key == "keys.1" && e.value == "b"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn json_mixed_types() {
|
||||
let content = r#"{"name": "app", "count": 42, "active": true, "data": null}"#;
|
||||
let entries = parse_content(content, FileFormat::Json, p("a.json")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "name");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// YAML parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn yaml_flat_map() {
|
||||
let content = "api_key: abc\nsecret: xyz\n";
|
||||
let entries = parse_content(content, FileFormat::Yaml, p("secrets.yaml")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "api_key" && e.value == "abc"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn yaml_nested_maps() {
|
||||
let content = "db:\n password: foo\n port: 5432\n";
|
||||
let entries = parse_content(content, FileFormat::Yaml, p("secrets.yml")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "db.password");
|
||||
assert_eq!(entries[0].value, "foo");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn yaml_mixed_types() {
|
||||
let content = "name: app\ncount: 42\nactive: true\n";
|
||||
let entries = parse_content(content, FileFormat::Yaml, p("a.yaml")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "name");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Python assignment parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn python_matches_assignments() {
|
||||
let content = r#"
|
||||
API_KEY = "abc123"
|
||||
DB_PASS = 'secret'
|
||||
import os
|
||||
x = 42
|
||||
"#;
|
||||
let entries = parse_content(content, FileFormat::PythonAssignments, p("settings.py")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "API_KEY" && e.value == "abc123"));
|
||||
assert!(entries.iter().any(|e| e.key == "DB_PASS" && e.value == "secret"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn python_skips_non_matching() {
|
||||
let content = "result = some_function()\nfor x in range(10):\n pass\n";
|
||||
let entries = parse_content(content, FileFormat::PythonAssignments, p("a.py")).unwrap();
|
||||
assert!(entries.is_empty());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Properties parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn properties_equals_separator() {
|
||||
let content = "db.password=secret\ndb.host=localhost";
|
||||
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "db.password" && e.value == "secret"));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn properties_colon_separator() {
|
||||
let content = "db.password: secret";
|
||||
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
|
||||
assert_eq!(entries[0].key, "db.password");
|
||||
assert_eq!(entries[0].value, "secret");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn properties_comments() {
|
||||
let content = "# comment\n! also comment\nkey=value";
|
||||
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "key");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn properties_continuation_lines() {
|
||||
let content = "long.value=hello \\\n world";
|
||||
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "long.value");
|
||||
assert_eq!(entries[0].value, "hello world");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Auto-detection from file extension
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn format_from_path_env_variants() {
|
||||
assert_eq!(FileFormat::from_path(p(".env")), Some(FileFormat::Env));
|
||||
assert_eq!(FileFormat::from_path(p(".env.local")), Some(FileFormat::Env));
|
||||
assert_eq!(FileFormat::from_path(p(".env.production")), Some(FileFormat::Env));
|
||||
assert_eq!(FileFormat::from_path(p("staging.env")), Some(FileFormat::Env));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn format_from_path_extensions() {
|
||||
assert_eq!(FileFormat::from_path(p("a.toml")), Some(FileFormat::Toml));
|
||||
assert_eq!(FileFormat::from_path(p("a.json")), Some(FileFormat::Json));
|
||||
assert_eq!(FileFormat::from_path(p("a.yaml")), Some(FileFormat::Yaml));
|
||||
assert_eq!(FileFormat::from_path(p("a.yml")), Some(FileFormat::Yaml));
|
||||
assert_eq!(FileFormat::from_path(p("a.py")), Some(FileFormat::PythonAssignments));
|
||||
assert_eq!(FileFormat::from_path(p("a.properties")), Some(FileFormat::Properties));
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn format_from_path_unknown() {
|
||||
assert_eq!(FileFormat::from_path(p("a.key")), None);
|
||||
assert_eq!(FileFormat::from_path(p("a.pem")), None);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Format hints
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn format_from_hint() {
|
||||
assert_eq!(FileFormat::from_hint("env"), Some(FileFormat::Env));
|
||||
assert_eq!(FileFormat::from_hint("dotenv"), Some(FileFormat::Env));
|
||||
assert_eq!(FileFormat::from_hint("toml"), Some(FileFormat::Toml));
|
||||
assert_eq!(FileFormat::from_hint("json"), Some(FileFormat::Json));
|
||||
assert_eq!(FileFormat::from_hint("yaml"), Some(FileFormat::Yaml));
|
||||
assert_eq!(FileFormat::from_hint("yml"), Some(FileFormat::Yaml));
|
||||
assert_eq!(FileFormat::from_hint("python-assignments"), Some(FileFormat::PythonAssignments));
|
||||
assert_eq!(FileFormat::from_hint("python"), Some(FileFormat::PythonAssignments));
|
||||
assert_eq!(FileFormat::from_hint("properties"), Some(FileFormat::Properties));
|
||||
assert_eq!(FileFormat::from_hint("java-properties"), Some(FileFormat::Properties));
|
||||
assert_eq!(FileFormat::from_hint("unknown"), None);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Key filtering
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn filter_by_glob() {
|
||||
let content = "API_KEY=abc\nDB_HOST=localhost\nDB_PASSWORD=secret\n";
|
||||
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
|
||||
assert_eq!(entries.len(), 3);
|
||||
|
||||
let filter = vec!["*PASSWORD*".to_string(), "*API_KEY*".to_string()];
|
||||
let result = parse_secret_file_with_filter(content, &filter);
|
||||
assert_eq!(result.len(), 2);
|
||||
assert!(result.iter().any(|e| e.key == "API_KEY"));
|
||||
assert!(result.iter().any(|e| e.key == "DB_PASSWORD"));
|
||||
}
|
||||
|
||||
/// Helper that parses env content with a key filter (avoids temp files).
|
||||
fn parse_secret_file_with_filter(content: &str, filter: &[String]) -> Vec<SecretEntry> {
|
||||
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
|
||||
// Re-implement the filter logic for testing without disk I/O.
|
||||
use dirigent_fermata::core::secrets::parser::parse_content as pc;
|
||||
let all = pc(content, FileFormat::Env, p(".env")).unwrap();
|
||||
// Apply filter manually using the same approach as parse_secret_file.
|
||||
let matchers: Vec<_> = filter
|
||||
.iter()
|
||||
.filter_map(|p| {
|
||||
globset::Glob::new(&p.to_ascii_uppercase())
|
||||
.ok()
|
||||
.map(|g| g.compile_matcher())
|
||||
})
|
||||
.collect();
|
||||
all.into_iter()
|
||||
.filter(|entry| {
|
||||
let upper = entry.key.to_ascii_uppercase();
|
||||
matchers.iter().any(|m| m.is_match(&upper))
|
||||
})
|
||||
.collect()
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Error on unrecognised format
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn error_on_unknown_format() {
|
||||
use std::io::Write;
|
||||
let mut tmp = NamedTempFile::with_suffix(".xyz").unwrap();
|
||||
write!(tmp, "KEY=value").unwrap();
|
||||
let result = parse_secret_file(tmp.path(), None, None);
|
||||
assert!(result.is_err());
|
||||
let err = result.unwrap_err().to_string();
|
||||
assert!(err.contains("cannot determine file format"));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Empty file
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn empty_file_produces_empty_vec() {
|
||||
let entries = parse_content("", FileFormat::Env, p(".env")).unwrap();
|
||||
assert!(entries.is_empty());
|
||||
|
||||
let entries = parse_content("{}", FileFormat::Json, p("a.json")).unwrap();
|
||||
assert!(entries.is_empty());
|
||||
|
||||
let entries = parse_content("", FileFormat::Toml, p("a.toml")).unwrap();
|
||||
assert!(entries.is_empty());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// parse_secret_file end-to-end (disk)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn parse_secret_file_from_disk() {
|
||||
use std::io::Write;
|
||||
let mut tmp = NamedTempFile::with_suffix(".env").unwrap();
|
||||
write!(tmp, "SECRET=hunter2\nPORT=8080").unwrap();
|
||||
|
||||
let entries = parse_secret_file(tmp.path(), None, None).unwrap();
|
||||
assert_eq!(entries.len(), 2);
|
||||
assert!(entries.iter().any(|e| e.key == "SECRET" && e.value == "hunter2"));
|
||||
// Source path should match.
|
||||
assert_eq!(entries[0].source, tmp.path());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_secret_file_with_key_filter() {
|
||||
use std::io::Write;
|
||||
let mut tmp = NamedTempFile::with_suffix(".env").unwrap();
|
||||
write!(tmp, "API_KEY=abc\nHOST=localhost\nDB_PASSWORD=secret").unwrap();
|
||||
|
||||
let filter = vec!["*PASSWORD*".to_string()];
|
||||
let entries = parse_secret_file(tmp.path(), None, Some(&filter)).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "DB_PASSWORD");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parse_secret_file_with_format_override() {
|
||||
use std::io::Write;
|
||||
// Write env content to a .txt file — format override should work.
|
||||
let mut tmp = NamedTempFile::with_suffix(".txt").unwrap();
|
||||
write!(tmp, "KEY=value").unwrap();
|
||||
|
||||
let entries = parse_secret_file(tmp.path(), Some(FileFormat::Env), None).unwrap();
|
||||
assert_eq!(entries.len(), 1);
|
||||
assert_eq!(entries[0].key, "KEY");
|
||||
}
|
||||
@@ -0,0 +1,373 @@
|
||||
//! Integration tests for the secret value redactor.
|
||||
|
||||
use std::path::PathBuf;
|
||||
|
||||
use dirigent_fermata::core::secrets::config::RedactionStyle;
|
||||
use dirigent_fermata::core::secrets::manifest::Manifest;
|
||||
use dirigent_fermata::core::secrets::parser::SecretEntry;
|
||||
use dirigent_fermata::core::secrets::redactor::Redactor;
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn entry(key: &str, value: &str) -> SecretEntry {
|
||||
SecretEntry {
|
||||
key: key.to_string(),
|
||||
value: value.to_string(),
|
||||
source: PathBuf::from("test"),
|
||||
}
|
||||
}
|
||||
|
||||
fn make_redactor(entries: Vec<SecretEntry>, style: RedactionStyle) -> Redactor {
|
||||
let manifest = Manifest::from_entries(entries);
|
||||
Redactor::new(&manifest, style)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Basic redaction
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn basic_single_secret() {
|
||||
let r = make_redactor(
|
||||
vec![entry("DB_PASSWORD", "super_secret_123")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("connecting with password super_secret_123 ...");
|
||||
assert_eq!(result.text, "connecting with password ***** ...");
|
||||
assert!(result.was_redacted());
|
||||
assert_eq!(result.redactions.len(), 1);
|
||||
assert_eq!(result.redactions[0].key, "DB_PASSWORD");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Multiple secrets
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn multiple_different_secrets() {
|
||||
let r = make_redactor(
|
||||
vec![
|
||||
entry("DB_PASSWORD", "db_pass_value"),
|
||||
entry("API_KEY", "ak_12345678"),
|
||||
],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("db=db_pass_value key=ak_12345678");
|
||||
assert_eq!(result.text, "db=***** key=*****");
|
||||
assert_eq!(result.redactions.len(), 2);
|
||||
assert_eq!(result.redactions[0].key, "DB_PASSWORD");
|
||||
assert_eq!(result.redactions[1].key, "API_KEY");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Repeated occurrences
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn same_secret_multiple_times() {
|
||||
let r = make_redactor(
|
||||
vec![entry("TOKEN", "tok_abcdef")],
|
||||
RedactionStyle::Named,
|
||||
);
|
||||
let result = r.redact("first=tok_abcdef second=tok_abcdef");
|
||||
assert_eq!(result.text, "first=<REDACTED:TOKEN> second=<REDACTED:TOKEN>");
|
||||
assert_eq!(result.redactions.len(), 2);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Redaction styles
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn style_masked() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "secret_value")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("val=secret_value");
|
||||
assert_eq!(result.text, "val=*****");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn style_typed() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "secret_value")],
|
||||
RedactionStyle::Typed,
|
||||
);
|
||||
let result = r.redact("val=secret_value");
|
||||
// "secret_value" is 12 chars
|
||||
assert_eq!(result.text, "val=<REDACTED:string:12>");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn style_named() {
|
||||
let r = make_redactor(
|
||||
vec![entry("MY_API_KEY", "secret_value")],
|
||||
RedactionStyle::Named,
|
||||
);
|
||||
let result = r.redact("val=secret_value");
|
||||
assert_eq!(result.text, "val=<REDACTED:MY_API_KEY>");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn style_absent() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "secret_value")],
|
||||
RedactionStyle::Absent,
|
||||
);
|
||||
let result = r.redact("val=secret_value end");
|
||||
assert_eq!(result.text, "val= end");
|
||||
assert!(result.was_redacted());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Overlapping values (longest match wins)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn overlapping_longest_match_wins() {
|
||||
let r = make_redactor(
|
||||
vec![
|
||||
entry("SHORT_KEY", "secret"),
|
||||
entry("LONG_KEY", "secret_long_value"),
|
||||
],
|
||||
RedactionStyle::Named,
|
||||
);
|
||||
let result = r.redact("x=secret_long_value");
|
||||
// The longer value should match, not the shorter substring.
|
||||
assert_eq!(result.text, "x=<REDACTED:LONG_KEY>");
|
||||
assert_eq!(result.redactions.len(), 1);
|
||||
assert_eq!(result.redactions[0].key, "LONG_KEY");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn shorter_match_still_found_when_no_overlap() {
|
||||
let r = make_redactor(
|
||||
vec![
|
||||
entry("SHORT_KEY", "secret"),
|
||||
entry("LONG_KEY", "secret_long_value"),
|
||||
],
|
||||
RedactionStyle::Named,
|
||||
);
|
||||
// "secret" appears standalone (not as part of "secret_long_value")
|
||||
let result = r.redact("a=secret b=secret_long_value");
|
||||
assert_eq!(result.text, "a=<REDACTED:SHORT_KEY> b=<REDACTED:LONG_KEY>");
|
||||
assert_eq!(result.redactions.len(), 2);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// No match
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn no_match_returns_unchanged() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "not_present_here")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("nothing to see here");
|
||||
assert_eq!(result.text, "nothing to see here");
|
||||
assert!(!result.was_redacted());
|
||||
assert!(result.redactions.is_empty());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Empty text
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn empty_input_returns_empty() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "some_secret")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("");
|
||||
assert_eq!(result.text, "");
|
||||
assert!(!result.was_redacted());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Empty manifest
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn empty_manifest_returns_unchanged() {
|
||||
let manifest = Manifest::empty();
|
||||
let r = Redactor::new(&manifest, RedactionStyle::Masked);
|
||||
assert!(!r.has_secrets());
|
||||
let result = r.redact("some text with no secrets");
|
||||
assert_eq!(result.text, "some text with no secrets");
|
||||
assert!(!result.was_redacted());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Short values filtered out by Manifest::from_entries
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn short_values_are_filtered() {
|
||||
// Values shorter than 4 chars should be dropped by from_entries.
|
||||
let r = make_redactor(
|
||||
vec![entry("TINY", "abc"), entry("LONG_ENOUGH", "abcd")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("abc abcd");
|
||||
// "abc" should NOT be redacted (too short), "abcd" should be.
|
||||
assert_eq!(result.text, "abc *****");
|
||||
assert_eq!(result.redactions.len(), 1);
|
||||
assert_eq!(result.redactions[0].key, "LONG_ENOUGH");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Zero false negatives — every declared secret must be caught
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn zero_false_negatives() {
|
||||
let secrets = vec![
|
||||
entry("A_SECRET", "alpha_secret_val"),
|
||||
entry("B_TOKEN", "bravo_token_val_"),
|
||||
entry("C_PASSWORD", "charlie_pass_99"),
|
||||
entry("D_API_KEY", "delta_key_00000"),
|
||||
];
|
||||
let r = make_redactor(secrets.clone(), RedactionStyle::Masked);
|
||||
|
||||
// Build text that contains every single secret value.
|
||||
let text = format!(
|
||||
"a={} b={} c={} d={}",
|
||||
"alpha_secret_val", "bravo_token_val_", "charlie_pass_99", "delta_key_00000",
|
||||
);
|
||||
let result = r.redact(&text);
|
||||
|
||||
// Every secret value must be replaced.
|
||||
for s in &secrets {
|
||||
if s.value.len() >= 4 {
|
||||
assert!(
|
||||
!result.text.contains(&s.value),
|
||||
"Secret {} was not redacted: {}",
|
||||
s.key,
|
||||
result.text,
|
||||
);
|
||||
}
|
||||
}
|
||||
assert_eq!(result.redactions.len(), 4);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Multi-line text
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn multi_line_redaction() {
|
||||
let r = make_redactor(
|
||||
vec![
|
||||
entry("DB_PASSWORD", "s3cr3t_p@ss"),
|
||||
entry("API_KEY", "ak-1234567890"),
|
||||
],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let text = "# Config file\n\
|
||||
DATABASE_URL=postgres://user:s3cr3t_p@ss@host/db\n\
|
||||
API_KEY=ak-1234567890\n\
|
||||
OTHER=safe_value\n";
|
||||
let result = r.redact(text);
|
||||
assert!(!result.text.contains("s3cr3t_p@ss"));
|
||||
assert!(!result.text.contains("ak-1234567890"));
|
||||
assert!(result.text.contains("safe_value"));
|
||||
assert_eq!(result.redactions.len(), 2);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Redaction metadata correctness
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn redaction_metadata_offset_and_len() {
|
||||
let r = make_redactor(
|
||||
vec![entry("SECRET", "ABCDEFGH")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let text = "prefix_ABCDEFGH_suffix";
|
||||
let result = r.redact(text);
|
||||
|
||||
assert_eq!(result.redactions.len(), 1);
|
||||
let red = &result.redactions[0];
|
||||
assert_eq!(red.key, "SECRET");
|
||||
assert_eq!(red.offset, 7); // "prefix_" is 7 bytes
|
||||
assert_eq!(red.original_len, 8); // "ABCDEFGH" is 8 bytes
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn redaction_metadata_multiple_offsets() {
|
||||
let r = make_redactor(
|
||||
vec![entry("TOK", "xxxx1234")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
// "a=xxxx1234 b=xxxx1234"
|
||||
let text = "a=xxxx1234 b=xxxx1234";
|
||||
let result = r.redact(text);
|
||||
|
||||
assert_eq!(result.redactions.len(), 2);
|
||||
assert_eq!(result.redactions[0].offset, 2); // after "a="
|
||||
assert_eq!(result.redactions[0].original_len, 8);
|
||||
assert_eq!(result.redactions[1].offset, 13); // after " b="
|
||||
assert_eq!(result.redactions[1].original_len, 8);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// has_secrets() helper
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn has_secrets_with_entries() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "long_enough_value")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
assert!(r.has_secrets());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn has_secrets_empty() {
|
||||
let r = make_redactor(vec![], RedactionStyle::Masked);
|
||||
assert!(!r.has_secrets());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// was_redacted() helper
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn was_redacted_true_when_match() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "findme_value")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("findme_value");
|
||||
assert!(result.was_redacted());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn was_redacted_false_when_no_match() {
|
||||
let r = make_redactor(
|
||||
vec![entry("KEY", "findme_value")],
|
||||
RedactionStyle::Masked,
|
||||
);
|
||||
let result = r.redact("nothing here");
|
||||
assert!(!result.was_redacted());
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Deduplication in from_entries
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn duplicate_entries_deduplicated() {
|
||||
let manifest = Manifest::from_entries(vec![
|
||||
entry("KEY", "same_value_here"),
|
||||
entry("KEY", "same_value_here"),
|
||||
]);
|
||||
assert_eq!(manifest.len(), 1);
|
||||
}
|
||||
@@ -0,0 +1,254 @@
|
||||
use dirigent_fermata::core::secrets::config::HeuristicConfig;
|
||||
use dirigent_fermata::core::secrets::scanner::{shannon_entropy, Confidence, Scanner};
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Helper: build a scanner with default config (built-in rules only)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
fn default_scanner() -> Scanner {
|
||||
Scanner::builtin().expect("built-in rules must compile")
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Specific provider patterns
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn detects_aws_access_key() {
|
||||
let scanner = default_scanner();
|
||||
let findings = scanner.scan("here is my key: AKIAIOSFODNN7EXAMPLE ok");
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "aws-access-key"),
|
||||
"expected aws-access-key finding, got: {findings:?}"
|
||||
);
|
||||
assert_eq!(findings[0].confidence, Confidence::High);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_github_pat_classic() {
|
||||
let scanner = default_scanner();
|
||||
let findings = scanner.scan("ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij");
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "github-pat-classic"),
|
||||
"expected github-pat-classic finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_stripe_secret_key() {
|
||||
let scanner = default_scanner();
|
||||
let findings = scanner.scan("STRIPE_KEY=sk_live_abcdefghijklmnopqrstuvwx");
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "stripe-secret-key"),
|
||||
"expected stripe-secret-key finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_private_key_header() {
|
||||
let scanner = default_scanner();
|
||||
let text = "-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAK...\n-----END RSA PRIVATE KEY-----";
|
||||
let findings = scanner.scan(text);
|
||||
assert!(
|
||||
findings
|
||||
.iter()
|
||||
.any(|f| f.pattern_id == "private-key-header"),
|
||||
"expected private-key-header finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_jwt_token() {
|
||||
let scanner = default_scanner();
|
||||
// A realistic-looking (but fake) JWT.
|
||||
let jwt = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ik\
|
||||
pvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c";
|
||||
let findings = scanner.scan(jwt);
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "jwt"),
|
||||
"expected jwt finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_database_connection_url() {
|
||||
let scanner = default_scanner();
|
||||
let findings = scanner.scan("DATABASE_URL=postgres://admin:s3cretP4ss@db.example.com:5432/mydb");
|
||||
assert!(
|
||||
findings
|
||||
.iter()
|
||||
.any(|f| f.pattern_id == "database-connection-url"),
|
||||
"expected database-connection-url finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_slack_webhook() {
|
||||
let scanner = default_scanner();
|
||||
let findings = scanner
|
||||
.scan("https://hooks.slack.com/services/T0ABCDEFG/B0ABCDEFG/abcdefghijklmnopqrstuvwx");
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "slack-webhook"),
|
||||
"expected slack-webhook finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_anthropic_api_key() {
|
||||
let scanner = default_scanner();
|
||||
let key = "sk-ant-aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789abcdefgh";
|
||||
let findings = scanner.scan(&format!("my key is {key}"));
|
||||
assert!(
|
||||
findings
|
||||
.iter()
|
||||
.any(|f| f.pattern_id == "anthropic-api-key"),
|
||||
"expected anthropic-api-key finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn detects_sendgrid_api_key() {
|
||||
let scanner = default_scanner();
|
||||
let key = "SG.abcdefghijklmnopqrstuv.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrst";
|
||||
let findings = scanner.scan(key);
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "sendgrid-api-key"),
|
||||
"expected sendgrid-api-key finding, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Generic patterns — entropy filtering
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn rejects_low_entropy_generic_api_key() {
|
||||
let scanner = default_scanner();
|
||||
// "test" repeated has very low entropy — should NOT trigger.
|
||||
let findings = scanner.scan(r#"api_key = "testtesttesttesttest""#);
|
||||
let generic_hits: Vec<_> = findings
|
||||
.iter()
|
||||
.filter(|f| f.pattern_id == "generic-api-key")
|
||||
.collect();
|
||||
assert!(
|
||||
generic_hits.is_empty(),
|
||||
"low-entropy api_key should be filtered out, got: {generic_hits:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn accepts_high_entropy_generic_secret() {
|
||||
let scanner = default_scanner();
|
||||
// A high-entropy random-looking value.
|
||||
let findings = scanner.scan(r#"secret = "a8Kz3Lm9Xq2Wp7Yn"#);
|
||||
let has_generic = findings
|
||||
.iter()
|
||||
.any(|f| f.pattern_id == "generic-secret");
|
||||
assert!(
|
||||
has_generic,
|
||||
"high-entropy secret should be detected, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Custom patterns from config
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn custom_pattern_from_config() {
|
||||
let config = HeuristicConfig {
|
||||
enabled: true,
|
||||
patterns: vec![r"MY_CUSTOM_[A-Z]{10}".to_string()],
|
||||
..Default::default()
|
||||
};
|
||||
let scanner = Scanner::new(&config).expect("should compile custom pattern");
|
||||
let findings = scanner.scan("found MY_CUSTOM_ABCDEFGHIJ in output");
|
||||
assert!(
|
||||
findings.iter().any(|f| f.pattern_id == "custom-0"),
|
||||
"expected custom-0 finding, got: {findings:?}"
|
||||
);
|
||||
assert_eq!(findings[0].confidence, Confidence::High);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Edge cases
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn empty_text_returns_no_findings() {
|
||||
let scanner = default_scanner();
|
||||
assert!(scanner.scan("").is_empty());
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn plain_text_returns_no_findings() {
|
||||
let scanner = default_scanner();
|
||||
let findings = scanner.scan("This is just a normal paragraph with no secrets.");
|
||||
assert!(
|
||||
findings.is_empty(),
|
||||
"plain text should have no findings, got: {findings:?}"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn overlapping_matches_are_deduplicated() {
|
||||
// Construct text where the same span could match multiple patterns.
|
||||
// The bearer token pattern and a generic pattern could overlap on the same region.
|
||||
let scanner = default_scanner();
|
||||
let text = "Authorization: Bearer ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefgh";
|
||||
let findings = scanner.scan(text);
|
||||
|
||||
// Verify no two findings have overlapping spans.
|
||||
for i in 0..findings.len() {
|
||||
for j in (i + 1)..findings.len() {
|
||||
assert!(
|
||||
findings[j].span.start >= findings[i].span.end,
|
||||
"findings {i} and {j} overlap: {:?} vs {:?}",
|
||||
findings[i].span,
|
||||
findings[j].span,
|
||||
);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Shannon entropy unit tests (supplement the inline mod tests)
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn entropy_known_values() {
|
||||
// Single character repeated → 0.
|
||||
assert!((shannon_entropy("aaaa") - 0.0).abs() < f64::EPSILON);
|
||||
|
||||
// Perfectly balanced binary → 1.0 bits/char.
|
||||
let balanced = "ababababab";
|
||||
assert!((shannon_entropy(balanced) - 1.0).abs() < 0.01);
|
||||
|
||||
// High diversity.
|
||||
let diverse = "aB3$kL9!mZ7@wQ1#xR5^";
|
||||
assert!(shannon_entropy(diverse) > 3.5);
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Scanner construction
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn builtin_scanner_has_rules() {
|
||||
let scanner = default_scanner();
|
||||
assert!(
|
||||
scanner.rule_count() >= 30,
|
||||
"expected at least 30 built-in rules, got {}",
|
||||
scanner.rule_count()
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn invalid_custom_pattern_returns_error() {
|
||||
let config = HeuristicConfig {
|
||||
enabled: true,
|
||||
patterns: vec![r"[invalid".to_string()],
|
||||
..Default::default()
|
||||
};
|
||||
assert!(Scanner::new(&config).is_err());
|
||||
}
|
||||
+58
-1
@@ -1,5 +1,5 @@
|
||||
use dirigent_fermata::core::{Decision, Reason};
|
||||
use dirigent_fermata::harness::{HarnessAdapter, PathKind, ToolOp};
|
||||
use dirigent_fermata::harness::{HarnessAdapter, HookEvent, PathKind, ToolOp};
|
||||
use dirigent_fermata::harness::claude::ClaudeAdapter;
|
||||
|
||||
#[test]
|
||||
@@ -84,3 +84,60 @@ fn renders_ask_as_ask() {
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v["hookSpecificOutput"]["permissionDecision"], "ask");
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// PostToolUse
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn parses_post_tool_use_payload() {
|
||||
let payload = br#"{"tool_name":"Read","tool_input":{"file_path":"/proj/.env"},"tool_response":"SECRET=abc"}"#;
|
||||
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
|
||||
assert_eq!(p.tool_name, "Read");
|
||||
assert_eq!(p.tool_response, "SECRET=abc");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn parses_post_tool_use_missing_response() {
|
||||
// tool_response absent → defaults to empty string.
|
||||
let payload = br#"{"tool_name":"Bash","tool_input":{"command":"ls"}}"#;
|
||||
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
|
||||
assert_eq!(p.tool_response, "");
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn renders_post_tool_use_with_redacted_output() {
|
||||
let payload = br#"{"tool_name":"Read","tool_input":{},"tool_response":"x"}"#;
|
||||
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
|
||||
let out = ClaudeAdapter
|
||||
.render_post_tool_use(&p, Some("redacted text"))
|
||||
.unwrap();
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v["hookSpecificOutput"]["hookEventName"], "PostToolUse");
|
||||
assert_eq!(
|
||||
v["hookSpecificOutput"]["updatedToolOutput"],
|
||||
"redacted text"
|
||||
);
|
||||
}
|
||||
|
||||
#[test]
|
||||
fn renders_post_tool_use_passthrough() {
|
||||
let payload = br#"{"tool_name":"Read","tool_input":{},"tool_response":"clean"}"#;
|
||||
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
|
||||
let out = ClaudeAdapter.render_post_tool_use(&p, None).unwrap();
|
||||
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
|
||||
assert_eq!(v, serde_json::json!({}));
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// HookEvent parsing
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
#[test]
|
||||
fn hook_event_parse_variants() {
|
||||
assert_eq!(HookEvent::parse("pre-tool-use"), Some(HookEvent::PreToolUse));
|
||||
assert_eq!(HookEvent::parse("PreToolUse"), Some(HookEvent::PreToolUse));
|
||||
assert_eq!(HookEvent::parse("post-tool-use"), Some(HookEvent::PostToolUse));
|
||||
assert_eq!(HookEvent::parse("PostToolUse"), Some(HookEvent::PostToolUse));
|
||||
assert_eq!(HookEvent::parse("unknown"), None);
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user