feat(fermata): add secret filtering engine — the security brain

Implement Goals 1–3 and 5 from the reveal-layer security brain goal.
fermata now detects, redacts, and scans for secrets in AI agent tool
output, filling the ecosystem gap where no coding agent filters secrets
post-read.

New core/secrets/ module:
- config.rs: .botsecrets TOML format with hierarchical merge and ~40
  built-in key patterns
- parser.rs: multi-format secret file parser (.env, TOML, YAML, JSON,
  Python assignments, Java properties)
- manifest.rs: file discovery + parsing → known-secrets set
- redactor.rs: Aho-Corasick multi-pattern replacement with 4 styles
- scanner.rs: RegexSet heuristic detection with 35 gitleaks-derived
  patterns (MIT) and Shannon entropy filtering
- patterns.rs: curated rules for AWS, GitHub, Stripe, Slack, JWT, etc.

Hook integration:
- fermata hook --event post-tool-use reads tool output, runs redactor +
  scanner, returns updatedToolOutput for Claude Code
- Backward compatible: --event pre-tool-use (default) unchanged
- Fail-open: errors produce {} and exit 0

Library API:
- Redactor::new(manifest, style).redact(text) → RedactedText
- Scanner::new(config).scan(text) → Vec<Finding>
- Compiles without CLI feature for embedding in other crates

195 tests (130 new), all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Gabor Körber
2026-05-25 17:29:07 +02:00
parent f77fd73966
commit 087429d275
22 changed files with 4557 additions and 172 deletions
+15 -6
View File
@@ -1,19 +1,26 @@
# Package: dirigent_fermata
Harness-agnostic policy gate for AI coding agents.
Harness-agnostic policy gate and secret filtering engine for AI coding agents.
## Quick Facts
- **Type**: Library + binary (`fermata`)
- **Main Entry**: `src/lib.rs`, `src/bin/fermata.rs`
- **Dependencies**: `ignore`, `toml`, `regex`, `globset`, `serde`, `clap` (cli feature)
- **Status**: v0.1library + CLI + Claude hook adapter
- **Dependencies**: `ignore`, `toml`, `regex`, `globset`, `serde`, `clap` (cli feature), `aho-corasick`, `serde_yaml`
- **Status**: v0.2policy gate + secret filtering engine
## Layering
Three concentric layers; nothing inner imports from anything outer.
- **`core/`** — harness-unaware, transport-unaware, sync. Types (`Op`, `Decision`), `.botignore` walker, `botignore.toml` parser, `Policy::check` / `check_command`, path extraction. Sync, no tokio.
- **`harness/`** — `HarnessAdapter` trait over a normalized `ToolCall`. Each adapter (Claude, future Codex, etc.) lives in its own submodule, feature-gated.
- **`core/secrets/`** — the secret filtering engine:
- `config.rs``.botsecrets` TOML parser and hierarchical resolution (user, project, local override).
- `manifest.rs` — discovers secret-containing files from `.botsecrets` patterns and loads their content for redaction.
- `parser.rs` — multi-format secret file parser (`.env`, TOML, YAML, JSON). Extracts key-value pairs where the value is a secret.
- `patterns.rs` — built-in key name patterns (~30 universal patterns like `*_KEY`, `*_SECRET`, `*_PASSWORD`) and gitleaks-derived regex patterns for heuristic scanning.
- `redactor.rs``Redactor` builds an Aho-Corasick automaton from known secret values and replaces them in arbitrary text. Sub-millisecond performance.
- `scanner.rs``Scanner` applies heuristic regex patterns to detect secrets not covered by the known-value manifest (entropy-based and format-based detection).
- **`harness/`** — `HarnessAdapter` trait over a normalized `ToolCall` (PreToolUse) and `PostToolUsePayload` (PostToolUse). Each adapter (Claude, future Codex, etc.) lives in its own submodule, feature-gated. PostToolUse enables output redaction via `updatedToolOutput` before content enters the LLM context.
- **`bin/fermata.rs`** — only place where `clap`, stdio, and exit codes appear.
## Release Model
@@ -24,11 +31,13 @@ Developed in this monorepo; planned to be exported as a standalone repo in the f
`dirigent_tools` depends on `dirigent_fermata`, never the reverse. Fermata must remain usable as a standalone hook/MCP without dragging in the in-process ACP tool runtime.
## Out of scope (v0.1)
## Out of scope (v0.2)
Codex / Gemini hook adapters, MCP server mode, PostToolUse envelope, `readonly_only` Bash mode, audit log, filesystem watcher. Each is a future task with its own plan.
Codex / Gemini hook adapters, MCP server mode, `readonly_only` Bash mode, audit log, filesystem watcher, context taint tracking. Each is a future task with its own plan.
## See also
- `docs/tools/fermata.md` — Dirigent integration plan
- `docs/workpad/brainstorm/fermata.md` — canonical product spec
- `docs/architecture/fermata-security-philosophy.md` — security philosophy and the reveal triangle
- `.botsecrets` format: `core/secrets/config.rs` — the `.gitignore` of AI agent secret protection
+2
View File
@@ -19,6 +19,7 @@ path = "src/bin/fermata.rs"
required-features = ["cli"]
[dependencies]
aho-corasick = "1.1"
globset = "0.4"
ignore = "0.4"
walkdir = "2"
@@ -26,6 +27,7 @@ toml = "0.8"
regex = "1.10"
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
serde_yaml = "0.9"
thiserror = "2.0"
clap = { version = "4.5", features = ["derive"], optional = true }
+115 -138
View File
@@ -1,44 +1,40 @@
# 𝄐 dirigent_fermata
# dirigent_fermata
**A fast, harness-agnostic policy gate for AI coding agents.**
**A fast, harness-agnostic policy gate and secret filtering engine for AI coding agents.**
Drop a `.botignore` file in your project root. Fermata reads it and blocks your agent from reading, writing, or running things it shouldn't — before the tool call happens.
```
.env
.env.*
secrets/**
conf/settings.local.yaml
```
That's all it takes.
Drop a `.botignore` to control what your agent can touch. Drop a `.botsecrets` to control what secret values your agent can see. Fermata enforces both -- before and after tool calls happen.
---
## Why Fermata
AI coding agents are powerful, but they don't have an innate sense of "don't touch `.env`." Native hook systems in tools like Claude Code let you intercept every file operation — but wiring up your own secure, fast hook for each project is friction. Fermata is that hook, ready to drop in.
AI coding agents don't have an innate sense of "don't touch `.env`" -- and even if you block the file, they can still see its contents through shell output, log files, and indirect reads. Fermata solves both problems:
- **Fast** — written in Rust; ~15ms per call. Hooks fire on every read, write, and bash operation. Python cold-start (~50150ms) compounds fast. Fermata doesn't.
- **Familiar syntax** `.botignore` uses gitignore rules via the `ignore` crate (the same engine powering ripgrep).
- **Per-operation control** — `botignore.toml` lets you block writes to `vendor/**` while still allowing reads, or deny specific bash patterns without touching path rules.
- **Harness-agnostic** — plain CLI exit codes work from any shell wrapper; the hook adapter speaks Claude Code's JSON natively.
- **Policy gate** -- `.botignore` blocks reads, writes, and dangerous commands before they execute (PreToolUse).
- **Secret filtering** -- `.botsecrets` redacts secret values from tool output before they enter the LLM context (PostToolUse).
- **Fast** -- Rust, Aho-Corasick automaton for redaction, ~1-5ms per call.
- **Familiar syntax** -- `.botignore` uses gitignore rules; `.botsecrets` uses TOML with glob patterns.
- **Harness-agnostic** -- hook adapters for Claude Code (shipped), Codex and Gemini (planned), MCP proxy (planned).
---
## Status: v0.1
## Status: v0.2
| Component | Status |
|-----------|--------|
| Library (`Op`, `Decision`, `Policy::check`, `Policy::check_command`) | Done |
| `.botignore` walker (project-root walk-up, gitignore semantics) | Done |
| Library (`Policy::check`, `Policy::check_command`) | Done |
| `.botignore` walker (gitignore semantics) | Done |
| `botignore.toml` parser (read / write / bash namespaces) | Done |
| Path identification heuristics | Done |
| CLI: `fermata check <path>...` | Done |
| CLI: `fermata hook --harness claude` | Done |
| CLI: `fermata check` / `fermata hook` | Done |
| Claude Code PreToolUse adapter | Done |
| Claude Code PostToolUse adapter (output redaction) | Done |
| `.botsecrets` config parser | Done |
| Secret manifest discovery and loading | Done |
| Multi-format secret file parser (.env, TOML, YAML, JSON) | Done |
| `Redactor` (known-value Aho-Corasick replacement) | Done |
| `Scanner` (heuristic regex + gitleaks patterns) | Done |
Out of scope for v0.1: Codex / Gemini hook adapters, MCP server mode, audit log, filesystem watcher.
Out of scope for v0.2: Codex / Gemini hook adapters, MCP proxy mode, audit log, filesystem watcher.
---
@@ -50,87 +46,43 @@ From source (this monorepo):
cargo install --path crates/dirigent_fermata --features cli
```
This installs the `fermata` binary into `~/.cargo/bin/`.
---
## Secret Filtering
Fermata's secret filtering operates in three layers:
1. **Policy gate** (PreToolUse) -- `.botignore` blocks direct access to sensitive files. Catches ~90% of accidental reads.
2. **Known-value redaction** (PostToolUse) -- `.botsecrets` declares which files contain secrets. Fermata parses them, extracts values, and replaces them in all tool output using an Aho-Corasick automaton. Zero false negatives for declared secrets.
3. **Heuristic scanning** (PostToolUse) -- regex patterns derived from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs). Safety net for secrets not covered by the manifest.
### `.botsecrets` format
Create a `.botsecrets` file at your project root:
```toml
# Files that contain secrets -- fermata parses these and redacts values
[files]
patterns = [".env", ".env.*", "secrets.*"]
# Additional secret key names (built-in defaults cover *_KEY, *_SECRET, etc.)
[keys]
include = ["STRIPE_*", "MY_APP_SIGNING_*"]
# Heuristic scanning on all tool output
[heuristic]
enabled = true
```
That's the typical case. Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
---
## Usage
### Checking a path
### Claude Code hook configuration
```bash
fermata check --op read /path/to/.env
# exit 1 — blocked
# stderr: blocked by rule ".env" in /your/project/.botignore
fermata check --op write /path/to/src/main.rs
# exit 0 — allowed
```
### Claude Code hook adapter
```bash
fermata hook --harness claude < hook_payload.json
```
Reads the PreToolUse JSON from stdin, extracts the tool name and path or command, applies policy, and emits the Claude-shaped JSON response. The hook's exit code is always `0`; the verdict is in the JSON body.
---
## Configuration
### `.botignore` — the 80% case
Create a `.botignore` at your project root. Gitignore syntax. Blocks both reads and writes.
```gitignore
# Secrets
.env
.env.*
secrets/**
# Local config overrides
conf/settings.local.yaml
conf/settings.test.yaml
# Generated files — let the tools rebuild them, not patch them
dist/**
*.lock
```
Fermata walks up from the target file to find the nearest `.botignore`, so it works correctly even when an agent changes directory.
### `botignore.toml` — per-operation rules
For cases where `.botignore`'s uniform read+write block isn't granular enough:
```toml
[read]
# Block reading secrets outright
patterns = [".env*", "secrets/**", "conf/settings.local.yaml"]
[write]
# Allow reading vendor code but block patching it
patterns = ["vendor/**", "*.lock"]
[bash]
# Hard-block destructive or exfiltrating commands
deny = [
"rm -rf /",
"curl * | sh",
"git push --force*",
]
# Ask before any removal or move
ask = ["rm:*", "mv:*"]
# Narrow allowlist for automated commands
allow_prefixes = ["make test", "git checkout:*"]
```
---
## How it fits into Claude Code
Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
```json
{
@@ -139,10 +91,15 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
{
"matcher": "Bash|Read|Edit|Write",
"hooks": [
{
"type": "command",
"command": "fermata hook --harness claude"
}
{ "type": "command", "command": "fermata hook --harness claude" }
]
}
],
"PostToolUse": [
{
"matcher": "Bash|Read|Edit|Write",
"hooks": [
{ "type": "command", "command": "fermata hook --harness claude --event post-tool-use" }
]
}
]
@@ -150,50 +107,68 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
}
```
When Claude attempts a `Read(.env)`, `Write(vendor/foo.js)`, or `Bash(rm ./secrets/key.pem)`, fermata intercepts the call, checks policy, and returns a deny with a human-readable reason — before any damage is done.
PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
### Checking a path
```bash
fermata check --op read /path/to/.env
# exit 1 -- blocked
fermata check --op write /path/to/src/main.rs
# exit 0 -- allowed
```
### Library API
```rust
use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
// Load .botsecrets config and build the manifest
let config = SecretsConfig::load("/path/to/project")?;
let manifest = Manifest::discover(&config)?;
// Known-value redaction (Aho-Corasick, sub-millisecond)
let redactor = Redactor::from_manifest(&manifest);
let clean = redactor.redact("DB_PASSWORD=hunter2\nAPI_KEY=sk-abc123");
// -> "DB_PASSWORD=*****\nAPI_KEY=*****"
// Heuristic scanning (regex patterns)
let scanner = Scanner::new(&config);
let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
```
---
## Real-world scenario
## Configuration
A project has `.env`, `conf/settings.local.yaml`, and a `vendor/` tree it doesn't want patched. With `.botignore`:
### `.botignore` -- access control
Gitignore syntax. Blocks both reads and writes.
```gitignore
.env
.env.*
conf/settings.local.yaml
vendor/**
secrets/**
```
Claude attempts to read credentials:
```
Tool: Read
Path: ./conf/settings.local.yaml
Decision: BLOCK — matched rule "conf/settings.local.yaml" (.botignore)
```
Claude attempts to read application code:
```
Tool: Read
Path: ./src/app/main.rs
Decision: ALLOW
```
Claude attempts to run `cat .env` via bash — which would bypass a path-only check:
### `botignore.toml` -- per-operation rules
```toml
# botignore.toml
[read]
patterns = [".env*", "secrets/**"]
[write]
patterns = ["vendor/**", "*.lock"]
[bash]
deny = ["cat .env*", "cat conf/settings.local*"]
deny = ["rm -rf /", "curl * | sh"]
```
```
Tool: Bash
Command: cat .env
Decision: BLOCK — matched bash deny rule "cat .env*"
```
### `.botsecrets` -- secret value redaction
See the Secret Filtering section above.
---
@@ -201,14 +176,16 @@ Decision: BLOCK — matched bash deny rule "cat .env*"
Three concentric layers; nothing inner imports from anything outer:
- **`core/`** harness-unaware, sync. Types, `.botignore` walker, `botignore.toml` parser, `Policy::check` / `check_command`, path extraction.
- **`harness/`** `HarnessAdapter` trait over a normalized `ToolCall`. Each adapter lives in its own submodule, feature-gated.
- **`bin/fermata.rs`** — the only place `clap`, stdio, and exit codes appear.
- **`core/`** -- harness-unaware, sync. Policy types, `.botignore` walker, `botignore.toml` parser, `Policy::check`.
- **`core/secrets/`** -- `.botsecrets` config, manifest discovery, multi-format parser, Aho-Corasick redactor, heuristic scanner.
- **`harness/`** -- `HarnessAdapter` trait for PreToolUse (policy gate) and PostToolUse (output redaction). Each adapter is feature-gated.
- **`bin/fermata.rs`** -- `clap`, stdio, and exit codes.
---
## See also
- `docs/tools/fermata.md` Dirigent integration plan
- `docs/workpad/brainstorm/fermata.md` — full product spec and field notes
- `docs/architecture/crates.md` — crate dependency map
- `docs/tools/fermata.md` -- Dirigent integration plan
- `docs/architecture/fermata-security-philosophy.md` -- security philosophy and the reveal triangle
- `docs/workpad/brainstorm/fermata.md` -- full product spec and field notes
- `docs/architecture/crates.md` -- crate dependency map
+180 -18
View File
@@ -1,5 +1,6 @@
use clap::{Parser, Subcommand, ValueEnum};
use dirigent_fermata::core::{project::find_project_root, Decision, Op, Policy};
use dirigent_fermata::harness::HookEvent;
use std::io::{Read, Write};
use std::path::PathBuf;
use std::process::ExitCode;
@@ -23,7 +24,11 @@ enum Cmd {
},
/// Read a harness hook payload from stdin and render the decision.
Hook {
#[arg(long)]
/// Hook event type: pre-tool-use or post-tool-use.
#[arg(long, default_value = "pre-tool-use")]
event: String,
/// Harness adapter name.
#[arg(long, default_value = "claude")]
harness: String,
},
}
@@ -49,7 +54,7 @@ fn main() -> ExitCode {
let cli = Cli::parse();
match cli.cmd {
Cmd::Check { op, json, paths } => run_check(op.into(), json, &paths),
Cmd::Hook { harness } => run_hook(&harness),
Cmd::Hook { event, harness } => run_hook(&event, &harness),
}
}
@@ -92,7 +97,7 @@ fn run_check(op: Op, json: bool, paths: &[PathBuf]) -> ExitCode {
}
}
fn run_hook(harness: &str) -> ExitCode {
fn run_hook(event_str: &str, harness: &str) -> ExitCode {
let adapter = match dirigent_fermata::harness::lookup(harness) {
Some(a) => a,
None => {
@@ -100,28 +105,51 @@ fn run_hook(harness: &str) -> ExitCode {
return ExitCode::from(2);
}
};
let event = match HookEvent::parse(event_str) {
Some(e) => e,
None => {
eprintln!("fermata: unknown event '{event_str}'");
return ExitCode::from(2);
}
};
let mut buf = Vec::new();
if let Err(e) = std::io::stdin().lock().read_to_end(&mut buf) {
eprintln!("fermata: stdin: {e}");
return ExitCode::from(2);
}
let call = match adapter.parse_request(&buf) {
match event {
HookEvent::PreToolUse => run_pre_tool_use(&*adapter, &buf),
HookEvent::PostToolUse => run_post_tool_use(&*adapter, &buf),
}
}
/// Handle a PreToolUse hook event (policy gate).
fn run_pre_tool_use(
adapter: &dyn dirigent_fermata::harness::HarnessAdapter,
buf: &[u8],
) -> ExitCode {
use dirigent_fermata::harness::{PathKind, ToolOp};
let call = match adapter.parse_request(buf) {
Ok(c) => c,
Err(e) => {
eprintln!("fermata: parse: {e}");
return ExitCode::from(2);
// Fail-open: output empty JSON and exit 0.
let _ = std::io::stdout().lock().write_all(b"{}");
return ExitCode::from(0);
}
};
use dirigent_fermata::harness::{PathKind, ToolOp};
let decision = match &call.op {
ToolOp::Path { path, kind } => {
let root = match find_project_root(path) {
// No project root → fail-open allow (hook must always exit 0 with a verdict).
// run_check silently skips these paths; here we must still emit JSON.
Some(r) => r,
None => {
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
let out = adapter
.render_decision(&call, &Decision::Allow)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
return ExitCode::from(0);
}
@@ -130,7 +158,9 @@ fn run_hook(harness: &str) -> ExitCode {
Ok(p) => p,
Err(e) => {
eprintln!("fermata: load error: {e}");
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
let out = adapter
.render_decision(&call, &Decision::Allow)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
return ExitCode::from(0);
}
@@ -143,32 +173,36 @@ fn run_hook(harness: &str) -> ExitCode {
Ok(d) => d,
Err(e) => {
eprintln!("fermata: check error: {e}");
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
let out = adapter
.render_decision(&call, &Decision::Allow)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
return ExitCode::from(0);
}
}
}
ToolOp::Command { text } => {
// For commands, we look up the project from cwd (no path argument).
let cwd = match std::env::current_dir() {
Ok(d) => d,
Err(e) => {
eprintln!("fermata: cwd error: {e}");
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
let out = adapter
.render_decision(&call, &Decision::Allow)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
return ExitCode::from(0);
}
};
match find_project_root(&cwd) {
// No project root → fail-open allow (see Path branch note above).
None => Decision::Allow,
Some(root) => {
let policy = match Policy::load(&root) {
Ok(p) => p,
Err(e) => {
eprintln!("fermata: load error: {e}");
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
let out = adapter
.render_decision(&call, &Decision::Allow)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
return ExitCode::from(0);
}
@@ -177,7 +211,9 @@ fn run_hook(harness: &str) -> ExitCode {
Ok(d) => d,
Err(e) => {
eprintln!("fermata: check error: {e}");
let out = adapter.render_decision(&call, &Decision::Allow).unwrap_or_default();
let out = adapter
.render_decision(&call, &Decision::Allow)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
return ExitCode::from(0);
}
@@ -186,9 +222,135 @@ fn run_hook(harness: &str) -> ExitCode {
}
}
};
let out = adapter.render_decision(&call, &decision).unwrap_or_default();
let out = adapter
.render_decision(&call, &decision)
.unwrap_or_default();
let _ = std::io::stdout().lock().write_all(&out);
ExitCode::from(0) // hook bins always exit 0; the JSON carries the verdict
ExitCode::from(0)
}
/// Handle a PostToolUse hook event (output redaction).
///
/// Fail-open: any error results in `{}` on stdout and exit 0, so the
/// harness continues with the original output.
fn run_post_tool_use(
adapter: &dyn dirigent_fermata::harness::HarnessAdapter,
buf: &[u8],
) -> ExitCode {
use dirigent_fermata::core::secrets::{
config::HeuristicMode, Manifest, Redactor, Scanner, SecretsConfig,
};
// Parse payload; fail-open on error.
let payload = match adapter.parse_post_tool_use(buf) {
Ok(p) => p,
Err(e) => {
eprintln!("fermata: post-tool-use parse: {e}");
let _ = std::io::stdout().lock().write_all(b"{}");
return ExitCode::from(0);
}
};
// Empty tool response — nothing to redact.
if payload.tool_response.is_empty() {
let _ = std::io::stdout().lock().write_all(b"{}");
return ExitCode::from(0);
}
// Find project root from cwd (PostToolUse has no reliable path).
let root = match std::env::current_dir().ok().and_then(|d| find_project_root(&d)) {
Some(r) => r,
None => {
// No project root → nothing to redact, pass through.
let _ = std::io::stdout().lock().write_all(b"{}");
return ExitCode::from(0);
}
};
// Load secrets config; fail-open if missing or broken.
let config = match SecretsConfig::load(&root) {
Ok(c) => c,
Err(e) => {
eprintln!("fermata: secrets config: {e}");
let _ = std::io::stdout().lock().write_all(b"{}");
return ExitCode::from(0);
}
};
// Build manifest from config (discovers .env files etc.).
let manifest = match Manifest::build(&config, &root) {
Ok(m) => m,
Err(e) => {
eprintln!("fermata: manifest: {e}");
let _ = std::io::stdout().lock().write_all(b"{}");
return ExitCode::from(0);
}
};
// Run redactor over tool_response.
let redactor = Redactor::new(&manifest, config.redaction.style);
let redacted = redactor.redact(&payload.tool_response);
// Run heuristic scanner if enabled.
let mut scanner_warning: Option<String> = None;
if config.heuristic.enabled {
if let Ok(scanner) = Scanner::new(&config.heuristic) {
// Scan the (already redacted) text so we don't re-flag known secrets.
let findings = scanner.scan(&redacted.text);
if !findings.is_empty() {
match config.heuristic.mode {
HeuristicMode::Report => {
// Log to stderr only; do not modify output.
for f in &findings {
eprintln!(
"fermata: heuristic finding [{:?}] {}: {}",
f.confidence, f.pattern_id, f.description
);
}
}
HeuristicMode::Enforce => {
let descriptions: Vec<String> = findings
.iter()
.map(|f| format!("{} ({})", f.description, f.pattern_id))
.collect();
scanner_warning = Some(format!(
"\n[fermata] WARNING: heuristic scan found {} potential secret(s): {}",
findings.len(),
descriptions.join(", ")
));
}
HeuristicMode::Disabled => {}
}
}
}
}
// Determine whether we need to send back modified output.
let redaction_count = redacted.redactions.len();
let was_redacted = redaction_count > 0;
let needs_update = was_redacted || scanner_warning.is_some();
let output = if needs_update {
let mut text = redacted.text;
if let Some(warning) = scanner_warning {
text.push_str(&warning);
}
if was_redacted {
eprintln!(
"fermata: redacted {} secret(s) from {} output",
redaction_count, payload.tool_name
);
}
Some(text)
} else {
None
};
let out = adapter
.render_post_tool_use(&payload, output.as_deref())
.unwrap_or_else(|_| b"{}".to_vec());
let _ = std::io::stdout().lock().write_all(&out);
ExitCode::from(0)
}
fn merge_worst(a: Option<Decision>, b: Decision) -> Decision {
+1
View File
@@ -6,6 +6,7 @@ pub mod extract;
pub mod op;
pub mod policy;
pub mod project;
pub mod secrets;
pub mod toml_config;
pub use decision::{Decision, Reason, Rule};
+1 -1
View File
@@ -1,7 +1,7 @@
use std::path::{Path, PathBuf};
/// Strong markers that definitively identify a project root.
const STRONG_MARKERS: &[&str] = &["botignore.toml", ".botignore.toml", ".git"];
const STRONG_MARKERS: &[&str] = &["botignore.toml", ".botignore.toml", ".botsecrets", ".git"];
/// Walk upward from `target` (or its parent if `target` is a file) looking
/// for the nearest project root. Strong markers (`botignore.toml`,
+530
View File
@@ -0,0 +1,530 @@
//! Parse and merge `.botsecrets` TOML configuration files.
//!
//! The configuration is layered (most-specific wins):
//!
//! 1. Built-in defaults
//! 2. `~/.config/fermata/.botsecrets` (user-global)
//! 3. `<root>/.botsecrets` (project)
//! 4. `<root>/.botsecrets.local` (local overrides, git-ignored)
//!
//! Vec fields like `files.patterns` are *replaced* by more-specific layers.
//! `keys.include` and `keys.exclude` *accumulate* across layers.
//! Scalar fields (style, mode, enabled) take the most-specific value.
use globset::{Glob, GlobMatcher};
use serde::{Deserialize, Serialize};
use std::path::{Path, PathBuf};
use thiserror::Error;
// ---------------------------------------------------------------------------
// Errors
// ---------------------------------------------------------------------------
#[derive(Debug, Error)]
pub enum SecretsConfigError {
#[error("io error reading {path}: {source}")]
Io {
path: PathBuf,
source: std::io::Error,
},
#[error("TOML parse error in {path}: {source}")]
Parse {
path: PathBuf,
source: toml::de::Error,
},
}
// ---------------------------------------------------------------------------
// Config types
// ---------------------------------------------------------------------------
/// Top-level `.botsecrets` configuration.
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct SecretsConfig {
#[serde(default)]
pub files: FilesConfig,
#[serde(default)]
pub keys: KeysConfig,
#[serde(default)]
pub redaction: RedactionConfig,
#[serde(default)]
pub heuristic: HeuristicConfig,
#[serde(default)]
pub enforcement: EnforcementConfig,
#[serde(default, rename = "file")]
pub file_overrides: Vec<FileOverride>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct FilesConfig {
#[serde(default = "default_file_patterns")]
pub patterns: Vec<String>,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct KeysConfig {
#[serde(default)]
pub include: Vec<String>,
#[serde(default)]
pub exclude: Vec<String>,
}
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum RedactionStyle {
Masked,
Typed,
Named,
Absent,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct RedactionConfig {
#[serde(default = "default_redaction_style")]
pub style: RedactionStyle,
}
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum HeuristicMode {
Enforce,
Report,
Disabled,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct HeuristicConfig {
#[serde(default = "default_true")]
pub enabled: bool,
#[serde(default = "default_heuristic_mode")]
pub mode: HeuristicMode,
#[serde(default)]
pub patterns: Vec<String>,
}
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum EnforcementMode {
Strict,
Permissive,
Audit,
}
#[derive(Debug, Clone, Copy, Deserialize, Serialize, PartialEq, Eq)]
#[serde(rename_all = "kebab-case")]
pub enum ParseErrorAction {
MaskEntireFile,
Allow,
Deny,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct EnforcementConfig {
#[serde(default = "default_enforcement_mode")]
pub mode: EnforcementMode,
#[serde(default = "default_parse_error_action")]
pub on_parse_error: ParseErrorAction,
}
#[derive(Debug, Clone, Deserialize, Serialize)]
pub struct FileOverride {
pub path: String,
#[serde(default)]
pub format: Option<String>,
#[serde(default)]
pub keys: Vec<String>,
}
// ---------------------------------------------------------------------------
// Built-in defaults
// ---------------------------------------------------------------------------
pub(crate) fn default_file_patterns() -> Vec<String> {
vec![
".env",
".env.*",
"*.env",
"secrets.*",
"credentials.*",
"*.key",
"*.pem",
"*.p12",
"*.pfx",
"id_rsa",
"id_ed25519",
"id_ecdsa",
"Secrets.toml",
"Secrets.*.toml",
"terraform.tfvars",
"*.auto.tfvars",
"terraform.tfstate",
"*.tfstate",
".docker/config.json",
"config/master.key",
"config/credentials/*.key",
".aws/credentials",
".netrc",
".htpasswd",
"service-account.json",
"service-account-key.json",
]
.into_iter()
.map(String::from)
.collect()
}
/// Built-in key name patterns that are always treated as sensitive.
pub const BUILTIN_KEY_PATTERNS: &[&str] = &[
"*PASSWORD*",
"*PASSWD*",
"*SECRET*",
"*API_KEY*",
"*APIKEY*",
"*TOKEN*",
"*ACCESS_KEY*",
"*PRIVATE_KEY*",
"*AUTH*",
"*CREDENTIAL*",
"*CONNECTION_STRING*",
"*CONN_STR*",
"DATABASE_URL",
"REDIS_URL",
"MONGODB_URI",
"AMQP_URL",
"AWS_SECRET_ACCESS_KEY",
"AWS_ACCESS_KEY_ID",
"AWS_SESSION_TOKEN",
"GITHUB_TOKEN",
"GH_TOKEN",
"GITLAB_TOKEN",
"NPM_TOKEN",
"NODE_AUTH_TOKEN",
"STRIPE_SECRET_KEY",
"STRIPE_WEBHOOK_SECRET",
"OPENAI_API_KEY",
"ANTHROPIC_API_KEY",
"SENTRY_DSN",
"HEROKU_API_KEY",
"SENDGRID_API_KEY",
"JWT_SECRET",
"JWT_SIGNING_KEY",
"SESSION_SECRET",
"ENCRYPTION_KEY",
"ENCRYPT_KEY",
"MASTER_KEY",
"SIGNING_KEY",
"SECRET_KEY",
"SECRET_KEY_BASE",
"APP_KEY",
"NEXTAUTH_SECRET",
];
fn default_redaction_style() -> RedactionStyle {
RedactionStyle::Masked
}
fn default_heuristic_mode() -> HeuristicMode {
HeuristicMode::Enforce
}
fn default_true() -> bool {
true
}
fn default_enforcement_mode() -> EnforcementMode {
EnforcementMode::Permissive
}
fn default_parse_error_action() -> ParseErrorAction {
ParseErrorAction::MaskEntireFile
}
// ---------------------------------------------------------------------------
// Default impls
// ---------------------------------------------------------------------------
impl Default for SecretsConfig {
fn default() -> Self {
Self {
files: FilesConfig::default(),
keys: KeysConfig::default(),
redaction: RedactionConfig::default(),
heuristic: HeuristicConfig::default(),
enforcement: EnforcementConfig::default(),
file_overrides: Vec::new(),
}
}
}
impl Default for FilesConfig {
fn default() -> Self {
Self {
patterns: default_file_patterns(),
}
}
}
impl Default for KeysConfig {
fn default() -> Self {
Self {
include: Vec::new(),
exclude: Vec::new(),
}
}
}
impl Default for RedactionConfig {
fn default() -> Self {
Self {
style: default_redaction_style(),
}
}
}
impl Default for HeuristicConfig {
fn default() -> Self {
Self {
enabled: default_true(),
mode: default_heuristic_mode(),
patterns: Vec::new(),
}
}
}
impl Default for EnforcementConfig {
fn default() -> Self {
Self {
mode: default_enforcement_mode(),
on_parse_error: default_parse_error_action(),
}
}
}
// ---------------------------------------------------------------------------
// Partial layer (for merge)
// ---------------------------------------------------------------------------
/// A partially-specified config layer parsed from a single `.botsecrets` file.
/// `Option`-wrapped fields distinguish "absent" from "explicitly set".
#[derive(Debug, Clone, Default, Deserialize)]
struct PartialSecretsConfig {
#[serde(default)]
files: Option<PartialFilesConfig>,
#[serde(default)]
keys: Option<PartialKeysConfig>,
#[serde(default)]
redaction: Option<PartialRedactionConfig>,
#[serde(default)]
heuristic: Option<PartialHeuristicConfig>,
#[serde(default)]
enforcement: Option<PartialEnforcementConfig>,
#[serde(default, rename = "file")]
file: Option<Vec<FileOverride>>,
}
#[derive(Debug, Clone, Default, Deserialize)]
struct PartialFilesConfig {
patterns: Option<Vec<String>>,
}
#[derive(Debug, Clone, Default, Deserialize)]
struct PartialKeysConfig {
include: Option<Vec<String>>,
exclude: Option<Vec<String>>,
}
#[derive(Debug, Clone, Default, Deserialize)]
struct PartialRedactionConfig {
style: Option<RedactionStyle>,
}
#[derive(Debug, Clone, Default, Deserialize)]
struct PartialHeuristicConfig {
enabled: Option<bool>,
mode: Option<HeuristicMode>,
patterns: Option<Vec<String>>,
}
#[derive(Debug, Clone, Default, Deserialize)]
struct PartialEnforcementConfig {
mode: Option<EnforcementMode>,
on_parse_error: Option<ParseErrorAction>,
}
// ---------------------------------------------------------------------------
// Merge logic
// ---------------------------------------------------------------------------
impl SecretsConfig {
/// Apply a partial layer on top of `self`.
///
/// - Vec fields (`files.patterns`, `heuristic.patterns`, `file_overrides`):
/// **replaced** by the layer's value when present.
/// - `keys.include` / `keys.exclude`: **accumulated** (appended).
/// - Scalar fields: overwritten when present in the layer.
fn merge_layer(&mut self, layer: PartialSecretsConfig) {
// files
if let Some(f) = layer.files {
if let Some(patterns) = f.patterns {
self.files.patterns = patterns;
}
}
// keys (accumulate)
if let Some(k) = layer.keys {
if let Some(inc) = k.include {
self.keys.include.extend(inc);
}
if let Some(exc) = k.exclude {
self.keys.exclude.extend(exc);
}
}
// redaction
if let Some(r) = layer.redaction {
if let Some(style) = r.style {
self.redaction.style = style;
}
}
// heuristic
if let Some(h) = layer.heuristic {
if let Some(enabled) = h.enabled {
self.heuristic.enabled = enabled;
}
if let Some(mode) = h.mode {
self.heuristic.mode = mode;
}
if let Some(patterns) = h.patterns {
self.heuristic.patterns = patterns;
}
}
// enforcement
if let Some(e) = layer.enforcement {
if let Some(mode) = e.mode {
self.enforcement.mode = mode;
}
if let Some(action) = e.on_parse_error {
self.enforcement.on_parse_error = action;
}
}
// file overrides (replace)
if let Some(overrides) = layer.file {
self.file_overrides = overrides;
}
}
}
// ---------------------------------------------------------------------------
// Loading & discovery
// ---------------------------------------------------------------------------
/// Return the user-global fermata config directory.
/// `~/.config/fermata` on Unix, `%APPDATA%/fermata` on Windows.
fn user_config_dir() -> Option<PathBuf> {
#[cfg(unix)]
{
std::env::var_os("HOME").map(|h| PathBuf::from(h).join(".config").join("fermata"))
}
#[cfg(windows)]
{
std::env::var_os("APPDATA").map(|a| PathBuf::from(a).join("fermata"))
}
}
impl SecretsConfig {
/// Load `.botsecrets` configuration for a project.
///
/// Merges layers in order (most-specific wins):
/// 1. Built-in defaults
/// 2. `~/.config/fermata/.botsecrets`
/// 3. `<root>/.botsecrets`
/// 4. `<root>/.botsecrets.local`
pub fn load(root: &Path) -> Result<Self, SecretsConfigError> {
let mut config = Self::default();
// Layer 2: user-global
if let Some(user_dir) = user_config_dir() {
let user_file = user_dir.join(".botsecrets");
if user_file.is_file() {
let layer = Self::read_partial(&user_file)?;
config.merge_layer(layer);
}
}
// Layer 3: project root
let project_file = root.join(".botsecrets");
if project_file.is_file() {
let layer = Self::read_partial(&project_file)?;
config.merge_layer(layer);
}
// Layer 4: local overrides
let local_file = root.join(".botsecrets.local");
if local_file.is_file() {
let layer = Self::read_partial(&local_file)?;
config.merge_layer(layer);
}
Ok(config)
}
/// Parse a single `.botsecrets` file into a partial layer.
fn read_partial(path: &Path) -> Result<PartialSecretsConfig, SecretsConfigError> {
let text = std::fs::read_to_string(path).map_err(|e| SecretsConfigError::Io {
path: path.to_path_buf(),
source: e,
})?;
toml::from_str(&text).map_err(|e| SecretsConfigError::Parse {
path: path.to_path_buf(),
source: e,
})
}
/// Load from a TOML string (useful for testing and embedding).
pub fn from_toml(toml_str: &str) -> Result<Self, toml::de::Error> {
toml::from_str(toml_str)
}
/// Returns the effective key-include patterns: built-in defaults + user
/// `keys.include`, minus any pattern that appears in `keys.exclude`.
pub fn effective_key_includes(&self) -> Vec<String> {
let mut patterns: Vec<String> = BUILTIN_KEY_PATTERNS
.iter()
.map(|s| (*s).to_owned())
.collect();
patterns.extend(self.keys.include.iter().cloned());
// Remove excluded patterns (exact string match).
if !self.keys.exclude.is_empty() {
let exclude_set: std::collections::HashSet<&str> =
self.keys.exclude.iter().map(|s| s.as_str()).collect();
patterns.retain(|p| !exclude_set.contains(p.as_str()));
}
patterns
}
/// Check whether `key` matches any of the effective key-include patterns.
///
/// Matching is case-insensitive and uses glob semantics (`*` wildcards).
pub fn key_matches(&self, key: &str) -> bool {
let patterns = self.effective_key_includes();
let upper = key.to_ascii_uppercase();
for pat in &patterns {
let pat_upper = pat.to_ascii_uppercase();
// Build a glob matcher. Patterns without path separators are
// matched as plain globs against the key name.
if let Ok(glob) = Glob::new(&pat_upper) {
let matcher: GlobMatcher = glob.compile_matcher();
if matcher.is_match(&upper) {
return true;
}
}
}
false
}
}
+310
View File
@@ -0,0 +1,310 @@
//! Secret manifest loader.
//!
//! Discovers secret files per the `.botsecrets` configuration, parses them,
//! filters by key patterns, and produces the known-secrets set that the
//! Redactor will consume.
use std::path::{Path, PathBuf};
use globset::{Glob, GlobSetBuilder};
use thiserror::Error;
use walkdir::WalkDir;
use super::config::{ParseErrorAction, SecretsConfig};
use super::parser::{self, FileFormat, ParseError, SecretEntry};
// ---------------------------------------------------------------------------
// Errors
// ---------------------------------------------------------------------------
#[derive(Debug, Error)]
pub enum ManifestError {
#[error(transparent)]
Parse(#[from] ParseError),
#[error("glob pattern error: {0}")]
Glob(String),
}
// ---------------------------------------------------------------------------
// Manifest
// ---------------------------------------------------------------------------
/// The complete set of known secrets discovered from a project.
///
/// Entries are sorted by value length descending (longest first) so the
/// redactor replaces the most specific match before shorter substrings.
#[derive(Debug, Clone)]
pub struct Manifest {
entries: Vec<SecretEntry>,
}
/// Minimum secret value length to keep. Anything shorter risks false-positive
/// redaction (e.g. `"yes"`, `"on"`, `"42"`).
const MIN_VALUE_LEN: usize = 4;
/// Directories that are unconditionally skipped during file discovery.
const SKIP_DIRS: &[&str] = &[".git", "node_modules", "target", "__pycache__", ".venv"];
impl Manifest {
/// Build a manifest by discovering and parsing secret files relative to
/// `root`.
pub fn build(config: &SecretsConfig, root: &Path) -> Result<Self, ManifestError> {
let mut entries = Vec::new();
// 1. Discover files matching `config.files.patterns`.
let discovered = discover_files(&config.files.patterns, root)?;
// 2. Parse each discovered file.
for path in &discovered {
match parse_discovered_file(path) {
Ok(file_entries) => entries.extend(file_entries),
Err(e) => match config.enforcement.on_parse_error {
ParseErrorAction::Allow => {
eprintln!(
"fermata: warning: skipping unparseable file {}: {}",
path.display(),
e
);
}
ParseErrorAction::Deny => {
return Err(e.into());
}
ParseErrorAction::MaskEntireFile => {
// We cannot extract individual secrets — the redactor
// may choose to mask the entire file content if it
// appears in output. For now we log and continue.
eprintln!(
"fermata: warning: cannot parse {}: {}",
path.display(),
e
);
}
},
}
}
// 3. Filter discovered entries by the effective key patterns.
entries = filter_by_key_patterns(entries, config);
// 4. Process explicit `[[file]]` overrides — these bypass key filtering
// because the user declared them intentionally.
for override_cfg in &config.file_overrides {
let override_path = root.join(&override_cfg.path);
if !override_path.is_file() {
continue;
}
let format = override_cfg
.format
.as_deref()
.and_then(FileFormat::from_hint);
let key_filter = if override_cfg.keys.is_empty() {
None
} else {
Some(override_cfg.keys.as_slice())
};
match parser::parse_secret_file(&override_path, format, key_filter) {
Ok(file_entries) => entries.extend(file_entries),
Err(e) => {
eprintln!(
"fermata: warning: cannot parse override file {}: {}",
override_path.display(),
e
);
}
}
}
// 5. Deduplicate (same key + value from different discovery paths).
entries.sort_by(|a, b| a.key.cmp(&b.key).then_with(|| a.value.cmp(&b.value)));
entries.dedup_by(|a, b| a.key == b.key && a.value == b.value);
// 6. Sort by value length descending (longest first for redaction).
entries.sort_by(|a, b| b.value.len().cmp(&a.value.len()));
// 7. Remove entries with very short values to avoid false replacements.
entries.retain(|e| e.value.len() >= MIN_VALUE_LEN);
Ok(Self { entries })
}
/// Build a manifest from a pre-built list of secret entries.
///
/// Applies the same post-processing as [`Manifest::build`]:
/// - Deduplicates entries with the same key and value.
/// - Sorts by value length descending (longest first for redaction).
/// - Removes entries with values shorter than 4 characters.
///
/// Useful for testing and for library consumers that obtain secrets
/// from sources other than filesystem discovery.
pub fn from_entries(mut entries: Vec<SecretEntry>) -> Self {
// Deduplicate (same key + value).
entries.sort_by(|a, b| a.key.cmp(&b.key).then_with(|| a.value.cmp(&b.value)));
entries.dedup_by(|a, b| a.key == b.key && a.value == b.value);
// Sort by value length descending (longest first for redaction).
entries.sort_by(|a, b| b.value.len().cmp(&a.value.len()));
// Remove entries with very short values to avoid false replacements.
entries.retain(|e| e.value.len() >= MIN_VALUE_LEN);
Self { entries }
}
/// Build an empty manifest (no secrets known).
pub fn empty() -> Self {
Self {
entries: Vec::new(),
}
}
/// Returns all discovered secret entries.
pub fn entries(&self) -> &[SecretEntry] {
&self.entries
}
/// Returns `true` if the manifest contains no secrets.
pub fn is_empty(&self) -> bool {
self.entries.is_empty()
}
/// Number of known secrets.
pub fn len(&self) -> usize {
self.entries.len()
}
}
// ---------------------------------------------------------------------------
// File discovery
// ---------------------------------------------------------------------------
/// Walk the project tree and collect files matching any of the given glob
/// patterns. Patterns are matched against paths *relative to* `root`.
fn discover_files(patterns: &[String], root: &Path) -> Result<Vec<PathBuf>, ManifestError> {
if patterns.is_empty() {
return Ok(Vec::new());
}
// Compile all patterns into a single GlobSet for efficient matching.
let mut builder = GlobSetBuilder::new();
for pat in patterns {
// `globset` patterns match against the full relative path including
// intermediate directories (e.g. `.docker/config.json`). We add
// both the literal pattern and a `**/` prefixed variant so that
// `.env` matches at the root and `subdir/.env` matches nested.
let glob = Glob::new(pat).map_err(|e| ManifestError::Glob(e.to_string()))?;
builder.add(glob);
// Also match nested occurrences: `**/<pattern>`.
if !pat.contains('/') {
let nested = format!("**/{pat}");
let nested_glob =
Glob::new(&nested).map_err(|e| ManifestError::Glob(e.to_string()))?;
builder.add(nested_glob);
}
}
let glob_set = builder.build().map_err(|e| ManifestError::Glob(e.to_string()))?;
let mut result = Vec::new();
for entry in WalkDir::new(root).follow_links(false) {
let entry = match entry {
Ok(e) => e,
Err(_) => continue,
};
// Skip common large / non-project directories.
if entry.file_type().is_dir() {
if let Some(name) = entry.file_name().to_str() {
if SKIP_DIRS.contains(&name) {
// WalkDir does not support in-place skip, but we simply
// won't match anything under these dirs because we check
// the dir name on each entry. We continue and let non-file
// entries fall through.
continue;
}
}
continue; // Only interested in files.
}
if !entry.file_type().is_file() {
continue;
}
// Check that no ancestor directory is in the skip list.
let abs_path = entry.path();
if has_skipped_ancestor(abs_path, root) {
continue;
}
// Match relative path against the glob set.
let rel = match abs_path.strip_prefix(root) {
Ok(r) => r,
Err(_) => continue,
};
if glob_set.is_match(rel) {
result.push(abs_path.to_path_buf());
}
}
Ok(result)
}
/// Returns `true` if any path component between `root` and `path` is in
/// [`SKIP_DIRS`].
fn has_skipped_ancestor(path: &Path, root: &Path) -> bool {
if let Ok(rel) = path.strip_prefix(root) {
for component in rel.parent().into_iter().flat_map(|p| p.components()) {
if let Some(name) = component.as_os_str().to_str() {
if SKIP_DIRS.contains(&name) {
return true;
}
}
}
}
false
}
// ---------------------------------------------------------------------------
// Single-file parsing
// ---------------------------------------------------------------------------
/// Parse a single discovered file. Auto-detects format from extension.
/// Returns an empty `Vec` if the format cannot be determined (e.g. `.key`,
/// `.pem` — opaque/binary files).
fn parse_discovered_file(path: &Path) -> Result<Vec<SecretEntry>, ParseError> {
let format = match FileFormat::from_path(path) {
Some(fmt) => fmt,
None => return Ok(Vec::new()), // opaque file — skip
};
parser::parse_secret_file(path, Some(format), None)
}
// ---------------------------------------------------------------------------
// Key-pattern filtering
// ---------------------------------------------------------------------------
/// Keep only entries whose key matches the effective key-include patterns
/// from the configuration.
fn filter_by_key_patterns(entries: Vec<SecretEntry>, config: &SecretsConfig) -> Vec<SecretEntry> {
entries
.into_iter()
.filter(|e| config.key_matches(&e.key))
.collect()
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn empty_manifest() {
let m = Manifest::empty();
assert!(m.is_empty());
assert_eq!(m.len(), 0);
assert!(m.entries().is_empty());
}
}
+15
View File
@@ -0,0 +1,15 @@
//! Secret-filtering configuration (`.botsecrets` files), multi-format
//! secret file parsing, and heuristic scanning.
pub mod config;
pub mod manifest;
pub mod parser;
pub mod patterns;
pub mod redactor;
pub mod scanner;
pub use config::SecretsConfig;
pub use manifest::{Manifest, ManifestError};
pub use parser::{parse_secret_file, FileFormat, ParseError, SecretEntry};
pub use redactor::{RedactedText, Redaction, Redactor};
pub use scanner::{Confidence, Finding, Scanner};
+517
View File
@@ -0,0 +1,517 @@
//! Multi-format secret file parser.
//!
//! Reads secret files (`.env`, TOML, JSON, YAML, Python assignments,
//! Java `.properties`) and extracts key-value pairs as [`SecretEntry`] items.
//! Nested structures are flattened with dot-separated keys.
use globset::Glob;
use std::path::{Path, PathBuf};
use thiserror::Error;
// ---------------------------------------------------------------------------
// Errors
// ---------------------------------------------------------------------------
#[derive(Debug, Error)]
pub enum ParseError {
#[error("io error reading {path}: {source}")]
Io {
path: PathBuf,
source: std::io::Error,
},
#[error("parse error in {path}: {message}")]
Format { path: PathBuf, message: String },
}
// ---------------------------------------------------------------------------
// Types
// ---------------------------------------------------------------------------
/// A single secret extracted from a file.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct SecretEntry {
/// The key name (e.g. `"DATABASE_URL"`, `"spring.datasource.password"`).
pub key: String,
/// The secret value.
pub value: String,
/// Which file the entry came from.
pub source: PathBuf,
}
/// Supported secret-file formats.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum FileFormat {
/// `.env` / dotenv files.
Env,
/// TOML files (e.g. `Secrets.toml`).
Toml,
/// JSON files.
Json,
/// YAML files.
Yaml,
/// Python-style assignments: `KEY = "value"` or `KEY = 'value'`.
PythonAssignments,
/// Java `.properties` files: `key=value` or `key: value`.
Properties,
}
impl FileFormat {
/// Guess format from file extension/name.
pub fn from_path(path: &Path) -> Option<Self> {
let name = path.file_name()?.to_str()?;
let ext = path.extension().and_then(|e| e.to_str());
// .env, .env.local, .env.production, etc.
if name.starts_with(".env") || name.ends_with(".env") {
return Some(Self::Env);
}
match ext {
Some("toml") => Some(Self::Toml),
Some("json") => Some(Self::Json),
Some("yaml" | "yml") => Some(Self::Yaml),
Some("py") => Some(Self::PythonAssignments),
Some("properties") => Some(Self::Properties),
// .key, .pem, etc. are binary/opaque — not parseable as key-value.
_ => None,
}
}
/// Parse from the `format` string used in `.botsecrets` `[[file]]` overrides.
pub fn from_hint(hint: &str) -> Option<Self> {
match hint {
"env" | "dotenv" => Some(Self::Env),
"toml" => Some(Self::Toml),
"json" => Some(Self::Json),
"yaml" | "yml" => Some(Self::Yaml),
"python-assignments" | "python" => Some(Self::PythonAssignments),
"properties" | "java-properties" => Some(Self::Properties),
_ => None,
}
}
}
// ---------------------------------------------------------------------------
// Public API
// ---------------------------------------------------------------------------
/// Parse a secret file and extract key-value entries.
///
/// If `format` is `None`, auto-detects from path extension.
/// If `key_filter` is `Some`, only entries whose keys match at least one
/// glob pattern are returned.
pub fn parse_secret_file(
path: &Path,
format: Option<FileFormat>,
key_filter: Option<&[String]>,
) -> Result<Vec<SecretEntry>, ParseError> {
let content = std::fs::read_to_string(path).map_err(|e| ParseError::Io {
path: path.to_path_buf(),
source: e,
})?;
let fmt = format
.or_else(|| FileFormat::from_path(path))
.ok_or_else(|| ParseError::Format {
path: path.to_path_buf(),
message: "cannot determine file format".into(),
})?;
let entries = parse_content(&content, fmt, path)?;
match key_filter {
Some(keys) => Ok(filter_entries(entries, keys)),
None => Ok(entries),
}
}
/// Parse content string without reading from disk (useful for testing).
pub fn parse_content(
content: &str,
format: FileFormat,
source: &Path,
) -> Result<Vec<SecretEntry>, ParseError> {
match format {
FileFormat::Env => parse_env(content, source),
FileFormat::Toml => parse_toml(content, source),
FileFormat::Json => parse_json(content, source),
FileFormat::Yaml => parse_yaml(content, source),
FileFormat::PythonAssignments => parse_python_assignments(content, source),
FileFormat::Properties => parse_properties(content, source),
}
}
// ---------------------------------------------------------------------------
// Format parsers
// ---------------------------------------------------------------------------
/// Parse `.env` / dotenv files.
///
/// Supports `KEY=VALUE`, `KEY="VALUE"`, `KEY='VALUE'`, and the `export`
/// prefix. Comments (`#`) and empty lines are skipped.
fn parse_env(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
let mut entries = Vec::new();
for line in content.lines() {
let trimmed = line.trim();
// Skip blank lines and comments.
if trimmed.is_empty() || trimmed.starts_with('#') {
continue;
}
// Strip optional `export ` prefix.
let trimmed = trimmed
.strip_prefix("export ")
.or_else(|| trimmed.strip_prefix("export\t"))
.unwrap_or(trimmed);
// Split on first `=`.
let Some((key, raw_value)) = trimmed.split_once('=') else {
continue;
};
let key = key.trim().to_string();
if key.is_empty() {
continue;
}
let value = strip_env_value(raw_value);
entries.push(SecretEntry {
key,
value,
source: source.to_path_buf(),
});
}
Ok(entries)
}
/// Strip surrounding quotes and trailing whitespace from an env value.
fn strip_env_value(raw: &str) -> String {
let trimmed = raw.trim();
// Double-quoted value.
if trimmed.starts_with('"') && trimmed.ends_with('"') && trimmed.len() >= 2 {
let inner = &trimmed[1..trimmed.len() - 1];
// Interpret common escape sequences.
return inner.replace("\\n", "\n").replace("\\t", "\t");
}
// Single-quoted value (literal, no escapes).
if trimmed.starts_with('\'') && trimmed.ends_with('\'') && trimmed.len() >= 2 {
return trimmed[1..trimmed.len() - 1].to_string();
}
// Unquoted — trim trailing whitespace (already trimmed above) and strip
// inline comments.
if let Some(pos) = trimmed.find(" #") {
trimmed[..pos].trim_end().to_string()
} else {
trimmed.to_string()
}
}
/// Parse TOML files. Nested tables are flattened with dot separators.
/// Only string values are extracted.
fn parse_toml(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
let table: toml::Value = content.parse().map_err(|e: toml::de::Error| ParseError::Format {
path: source.to_path_buf(),
message: e.to_string(),
})?;
let mut entries = Vec::new();
flatten_toml_value(&table, "", source, &mut entries);
Ok(entries)
}
fn flatten_toml_value(
value: &toml::Value,
prefix: &str,
source: &Path,
entries: &mut Vec<SecretEntry>,
) {
match value {
toml::Value::String(s) => {
if !prefix.is_empty() {
entries.push(SecretEntry {
key: prefix.to_string(),
value: s.clone(),
source: source.to_path_buf(),
});
}
}
toml::Value::Table(map) => {
for (k, v) in map {
let key = if prefix.is_empty() {
k.clone()
} else {
format!("{prefix}.{k}")
};
flatten_toml_value(v, &key, source, entries);
}
}
toml::Value::Array(arr) => {
for (i, v) in arr.iter().enumerate() {
let key = if prefix.is_empty() {
i.to_string()
} else {
format!("{prefix}.{i}")
};
flatten_toml_value(v, &key, source, entries);
}
}
// Integer, Float, Boolean, Datetime — skip, not secrets.
_ => {}
}
}
/// Parse JSON files. Nested objects are flattened with dot separators.
/// Arrays use numeric indices. Only string values are extracted.
fn parse_json(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
let value: serde_json::Value =
serde_json::from_str(content).map_err(|e| ParseError::Format {
path: source.to_path_buf(),
message: e.to_string(),
})?;
let mut entries = Vec::new();
flatten_json_value(&value, "", source, &mut entries);
Ok(entries)
}
fn flatten_json_value(
value: &serde_json::Value,
prefix: &str,
source: &Path,
entries: &mut Vec<SecretEntry>,
) {
match value {
serde_json::Value::String(s) => {
if !prefix.is_empty() {
entries.push(SecretEntry {
key: prefix.to_string(),
value: s.clone(),
source: source.to_path_buf(),
});
}
}
serde_json::Value::Object(map) => {
for (k, v) in map {
let key = if prefix.is_empty() {
k.clone()
} else {
format!("{prefix}.{k}")
};
flatten_json_value(v, &key, source, entries);
}
}
serde_json::Value::Array(arr) => {
for (i, v) in arr.iter().enumerate() {
let key = if prefix.is_empty() {
i.to_string()
} else {
format!("{prefix}.{i}")
};
flatten_json_value(v, &key, source, entries);
}
}
// Number, Bool, Null — skip.
_ => {}
}
}
/// Parse YAML files. Nested mappings are flattened with dot separators.
/// Only string values are extracted.
fn parse_yaml(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
let value: serde_yaml::Value =
serde_yaml::from_str(content).map_err(|e| ParseError::Format {
path: source.to_path_buf(),
message: e.to_string(),
})?;
let mut entries = Vec::new();
flatten_yaml_value(&value, "", source, &mut entries);
Ok(entries)
}
fn flatten_yaml_value(
value: &serde_yaml::Value,
prefix: &str,
source: &Path,
entries: &mut Vec<SecretEntry>,
) {
match value {
serde_yaml::Value::String(s) => {
if !prefix.is_empty() {
entries.push(SecretEntry {
key: prefix.to_string(),
value: s.clone(),
source: source.to_path_buf(),
});
}
}
serde_yaml::Value::Mapping(map) => {
for (k, v) in map {
let k_str = match k {
serde_yaml::Value::String(s) => s.clone(),
other => format!("{other:?}"),
};
let key = if prefix.is_empty() {
k_str
} else {
format!("{prefix}.{k_str}")
};
flatten_yaml_value(v, &key, source, entries);
}
}
serde_yaml::Value::Sequence(arr) => {
for (i, v) in arr.iter().enumerate() {
let key = if prefix.is_empty() {
i.to_string()
} else {
format!("{prefix}.{i}")
};
flatten_yaml_value(v, &key, source, entries);
}
}
// Number, Bool, Null, Tagged — skip.
_ => {}
}
}
/// Parse Python-style assignment lines: `KEY = "value"` or `KEY = 'value'`.
///
/// This is heuristic — lines that don't match the pattern are silently skipped.
fn parse_python_assignments(
content: &str,
source: &Path,
) -> Result<Vec<SecretEntry>, ParseError> {
let re = regex::Regex::new(r#"(?i)^([A-Z_][A-Z0-9_]*)\s*=\s*['"](.+?)['"]\s*$"#)
.expect("valid regex");
let mut entries = Vec::new();
for line in content.lines() {
let trimmed = line.trim();
if trimmed.is_empty() || trimmed.starts_with('#') {
continue;
}
if let Some(caps) = re.captures(trimmed) {
entries.push(SecretEntry {
key: caps[1].to_string(),
value: caps[2].to_string(),
source: source.to_path_buf(),
});
}
}
Ok(entries)
}
/// Parse Java `.properties` files.
///
/// Supports `key=value`, `key: value`, `key value` (space separator).
/// Lines starting with `#` or `!` are comments. Continuation lines ending
/// with `\` are joined.
fn parse_properties(content: &str, source: &Path) -> Result<Vec<SecretEntry>, ParseError> {
let mut entries = Vec::new();
let mut lines = content.lines().peekable();
while let Some(line) = lines.next() {
let trimmed = line.trim();
// Skip blank lines and comments.
if trimmed.is_empty() || trimmed.starts_with('#') || trimmed.starts_with('!') {
continue;
}
// Handle continuation lines (trailing `\`).
let mut logical_line = String::new();
let mut current = trimmed.to_string();
while current.ends_with('\\') {
// Remove trailing backslash and append next line.
logical_line.push_str(&current[..current.len() - 1]);
current = lines
.next()
.map(|l| l.trim_start().to_string())
.unwrap_or_default();
}
logical_line.push_str(&current);
// Split on first `=`, `:`, or whitespace.
let (key, value) = split_property_line(&logical_line);
if key.is_empty() {
continue;
}
entries.push(SecretEntry {
key,
value,
source: source.to_path_buf(),
});
}
Ok(entries)
}
/// Split a logical properties line into (key, value).
/// Recognises `=`, `:`, or whitespace as the separator.
fn split_property_line(line: &str) -> (String, String) {
// Find the first unescaped separator.
let mut i = 0;
let bytes = line.as_bytes();
let len = bytes.len();
while i < len {
// Skip escaped characters.
if bytes[i] == b'\\' {
i += 2;
continue;
}
if bytes[i] == b'=' || bytes[i] == b':' {
let key = line[..i].trim().to_string();
let value = line[i + 1..].trim().to_string();
return (key, value);
}
if bytes[i] == b' ' || bytes[i] == b'\t' {
let key = line[..i].trim().to_string();
let value = line[i..].trim().to_string();
return (key, value);
}
i += 1;
}
// No separator found — the entire line is a key with an empty value.
(line.trim().to_string(), String::new())
}
// ---------------------------------------------------------------------------
// Key filtering
// ---------------------------------------------------------------------------
/// Filter entries by glob patterns (case-insensitive).
fn filter_entries(entries: Vec<SecretEntry>, patterns: &[String]) -> Vec<SecretEntry> {
// Pre-compile matchers.
let matchers: Vec<_> = patterns
.iter()
.filter_map(|p| {
Glob::new(&p.to_ascii_uppercase())
.ok()
.map(|g| g.compile_matcher())
})
.collect();
if matchers.is_empty() {
return Vec::new();
}
entries
.into_iter()
.filter(|entry| {
let upper = entry.key.to_ascii_uppercase();
matchers.iter().any(|m| m.is_match(&upper))
})
.collect()
}
+258
View File
@@ -0,0 +1,258 @@
//! Built-in regex patterns for heuristic secret detection.
//!
//! Rules are derived from [gitleaks](https://github.com/gitleaks/gitleaks) (MIT license)
//! and curated for high-confidence detection in AI agent output streams.
use std::borrow::Cow;
/// A single detection rule.
#[derive(Debug, Clone)]
pub struct DetectionRule {
/// Unique identifier (e.g. `"aws-access-key"`, `"github-pat"`).
pub id: Cow<'static, str>,
/// Human-readable description.
pub description: Cow<'static, str>,
/// Regex pattern string.
pub pattern: Cow<'static, str>,
/// Minimum Shannon entropy threshold for matched text.
/// `None` means no entropy check — the pattern alone is sufficient.
pub entropy_threshold: Option<f64>,
}
/// Returns the built-in detection rules (gitleaks-derived, MIT licensed).
pub fn builtin_rules() -> &'static [DetectionRule] {
&RULES
}
/// Convenience macro to define a static rule with `Cow::Borrowed`.
macro_rules! rule {
($id:expr, $desc:expr, $pat:expr) => {
DetectionRule {
id: Cow::Borrowed($id),
description: Cow::Borrowed($desc),
pattern: Cow::Borrowed($pat),
entropy_threshold: None,
}
};
($id:expr, $desc:expr, $pat:expr, entropy: $threshold:expr) => {
DetectionRule {
id: Cow::Borrowed($id),
description: Cow::Borrowed($desc),
pattern: Cow::Borrowed($pat),
entropy_threshold: Some($threshold),
}
};
}
static RULES: [DetectionRule; 35] = [
// -----------------------------------------------------------------------
// Cloud provider keys
// -----------------------------------------------------------------------
rule!(
"aws-access-key",
"AWS Access Key ID",
r"(?:A3T[A-Z0-9]|AKIA|ASIA|ABIA|ACCA)[A-Z2-7]{16}"
),
rule!(
"aws-secret-key",
"AWS Secret Access Key (near assignment)",
r"(?i)aws[_\-\.]?secret[_\-\.]?access[_\-\.]?key[\s]*[=:\s]+[\s]*['\x22]?([A-Za-z0-9/+=]{40})['\x22]?"
),
rule!(
"gcp-api-key",
"GCP API Key",
r"AIza[0-9A-Za-z\-_]{35}"
),
rule!(
"gcp-service-account",
"GCP Service Account JSON",
r#"\x22type\x22\s*:\s*\x22service_account\x22"#
),
// -----------------------------------------------------------------------
// Code hosting tokens
// -----------------------------------------------------------------------
rule!(
"github-pat-fine-grained",
"GitHub Fine-Grained Personal Access Token",
r"github_pat_[A-Za-z0-9_]{82}"
),
rule!(
"github-pat-classic",
"GitHub Classic Personal Access Token",
r"ghp_[A-Za-z0-9]{36}"
),
rule!(
"github-oauth",
"GitHub OAuth Access Token",
r"gho_[A-Za-z0-9]{36}"
),
rule!(
"github-app-user-token",
"GitHub App User-to-Server Token",
r"ghu_[A-Za-z0-9]{36}"
),
rule!(
"github-app-server-token",
"GitHub App Server-to-Server Token",
r"ghs_[A-Za-z0-9]{36}"
),
rule!(
"gitlab-pat",
"GitLab Personal Access Token",
r"glpat-[A-Za-z0-9\-_]{20,}"
),
rule!(
"gitlab-pipeline-token",
"GitLab Pipeline Trigger Token",
r"glptt-[A-Za-z0-9\-_]{20,}"
),
// -----------------------------------------------------------------------
// Payment
// -----------------------------------------------------------------------
rule!(
"stripe-secret-key",
"Stripe Secret Key",
r"sk_live_[A-Za-z0-9]{24,}"
),
rule!(
"stripe-restricted-key",
"Stripe Restricted Key",
r"rk_live_[A-Za-z0-9]{24,}"
),
// -----------------------------------------------------------------------
// Communication
// -----------------------------------------------------------------------
rule!(
"slack-bot-token",
"Slack Bot Token",
r"xoxb-[0-9]{10,}-[0-9]{10,}-[A-Za-z0-9]{24,}"
),
rule!(
"slack-user-token",
"Slack User Token",
r"xoxp-[0-9]{10,}-[0-9]{10,}-[0-9]{10,}-[a-z0-9]{32}"
),
rule!(
"slack-webhook",
"Slack Incoming Webhook URL",
r"https://hooks\.slack\.com/services/T[A-Z0-9]{8,}/B[A-Z0-9]{8,}/[A-Za-z0-9]{24,}"
),
rule!(
"twilio-api-key",
"Twilio API Key",
r"SK[a-f0-9]{32}"
),
rule!(
"sendgrid-api-key",
"SendGrid API Key",
r"SG\.[A-Za-z0-9_\-]{22}\.[A-Za-z0-9_\-]{43}"
),
// -----------------------------------------------------------------------
// Auth / Identity
// -----------------------------------------------------------------------
rule!(
"jwt",
"JSON Web Token",
r"eyJ[A-Za-z0-9_-]{10,}\.eyJ[A-Za-z0-9_-]{10,}\.[A-Za-z0-9_\-]{10,}"
),
rule!(
"bearer-token",
"Bearer Token in Authorization Header",
r"(?i)bearer\s+[A-Za-z0-9\-._~+/]+=*"
),
// -----------------------------------------------------------------------
// Cryptographic material
// -----------------------------------------------------------------------
rule!(
"private-key-header",
"Private Key (PEM Header)",
r"-----BEGIN\s?(?:RSA |DSA |EC |PGP |OPENSSH )?PRIVATE KEY-----"
),
rule!(
"pgp-private-key",
"PGP Private Key Block",
r"-----BEGIN PGP PRIVATE KEY BLOCK-----"
),
// -----------------------------------------------------------------------
// Database
// -----------------------------------------------------------------------
rule!(
"database-connection-url",
"Database Connection URL with Credentials",
r"(?i)(?:postgres|mysql|mongodb|redis|amqp)://[^:\s]+:[^@\s]+@[^\s]+"
),
// -----------------------------------------------------------------------
// Infrastructure
// -----------------------------------------------------------------------
rule!(
"heroku-api-key",
"Heroku API Key",
r"(?i)heroku[_\-\.]?api[_\-\.]?key[\s]*[=:\s]+[\s]*[A-Fa-f0-9]{8}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{4}-[A-Fa-f0-9]{12}"
),
rule!(
"npm-token",
"npm Access Token",
r"(?i)npm_[A-Za-z0-9]{36}"
),
rule!(
"pypi-token",
"PyPI API Token",
r"pypi-[A-Za-z0-9_\-]{50,}"
),
rule!(
"docker-hub-token",
"Docker Hub Personal Access Token",
r"dckr_pat_[A-Za-z0-9_\-]{27,}"
),
// -----------------------------------------------------------------------
// AI / ML
// -----------------------------------------------------------------------
rule!(
"openai-api-key-legacy",
"OpenAI API Key (Legacy Format)",
r"sk-[A-Za-z0-9]{20}T3BlbkFJ[A-Za-z0-9]{20}"
),
rule!(
"openai-project-key",
"OpenAI Project API Key",
r"sk-proj-[A-Za-z0-9\-_]{40,}"
),
rule!(
"anthropic-api-key",
"Anthropic API Key",
r"sk-ant-[A-Za-z0-9\-_]{40,}"
),
// -----------------------------------------------------------------------
// Generic patterns (entropy-gated)
// -----------------------------------------------------------------------
rule!(
"generic-api-key",
"Generic API Key Assignment",
r"(?i)(?:api[_\-]?key|apikey)[\s]*[=:]\s*['\x22]?([A-Za-z0-9_\-]{20,})['\x22]?",
entropy: 3.5
),
rule!(
"generic-secret",
"Generic Secret/Password/Token Assignment",
r"(?i)(?:secret|password|passwd|token)[\s]*[=:]\s*['\x22]?([^\s'\x22]{8,})['\x22]?",
entropy: 3.0
),
rule!(
"generic-private-key",
"Generic Private Key Assignment",
r"(?i)private[_\-]?key[\s]*[=:]\s*['\x22]?([^\s'\x22]{20,})['\x22]?",
entropy: 3.5
),
rule!(
"high-entropy-hex",
"High-Entropy Hex String (32+ chars)",
r"(?i)[=:]\s*['\x22]?([0-9a-f]{32,})['\x22]?",
entropy: 3.5
),
rule!(
"high-entropy-base64",
"High-Entropy Base64 String (24+ chars)",
r"(?i)[=:]\s*['\x22]?([A-Za-z0-9+/]{24,}={0,3})['\x22]?",
entropy: 4.0
),
];
+172
View File
@@ -0,0 +1,172 @@
//! Secret value redactor.
//!
//! Takes the known-secrets [`Manifest`] and efficiently replaces every
//! occurrence of a secret value in arbitrary text using an Aho-Corasick
//! automaton for multi-pattern matching.
use aho_corasick::AhoCorasick;
use super::config::RedactionStyle;
use super::manifest::Manifest;
// ---------------------------------------------------------------------------
// Output types
// ---------------------------------------------------------------------------
/// A redaction event -- records what was replaced and where.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Redaction {
/// The key name of the redacted secret.
pub key: String,
/// Byte offset in the *original* text where the match starts.
pub offset: usize,
/// Length (in bytes) of the original secret value that was replaced.
pub original_len: usize,
}
/// The result of redacting text.
#[derive(Debug, Clone)]
pub struct RedactedText {
/// The text with secret values replaced.
pub text: String,
/// List of redactions that were applied (in order of occurrence).
pub redactions: Vec<Redaction>,
}
impl RedactedText {
/// Returns `true` if any redactions were made.
pub fn was_redacted(&self) -> bool {
!self.redactions.is_empty()
}
}
// ---------------------------------------------------------------------------
// Redactor
// ---------------------------------------------------------------------------
/// Replaces known secret values in text with configurable placeholders.
///
/// Construction is cheap when the manifest is empty and O(n) in the total
/// length of secret values otherwise (Aho-Corasick automaton build).
/// Redaction itself is O(n) in the length of the input text.
#[derive(Debug, Clone)]
pub struct Redactor {
/// The Aho-Corasick automaton for multi-pattern matching.
/// `None` when the manifest is empty (no-op fast path).
automaton: Option<AhoCorasick>,
/// Secret entries parallel to the automaton patterns.
/// Index `i` in the automaton corresponds to `entries[i]`.
entries: Vec<RedactorEntry>,
/// How to format replacements.
style: RedactionStyle,
}
/// Internal entry -- stores info needed for replacement formatting.
#[derive(Debug, Clone)]
struct RedactorEntry {
key: String,
value_len: usize,
}
impl Redactor {
/// Build a redactor from a manifest and redaction style.
///
/// The manifest entries are already sorted by value length descending,
/// but Aho-Corasick with `LeftmostLongest` handles overlap correctly
/// regardless of input order.
pub fn new(manifest: &Manifest, style: RedactionStyle) -> Self {
let secrets = manifest.entries();
if secrets.is_empty() {
return Self {
automaton: None,
entries: Vec::new(),
style,
};
}
// Build patterns from secret *values* (not keys).
let patterns: Vec<&str> = secrets.iter().map(|e| e.value.as_str()).collect();
let entries: Vec<RedactorEntry> = secrets
.iter()
.map(|e| RedactorEntry {
key: e.key.clone(),
value_len: e.value.len(),
})
.collect();
// LeftmostLongest ensures that when one secret value is a substring
// of another, the longer match wins.
let automaton = AhoCorasick::builder()
.match_kind(aho_corasick::MatchKind::LeftmostLongest)
.build(&patterns)
.ok(); // If build fails (shouldn't for valid strings), fall back to no-op.
Self {
automaton,
entries,
style,
}
}
/// Redact all known secret values in the input text.
///
/// Returns the redacted text together with metadata about each
/// replacement (key name, byte offset, original length).
pub fn redact(&self, text: &str) -> RedactedText {
let automaton = match &self.automaton {
Some(a) => a,
None => {
return RedactedText {
text: text.to_string(),
redactions: Vec::new(),
}
}
};
let mut result = String::with_capacity(text.len());
let mut redactions = Vec::new();
let mut last_end = 0;
for mat in automaton.find_iter(text) {
let entry = &self.entries[mat.pattern().as_usize()];
// Append text before the match.
result.push_str(&text[last_end..mat.start()]);
// Append the replacement placeholder.
let replacement = self.format_replacement(entry);
result.push_str(&replacement);
redactions.push(Redaction {
key: entry.key.clone(),
offset: mat.start(),
original_len: entry.value_len,
});
last_end = mat.end();
}
// Append remaining text after the last match.
result.push_str(&text[last_end..]);
RedactedText {
text: result,
redactions,
}
}
/// Format the replacement string according to the configured style.
fn format_replacement(&self, entry: &RedactorEntry) -> String {
match self.style {
RedactionStyle::Masked => "*****".to_string(),
RedactionStyle::Typed => format!("<REDACTED:string:{}>", entry.value_len),
RedactionStyle::Named => format!("<REDACTED:{}>", entry.key),
RedactionStyle::Absent => String::new(),
}
}
/// Returns `true` if this redactor has any secrets loaded.
pub fn has_secrets(&self) -> bool {
self.automaton.is_some()
}
}
+250
View File
@@ -0,0 +1,250 @@
//! Heuristic secret scanner using [`RegexSet`] for single-pass multi-pattern
//! matching with optional Shannon entropy filtering.
//!
//! The scanner operates purely on text input — it has no knowledge of redaction,
//! manifests, or file structure. Callers feed it text and receive [`Finding`]s.
use std::borrow::Cow;
use std::ops::Range;
use regex::{Regex, RegexSet};
use thiserror::Error;
use super::config::HeuristicConfig;
use super::patterns::{self, DetectionRule};
// ---------------------------------------------------------------------------
// Errors
// ---------------------------------------------------------------------------
#[derive(Debug, Error)]
pub enum ScannerError {
#[error("invalid regex pattern: {0}")]
Regex(#[from] regex::Error),
}
// ---------------------------------------------------------------------------
// Finding types
// ---------------------------------------------------------------------------
/// Confidence level of a finding.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub enum Confidence {
/// Specific provider pattern matched (e.g. `ghp_`, `AKIA`).
High,
/// Generic pattern matched and passed entropy threshold.
Medium,
/// Generic pattern matched but entropy was borderline.
Low,
}
/// A single potential secret detected in the input text.
#[derive(Debug, Clone)]
pub struct Finding {
/// The matched text substring.
pub matched_text: String,
/// Which detection rule triggered this finding.
pub pattern_id: String,
/// Human-readable description of the rule.
pub description: String,
/// Confidence level.
pub confidence: Confidence,
/// Byte range in the input text.
pub span: Range<usize>,
}
// ---------------------------------------------------------------------------
// Scanner
// ---------------------------------------------------------------------------
/// Pre-compiled multi-pattern secret scanner.
///
/// Holds a [`RegexSet`] for fast "any match?" bulk filtering and parallel
/// individual [`Regex`] instances for extracting match details and spans.
pub struct Scanner {
/// Pre-compiled set for fast "any match?" check.
regex_set: RegexSet,
/// Individual compiled regexes for extracting match details (parallel to `regex_set`).
regexes: Vec<Regex>,
/// Rule metadata (parallel to `regex_set`).
rules: Vec<DetectionRule>,
}
impl std::fmt::Debug for Scanner {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
f.debug_struct("Scanner")
.field("rule_count", &self.rules.len())
.finish()
}
}
impl Scanner {
/// Build a scanner from the given heuristic configuration.
///
/// Includes all built-in rules plus any custom patterns from `config.patterns`.
pub fn new(config: &HeuristicConfig) -> Result<Self, ScannerError> {
let mut rules: Vec<DetectionRule> = patterns::builtin_rules().to_vec();
// Append custom patterns from config.
for (i, pat) in config.patterns.iter().enumerate() {
rules.push(DetectionRule {
id: Cow::Owned(format!("custom-{i}")),
description: Cow::Owned(format!("Custom pattern #{i}")),
pattern: Cow::Owned(pat.clone()),
entropy_threshold: None,
});
}
let pattern_strings: Vec<&str> = rules.iter().map(|r| r.pattern.as_ref()).collect();
let regex_set = RegexSet::new(&pattern_strings)?;
let regexes = pattern_strings
.iter()
.map(|p| Regex::new(p))
.collect::<Result<Vec<_>, _>>()?;
Ok(Self {
regex_set,
regexes,
rules,
})
}
/// Build a scanner with only the built-in rules (no custom patterns).
pub fn builtin() -> Result<Self, ScannerError> {
Self::new(&HeuristicConfig::default())
}
/// Scan `text` for potential secrets.
///
/// Returns findings sorted by byte position with overlapping matches
/// deduplicated (first match wins).
pub fn scan(&self, text: &str) -> Vec<Finding> {
let matches = self.regex_set.matches(text);
if !matches.matched_any() {
return Vec::new();
}
let mut findings = Vec::new();
for idx in matches.iter() {
let rule = &self.rules[idx];
let regex = &self.regexes[idx];
for mat in regex.find_iter(text) {
let matched_text = mat.as_str();
// Apply entropy threshold when configured.
if let Some(threshold) = rule.entropy_threshold {
if shannon_entropy(matched_text) < threshold {
continue;
}
}
let confidence = if rule.entropy_threshold.is_some() {
Confidence::Medium
} else {
Confidence::High
};
findings.push(Finding {
matched_text: matched_text.to_string(),
pattern_id: rule.id.to_string(),
description: rule.description.to_string(),
confidence,
span: mat.start()..mat.end(),
});
}
}
// Sort by position, then deduplicate overlapping spans.
findings.sort_by_key(|f| f.span.start);
dedup_overlapping(&mut findings);
findings
}
/// Returns the number of active rules (built-in + custom).
pub fn rule_count(&self) -> usize {
self.rules.len()
}
}
// ---------------------------------------------------------------------------
// Shannon entropy
// ---------------------------------------------------------------------------
/// Calculate Shannon entropy of `s` in bits per character.
///
/// Returns 0.0 for empty strings. Maximum entropy for ASCII printable text
/// is ~6.57 bits/char.
pub fn shannon_entropy(s: &str) -> f64 {
if s.is_empty() {
return 0.0;
}
let mut freq = [0u32; 256];
let len = s.len() as f64;
for &b in s.as_bytes() {
freq[b as usize] += 1;
}
freq.iter()
.filter(|&&c| c > 0)
.map(|&c| {
let p = c as f64 / len;
-p * p.log2()
})
.sum()
}
// ---------------------------------------------------------------------------
// Deduplication
// ---------------------------------------------------------------------------
/// Remove findings whose span overlaps with an earlier (higher-priority) finding.
///
/// Input must be sorted by `span.start`. When two findings overlap, the one
/// appearing first (lower start position) is kept.
fn dedup_overlapping(findings: &mut Vec<Finding>) {
let mut i = 0;
while i < findings.len() {
let end = findings[i].span.end;
let mut j = i + 1;
while j < findings.len() {
if findings[j].span.start < end {
findings.remove(j);
} else {
break;
}
}
i += 1;
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn entropy_all_same_chars() {
// All same characters → 0 entropy.
assert!((shannon_entropy("aaaaaaaaaa") - 0.0).abs() < f64::EPSILON);
}
#[test]
fn entropy_two_equal_chars() {
// "ab" repeated → exactly 1.0 bits/char.
let e = shannon_entropy("abababababababababab");
assert!((e - 1.0).abs() < 0.01);
}
#[test]
fn entropy_high_randomness() {
// A string with many distinct characters should have high entropy.
let s = "aB3$kL9!mZ7@wQ1#";
assert!(shannon_entropy(s) > 3.5);
}
#[test]
fn entropy_empty_string() {
assert!((shannon_entropy("") - 0.0).abs() < f64::EPSILON);
}
}
+54 -5
View File
@@ -1,10 +1,11 @@
//! Claude Code hook adapter (PreToolUse).
//! Claude Code hook adapter (PreToolUse + PostToolUse).
//!
//! Wire format: stdin is one JSON object with `tool_name` and `tool_input`.
//! Wire format: stdin is one JSON object with `tool_name` and `tool_input`
//! (and optionally `tool_response` for PostToolUse).
//! Stdout is `{"hookSpecificOutput": {...}}` with exit code 0; the JSON
//! carries the verdict.
//! carries the verdict / updated output.
use super::{AdapterError, HarnessAdapter, PathKind, ToolCall, ToolOp};
use super::{AdapterError, HarnessAdapter, PathKind, PostToolUsePayload, ToolCall, ToolOp};
use crate::core::Decision;
use serde_json::{json, Value};
use std::path::PathBuf;
@@ -16,6 +17,8 @@ impl HarnessAdapter for ClaudeAdapter {
"claude"
}
// -- PreToolUse --------------------------------------------------------
fn parse_request(&self, input: &[u8]) -> Result<ToolCall, AdapterError> {
let v: Value = serde_json::from_slice(input)?;
let tool_name = v
@@ -39,7 +42,11 @@ impl HarnessAdapter for ClaudeAdapter {
})
}
fn render_decision(&self, _call: &ToolCall, decision: &Decision) -> Result<Vec<u8>, AdapterError> {
fn render_decision(
&self,
_call: &ToolCall,
decision: &Decision,
) -> Result<Vec<u8>, AdapterError> {
let (verdict, reason) = match decision {
Decision::Allow => ("allow", String::new()),
Decision::Ask(r) => ("ask", r.message.clone()),
@@ -54,6 +61,48 @@ impl HarnessAdapter for ClaudeAdapter {
});
Ok(serde_json::to_vec(&out)?)
}
// -- PostToolUse -------------------------------------------------------
fn parse_post_tool_use(&self, input: &[u8]) -> Result<PostToolUsePayload, AdapterError> {
let v: Value = serde_json::from_slice(input)?;
let tool_name = v
.get("tool_name")
.and_then(|x| x.as_str())
.ok_or_else(|| AdapterError::Parse("missing tool_name".into()))?
.to_string();
let tool_input = v.get("tool_input").cloned().unwrap_or(Value::Null);
let tool_response = v
.get("tool_response")
.and_then(|x| x.as_str())
.unwrap_or("")
.to_string();
Ok(PostToolUsePayload {
tool_name,
tool_input,
tool_response,
raw: v,
})
}
fn render_post_tool_use(
&self,
_payload: &PostToolUsePayload,
redacted_output: Option<&str>,
) -> Result<Vec<u8>, AdapterError> {
// When there are no changes, return `{}` — Claude Code interprets
// an empty object as "use original output, no modifications".
let out = match redacted_output {
Some(text) => json!({
"hookSpecificOutput": {
"hookEventName": "PostToolUse",
"updatedToolOutput": text,
}
}),
None => json!({}),
};
Ok(serde_json::to_vec(&out)?)
}
}
fn path_op(tool_input: &Value, kind: PathKind) -> Result<ToolOp, AdapterError> {
+55 -3
View File
@@ -42,16 +42,68 @@ pub enum PathKind {
Write,
}
/// Trait implemented by each harness adapter. Adapters parse the harness's
/// hook stdin payload into `ToolCall` and render a `Decision` back to the
/// harness's expected stdout format.
/// A PostToolUse hook payload -- tool already executed, output available for
/// inspection/redaction.
#[derive(Debug, Clone)]
pub struct PostToolUsePayload {
/// Harness's tool name (e.g. "Read", "Bash").
pub tool_name: String,
/// The tool input that was originally provided.
pub tool_input: serde_json::Value,
/// The tool's output text that may contain secrets.
pub tool_response: String,
/// Original raw payload.
pub raw: serde_json::Value,
}
/// Hook event type discriminator.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum HookEvent {
PreToolUse,
PostToolUse,
}
impl HookEvent {
/// Parse a hook event name from CLI or payload strings.
///
/// Accepts both kebab-case (`pre-tool-use`) and PascalCase (`PreToolUse`).
pub fn parse(s: &str) -> Option<Self> {
match s {
"pre-tool-use" | "PreToolUse" => Some(Self::PreToolUse),
"post-tool-use" | "PostToolUse" => Some(Self::PostToolUse),
_ => None,
}
}
}
/// Trait implemented by each harness adapter.
///
/// Adapters handle both PreToolUse (policy gate) and PostToolUse (output
/// redaction) hook events.
pub trait HarnessAdapter {
/// The CLI name (e.g. "claude", "codex", "gemini").
fn name(&self) -> &'static str;
// -- PreToolUse --------------------------------------------------------
/// Parse a PreToolUse hook payload into a normalized `ToolCall`.
fn parse_request(&self, input: &[u8]) -> Result<ToolCall, AdapterError>;
/// Render a policy `Decision` back to the harness's PreToolUse wire format.
fn render_decision(&self, call: &ToolCall, decision: &Decision) -> Result<Vec<u8>, AdapterError>;
// -- PostToolUse -------------------------------------------------------
/// Parse a PostToolUse hook payload (tool name, input, response).
fn parse_post_tool_use(&self, input: &[u8]) -> Result<PostToolUsePayload, AdapterError>;
/// Render a PostToolUse response. `redacted_output` is the (possibly
/// modified) tool output to send back to the harness.
fn render_post_tool_use(
&self,
payload: &PostToolUsePayload,
redacted_output: Option<&str>,
) -> Result<Vec<u8>, AdapterError>;
}
#[cfg(feature = "harness-claude")]
+298
View File
@@ -0,0 +1,298 @@
use assert_cmd::Command;
use std::fs;
/// Helper: create a temp project directory with a `.botsecrets` config and
/// a `.env` file containing the given secrets.
fn setup_project(
env_content: &str,
botsecrets_content: Option<&str>,
) -> tempfile::TempDir {
let tmp = tempfile::tempdir().unwrap();
// .env with test secrets
fs::write(tmp.path().join(".env"), env_content).unwrap();
// .botsecrets config (use default if not specified)
let botsecrets = botsecrets_content.unwrap_or(
r#"
[files]
patterns = [".env"]
"#,
);
fs::write(tmp.path().join(".botsecrets"), botsecrets).unwrap();
// .botignore (empty — required for project root detection)
fs::write(tmp.path().join(".botignore"), "").unwrap();
tmp
}
#[test]
fn post_tool_use_redacts_known_secret() {
let tmp = setup_project("DB_PASSWORD=supersecret123\n", None);
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": "/some/file.txt" },
"tool_response": "DB_HOST=localhost\nDB_PASSWORD=supersecret123\nDB_PORT=5432"
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
.current_dir(tmp.path())
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
let updated = v["hookSpecificOutput"]["updatedToolOutput"]
.as_str()
.expect("expected updatedToolOutput");
assert!(
updated.contains("*****"),
"expected masked secret, got: {updated}"
);
assert!(
!updated.contains("supersecret123"),
"secret should be redacted, got: {updated}"
);
assert!(
updated.contains("DB_HOST=localhost"),
"non-secret lines should be preserved, got: {updated}"
);
assert!(
updated.contains("DB_PORT=5432"),
"non-secret lines should be preserved, got: {updated}"
);
}
#[test]
fn post_tool_use_no_secrets_passthrough() {
let tmp = setup_project("DB_PASSWORD=supersecret123\n", None);
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": "/some/file.txt" },
"tool_response": "Hello, world! This text has no secrets."
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
.current_dir(tmp.path())
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
// Empty JSON object means "no changes".
assert_eq!(v, serde_json::json!({}), "expected empty JSON for passthrough");
}
#[test]
fn post_tool_use_empty_response_passthrough() {
let tmp = setup_project("DB_PASSWORD=supersecret123\n", None);
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": "/some/file.txt" },
"tool_response": ""
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
.current_dir(tmp.path())
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v, serde_json::json!({}));
}
#[test]
fn post_tool_use_heuristic_enforce_appends_warning() {
// Use a config with heuristic in enforce mode (the default).
let botsecrets = r#"
[files]
patterns = [".env"]
[heuristic]
enabled = true
mode = "enforce"
"#;
let tmp = setup_project("UNRELATED_KEY=foo\n", Some(botsecrets));
// Include something that looks like a GitHub PAT (classic) in the response.
// Pattern requires `ghp_` followed by exactly 36 alphanumeric chars.
let payload = serde_json::json!({
"tool_name": "Bash",
"tool_input": { "command": "cat output.log" },
"tool_response": "deploy log: token ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij used"
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
.current_dir(tmp.path())
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
let updated = v["hookSpecificOutput"]["updatedToolOutput"]
.as_str()
.expect("expected updatedToolOutput with heuristic warning");
assert!(
updated.contains("[fermata] WARNING"),
"expected heuristic warning, got: {updated}"
);
}
#[test]
fn pre_tool_use_backward_compat_default_event() {
// `--event` defaults to pre-tool-use; existing `--harness claude` still works.
let tmp = tempfile::tempdir().unwrap();
fs::write(tmp.path().join(".botignore"), ".env\n").unwrap();
let target = tmp.path().join(".env");
fs::write(&target, "").unwrap();
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": target.to_str().unwrap() }
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--harness", "claude"])
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v["hookSpecificOutput"]["permissionDecision"], "deny");
}
#[test]
fn pre_tool_use_explicit_event_flag() {
// Explicitly passing `--event pre-tool-use` works identically.
let tmp = tempfile::tempdir().unwrap();
fs::write(tmp.path().join(".botignore"), ".env\n").unwrap();
let target = tmp.path().join("safe.txt");
fs::write(&target, "").unwrap();
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": target.to_str().unwrap() }
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "pre-tool-use", "--harness", "claude"])
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v["hookSpecificOutput"]["permissionDecision"], "allow");
}
#[test]
fn unknown_event_exits_2() {
Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "nonsense", "--harness", "claude"])
.write_stdin("{}")
.assert()
.code(2);
}
#[test]
fn post_tool_use_no_project_root_passthrough() {
// When run in a directory with no .botignore / .botsecrets,
// PostToolUse should fail-open with `{}`.
let tmp = tempfile::tempdir().unwrap();
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": "/some/file.txt" },
"tool_response": "DB_PASSWORD=supersecret123"
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
.current_dir(tmp.path())
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v, serde_json::json!({}));
}
#[test]
fn post_tool_use_multiple_secrets_redacted() {
let tmp = setup_project(
"DB_PASSWORD=supersecret123\nAPI_KEY=my-api-key-abc\n",
None,
);
let payload = serde_json::json!({
"tool_name": "Read",
"tool_input": { "file_path": "/some/config" },
"tool_response": "config: password=supersecret123, key=my-api-key-abc, host=localhost"
})
.to_string();
let out = Command::cargo_bin("fermata")
.unwrap()
.args(["hook", "--event", "post-tool-use", "--harness", "claude"])
.current_dir(tmp.path())
.write_stdin(payload)
.assert()
.success()
.get_output()
.stdout
.clone();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
let updated = v["hookSpecificOutput"]["updatedToolOutput"]
.as_str()
.expect("expected updatedToolOutput");
assert!(!updated.contains("supersecret123"), "first secret should be redacted");
assert!(!updated.contains("my-api-key-abc"), "second secret should be redacted");
assert!(updated.contains("host=localhost"), "non-secret should be preserved");
}
+388
View File
@@ -0,0 +1,388 @@
use dirigent_fermata::core::secrets::config::{
EnforcementMode, HeuristicMode, ParseErrorAction, RedactionStyle, SecretsConfig,
BUILTIN_KEY_PATTERNS,
};
#[test]
fn parse_minimal_files_only() {
let cfg = SecretsConfig::from_toml(
r#"
[files]
patterns = [".env", ".env.*"]
"#,
)
.unwrap();
assert_eq!(cfg.files.patterns, vec![".env", ".env.*"]);
// Other sections use defaults
assert_eq!(cfg.redaction.style, RedactionStyle::Masked);
assert_eq!(cfg.enforcement.mode, EnforcementMode::Permissive);
}
#[test]
fn parse_full_config() {
let cfg = SecretsConfig::from_toml(
r#"
[files]
patterns = [".env", "secrets.*"]
[keys]
include = ["STRIPE_*", "TWILIO_*"]
exclude = ["PUBLIC_KEY", "SSH_KEY_PATH"]
[redaction]
style = "typed"
[heuristic]
enabled = false
mode = "report"
patterns = ['AKIA[A-Z2-7]{16}']
[enforcement]
mode = "strict"
on_parse_error = "deny"
[[file]]
path = "settings.py"
format = "python-assignments"
keys = ["SECRET_KEY", "DATABASES.*.PASSWORD"]
"#,
)
.unwrap();
assert_eq!(cfg.files.patterns, vec![".env", "secrets.*"]);
assert_eq!(cfg.keys.include, vec!["STRIPE_*", "TWILIO_*"]);
assert_eq!(cfg.keys.exclude, vec!["PUBLIC_KEY", "SSH_KEY_PATH"]);
assert_eq!(cfg.redaction.style, RedactionStyle::Typed);
assert!(!cfg.heuristic.enabled);
assert_eq!(cfg.heuristic.mode, HeuristicMode::Report);
assert_eq!(cfg.heuristic.patterns, vec!["AKIA[A-Z2-7]{16}"]);
assert_eq!(cfg.enforcement.mode, EnforcementMode::Strict);
assert_eq!(cfg.enforcement.on_parse_error, ParseErrorAction::Deny);
assert_eq!(cfg.file_overrides.len(), 1);
assert_eq!(cfg.file_overrides[0].path, "settings.py");
assert_eq!(
cfg.file_overrides[0].format.as_deref(),
Some("python-assignments")
);
assert_eq!(
cfg.file_overrides[0].keys,
vec!["SECRET_KEY", "DATABASES.*.PASSWORD"]
);
}
#[test]
fn empty_toml_returns_defaults() {
let cfg = SecretsConfig::from_toml("").unwrap();
assert!(!cfg.files.patterns.is_empty());
assert!(cfg.files.patterns.contains(&".env".to_string()));
assert_eq!(cfg.redaction.style, RedactionStyle::Masked);
assert!(cfg.heuristic.enabled);
assert_eq!(cfg.heuristic.mode, HeuristicMode::Enforce);
assert_eq!(cfg.enforcement.mode, EnforcementMode::Permissive);
assert_eq!(
cfg.enforcement.on_parse_error,
ParseErrorAction::MaskEntireFile
);
assert!(cfg.file_overrides.is_empty());
}
#[test]
fn invalid_toml_produces_error() {
let result = SecretsConfig::from_toml("this is not valid {{ toml");
assert!(result.is_err());
let err_msg = result.unwrap_err().to_string();
assert!(
err_msg.contains("expected"),
"error should describe parse issue: {err_msg}"
);
}
#[test]
fn effective_key_includes_has_builtins() {
let cfg = SecretsConfig::default();
let effective = cfg.effective_key_includes();
for builtin in BUILTIN_KEY_PATTERNS {
assert!(
effective.contains(&builtin.to_string()),
"missing builtin: {builtin}"
);
}
}
#[test]
fn effective_key_includes_adds_user_patterns() {
let cfg = SecretsConfig::from_toml(
r#"
[keys]
include = ["MY_CUSTOM_SECRET_*"]
"#,
)
.unwrap();
let effective = cfg.effective_key_includes();
assert!(effective.contains(&"MY_CUSTOM_SECRET_*".to_string()));
// Builtins still present
assert!(effective.contains(&"*PASSWORD*".to_string()));
}
#[test]
fn effective_key_includes_removes_excluded() {
let cfg = SecretsConfig::from_toml(
r#"
[keys]
exclude = ["*TOKEN*", "SENTRY_DSN"]
"#,
)
.unwrap();
let effective = cfg.effective_key_includes();
assert!(
!effective.contains(&"*TOKEN*".to_string()),
"excluded pattern should be removed"
);
assert!(
!effective.contains(&"SENTRY_DSN".to_string()),
"excluded pattern should be removed"
);
// Other builtins still present
assert!(effective.contains(&"*PASSWORD*".to_string()));
}
#[test]
fn key_matches_glob_case_insensitive() {
let cfg = SecretsConfig::default();
assert!(cfg.key_matches("DATABASE_URL"));
assert!(cfg.key_matches("database_url"));
assert!(cfg.key_matches("my_password_here"));
assert!(cfg.key_matches("MY_PASSWORD_HERE"));
assert!(cfg.key_matches("STRIPE_SECRET_KEY"));
assert!(cfg.key_matches("AWS_ACCESS_KEY_ID"));
}
#[test]
fn key_matches_non_secret_keys() {
let cfg = SecretsConfig::default();
assert!(!cfg.key_matches("DEBUG"));
assert!(!cfg.key_matches("LOG_LEVEL"));
assert!(!cfg.key_matches("PORT"));
assert!(!cfg.key_matches("HOST"));
}
#[test]
fn key_matches_respects_user_include() {
let cfg = SecretsConfig::from_toml(
r#"
[keys]
include = ["MY_APP_*"]
"#,
)
.unwrap();
assert!(cfg.key_matches("MY_APP_SETTING"));
assert!(cfg.key_matches("my_app_setting"));
}
#[test]
fn key_matches_respects_user_exclude() {
let cfg = SecretsConfig::from_toml(
r#"
[keys]
exclude = ["*TOKEN*"]
"#,
)
.unwrap();
// TOKEN patterns were excluded, so GITHUB_TOKEN should no longer match
// via the *TOKEN* pattern. But it might match via GITHUB_TOKEN literal.
// Let's check something that only matched *TOKEN*.
assert!(!cfg.key_matches("MY_TOKEN"));
// PASSWORD still matches
assert!(cfg.key_matches("MY_PASSWORD"));
}
#[test]
fn builtin_file_patterns_present() {
let cfg = SecretsConfig::default();
let patterns = &cfg.files.patterns;
assert!(patterns.contains(&".env".to_string()));
assert!(patterns.contains(&"*.pem".to_string()));
assert!(patterns.contains(&".aws/credentials".to_string()));
assert!(patterns.contains(&"terraform.tfvars".to_string()));
}
#[test]
fn load_missing_files_returns_defaults() {
let tmp = tempfile::tempdir().unwrap();
let cfg = SecretsConfig::load(tmp.path()).unwrap();
assert_eq!(cfg.files.patterns, SecretsConfig::default().files.patterns);
assert_eq!(cfg.redaction.style, RedactionStyle::Masked);
}
#[test]
fn load_project_botsecrets() {
let tmp = tempfile::tempdir().unwrap();
std::fs::write(
tmp.path().join(".botsecrets"),
r#"
[redaction]
style = "named"
[keys]
include = ["CUSTOM_*"]
"#,
)
.unwrap();
let cfg = SecretsConfig::load(tmp.path()).unwrap();
assert_eq!(cfg.redaction.style, RedactionStyle::Named);
assert!(cfg.effective_key_includes().contains(&"CUSTOM_*".to_string()));
// File patterns remain at defaults (not overridden)
assert!(cfg.files.patterns.contains(&".env".to_string()));
}
#[test]
fn load_local_overrides_project() {
let tmp = tempfile::tempdir().unwrap();
std::fs::write(
tmp.path().join(".botsecrets"),
r#"
[redaction]
style = "named"
[enforcement]
mode = "strict"
"#,
)
.unwrap();
std::fs::write(
tmp.path().join(".botsecrets.local"),
r#"
[redaction]
style = "absent"
"#,
)
.unwrap();
let cfg = SecretsConfig::load(tmp.path()).unwrap();
// .local overrides .botsecrets for redaction style
assert_eq!(cfg.redaction.style, RedactionStyle::Absent);
// enforcement from .botsecrets is preserved (not in .local)
assert_eq!(cfg.enforcement.mode, EnforcementMode::Strict);
}
#[test]
fn load_invalid_botsecrets_returns_error() {
let tmp = tempfile::tempdir().unwrap();
std::fs::write(tmp.path().join(".botsecrets"), "invalid {{ toml").unwrap();
let result = SecretsConfig::load(tmp.path());
assert!(result.is_err());
let err = result.unwrap_err().to_string();
assert!(err.contains(".botsecrets"), "error should mention file: {err}");
}
#[test]
fn merge_keys_accumulate() {
let tmp = tempfile::tempdir().unwrap();
std::fs::write(
tmp.path().join(".botsecrets"),
r#"
[keys]
include = ["FROM_PROJECT"]
exclude = ["EXCLUDE_PROJECT"]
"#,
)
.unwrap();
std::fs::write(
tmp.path().join(".botsecrets.local"),
r#"
[keys]
include = ["FROM_LOCAL"]
exclude = ["EXCLUDE_LOCAL"]
"#,
)
.unwrap();
let cfg = SecretsConfig::load(tmp.path()).unwrap();
assert!(cfg.keys.include.contains(&"FROM_PROJECT".to_string()));
assert!(cfg.keys.include.contains(&"FROM_LOCAL".to_string()));
assert!(cfg.keys.exclude.contains(&"EXCLUDE_PROJECT".to_string()));
assert!(cfg.keys.exclude.contains(&"EXCLUDE_LOCAL".to_string()));
}
#[test]
fn merge_file_patterns_replaced_not_appended() {
let tmp = tempfile::tempdir().unwrap();
std::fs::write(
tmp.path().join(".botsecrets"),
r#"
[files]
patterns = ["only-this.env"]
"#,
)
.unwrap();
let cfg = SecretsConfig::load(tmp.path()).unwrap();
assert_eq!(cfg.files.patterns, vec!["only-this.env"]);
// Defaults should be gone, replaced by the project's list
assert!(!cfg.files.patterns.contains(&".env".to_string()));
}
#[test]
fn all_redaction_styles_parse() {
for (input, expected) in [
("masked", RedactionStyle::Masked),
("typed", RedactionStyle::Typed),
("named", RedactionStyle::Named),
("absent", RedactionStyle::Absent),
] {
let toml_str = format!("[redaction]\nstyle = \"{input}\"");
let cfg = SecretsConfig::from_toml(&toml_str).unwrap();
assert_eq!(cfg.redaction.style, expected, "failed for: {input}");
}
}
#[test]
fn all_enforcement_modes_parse() {
for (input, expected) in [
("strict", EnforcementMode::Strict),
("permissive", EnforcementMode::Permissive),
("audit", EnforcementMode::Audit),
] {
let toml_str = format!("[enforcement]\nmode = \"{input}\"");
let cfg = SecretsConfig::from_toml(&toml_str).unwrap();
assert_eq!(cfg.enforcement.mode, expected, "failed for: {input}");
}
}
#[test]
fn all_heuristic_modes_parse() {
for (input, expected) in [
("enforce", HeuristicMode::Enforce),
("report", HeuristicMode::Report),
("disabled", HeuristicMode::Disabled),
] {
let toml_str = format!("[heuristic]\nmode = \"{input}\"");
let cfg = SecretsConfig::from_toml(&toml_str).unwrap();
assert_eq!(cfg.heuristic.mode, expected, "failed for: {input}");
}
}
#[test]
fn serialization_roundtrip() {
let cfg = SecretsConfig::from_toml(
r#"
[files]
patterns = [".env"]
[redaction]
style = "typed"
[enforcement]
mode = "audit"
on_parse_error = "allow"
"#,
)
.unwrap();
let serialized = toml::to_string(&cfg).unwrap();
let deserialized: SecretsConfig = toml::from_str(&serialized).unwrap();
assert_eq!(deserialized.redaction.style, RedactionStyle::Typed);
assert_eq!(deserialized.enforcement.mode, EnforcementMode::Audit);
assert_eq!(
deserialized.enforcement.on_parse_error,
ParseErrorAction::Allow
);
}
+307
View File
@@ -0,0 +1,307 @@
//! Integration tests for `secrets::manifest` — the manifest loader that
//! discovers secret files, parses them, and builds the known-secrets set.
use std::fs;
use dirigent_fermata::core::secrets::config::SecretsConfig;
use dirigent_fermata::core::secrets::manifest::Manifest;
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
/// Create a minimal config that only discovers `.env*` files and matches
/// common secret key patterns (the defaults).
fn default_config() -> SecretsConfig {
SecretsConfig::default()
}
/// Create a config from TOML.
fn config_from_toml(toml: &str) -> SecretsConfig {
SecretsConfig::from_toml(toml).expect("valid TOML config")
}
// ---------------------------------------------------------------------------
// Tests
// ---------------------------------------------------------------------------
#[test]
fn discovers_env_file_and_extracts_matching_secrets() {
let dir = tempfile::tempdir().unwrap();
fs::write(
dir.path().join(".env"),
"DATABASE_URL=postgres://localhost/db\nAPP_NAME=myapp\nSECRET_KEY=super-secret-value-1234\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
// DATABASE_URL and SECRET_KEY match the default key patterns; APP_NAME does not.
assert!(!manifest.is_empty());
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
assert!(keys.contains(&"DATABASE_URL"), "expected DATABASE_URL, got {keys:?}");
assert!(keys.contains(&"SECRET_KEY"), "expected SECRET_KEY, got {keys:?}");
assert!(!keys.contains(&"APP_NAME"), "APP_NAME should be filtered out");
}
#[test]
fn discovers_nested_env_local_file() {
let dir = tempfile::tempdir().unwrap();
let nested = dir.path().join("services").join("auth");
fs::create_dir_all(&nested).unwrap();
fs::write(
nested.join(".env.local"),
"AUTH_TOKEN=tok_abcdefgh12345678\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
assert!(!manifest.is_empty());
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
assert!(keys.contains(&"AUTH_TOKEN"), "expected AUTH_TOKEN, got {keys:?}");
}
#[test]
fn filters_entries_by_key_patterns() {
let dir = tempfile::tempdir().unwrap();
fs::write(
dir.path().join(".env"),
"MY_PASSWORD=hunter2hunter2\nNOT_SENSITIVE=hello-world-1234\nAPI_KEY=abcdef1234567890\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
assert!(keys.contains(&"MY_PASSWORD"));
assert!(keys.contains(&"API_KEY"));
assert!(!keys.contains(&"NOT_SENSITIVE"));
}
#[test]
fn file_override_with_explicit_format_and_key_filter() {
let dir = tempfile::tempdir().unwrap();
// Write a file that wouldn't normally be discovered by default patterns.
fs::write(
dir.path().join("custom_secrets.conf"),
"SERVICE_TOKEN=long-token-value-here\nDEBUG=true-ish-thing\n",
)
.unwrap();
let config = config_from_toml(
r#"
[files]
patterns = []
[[file]]
path = "custom_secrets.conf"
format = "env"
keys = ["SERVICE_TOKEN"]
"#,
);
let manifest = Manifest::build(&config, dir.path()).unwrap();
assert_eq!(manifest.len(), 1);
assert_eq!(manifest.entries()[0].key, "SERVICE_TOKEN");
assert_eq!(manifest.entries()[0].value, "long-token-value-here");
}
#[test]
fn empty_project_yields_empty_manifest() {
let dir = tempfile::tempdir().unwrap();
// No files at all.
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
assert!(manifest.is_empty());
assert_eq!(manifest.len(), 0);
}
#[test]
fn entries_sorted_by_value_length_descending() {
let dir = tempfile::tempdir().unwrap();
fs::write(
dir.path().join(".env"),
// Deliberately out of order by length.
"TOKEN_A=short1234\nTOKEN_B=a-much-longer-secret-value-here\nTOKEN_C=medium-value1\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
let lengths: Vec<usize> = manifest.entries().iter().map(|e| e.value.len()).collect();
for window in lengths.windows(2) {
assert!(
window[0] >= window[1],
"entries not sorted by value length descending: {lengths:?}"
);
}
}
#[test]
fn short_values_filtered_out() {
let dir = tempfile::tempdir().unwrap();
fs::write(
dir.path().join(".env"),
"PASSWORD_TINY=yes\nPASSWORD_OK=long-enough-password\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
// "yes" is 3 chars, below the 4-char minimum.
assert!(!keys.contains(&"PASSWORD_TINY"), "short value should be filtered");
assert!(keys.contains(&"PASSWORD_OK"));
}
#[test]
fn deduplication_of_same_key_value() {
let dir = tempfile::tempdir().unwrap();
// Same secret appears in two different .env files.
fs::write(
dir.path().join(".env"),
"SECRET_KEY=shared-secret-value-12345\n",
)
.unwrap();
let sub = dir.path().join("sub");
fs::create_dir(&sub).unwrap();
fs::write(sub.join(".env"), "SECRET_KEY=shared-secret-value-12345\n").unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
// Should be deduplicated to a single entry.
let matching: Vec<_> = manifest
.entries()
.iter()
.filter(|e| e.key == "SECRET_KEY")
.collect();
assert_eq!(
matching.len(),
1,
"duplicate entries should be collapsed: found {}",
matching.len()
);
}
#[test]
fn unparseable_file_with_allow_is_skipped() {
let dir = tempfile::tempdir().unwrap();
// Write a file that looks like an env file but contains garbage TOML.
// Actually, .env parser is lenient, so let's use a .toml extension
// with invalid TOML content to trigger a parse error.
let secrets_dir = dir.path();
fs::write(secrets_dir.join("secrets.toml"), "this is not valid toml {{{\n").unwrap();
// Also write a valid .env so we can confirm it still works.
fs::write(
secrets_dir.join(".env"),
"API_KEY=valid-secret-12345678\n",
)
.unwrap();
let config = config_from_toml(
r#"
[enforcement]
on_parse_error = "allow"
"#,
);
let manifest = Manifest::build(&config, secrets_dir).unwrap();
// The broken secrets.toml is skipped; .env is still processed.
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
assert!(keys.contains(&"API_KEY"));
}
#[test]
fn unparseable_file_with_deny_returns_error() {
let dir = tempfile::tempdir().unwrap();
fs::write(dir.path().join("secrets.toml"), "not valid toml {{{\n").unwrap();
let config = config_from_toml(
r#"
[enforcement]
on_parse_error = "deny"
"#,
);
let result = Manifest::build(&config, dir.path());
assert!(result.is_err(), "deny mode should propagate parse errors");
}
#[test]
fn manifest_empty_and_is_empty() {
let m = Manifest::empty();
assert!(m.is_empty());
assert_eq!(m.len(), 0);
assert!(m.entries().is_empty());
}
#[test]
fn skips_git_and_node_modules_directories() {
let dir = tempfile::tempdir().unwrap();
// .env inside .git should be skipped.
let git_dir = dir.path().join(".git");
fs::create_dir(&git_dir).unwrap();
fs::write(git_dir.join(".env"), "SECRET_KEY=git-secret-12345\n").unwrap();
// .env inside node_modules should be skipped.
let nm_dir = dir.path().join("node_modules").join("pkg");
fs::create_dir_all(&nm_dir).unwrap();
fs::write(nm_dir.join(".env"), "TOKEN=nm-token-12345678\n").unwrap();
// .env at root should be found.
fs::write(
dir.path().join(".env"),
"API_KEY=root-api-key-12345\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
let values: Vec<&str> = manifest.entries().iter().map(|e| e.value.as_str()).collect();
assert!(
values.contains(&"root-api-key-12345"),
"root .env should be found"
);
assert!(
!values.contains(&"git-secret-12345"),
".git/.env should be skipped"
);
assert!(
!values.contains(&"nm-token-12345678"),
"node_modules/.env should be skipped"
);
}
#[test]
fn opaque_file_formats_are_skipped_gracefully() {
let dir = tempfile::tempdir().unwrap();
// .pem and .key files match default patterns but have no parseable format.
fs::write(dir.path().join("server.key"), "binary-ish key data here\n").unwrap();
fs::write(
dir.path().join(".env"),
"PASSWORD=parseable-secret-12345\n",
)
.unwrap();
let config = default_config();
let manifest = Manifest::build(&config, dir.path()).unwrap();
// Should not error, should still find the .env entry.
let keys: Vec<&str> = manifest.entries().iter().map(|e| e.key.as_str()).collect();
assert!(keys.contains(&"PASSWORD"));
}
+404
View File
@@ -0,0 +1,404 @@
//! Integration tests for the multi-format secret file parser.
use dirigent_fermata::core::secrets::parser::{
parse_content, parse_secret_file, FileFormat, SecretEntry,
};
use std::path::Path;
use tempfile::NamedTempFile;
fn p(s: &str) -> &Path {
Path::new(s)
}
// ---------------------------------------------------------------------------
// .env parsing
// ---------------------------------------------------------------------------
#[test]
fn env_basic_key_value() {
let entries = parse_content("DATABASE_URL=postgres://localhost/db", FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "DATABASE_URL");
assert_eq!(entries[0].value, "postgres://localhost/db");
}
#[test]
fn env_double_quoted() {
let entries = parse_content(r#"SECRET="hello world""#, FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries[0].value, "hello world");
}
#[test]
fn env_single_quoted() {
let entries = parse_content("SECRET='hello world'", FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries[0].value, "hello world");
}
#[test]
fn env_comments_and_empty_lines() {
let content = "# comment\n\nKEY=value\n # indented comment\n";
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "KEY");
}
#[test]
fn env_export_prefix() {
let content = "export API_KEY=abc123\nexport TOKEN=\"xyz\"";
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries.len(), 2);
assert_eq!(entries[0].key, "API_KEY");
assert_eq!(entries[0].value, "abc123");
assert_eq!(entries[1].key, "TOKEN");
assert_eq!(entries[1].value, "xyz");
}
#[test]
fn env_whitespace_handling() {
let content = " KEY = value \nKEY2= spaced ";
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
// Key is trimmed; unquoted value trimmed.
assert_eq!(entries[0].key, "KEY");
assert_eq!(entries[0].value, "value");
assert_eq!(entries[1].key, "KEY2");
assert_eq!(entries[1].value, "spaced");
}
#[test]
fn env_escape_sequences_in_double_quotes() {
let content = r#"MSG="line1\nline2""#;
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries[0].value, "line1\nline2");
}
// ---------------------------------------------------------------------------
// TOML parsing
// ---------------------------------------------------------------------------
#[test]
fn toml_flat_table() {
let content = r#"
API_KEY = "abc"
DB_PASS = "secret"
"#;
let entries = parse_content(content, FileFormat::Toml, p("Secrets.toml")).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "API_KEY" && e.value == "abc"));
assert!(entries.iter().any(|e| e.key == "DB_PASS" && e.value == "secret"));
}
#[test]
fn toml_nested_tables() {
let content = r#"
[database]
password = "secret"
host = "localhost"
port = 5432
"#;
let entries = parse_content(content, FileFormat::Toml, p("config.toml")).unwrap();
// Only string values extracted; port (integer) skipped.
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "database.password" && e.value == "secret"));
assert!(entries.iter().any(|e| e.key == "database.host" && e.value == "localhost"));
}
#[test]
fn toml_mixed_types_only_strings() {
let content = r#"
name = "app"
debug = true
count = 42
ratio = 3.14
"#;
let entries = parse_content(content, FileFormat::Toml, p("app.toml")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "name");
}
// ---------------------------------------------------------------------------
// JSON parsing
// ---------------------------------------------------------------------------
#[test]
fn json_flat_object() {
let content = r#"{"api_key": "abc", "secret": "xyz"}"#;
let entries = parse_content(content, FileFormat::Json, p("secrets.json")).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "api_key" && e.value == "abc"));
}
#[test]
fn json_nested_objects() {
let content = r#"{"db": {"password": "foo", "port": 5432}}"#;
let entries = parse_content(content, FileFormat::Json, p("secrets.json")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "db.password");
assert_eq!(entries[0].value, "foo");
}
#[test]
fn json_arrays() {
let content = r#"{"keys": ["a", "b"]}"#;
let entries = parse_content(content, FileFormat::Json, p("secrets.json")).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "keys.0" && e.value == "a"));
assert!(entries.iter().any(|e| e.key == "keys.1" && e.value == "b"));
}
#[test]
fn json_mixed_types() {
let content = r#"{"name": "app", "count": 42, "active": true, "data": null}"#;
let entries = parse_content(content, FileFormat::Json, p("a.json")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "name");
}
// ---------------------------------------------------------------------------
// YAML parsing
// ---------------------------------------------------------------------------
#[test]
fn yaml_flat_map() {
let content = "api_key: abc\nsecret: xyz\n";
let entries = parse_content(content, FileFormat::Yaml, p("secrets.yaml")).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "api_key" && e.value == "abc"));
}
#[test]
fn yaml_nested_maps() {
let content = "db:\n password: foo\n port: 5432\n";
let entries = parse_content(content, FileFormat::Yaml, p("secrets.yml")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "db.password");
assert_eq!(entries[0].value, "foo");
}
#[test]
fn yaml_mixed_types() {
let content = "name: app\ncount: 42\nactive: true\n";
let entries = parse_content(content, FileFormat::Yaml, p("a.yaml")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "name");
}
// ---------------------------------------------------------------------------
// Python assignment parsing
// ---------------------------------------------------------------------------
#[test]
fn python_matches_assignments() {
let content = r#"
API_KEY = "abc123"
DB_PASS = 'secret'
import os
x = 42
"#;
let entries = parse_content(content, FileFormat::PythonAssignments, p("settings.py")).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "API_KEY" && e.value == "abc123"));
assert!(entries.iter().any(|e| e.key == "DB_PASS" && e.value == "secret"));
}
#[test]
fn python_skips_non_matching() {
let content = "result = some_function()\nfor x in range(10):\n pass\n";
let entries = parse_content(content, FileFormat::PythonAssignments, p("a.py")).unwrap();
assert!(entries.is_empty());
}
// ---------------------------------------------------------------------------
// Properties parsing
// ---------------------------------------------------------------------------
#[test]
fn properties_equals_separator() {
let content = "db.password=secret\ndb.host=localhost";
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "db.password" && e.value == "secret"));
}
#[test]
fn properties_colon_separator() {
let content = "db.password: secret";
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
assert_eq!(entries[0].key, "db.password");
assert_eq!(entries[0].value, "secret");
}
#[test]
fn properties_comments() {
let content = "# comment\n! also comment\nkey=value";
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "key");
}
#[test]
fn properties_continuation_lines() {
let content = "long.value=hello \\\n world";
let entries = parse_content(content, FileFormat::Properties, p("app.properties")).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "long.value");
assert_eq!(entries[0].value, "hello world");
}
// ---------------------------------------------------------------------------
// Auto-detection from file extension
// ---------------------------------------------------------------------------
#[test]
fn format_from_path_env_variants() {
assert_eq!(FileFormat::from_path(p(".env")), Some(FileFormat::Env));
assert_eq!(FileFormat::from_path(p(".env.local")), Some(FileFormat::Env));
assert_eq!(FileFormat::from_path(p(".env.production")), Some(FileFormat::Env));
assert_eq!(FileFormat::from_path(p("staging.env")), Some(FileFormat::Env));
}
#[test]
fn format_from_path_extensions() {
assert_eq!(FileFormat::from_path(p("a.toml")), Some(FileFormat::Toml));
assert_eq!(FileFormat::from_path(p("a.json")), Some(FileFormat::Json));
assert_eq!(FileFormat::from_path(p("a.yaml")), Some(FileFormat::Yaml));
assert_eq!(FileFormat::from_path(p("a.yml")), Some(FileFormat::Yaml));
assert_eq!(FileFormat::from_path(p("a.py")), Some(FileFormat::PythonAssignments));
assert_eq!(FileFormat::from_path(p("a.properties")), Some(FileFormat::Properties));
}
#[test]
fn format_from_path_unknown() {
assert_eq!(FileFormat::from_path(p("a.key")), None);
assert_eq!(FileFormat::from_path(p("a.pem")), None);
}
// ---------------------------------------------------------------------------
// Format hints
// ---------------------------------------------------------------------------
#[test]
fn format_from_hint() {
assert_eq!(FileFormat::from_hint("env"), Some(FileFormat::Env));
assert_eq!(FileFormat::from_hint("dotenv"), Some(FileFormat::Env));
assert_eq!(FileFormat::from_hint("toml"), Some(FileFormat::Toml));
assert_eq!(FileFormat::from_hint("json"), Some(FileFormat::Json));
assert_eq!(FileFormat::from_hint("yaml"), Some(FileFormat::Yaml));
assert_eq!(FileFormat::from_hint("yml"), Some(FileFormat::Yaml));
assert_eq!(FileFormat::from_hint("python-assignments"), Some(FileFormat::PythonAssignments));
assert_eq!(FileFormat::from_hint("python"), Some(FileFormat::PythonAssignments));
assert_eq!(FileFormat::from_hint("properties"), Some(FileFormat::Properties));
assert_eq!(FileFormat::from_hint("java-properties"), Some(FileFormat::Properties));
assert_eq!(FileFormat::from_hint("unknown"), None);
}
// ---------------------------------------------------------------------------
// Key filtering
// ---------------------------------------------------------------------------
#[test]
fn filter_by_glob() {
let content = "API_KEY=abc\nDB_HOST=localhost\nDB_PASSWORD=secret\n";
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
assert_eq!(entries.len(), 3);
let filter = vec!["*PASSWORD*".to_string(), "*API_KEY*".to_string()];
let result = parse_secret_file_with_filter(content, &filter);
assert_eq!(result.len(), 2);
assert!(result.iter().any(|e| e.key == "API_KEY"));
assert!(result.iter().any(|e| e.key == "DB_PASSWORD"));
}
/// Helper that parses env content with a key filter (avoids temp files).
fn parse_secret_file_with_filter(content: &str, filter: &[String]) -> Vec<SecretEntry> {
let entries = parse_content(content, FileFormat::Env, p(".env")).unwrap();
// Re-implement the filter logic for testing without disk I/O.
use dirigent_fermata::core::secrets::parser::parse_content as pc;
let all = pc(content, FileFormat::Env, p(".env")).unwrap();
// Apply filter manually using the same approach as parse_secret_file.
let matchers: Vec<_> = filter
.iter()
.filter_map(|p| {
globset::Glob::new(&p.to_ascii_uppercase())
.ok()
.map(|g| g.compile_matcher())
})
.collect();
all.into_iter()
.filter(|entry| {
let upper = entry.key.to_ascii_uppercase();
matchers.iter().any(|m| m.is_match(&upper))
})
.collect()
}
// ---------------------------------------------------------------------------
// Error on unrecognised format
// ---------------------------------------------------------------------------
#[test]
fn error_on_unknown_format() {
use std::io::Write;
let mut tmp = NamedTempFile::with_suffix(".xyz").unwrap();
write!(tmp, "KEY=value").unwrap();
let result = parse_secret_file(tmp.path(), None, None);
assert!(result.is_err());
let err = result.unwrap_err().to_string();
assert!(err.contains("cannot determine file format"));
}
// ---------------------------------------------------------------------------
// Empty file
// ---------------------------------------------------------------------------
#[test]
fn empty_file_produces_empty_vec() {
let entries = parse_content("", FileFormat::Env, p(".env")).unwrap();
assert!(entries.is_empty());
let entries = parse_content("{}", FileFormat::Json, p("a.json")).unwrap();
assert!(entries.is_empty());
let entries = parse_content("", FileFormat::Toml, p("a.toml")).unwrap();
assert!(entries.is_empty());
}
// ---------------------------------------------------------------------------
// parse_secret_file end-to-end (disk)
// ---------------------------------------------------------------------------
#[test]
fn parse_secret_file_from_disk() {
use std::io::Write;
let mut tmp = NamedTempFile::with_suffix(".env").unwrap();
write!(tmp, "SECRET=hunter2\nPORT=8080").unwrap();
let entries = parse_secret_file(tmp.path(), None, None).unwrap();
assert_eq!(entries.len(), 2);
assert!(entries.iter().any(|e| e.key == "SECRET" && e.value == "hunter2"));
// Source path should match.
assert_eq!(entries[0].source, tmp.path());
}
#[test]
fn parse_secret_file_with_key_filter() {
use std::io::Write;
let mut tmp = NamedTempFile::with_suffix(".env").unwrap();
write!(tmp, "API_KEY=abc\nHOST=localhost\nDB_PASSWORD=secret").unwrap();
let filter = vec!["*PASSWORD*".to_string()];
let entries = parse_secret_file(tmp.path(), None, Some(&filter)).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "DB_PASSWORD");
}
#[test]
fn parse_secret_file_with_format_override() {
use std::io::Write;
// Write env content to a .txt file — format override should work.
let mut tmp = NamedTempFile::with_suffix(".txt").unwrap();
write!(tmp, "KEY=value").unwrap();
let entries = parse_secret_file(tmp.path(), Some(FileFormat::Env), None).unwrap();
assert_eq!(entries.len(), 1);
assert_eq!(entries[0].key, "KEY");
}
+373
View File
@@ -0,0 +1,373 @@
//! Integration tests for the secret value redactor.
use std::path::PathBuf;
use dirigent_fermata::core::secrets::config::RedactionStyle;
use dirigent_fermata::core::secrets::manifest::Manifest;
use dirigent_fermata::core::secrets::parser::SecretEntry;
use dirigent_fermata::core::secrets::redactor::Redactor;
// ---------------------------------------------------------------------------
// Helpers
// ---------------------------------------------------------------------------
fn entry(key: &str, value: &str) -> SecretEntry {
SecretEntry {
key: key.to_string(),
value: value.to_string(),
source: PathBuf::from("test"),
}
}
fn make_redactor(entries: Vec<SecretEntry>, style: RedactionStyle) -> Redactor {
let manifest = Manifest::from_entries(entries);
Redactor::new(&manifest, style)
}
// ---------------------------------------------------------------------------
// Basic redaction
// ---------------------------------------------------------------------------
#[test]
fn basic_single_secret() {
let r = make_redactor(
vec![entry("DB_PASSWORD", "super_secret_123")],
RedactionStyle::Masked,
);
let result = r.redact("connecting with password super_secret_123 ...");
assert_eq!(result.text, "connecting with password ***** ...");
assert!(result.was_redacted());
assert_eq!(result.redactions.len(), 1);
assert_eq!(result.redactions[0].key, "DB_PASSWORD");
}
// ---------------------------------------------------------------------------
// Multiple secrets
// ---------------------------------------------------------------------------
#[test]
fn multiple_different_secrets() {
let r = make_redactor(
vec![
entry("DB_PASSWORD", "db_pass_value"),
entry("API_KEY", "ak_12345678"),
],
RedactionStyle::Masked,
);
let result = r.redact("db=db_pass_value key=ak_12345678");
assert_eq!(result.text, "db=***** key=*****");
assert_eq!(result.redactions.len(), 2);
assert_eq!(result.redactions[0].key, "DB_PASSWORD");
assert_eq!(result.redactions[1].key, "API_KEY");
}
// ---------------------------------------------------------------------------
// Repeated occurrences
// ---------------------------------------------------------------------------
#[test]
fn same_secret_multiple_times() {
let r = make_redactor(
vec![entry("TOKEN", "tok_abcdef")],
RedactionStyle::Named,
);
let result = r.redact("first=tok_abcdef second=tok_abcdef");
assert_eq!(result.text, "first=<REDACTED:TOKEN> second=<REDACTED:TOKEN>");
assert_eq!(result.redactions.len(), 2);
}
// ---------------------------------------------------------------------------
// Redaction styles
// ---------------------------------------------------------------------------
#[test]
fn style_masked() {
let r = make_redactor(
vec![entry("KEY", "secret_value")],
RedactionStyle::Masked,
);
let result = r.redact("val=secret_value");
assert_eq!(result.text, "val=*****");
}
#[test]
fn style_typed() {
let r = make_redactor(
vec![entry("KEY", "secret_value")],
RedactionStyle::Typed,
);
let result = r.redact("val=secret_value");
// "secret_value" is 12 chars
assert_eq!(result.text, "val=<REDACTED:string:12>");
}
#[test]
fn style_named() {
let r = make_redactor(
vec![entry("MY_API_KEY", "secret_value")],
RedactionStyle::Named,
);
let result = r.redact("val=secret_value");
assert_eq!(result.text, "val=<REDACTED:MY_API_KEY>");
}
#[test]
fn style_absent() {
let r = make_redactor(
vec![entry("KEY", "secret_value")],
RedactionStyle::Absent,
);
let result = r.redact("val=secret_value end");
assert_eq!(result.text, "val= end");
assert!(result.was_redacted());
}
// ---------------------------------------------------------------------------
// Overlapping values (longest match wins)
// ---------------------------------------------------------------------------
#[test]
fn overlapping_longest_match_wins() {
let r = make_redactor(
vec![
entry("SHORT_KEY", "secret"),
entry("LONG_KEY", "secret_long_value"),
],
RedactionStyle::Named,
);
let result = r.redact("x=secret_long_value");
// The longer value should match, not the shorter substring.
assert_eq!(result.text, "x=<REDACTED:LONG_KEY>");
assert_eq!(result.redactions.len(), 1);
assert_eq!(result.redactions[0].key, "LONG_KEY");
}
#[test]
fn shorter_match_still_found_when_no_overlap() {
let r = make_redactor(
vec![
entry("SHORT_KEY", "secret"),
entry("LONG_KEY", "secret_long_value"),
],
RedactionStyle::Named,
);
// "secret" appears standalone (not as part of "secret_long_value")
let result = r.redact("a=secret b=secret_long_value");
assert_eq!(result.text, "a=<REDACTED:SHORT_KEY> b=<REDACTED:LONG_KEY>");
assert_eq!(result.redactions.len(), 2);
}
// ---------------------------------------------------------------------------
// No match
// ---------------------------------------------------------------------------
#[test]
fn no_match_returns_unchanged() {
let r = make_redactor(
vec![entry("KEY", "not_present_here")],
RedactionStyle::Masked,
);
let result = r.redact("nothing to see here");
assert_eq!(result.text, "nothing to see here");
assert!(!result.was_redacted());
assert!(result.redactions.is_empty());
}
// ---------------------------------------------------------------------------
// Empty text
// ---------------------------------------------------------------------------
#[test]
fn empty_input_returns_empty() {
let r = make_redactor(
vec![entry("KEY", "some_secret")],
RedactionStyle::Masked,
);
let result = r.redact("");
assert_eq!(result.text, "");
assert!(!result.was_redacted());
}
// ---------------------------------------------------------------------------
// Empty manifest
// ---------------------------------------------------------------------------
#[test]
fn empty_manifest_returns_unchanged() {
let manifest = Manifest::empty();
let r = Redactor::new(&manifest, RedactionStyle::Masked);
assert!(!r.has_secrets());
let result = r.redact("some text with no secrets");
assert_eq!(result.text, "some text with no secrets");
assert!(!result.was_redacted());
}
// ---------------------------------------------------------------------------
// Short values filtered out by Manifest::from_entries
// ---------------------------------------------------------------------------
#[test]
fn short_values_are_filtered() {
// Values shorter than 4 chars should be dropped by from_entries.
let r = make_redactor(
vec![entry("TINY", "abc"), entry("LONG_ENOUGH", "abcd")],
RedactionStyle::Masked,
);
let result = r.redact("abc abcd");
// "abc" should NOT be redacted (too short), "abcd" should be.
assert_eq!(result.text, "abc *****");
assert_eq!(result.redactions.len(), 1);
assert_eq!(result.redactions[0].key, "LONG_ENOUGH");
}
// ---------------------------------------------------------------------------
// Zero false negatives — every declared secret must be caught
// ---------------------------------------------------------------------------
#[test]
fn zero_false_negatives() {
let secrets = vec![
entry("A_SECRET", "alpha_secret_val"),
entry("B_TOKEN", "bravo_token_val_"),
entry("C_PASSWORD", "charlie_pass_99"),
entry("D_API_KEY", "delta_key_00000"),
];
let r = make_redactor(secrets.clone(), RedactionStyle::Masked);
// Build text that contains every single secret value.
let text = format!(
"a={} b={} c={} d={}",
"alpha_secret_val", "bravo_token_val_", "charlie_pass_99", "delta_key_00000",
);
let result = r.redact(&text);
// Every secret value must be replaced.
for s in &secrets {
if s.value.len() >= 4 {
assert!(
!result.text.contains(&s.value),
"Secret {} was not redacted: {}",
s.key,
result.text,
);
}
}
assert_eq!(result.redactions.len(), 4);
}
// ---------------------------------------------------------------------------
// Multi-line text
// ---------------------------------------------------------------------------
#[test]
fn multi_line_redaction() {
let r = make_redactor(
vec![
entry("DB_PASSWORD", "s3cr3t_p@ss"),
entry("API_KEY", "ak-1234567890"),
],
RedactionStyle::Masked,
);
let text = "# Config file\n\
DATABASE_URL=postgres://user:s3cr3t_p@ss@host/db\n\
API_KEY=ak-1234567890\n\
OTHER=safe_value\n";
let result = r.redact(text);
assert!(!result.text.contains("s3cr3t_p@ss"));
assert!(!result.text.contains("ak-1234567890"));
assert!(result.text.contains("safe_value"));
assert_eq!(result.redactions.len(), 2);
}
// ---------------------------------------------------------------------------
// Redaction metadata correctness
// ---------------------------------------------------------------------------
#[test]
fn redaction_metadata_offset_and_len() {
let r = make_redactor(
vec![entry("SECRET", "ABCDEFGH")],
RedactionStyle::Masked,
);
let text = "prefix_ABCDEFGH_suffix";
let result = r.redact(text);
assert_eq!(result.redactions.len(), 1);
let red = &result.redactions[0];
assert_eq!(red.key, "SECRET");
assert_eq!(red.offset, 7); // "prefix_" is 7 bytes
assert_eq!(red.original_len, 8); // "ABCDEFGH" is 8 bytes
}
#[test]
fn redaction_metadata_multiple_offsets() {
let r = make_redactor(
vec![entry("TOK", "xxxx1234")],
RedactionStyle::Masked,
);
// "a=xxxx1234 b=xxxx1234"
let text = "a=xxxx1234 b=xxxx1234";
let result = r.redact(text);
assert_eq!(result.redactions.len(), 2);
assert_eq!(result.redactions[0].offset, 2); // after "a="
assert_eq!(result.redactions[0].original_len, 8);
assert_eq!(result.redactions[1].offset, 13); // after " b="
assert_eq!(result.redactions[1].original_len, 8);
}
// ---------------------------------------------------------------------------
// has_secrets() helper
// ---------------------------------------------------------------------------
#[test]
fn has_secrets_with_entries() {
let r = make_redactor(
vec![entry("KEY", "long_enough_value")],
RedactionStyle::Masked,
);
assert!(r.has_secrets());
}
#[test]
fn has_secrets_empty() {
let r = make_redactor(vec![], RedactionStyle::Masked);
assert!(!r.has_secrets());
}
// ---------------------------------------------------------------------------
// was_redacted() helper
// ---------------------------------------------------------------------------
#[test]
fn was_redacted_true_when_match() {
let r = make_redactor(
vec![entry("KEY", "findme_value")],
RedactionStyle::Masked,
);
let result = r.redact("findme_value");
assert!(result.was_redacted());
}
#[test]
fn was_redacted_false_when_no_match() {
let r = make_redactor(
vec![entry("KEY", "findme_value")],
RedactionStyle::Masked,
);
let result = r.redact("nothing here");
assert!(!result.was_redacted());
}
// ---------------------------------------------------------------------------
// Deduplication in from_entries
// ---------------------------------------------------------------------------
#[test]
fn duplicate_entries_deduplicated() {
let manifest = Manifest::from_entries(vec![
entry("KEY", "same_value_here"),
entry("KEY", "same_value_here"),
]);
assert_eq!(manifest.len(), 1);
}
+254
View File
@@ -0,0 +1,254 @@
use dirigent_fermata::core::secrets::config::HeuristicConfig;
use dirigent_fermata::core::secrets::scanner::{shannon_entropy, Confidence, Scanner};
// ---------------------------------------------------------------------------
// Helper: build a scanner with default config (built-in rules only)
// ---------------------------------------------------------------------------
fn default_scanner() -> Scanner {
Scanner::builtin().expect("built-in rules must compile")
}
// ---------------------------------------------------------------------------
// Specific provider patterns
// ---------------------------------------------------------------------------
#[test]
fn detects_aws_access_key() {
let scanner = default_scanner();
let findings = scanner.scan("here is my key: AKIAIOSFODNN7EXAMPLE ok");
assert!(
findings.iter().any(|f| f.pattern_id == "aws-access-key"),
"expected aws-access-key finding, got: {findings:?}"
);
assert_eq!(findings[0].confidence, Confidence::High);
}
#[test]
fn detects_github_pat_classic() {
let scanner = default_scanner();
let findings = scanner.scan("ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij");
assert!(
findings.iter().any(|f| f.pattern_id == "github-pat-classic"),
"expected github-pat-classic finding, got: {findings:?}"
);
}
#[test]
fn detects_stripe_secret_key() {
let scanner = default_scanner();
let findings = scanner.scan("STRIPE_KEY=sk_live_abcdefghijklmnopqrstuvwx");
assert!(
findings.iter().any(|f| f.pattern_id == "stripe-secret-key"),
"expected stripe-secret-key finding, got: {findings:?}"
);
}
#[test]
fn detects_private_key_header() {
let scanner = default_scanner();
let text = "-----BEGIN RSA PRIVATE KEY-----\nMIIEpAIBAAK...\n-----END RSA PRIVATE KEY-----";
let findings = scanner.scan(text);
assert!(
findings
.iter()
.any(|f| f.pattern_id == "private-key-header"),
"expected private-key-header finding, got: {findings:?}"
);
}
#[test]
fn detects_jwt_token() {
let scanner = default_scanner();
// A realistic-looking (but fake) JWT.
let jwt = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ik\
pvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c";
let findings = scanner.scan(jwt);
assert!(
findings.iter().any(|f| f.pattern_id == "jwt"),
"expected jwt finding, got: {findings:?}"
);
}
#[test]
fn detects_database_connection_url() {
let scanner = default_scanner();
let findings = scanner.scan("DATABASE_URL=postgres://admin:s3cretP4ss@db.example.com:5432/mydb");
assert!(
findings
.iter()
.any(|f| f.pattern_id == "database-connection-url"),
"expected database-connection-url finding, got: {findings:?}"
);
}
#[test]
fn detects_slack_webhook() {
let scanner = default_scanner();
let findings = scanner
.scan("https://hooks.slack.com/services/T0ABCDEFG/B0ABCDEFG/abcdefghijklmnopqrstuvwx");
assert!(
findings.iter().any(|f| f.pattern_id == "slack-webhook"),
"expected slack-webhook finding, got: {findings:?}"
);
}
#[test]
fn detects_anthropic_api_key() {
let scanner = default_scanner();
let key = "sk-ant-aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789abcdefgh";
let findings = scanner.scan(&format!("my key is {key}"));
assert!(
findings
.iter()
.any(|f| f.pattern_id == "anthropic-api-key"),
"expected anthropic-api-key finding, got: {findings:?}"
);
}
#[test]
fn detects_sendgrid_api_key() {
let scanner = default_scanner();
let key = "SG.abcdefghijklmnopqrstuv.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrst";
let findings = scanner.scan(key);
assert!(
findings.iter().any(|f| f.pattern_id == "sendgrid-api-key"),
"expected sendgrid-api-key finding, got: {findings:?}"
);
}
// ---------------------------------------------------------------------------
// Generic patterns — entropy filtering
// ---------------------------------------------------------------------------
#[test]
fn rejects_low_entropy_generic_api_key() {
let scanner = default_scanner();
// "test" repeated has very low entropy — should NOT trigger.
let findings = scanner.scan(r#"api_key = "testtesttesttesttest""#);
let generic_hits: Vec<_> = findings
.iter()
.filter(|f| f.pattern_id == "generic-api-key")
.collect();
assert!(
generic_hits.is_empty(),
"low-entropy api_key should be filtered out, got: {generic_hits:?}"
);
}
#[test]
fn accepts_high_entropy_generic_secret() {
let scanner = default_scanner();
// A high-entropy random-looking value.
let findings = scanner.scan(r#"secret = "a8Kz3Lm9Xq2Wp7Yn"#);
let has_generic = findings
.iter()
.any(|f| f.pattern_id == "generic-secret");
assert!(
has_generic,
"high-entropy secret should be detected, got: {findings:?}"
);
}
// ---------------------------------------------------------------------------
// Custom patterns from config
// ---------------------------------------------------------------------------
#[test]
fn custom_pattern_from_config() {
let config = HeuristicConfig {
enabled: true,
patterns: vec![r"MY_CUSTOM_[A-Z]{10}".to_string()],
..Default::default()
};
let scanner = Scanner::new(&config).expect("should compile custom pattern");
let findings = scanner.scan("found MY_CUSTOM_ABCDEFGHIJ in output");
assert!(
findings.iter().any(|f| f.pattern_id == "custom-0"),
"expected custom-0 finding, got: {findings:?}"
);
assert_eq!(findings[0].confidence, Confidence::High);
}
// ---------------------------------------------------------------------------
// Edge cases
// ---------------------------------------------------------------------------
#[test]
fn empty_text_returns_no_findings() {
let scanner = default_scanner();
assert!(scanner.scan("").is_empty());
}
#[test]
fn plain_text_returns_no_findings() {
let scanner = default_scanner();
let findings = scanner.scan("This is just a normal paragraph with no secrets.");
assert!(
findings.is_empty(),
"plain text should have no findings, got: {findings:?}"
);
}
#[test]
fn overlapping_matches_are_deduplicated() {
// Construct text where the same span could match multiple patterns.
// The bearer token pattern and a generic pattern could overlap on the same region.
let scanner = default_scanner();
let text = "Authorization: Bearer ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefgh";
let findings = scanner.scan(text);
// Verify no two findings have overlapping spans.
for i in 0..findings.len() {
for j in (i + 1)..findings.len() {
assert!(
findings[j].span.start >= findings[i].span.end,
"findings {i} and {j} overlap: {:?} vs {:?}",
findings[i].span,
findings[j].span,
);
}
}
}
// ---------------------------------------------------------------------------
// Shannon entropy unit tests (supplement the inline mod tests)
// ---------------------------------------------------------------------------
#[test]
fn entropy_known_values() {
// Single character repeated → 0.
assert!((shannon_entropy("aaaa") - 0.0).abs() < f64::EPSILON);
// Perfectly balanced binary → 1.0 bits/char.
let balanced = "ababababab";
assert!((shannon_entropy(balanced) - 1.0).abs() < 0.01);
// High diversity.
let diverse = "aB3$kL9!mZ7@wQ1#xR5^";
assert!(shannon_entropy(diverse) > 3.5);
}
// ---------------------------------------------------------------------------
// Scanner construction
// ---------------------------------------------------------------------------
#[test]
fn builtin_scanner_has_rules() {
let scanner = default_scanner();
assert!(
scanner.rule_count() >= 30,
"expected at least 30 built-in rules, got {}",
scanner.rule_count()
);
}
#[test]
fn invalid_custom_pattern_returns_error() {
let config = HeuristicConfig {
enabled: true,
patterns: vec![r"[invalid".to_string()],
..Default::default()
};
assert!(Scanner::new(&config).is_err());
}
+58 -1
View File
@@ -1,5 +1,5 @@
use dirigent_fermata::core::{Decision, Reason};
use dirigent_fermata::harness::{HarnessAdapter, PathKind, ToolOp};
use dirigent_fermata::harness::{HarnessAdapter, HookEvent, PathKind, ToolOp};
use dirigent_fermata::harness::claude::ClaudeAdapter;
#[test]
@@ -84,3 +84,60 @@ fn renders_ask_as_ask() {
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v["hookSpecificOutput"]["permissionDecision"], "ask");
}
// ---------------------------------------------------------------------------
// PostToolUse
// ---------------------------------------------------------------------------
#[test]
fn parses_post_tool_use_payload() {
let payload = br#"{"tool_name":"Read","tool_input":{"file_path":"/proj/.env"},"tool_response":"SECRET=abc"}"#;
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
assert_eq!(p.tool_name, "Read");
assert_eq!(p.tool_response, "SECRET=abc");
}
#[test]
fn parses_post_tool_use_missing_response() {
// tool_response absent → defaults to empty string.
let payload = br#"{"tool_name":"Bash","tool_input":{"command":"ls"}}"#;
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
assert_eq!(p.tool_response, "");
}
#[test]
fn renders_post_tool_use_with_redacted_output() {
let payload = br#"{"tool_name":"Read","tool_input":{},"tool_response":"x"}"#;
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
let out = ClaudeAdapter
.render_post_tool_use(&p, Some("redacted text"))
.unwrap();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v["hookSpecificOutput"]["hookEventName"], "PostToolUse");
assert_eq!(
v["hookSpecificOutput"]["updatedToolOutput"],
"redacted text"
);
}
#[test]
fn renders_post_tool_use_passthrough() {
let payload = br#"{"tool_name":"Read","tool_input":{},"tool_response":"clean"}"#;
let p = ClaudeAdapter.parse_post_tool_use(payload).unwrap();
let out = ClaudeAdapter.render_post_tool_use(&p, None).unwrap();
let v: serde_json::Value = serde_json::from_slice(&out).unwrap();
assert_eq!(v, serde_json::json!({}));
}
// ---------------------------------------------------------------------------
// HookEvent parsing
// ---------------------------------------------------------------------------
#[test]
fn hook_event_parse_variants() {
assert_eq!(HookEvent::parse("pre-tool-use"), Some(HookEvent::PreToolUse));
assert_eq!(HookEvent::parse("PreToolUse"), Some(HookEvent::PreToolUse));
assert_eq!(HookEvent::parse("post-tool-use"), Some(HookEvent::PostToolUse));
assert_eq!(HookEvent::parse("PostToolUse"), Some(HookEvent::PostToolUse));
assert_eq!(HookEvent::parse("unknown"), None);
}