✨ feat(fermata): add secret filtering engine — the security brain

Implement Goals 1–3 and 5 from the reveal-layer security brain goal. fermata now detects, redacts, and scans for secrets in AI agent tool output, filling the ecosystem gap where no coding agent filters secrets post-read. New core/secrets/ module: - config.rs: .botsecrets TOML format with hierarchical merge and ~40 built-in key patterns - parser.rs: multi-format secret file parser (.env, TOML, YAML, JSON, Python assignments, Java properties) - manifest.rs: file discovery + parsing → known-secrets set - redactor.rs: Aho-Corasick multi-pattern replacement with 4 styles - scanner.rs: RegexSet heuristic detection with 35 gitleaks-derived patterns (MIT) and Shannon entropy filtering - patterns.rs: curated rules for AWS, GitHub, Stripe, Slack, JWT, etc. Hook integration: - fermata hook --event post-tool-use reads tool output, runs redactor + scanner, returns updatedToolOutput for Claude Code - Backward compatible: --event pre-tool-use (default) unchanged - Fail-open: errors produce {} and exit 0 Library API: - Redactor::new(manifest, style).redact(text) → RedactedText - Scanner::new(config).scan(text) → Vec<Finding> - Compiles without CLI feature for embedding in other crates 195 tests (130 new), all passing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-25 17:29:07 +02:00
parent f77fd73966
commit 087429d275
22 changed files with 4557 additions and 172 deletions
@@ -1,44 +1,40 @@
-# 𝄐 dirigent_fermata
+# dirigent_fermata

-**A fast, harness-agnostic policy gate for AI coding agents.**
+**A fast, harness-agnostic policy gate and secret filtering engine for AI coding agents.**

-Drop a `.botignore` file in your project root. Fermata reads it and blocks your agent from reading, writing, or running things it shouldn't — before the tool call happens.
-
-```
-.env
-.env.*
-secrets/**
-conf/settings.local.yaml
-```
-
-That's all it takes.
+Drop a `.botignore` to control what your agent can touch. Drop a `.botsecrets` to control what secret values your agent can see. Fermata enforces both -- before and after tool calls happen.

 ---

 ## Why Fermata

-AI coding agents are powerful, but they don't have an innate sense of "don't touch `.env`." Native hook systems in tools like Claude Code let you intercept every file operation — but wiring up your own secure, fast hook for each project is friction. Fermata is that hook, ready to drop in.
+AI coding agents don't have an innate sense of "don't touch `.env`" -- and even if you block the file, they can still see its contents through shell output, log files, and indirect reads. Fermata solves both problems:

- **Fast** — written in Rust; ~1–5ms per call. Hooks fire on every read, write, and bash operation. Python cold-start (~50–150ms) compounds fast. Fermata doesn't.
- **Familiar syntax** — `.botignore` uses gitignore rules via the `ignore` crate (the same engine powering ripgrep).
- **Per-operation control** — `botignore.toml` lets you block writes to `vendor/**` while still allowing reads, or deny specific bash patterns without touching path rules.
- **Harness-agnostic** — plain CLI exit codes work from any shell wrapper; the hook adapter speaks Claude Code's JSON natively.
+- **Policy gate** -- `.botignore` blocks reads, writes, and dangerous commands before they execute (PreToolUse).
+- **Secret filtering** -- `.botsecrets` redacts secret values from tool output before they enter the LLM context (PostToolUse).
+- **Fast** -- Rust, Aho-Corasick automaton for redaction, ~1-5ms per call.
+- **Familiar syntax** -- `.botignore` uses gitignore rules; `.botsecrets` uses TOML with glob patterns.
+- **Harness-agnostic** -- hook adapters for Claude Code (shipped), Codex and Gemini (planned), MCP proxy (planned).

 ---

-## Status: v0.1
+## Status: v0.2

 | Component | Status |
 |-----------|--------|
-| Library (`Op`, `Decision`, `Policy::check`, `Policy::check_command`) | Done |
-| `.botignore` walker (project-root walk-up, gitignore semantics) | Done |
+| Library (`Policy::check`, `Policy::check_command`) | Done |
+| `.botignore` walker (gitignore semantics) | Done |
 | `botignore.toml` parser (read / write / bash namespaces) | Done |
-| Path identification heuristics | Done |
-| CLI: `fermata check <path>...` | Done |
-| CLI: `fermata hook --harness claude` | Done |
+| CLI: `fermata check` / `fermata hook` | Done |
 | Claude Code PreToolUse adapter | Done |
+| Claude Code PostToolUse adapter (output redaction) | Done |
+| `.botsecrets` config parser | Done |
+| Secret manifest discovery and loading | Done |
+| Multi-format secret file parser (.env, TOML, YAML, JSON) | Done |
+| `Redactor` (known-value Aho-Corasick replacement) | Done |
+| `Scanner` (heuristic regex + gitleaks patterns) | Done |

-Out of scope for v0.1: Codex / Gemini hook adapters, MCP server mode, audit log, filesystem watcher.
+Out of scope for v0.2: Codex / Gemini hook adapters, MCP proxy mode, audit log, filesystem watcher.

 ---

@@ -50,87 +46,43 @@ From source (this monorepo):
 cargo install --path crates/dirigent_fermata --features cli
 ```

-This installs the `fermata` binary into `~/.cargo/bin/`.
+---
+
+## Secret Filtering
+
+Fermata's secret filtering operates in three layers:
+
+1. **Policy gate** (PreToolUse) -- `.botignore` blocks direct access to sensitive files. Catches ~90% of accidental reads.
+2. **Known-value redaction** (PostToolUse) -- `.botsecrets` declares which files contain secrets. Fermata parses them, extracts values, and replaces them in all tool output using an Aho-Corasick automaton. Zero false negatives for declared secrets.
+3. **Heuristic scanning** (PostToolUse) -- regex patterns derived from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs). Safety net for secrets not covered by the manifest.
+
+### `.botsecrets` format
+
+Create a `.botsecrets` file at your project root:
+
+```toml
+# Files that contain secrets -- fermata parses these and redacts values
+[files]
+patterns = [".env", ".env.*", "secrets.*"]
+
+# Additional secret key names (built-in defaults cover *_KEY, *_SECRET, etc.)
+[keys]
+include = ["STRIPE_*", "MY_APP_SIGNING_*"]
+
+# Heuristic scanning on all tool output
+[heuristic]
+enabled = true
+```
+
+That's the typical case. Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.

 ---

 ## Usage

-### Checking a path
+### Claude Code hook configuration

-```bash
-fermata check --op read /path/to/.env
-# exit 1 — blocked
-# stderr: blocked by rule ".env" in /your/project/.botignore
-
-fermata check --op write /path/to/src/main.rs
-# exit 0 — allowed
-```
-
-### Claude Code hook adapter
-
-```bash
-fermata hook --harness claude < hook_payload.json
-```
-
-Reads the PreToolUse JSON from stdin, extracts the tool name and path or command, applies policy, and emits the Claude-shaped JSON response. The hook's exit code is always `0`; the verdict is in the JSON body.
-
---
-
-## Configuration
-
-### `.botignore` — the 80% case
-
-Create a `.botignore` at your project root. Gitignore syntax. Blocks both reads and writes.
-
-```gitignore
-# Secrets
-.env
-.env.*
-secrets/**
-
-# Local config overrides
-conf/settings.local.yaml
-conf/settings.test.yaml
-
-# Generated files — let the tools rebuild them, not patch them
-dist/**
-*.lock
-```
-
-Fermata walks up from the target file to find the nearest `.botignore`, so it works correctly even when an agent changes directory.
-
-### `botignore.toml` — per-operation rules
-
-For cases where `.botignore`'s uniform read+write block isn't granular enough:
-
-```toml
-[read]
-# Block reading secrets outright
-patterns = [".env*", "secrets/**", "conf/settings.local.yaml"]
-
-[write]
-# Allow reading vendor code but block patching it
-patterns = ["vendor/**", "*.lock"]
-
-[bash]
-# Hard-block destructive or exfiltrating commands
-deny = [
-  "rm -rf /",
-  "curl * | sh",
-  "git push --force*",
-]
-# Ask before any removal or move
-ask = ["rm:*", "mv:*"]
-# Narrow allowlist for automated commands
-allow_prefixes = ["make test", "git checkout:*"]
-```
-
---
-
-## How it fits into Claude Code
-
-Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
+Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:

 ```json
 {
@@ -139,10 +91,15 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
      {
        "matcher": "Bash|Read|Edit|Write",
        "hooks": [
-          {
-            "type": "command",
-            "command": "fermata hook --harness claude"
-          }
+          { "type": "command", "command": "fermata hook --harness claude" }
+        ]
+      }
+    ],
+    "PostToolUse": [
+      {
+        "matcher": "Bash|Read|Edit|Write",
+        "hooks": [
+          { "type": "command", "command": "fermata hook --harness claude --event post-tool-use" }
        ]
      }
    ]
@@ -150,50 +107,68 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
 }
 ```

-When Claude attempts a `Read(.env)`, `Write(vendor/foo.js)`, or `Bash(rm ./secrets/key.pem)`, fermata intercepts the call, checks policy, and returns a deny with a human-readable reason — before any damage is done.
+PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
+
+### Checking a path
+
+```bash
+fermata check --op read /path/to/.env
+# exit 1 -- blocked
+
+fermata check --op write /path/to/src/main.rs
+# exit 0 -- allowed
+```
+
+### Library API
+
+```rust
+use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
+
+// Load .botsecrets config and build the manifest
+let config = SecretsConfig::load("/path/to/project")?;
+let manifest = Manifest::discover(&config)?;
+
+// Known-value redaction (Aho-Corasick, sub-millisecond)
+let redactor = Redactor::from_manifest(&manifest);
+let clean = redactor.redact("DB_PASSWORD=hunter2\nAPI_KEY=sk-abc123");
+// -> "DB_PASSWORD=*****\nAPI_KEY=*****"
+
+// Heuristic scanning (regex patterns)
+let scanner = Scanner::new(&config);
+let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
+// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
+```

 ---

-## Real-world scenario
+## Configuration

-A project has `.env`, `conf/settings.local.yaml`, and a `vendor/` tree it doesn't want patched. With `.botignore`:
+### `.botignore` -- access control
+
+Gitignore syntax. Blocks both reads and writes.

 ```gitignore
 .env
 .env.*
-conf/settings.local.yaml
-vendor/**
+secrets/**
 ```

-Claude attempts to read credentials:
-
-```
-Tool: Read
-Path: ./conf/settings.local.yaml
-Decision: BLOCK — matched rule "conf/settings.local.yaml" (.botignore)
-```
-
-Claude attempts to read application code:
-
-```
-Tool: Read
-Path: ./src/app/main.rs
-Decision: ALLOW
-```
-
-Claude attempts to run `cat .env` via bash — which would bypass a path-only check:
+### `botignore.toml` -- per-operation rules

 ```toml
-# botignore.toml
+[read]
+patterns = [".env*", "secrets/**"]
+
+[write]
+patterns = ["vendor/**", "*.lock"]
+
 [bash]
-deny = ["cat .env*", "cat conf/settings.local*"]
+deny = ["rm -rf /", "curl * | sh"]
 ```

-```
-Tool: Bash
-Command: cat .env
-Decision: BLOCK — matched bash deny rule "cat .env*"
-```
+### `.botsecrets` -- secret value redaction
+
+See the Secret Filtering section above.

 ---

@@ -201,14 +176,16 @@ Decision: BLOCK — matched bash deny rule "cat .env*"

 Three concentric layers; nothing inner imports from anything outer:

- **`core/`** — harness-unaware, sync. Types, `.botignore` walker, `botignore.toml` parser, `Policy::check` / `check_command`, path extraction.
- **`harness/`** — `HarnessAdapter` trait over a normalized `ToolCall`. Each adapter lives in its own submodule, feature-gated.
- **`bin/fermata.rs`** — the only place `clap`, stdio, and exit codes appear.
+- **`core/`** -- harness-unaware, sync. Policy types, `.botignore` walker, `botignore.toml` parser, `Policy::check`.
+  - **`core/secrets/`** -- `.botsecrets` config, manifest discovery, multi-format parser, Aho-Corasick redactor, heuristic scanner.
+- **`harness/`** -- `HarnessAdapter` trait for PreToolUse (policy gate) and PostToolUse (output redaction). Each adapter is feature-gated.
+- **`bin/fermata.rs`** -- `clap`, stdio, and exit codes.

 ---

 ## See also

- `docs/tools/fermata.md` — Dirigent integration plan
- `docs/workpad/brainstorm/fermata.md` — full product spec and field notes
- `docs/architecture/crates.md` — crate dependency map
+- `docs/tools/fermata.md` -- Dirigent integration plan
+- `docs/architecture/fermata-security-philosophy.md` -- security philosophy and the reveal triangle
+- `docs/workpad/brainstorm/fermata.md` -- full product spec and field notes
+- `docs/architecture/crates.md` -- crate dependency map