feat(fermata): add secret filtering engine — the security brain

Implement Goals 1–3 and 5 from the reveal-layer security brain goal.
fermata now detects, redacts, and scans for secrets in AI agent tool
output, filling the ecosystem gap where no coding agent filters secrets
post-read.

New core/secrets/ module:
- config.rs: .botsecrets TOML format with hierarchical merge and ~40
  built-in key patterns
- parser.rs: multi-format secret file parser (.env, TOML, YAML, JSON,
  Python assignments, Java properties)
- manifest.rs: file discovery + parsing → known-secrets set
- redactor.rs: Aho-Corasick multi-pattern replacement with 4 styles
- scanner.rs: RegexSet heuristic detection with 35 gitleaks-derived
  patterns (MIT) and Shannon entropy filtering
- patterns.rs: curated rules for AWS, GitHub, Stripe, Slack, JWT, etc.

Hook integration:
- fermata hook --event post-tool-use reads tool output, runs redactor +
  scanner, returns updatedToolOutput for Claude Code
- Backward compatible: --event pre-tool-use (default) unchanged
- Fail-open: errors produce {} and exit 0

Library API:
- Redactor::new(manifest, style).redact(text) → RedactedText
- Scanner::new(config).scan(text) → Vec<Finding>
- Compiles without CLI feature for embedding in other crates

195 tests (130 new), all passing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Gabor Körber
2026-05-25 17:29:07 +02:00
parent f77fd73966
commit 087429d275
22 changed files with 4557 additions and 172 deletions
+115 -138
View File
@@ -1,44 +1,40 @@
# 𝄐 dirigent_fermata
# dirigent_fermata
**A fast, harness-agnostic policy gate for AI coding agents.**
**A fast, harness-agnostic policy gate and secret filtering engine for AI coding agents.**
Drop a `.botignore` file in your project root. Fermata reads it and blocks your agent from reading, writing, or running things it shouldn't — before the tool call happens.
```
.env
.env.*
secrets/**
conf/settings.local.yaml
```
That's all it takes.
Drop a `.botignore` to control what your agent can touch. Drop a `.botsecrets` to control what secret values your agent can see. Fermata enforces both -- before and after tool calls happen.
---
## Why Fermata
AI coding agents are powerful, but they don't have an innate sense of "don't touch `.env`." Native hook systems in tools like Claude Code let you intercept every file operation — but wiring up your own secure, fast hook for each project is friction. Fermata is that hook, ready to drop in.
AI coding agents don't have an innate sense of "don't touch `.env`" -- and even if you block the file, they can still see its contents through shell output, log files, and indirect reads. Fermata solves both problems:
- **Fast** — written in Rust; ~15ms per call. Hooks fire on every read, write, and bash operation. Python cold-start (~50150ms) compounds fast. Fermata doesn't.
- **Familiar syntax** `.botignore` uses gitignore rules via the `ignore` crate (the same engine powering ripgrep).
- **Per-operation control** — `botignore.toml` lets you block writes to `vendor/**` while still allowing reads, or deny specific bash patterns without touching path rules.
- **Harness-agnostic** — plain CLI exit codes work from any shell wrapper; the hook adapter speaks Claude Code's JSON natively.
- **Policy gate** -- `.botignore` blocks reads, writes, and dangerous commands before they execute (PreToolUse).
- **Secret filtering** -- `.botsecrets` redacts secret values from tool output before they enter the LLM context (PostToolUse).
- **Fast** -- Rust, Aho-Corasick automaton for redaction, ~1-5ms per call.
- **Familiar syntax** -- `.botignore` uses gitignore rules; `.botsecrets` uses TOML with glob patterns.
- **Harness-agnostic** -- hook adapters for Claude Code (shipped), Codex and Gemini (planned), MCP proxy (planned).
---
## Status: v0.1
## Status: v0.2
| Component | Status |
|-----------|--------|
| Library (`Op`, `Decision`, `Policy::check`, `Policy::check_command`) | Done |
| `.botignore` walker (project-root walk-up, gitignore semantics) | Done |
| Library (`Policy::check`, `Policy::check_command`) | Done |
| `.botignore` walker (gitignore semantics) | Done |
| `botignore.toml` parser (read / write / bash namespaces) | Done |
| Path identification heuristics | Done |
| CLI: `fermata check <path>...` | Done |
| CLI: `fermata hook --harness claude` | Done |
| CLI: `fermata check` / `fermata hook` | Done |
| Claude Code PreToolUse adapter | Done |
| Claude Code PostToolUse adapter (output redaction) | Done |
| `.botsecrets` config parser | Done |
| Secret manifest discovery and loading | Done |
| Multi-format secret file parser (.env, TOML, YAML, JSON) | Done |
| `Redactor` (known-value Aho-Corasick replacement) | Done |
| `Scanner` (heuristic regex + gitleaks patterns) | Done |
Out of scope for v0.1: Codex / Gemini hook adapters, MCP server mode, audit log, filesystem watcher.
Out of scope for v0.2: Codex / Gemini hook adapters, MCP proxy mode, audit log, filesystem watcher.
---
@@ -50,87 +46,43 @@ From source (this monorepo):
cargo install --path crates/dirigent_fermata --features cli
```
This installs the `fermata` binary into `~/.cargo/bin/`.
---
## Secret Filtering
Fermata's secret filtering operates in three layers:
1. **Policy gate** (PreToolUse) -- `.botignore` blocks direct access to sensitive files. Catches ~90% of accidental reads.
2. **Known-value redaction** (PostToolUse) -- `.botsecrets` declares which files contain secrets. Fermata parses them, extracts values, and replaces them in all tool output using an Aho-Corasick automaton. Zero false negatives for declared secrets.
3. **Heuristic scanning** (PostToolUse) -- regex patterns derived from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs). Safety net for secrets not covered by the manifest.
### `.botsecrets` format
Create a `.botsecrets` file at your project root:
```toml
# Files that contain secrets -- fermata parses these and redacts values
[files]
patterns = [".env", ".env.*", "secrets.*"]
# Additional secret key names (built-in defaults cover *_KEY, *_SECRET, etc.)
[keys]
include = ["STRIPE_*", "MY_APP_SIGNING_*"]
# Heuristic scanning on all tool output
[heuristic]
enabled = true
```
That's the typical case. Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
---
## Usage
### Checking a path
### Claude Code hook configuration
```bash
fermata check --op read /path/to/.env
# exit 1 — blocked
# stderr: blocked by rule ".env" in /your/project/.botignore
fermata check --op write /path/to/src/main.rs
# exit 0 — allowed
```
### Claude Code hook adapter
```bash
fermata hook --harness claude < hook_payload.json
```
Reads the PreToolUse JSON from stdin, extracts the tool name and path or command, applies policy, and emits the Claude-shaped JSON response. The hook's exit code is always `0`; the verdict is in the JSON body.
---
## Configuration
### `.botignore` — the 80% case
Create a `.botignore` at your project root. Gitignore syntax. Blocks both reads and writes.
```gitignore
# Secrets
.env
.env.*
secrets/**
# Local config overrides
conf/settings.local.yaml
conf/settings.test.yaml
# Generated files — let the tools rebuild them, not patch them
dist/**
*.lock
```
Fermata walks up from the target file to find the nearest `.botignore`, so it works correctly even when an agent changes directory.
### `botignore.toml` — per-operation rules
For cases where `.botignore`'s uniform read+write block isn't granular enough:
```toml
[read]
# Block reading secrets outright
patterns = [".env*", "secrets/**", "conf/settings.local.yaml"]
[write]
# Allow reading vendor code but block patching it
patterns = ["vendor/**", "*.lock"]
[bash]
# Hard-block destructive or exfiltrating commands
deny = [
"rm -rf /",
"curl * | sh",
"git push --force*",
]
# Ask before any removal or move
ask = ["rm:*", "mv:*"]
# Narrow allowlist for automated commands
allow_prefixes = ["make test", "git checkout:*"]
```
---
## How it fits into Claude Code
Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
```json
{
@@ -139,10 +91,15 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
{
"matcher": "Bash|Read|Edit|Write",
"hooks": [
{
"type": "command",
"command": "fermata hook --harness claude"
}
{ "type": "command", "command": "fermata hook --harness claude" }
]
}
],
"PostToolUse": [
{
"matcher": "Bash|Read|Edit|Write",
"hooks": [
{ "type": "command", "command": "fermata hook --harness claude --event post-tool-use" }
]
}
]
@@ -150,50 +107,68 @@ Add fermata as a `PreToolUse` hook in `.claude/settings.json`:
}
```
When Claude attempts a `Read(.env)`, `Write(vendor/foo.js)`, or `Bash(rm ./secrets/key.pem)`, fermata intercepts the call, checks policy, and returns a deny with a human-readable reason — before any damage is done.
PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
### Checking a path
```bash
fermata check --op read /path/to/.env
# exit 1 -- blocked
fermata check --op write /path/to/src/main.rs
# exit 0 -- allowed
```
### Library API
```rust
use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
// Load .botsecrets config and build the manifest
let config = SecretsConfig::load("/path/to/project")?;
let manifest = Manifest::discover(&config)?;
// Known-value redaction (Aho-Corasick, sub-millisecond)
let redactor = Redactor::from_manifest(&manifest);
let clean = redactor.redact("DB_PASSWORD=hunter2\nAPI_KEY=sk-abc123");
// -> "DB_PASSWORD=*****\nAPI_KEY=*****"
// Heuristic scanning (regex patterns)
let scanner = Scanner::new(&config);
let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
```
---
## Real-world scenario
## Configuration
A project has `.env`, `conf/settings.local.yaml`, and a `vendor/` tree it doesn't want patched. With `.botignore`:
### `.botignore` -- access control
Gitignore syntax. Blocks both reads and writes.
```gitignore
.env
.env.*
conf/settings.local.yaml
vendor/**
secrets/**
```
Claude attempts to read credentials:
```
Tool: Read
Path: ./conf/settings.local.yaml
Decision: BLOCK — matched rule "conf/settings.local.yaml" (.botignore)
```
Claude attempts to read application code:
```
Tool: Read
Path: ./src/app/main.rs
Decision: ALLOW
```
Claude attempts to run `cat .env` via bash — which would bypass a path-only check:
### `botignore.toml` -- per-operation rules
```toml
# botignore.toml
[read]
patterns = [".env*", "secrets/**"]
[write]
patterns = ["vendor/**", "*.lock"]
[bash]
deny = ["cat .env*", "cat conf/settings.local*"]
deny = ["rm -rf /", "curl * | sh"]
```
```
Tool: Bash
Command: cat .env
Decision: BLOCK — matched bash deny rule "cat .env*"
```
### `.botsecrets` -- secret value redaction
See the Secret Filtering section above.
---
@@ -201,14 +176,16 @@ Decision: BLOCK — matched bash deny rule "cat .env*"
Three concentric layers; nothing inner imports from anything outer:
- **`core/`** harness-unaware, sync. Types, `.botignore` walker, `botignore.toml` parser, `Policy::check` / `check_command`, path extraction.
- **`harness/`** `HarnessAdapter` trait over a normalized `ToolCall`. Each adapter lives in its own submodule, feature-gated.
- **`bin/fermata.rs`** — the only place `clap`, stdio, and exit codes appear.
- **`core/`** -- harness-unaware, sync. Policy types, `.botignore` walker, `botignore.toml` parser, `Policy::check`.
- **`core/secrets/`** -- `.botsecrets` config, manifest discovery, multi-format parser, Aho-Corasick redactor, heuristic scanner.
- **`harness/`** -- `HarnessAdapter` trait for PreToolUse (policy gate) and PostToolUse (output redaction). Each adapter is feature-gated.
- **`bin/fermata.rs`** -- `clap`, stdio, and exit codes.
---
## See also
- `docs/tools/fermata.md` Dirigent integration plan
- `docs/workpad/brainstorm/fermata.md` — full product spec and field notes
- `docs/architecture/crates.md` — crate dependency map
- `docs/tools/fermata.md` -- Dirigent integration plan
- `docs/architecture/fermata-security-philosophy.md` -- security philosophy and the reveal triangle
- `docs/workpad/brainstorm/fermata.md` -- full product spec and field notes
- `docs/architecture/crates.md` -- crate dependency map