📝 fermata: rewrite docs for public-facing export

New user-friendly README modeled after sandcage's layout (Why / Quick Start / How It Works), plus four focused docs under docs/: - commands.md — full CLI reference with options, exit codes, examples - configuration.md — .botignore, botignore.toml, .botsecrets reference - security-model.md — the Reveal Triangle and defense-in-depth layers - threat-model.md — L0-L6 coverage, honest limitations, pairing guidance All Dirigent/monorepo internals stripped — ready for standalone export. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-25 18:27:51 +02:00
parent 087429d275
commit 77520819f6
5 changed files with 1030 additions and 107 deletions
@@ -1,88 +1,44 @@
-# dirigent_fermata
+# fermata

-**A fast, harness-agnostic policy gate and secret filtering engine for AI coding agents.**
+**A fast, harness-agnostic security layer for AI coding agents.**

-Drop a `.botignore` to control what your agent can touch. Drop a `.botsecrets` to control what secret values your agent can see. Fermata enforces both -- before and after tool calls happen.
+AI coding agents read files, run commands, and inspect output as part of their normal workflow. When they read `.env`, secret values get tokenized into the LLM's context window -- and from there they can leak into commits, PR descriptions, log messages, or API calls. No AI coding agent ships built-in post-read secret filtering today. fermata fixes that.

---
+## Why

-## Why Fermata
+Traditional security blocks the file and hopes the agent doesn't find the data through another path. This is insufficient -- secrets appear in shell output, log files, error messages, and indirect reads that bypass any access-control list.

-AI coding agents don't have an innate sense of "don't touch `.env`" -- and even if you block the file, they can still see its contents through shell output, log files, and indirect reads. Fermata solves both problems:
+fermata operates on two independent levels:

- **Policy gate** -- `.botignore` blocks reads, writes, and dangerous commands before they execute (PreToolUse).
- **Secret filtering** -- `.botsecrets` redacts secret values from tool output before they enter the LLM context (PostToolUse).
- **Fast** -- Rust, Aho-Corasick automaton for redaction, ~1-5ms per call.
- **Familiar syntax** -- `.botignore` uses gitignore rules; `.botsecrets` uses TOML with glob patterns.
- **Harness-agnostic** -- hook adapters for Claude Code (shipped), Codex and Gemini (planned), MCP proxy (planned).
+- **Policy gate** (PreToolUse) -- `.botignore` blocks reads, writes, and dangerous commands before they execute. Catches ~90% of accidental secret access.
+- **Secret filtering** (PostToolUse) -- `.botsecrets` redacts secret *values* from tool output before they enter the LLM context. Catches the remaining cases regardless of how secrets appear.

---
+The key insight: blocking a file is necessary but not sufficient. The agent can have read access to `.env` without secret values being revealed -- if the output is redacted before it reaches the model.

-## Status: v0.2
+## Quick Start

-| Component | Status |
-|-----------|--------|
-| Library (`Policy::check`, `Policy::check_command`) | Done |
-| `.botignore` walker (gitignore semantics) | Done |
-| `botignore.toml` parser (read / write / bash namespaces) | Done |
-| CLI: `fermata check` / `fermata hook` | Done |
-| Claude Code PreToolUse adapter | Done |
-| Claude Code PostToolUse adapter (output redaction) | Done |
-| `.botsecrets` config parser | Done |
-| Secret manifest discovery and loading | Done |
-| Multi-format secret file parser (.env, TOML, YAML, JSON) | Done |
-| `Redactor` (known-value Aho-Corasick replacement) | Done |
-| `Scanner` (heuristic regex + gitleaks patterns) | Done |
-
-Out of scope for v0.2: Codex / Gemini hook adapters, MCP proxy mode, audit log, filesystem watcher.
-
---
-
-## Install
-
-From source (this monorepo):
+### Install

 ```bash
-cargo install --path crates/dirigent_fermata --features cli
+cargo install --path . --features cli
 ```

---
+### Protect a project in 30 seconds

-## Secret Filtering
+```bash
+# Block direct access to secret files
+echo ".env" > .botignore

-Fermata's secret filtering operates in three layers:
-
-1. **Policy gate** (PreToolUse) -- `.botignore` blocks direct access to sensitive files. Catches ~90% of accidental reads.
-2. **Known-value redaction** (PostToolUse) -- `.botsecrets` declares which files contain secrets. Fermata parses them, extracts values, and replaces them in all tool output using an Aho-Corasick automaton. Zero false negatives for declared secrets.
-3. **Heuristic scanning** (PostToolUse) -- regex patterns derived from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs). Safety net for secrets not covered by the manifest.
-
-### `.botsecrets` format
-
-Create a `.botsecrets` file at your project root:
-
-```toml
-# Files that contain secrets -- fermata parses these and redacts values
+# Declare where secrets live -- fermata parses them and redacts values
+cat > .botsecrets << 'EOF'
 [files]
 patterns = [".env", ".env.*", "secrets.*"]
-
-# Additional secret key names (built-in defaults cover *_KEY, *_SECRET, etc.)
-[keys]
-include = ["STRIPE_*", "MY_APP_SIGNING_*"]
-
-# Heuristic scanning on all tool output
-[heuristic]
-enabled = true
+EOF
 ```

-That's the typical case. Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
+### Wire into Claude Code

---
-
-## Usage
-
-### Claude Code hook configuration
-
-Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
+Add both hooks in `.claude/settings.json`:

 ```json
 {
@@ -107,45 +63,43 @@ Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
 }
 ```

-PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
+That's it. PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.

-### Checking a path
+## How It Works

-```bash
-fermata check --op read /path/to/.env
-# exit 1 -- blocked
+fermata interposes on every tool call in the agent's lifecycle:

-fermata check --op write /path/to/src/main.rs
-# exit 0 -- allowed
+```
+Agent wants to run a tool
+        |
+   PreToolUse ── fermata checks .botignore / botignore.toml
+        |            blocked? → deny with reason
+        |            allowed? ↓
+   Tool executes
+        |
+   PostToolUse ── fermata scans output for secret values
+        |            found? → replace with ***** before LLM sees it
+        |
+   Clean output enters LLM context
 ```

-### Library API
+Three layers of defense, each independent:

-```rust
-use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
+| Layer | Mechanism | What it catches |
+|-------|-----------|-----------------|
+| **Access control** | `.botignore` rules block tool calls by path | Direct reads/writes to sensitive files |
+| **Known-value redaction** | `.botsecrets` declares secret files; fermata parses them and builds an Aho-Corasick automaton | Every occurrence of a declared secret value, in any tool output, regardless of source |
+| **Heuristic detection** | Regex patterns from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs) | Secrets not covered by the manifest -- runtime-generated, unexpected locations |

-// Load .botsecrets config and build the manifest
-let config = SecretsConfig::load("/path/to/project")?;
-let manifest = Manifest::discover(&config)?;
-
-// Known-value redaction (Aho-Corasick, sub-millisecond)
-let redactor = Redactor::from_manifest(&manifest);
-let clean = redactor.redact("DB_PASSWORD=hunter2\nAPI_KEY=sk-abc123");
-// -> "DB_PASSWORD=*****\nAPI_KEY=*****"
-
-// Heuristic scanning (regex patterns)
-let scanner = Scanner::new(&config);
-let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
-// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
-```
-
---
+Performance: ~1-5ms per tool call. Cold start (loading config + parsing secret files) is ~10-20ms.

 ## Configuration

-### `.botignore` -- access control
+Three files, each optional, each solving a different problem:

-Gitignore syntax. Blocks both reads and writes.
+### `.botignore` -- the 80% case
+
+Gitignore syntax. Blocks both reads and writes. Onboarding is one line.

 ```gitignore
 .env
@@ -155,6 +109,8 @@ secrets/**

 ### `botignore.toml` -- per-operation rules

+Separate namespaces so the same file can be readable but not writable:
+
 ```toml
 [read]
 patterns = [".env*", "secrets/**"]
@@ -168,24 +124,99 @@ deny = ["rm -rf /", "curl * | sh"]

 ### `.botsecrets` -- secret value redaction

-See the Secret Filtering section above.
+Declares which files contain secrets. fermata parses them, extracts values, and redacts every occurrence in tool output.

---
+```toml
+[files]
+patterns = [".env", ".env.*", "secrets.*"]

-## Architecture
+[keys]
+include = ["STRIPE_*", "MY_APP_SIGNING_*"]

-Three concentric layers; nothing inner imports from anything outer:
+[heuristic]
+enabled = true
+```

- **`core/`** -- harness-unaware, sync. Policy types, `.botignore` walker, `botignore.toml` parser, `Policy::check`.
-  - **`core/secrets/`** -- `.botsecrets` config, manifest discovery, multi-format parser, Aho-Corasick redactor, heuristic scanner.
- **`harness/`** -- `HarnessAdapter` trait for PreToolUse (policy gate) and PostToolUse (output redaction). Each adapter is feature-gated.
- **`bin/fermata.rs`** -- `clap`, stdio, and exit codes.
+Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.

---
+See [docs/configuration.md](docs/configuration.md) for the full reference.

-## See also
+## Commands

- `docs/tools/fermata.md` -- Dirigent integration plan
- `docs/architecture/fermata-security-philosophy.md` -- security philosophy and the reveal triangle
- `docs/workpad/brainstorm/fermata.md` -- full product spec and field notes
- `docs/architecture/crates.md` -- crate dependency map
+```bash
+# Check if a path is allowed
+fermata check --op read /path/to/.env     # exit 1 = blocked
+fermata check --op write src/main.rs       # exit 0 = allowed
+
+# Run as a hook (reads harness JSON from stdin)
+fermata hook --harness claude
+fermata hook --harness claude --event post-tool-use
+```
+
+See [docs/commands.md](docs/commands.md) for the full CLI reference.
+
+## Library API
+
+fermata is also a Rust library:
+
+```rust
+use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
+
+// Load .botsecrets and build the redaction manifest
+let config = SecretsConfig::load("/path/to/project")?;
+let manifest = Manifest::discover(&config)?;
+
+// Known-value redaction (Aho-Corasick, sub-millisecond)
+let redactor = Redactor::from_manifest(&manifest);
+let clean = redactor.redact("DB_PASSWORD=hunter2");
+// -> "DB_PASSWORD=*****"
+
+// Heuristic scanning (regex patterns)
+let scanner = Scanner::new(&config);
+let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
+// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
+```
+
+## Security Model
+
+fermata addresses a novel security concern: **reveal** -- whether secret *values* enter the LLM context. Traditional file-level access control operates on file identity (which file). Secret redaction operates on data content (which values). The reveal problem can only be solved at the data-content level.
+
+Read [docs/security-model.md](docs/security-model.md) for the full analysis, including the Reveal Triangle and defense-in-depth architecture.
+
+## Threat Model
+
+fermata is a heuristic guard, not a sandbox. It defends against statistical agent behavior and prompt-driven mistakes -- not a deliberate adversary. This is a strength: the threat model is well-defined, and the boundaries are documented honestly.
+
+Read [docs/threat-model.md](docs/threat-model.md) for what fermata catches, what it doesn't, and what to combine it with.
+
+## Harness Support
+
+| Harness | Status | Mechanism |
+|---------|--------|-----------|
+| Claude Code | Shipped | PreToolUse + PostToolUse hooks |
+| Codex CLI | Planned | Pre-exec hook adapter |
+| Gemini CLI | Planned | MCP server mode |
+| Any MCP agent | Planned | MCP proxy wrapping existing servers |
+
+The policy engine and redaction logic are identical across all modes. Only the I/O adapter changes.
+
+## Status
+
+v0.2 -- policy gate and secret filtering engine are production-ready. All core components are implemented and tested:
+
+- `.botignore` walker with gitignore semantics
+- `botignore.toml` with read/write/bash namespaces
+- Claude Code PreToolUse and PostToolUse adapters
+- `.botsecrets` config, manifest discovery, multi-format parser (.env, TOML, YAML, JSON)
+- Aho-Corasick known-value redactor
+- Heuristic scanner with gitleaks-derived patterns
+
+## The `.botsecrets` Vision
+
+`.botsecrets` is designed to be the **`.gitignore` of AI agent security**: a simple, declarative, human-readable file that every project can drop in to protect its secrets from AI agents.
+
+The format is harness-agnostic from day one. It declares *what* to protect, not *how*. The same `.botsecrets` works with Claude Code, Codex, Gemini, and any future harness that supports tool lifecycle hooks.
+
+## License
+
+Licensed under either of [Apache License, Version 2.0](LICENSE-APACHE) or [MIT License](LICENSE-MIT) at your option.