📝 fermata: rewrite docs for public-facing export

New user-friendly README modeled after sandcage's layout (Why / Quick Start /
How It Works), plus four focused docs under docs/:

- commands.md — full CLI reference with options, exit codes, examples
- configuration.md — .botignore, botignore.toml, .botsecrets reference
- security-model.md — the Reveal Triangle and defense-in-depth layers
- threat-model.md — L0-L6 coverage, honest limitations, pairing guidance

All Dirigent/monorepo internals stripped — ready for standalone export.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-25 18:27:51 +02:00
parent 087429d275
commit 77520819f6
5 changed files with 1030 additions and 107 deletions
+138 -107
View File
@@ -1,88 +1,44 @@
# dirigent_fermata
# fermata
**A fast, harness-agnostic policy gate and secret filtering engine for AI coding agents.**
**A fast, harness-agnostic security layer for AI coding agents.**
Drop a `.botignore` to control what your agent can touch. Drop a `.botsecrets` to control what secret values your agent can see. Fermata enforces both -- before and after tool calls happen.
AI coding agents read files, run commands, and inspect output as part of their normal workflow. When they read `.env`, secret values get tokenized into the LLM's context window -- and from there they can leak into commits, PR descriptions, log messages, or API calls. No AI coding agent ships built-in post-read secret filtering today. fermata fixes that.
---
## Why
## Why Fermata
Traditional security blocks the file and hopes the agent doesn't find the data through another path. This is insufficient -- secrets appear in shell output, log files, error messages, and indirect reads that bypass any access-control list.
AI coding agents don't have an innate sense of "don't touch `.env`" -- and even if you block the file, they can still see its contents through shell output, log files, and indirect reads. Fermata solves both problems:
fermata operates on two independent levels:
- **Policy gate** -- `.botignore` blocks reads, writes, and dangerous commands before they execute (PreToolUse).
- **Secret filtering** -- `.botsecrets` redacts secret values from tool output before they enter the LLM context (PostToolUse).
- **Fast** -- Rust, Aho-Corasick automaton for redaction, ~1-5ms per call.
- **Familiar syntax** -- `.botignore` uses gitignore rules; `.botsecrets` uses TOML with glob patterns.
- **Harness-agnostic** -- hook adapters for Claude Code (shipped), Codex and Gemini (planned), MCP proxy (planned).
- **Policy gate** (PreToolUse) -- `.botignore` blocks reads, writes, and dangerous commands before they execute. Catches ~90% of accidental secret access.
- **Secret filtering** (PostToolUse) -- `.botsecrets` redacts secret *values* from tool output before they enter the LLM context. Catches the remaining cases regardless of how secrets appear.
---
The key insight: blocking a file is necessary but not sufficient. The agent can have read access to `.env` without secret values being revealed -- if the output is redacted before it reaches the model.
## Status: v0.2
## Quick Start
| Component | Status |
|-----------|--------|
| Library (`Policy::check`, `Policy::check_command`) | Done |
| `.botignore` walker (gitignore semantics) | Done |
| `botignore.toml` parser (read / write / bash namespaces) | Done |
| CLI: `fermata check` / `fermata hook` | Done |
| Claude Code PreToolUse adapter | Done |
| Claude Code PostToolUse adapter (output redaction) | Done |
| `.botsecrets` config parser | Done |
| Secret manifest discovery and loading | Done |
| Multi-format secret file parser (.env, TOML, YAML, JSON) | Done |
| `Redactor` (known-value Aho-Corasick replacement) | Done |
| `Scanner` (heuristic regex + gitleaks patterns) | Done |
Out of scope for v0.2: Codex / Gemini hook adapters, MCP proxy mode, audit log, filesystem watcher.
---
## Install
From source (this monorepo):
### Install
```bash
cargo install --path crates/dirigent_fermata --features cli
cargo install --path . --features cli
```
---
### Protect a project in 30 seconds
## Secret Filtering
```bash
# Block direct access to secret files
echo ".env" > .botignore
Fermata's secret filtering operates in three layers:
1. **Policy gate** (PreToolUse) -- `.botignore` blocks direct access to sensitive files. Catches ~90% of accidental reads.
2. **Known-value redaction** (PostToolUse) -- `.botsecrets` declares which files contain secrets. Fermata parses them, extracts values, and replaces them in all tool output using an Aho-Corasick automaton. Zero false negatives for declared secrets.
3. **Heuristic scanning** (PostToolUse) -- regex patterns derived from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs). Safety net for secrets not covered by the manifest.
### `.botsecrets` format
Create a `.botsecrets` file at your project root:
```toml
# Files that contain secrets -- fermata parses these and redacts values
# Declare where secrets live -- fermata parses them and redacts values
cat > .botsecrets << 'EOF'
[files]
patterns = [".env", ".env.*", "secrets.*"]
# Additional secret key names (built-in defaults cover *_KEY, *_SECRET, etc.)
[keys]
include = ["STRIPE_*", "MY_APP_SIGNING_*"]
# Heuristic scanning on all tool output
[heuristic]
enabled = true
EOF
```
That's the typical case. Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
### Wire into Claude Code
---
## Usage
### Claude Code hook configuration
Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
Add both hooks in `.claude/settings.json`:
```json
{
@@ -107,45 +63,43 @@ Add both PreToolUse and PostToolUse hooks in `.claude/settings.json`:
}
```
PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
That's it. PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
### Checking a path
## How It Works
```bash
fermata check --op read /path/to/.env
# exit 1 -- blocked
fermata interposes on every tool call in the agent's lifecycle:
fermata check --op write /path/to/src/main.rs
# exit 0 -- allowed
```
Agent wants to run a tool
|
PreToolUse ── fermata checks .botignore / botignore.toml
| blocked? → deny with reason
| allowed? ↓
Tool executes
|
PostToolUse ── fermata scans output for secret values
| found? → replace with ***** before LLM sees it
|
Clean output enters LLM context
```
### Library API
Three layers of defense, each independent:
```rust
use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
| Layer | Mechanism | What it catches |
|-------|-----------|-----------------|
| **Access control** | `.botignore` rules block tool calls by path | Direct reads/writes to sensitive files |
| **Known-value redaction** | `.botsecrets` declares secret files; fermata parses them and builds an Aho-Corasick automaton | Every occurrence of a declared secret value, in any tool output, regardless of source |
| **Heuristic detection** | Regex patterns from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs) | Secrets not covered by the manifest -- runtime-generated, unexpected locations |
// Load .botsecrets config and build the manifest
let config = SecretsConfig::load("/path/to/project")?;
let manifest = Manifest::discover(&config)?;
// Known-value redaction (Aho-Corasick, sub-millisecond)
let redactor = Redactor::from_manifest(&manifest);
let clean = redactor.redact("DB_PASSWORD=hunter2\nAPI_KEY=sk-abc123");
// -> "DB_PASSWORD=*****\nAPI_KEY=*****"
// Heuristic scanning (regex patterns)
let scanner = Scanner::new(&config);
let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
```
---
Performance: ~1-5ms per tool call. Cold start (loading config + parsing secret files) is ~10-20ms.
## Configuration
### `.botignore` -- access control
Three files, each optional, each solving a different problem:
Gitignore syntax. Blocks both reads and writes.
### `.botignore` -- the 80% case
Gitignore syntax. Blocks both reads and writes. Onboarding is one line.
```gitignore
.env
@@ -155,6 +109,8 @@ secrets/**
### `botignore.toml` -- per-operation rules
Separate namespaces so the same file can be readable but not writable:
```toml
[read]
patterns = [".env*", "secrets/**"]
@@ -168,24 +124,99 @@ deny = ["rm -rf /", "curl * | sh"]
### `.botsecrets` -- secret value redaction
See the Secret Filtering section above.
Declares which files contain secrets. fermata parses them, extracts values, and redacts every occurrence in tool output.
---
```toml
[files]
patterns = [".env", ".env.*", "secrets.*"]
## Architecture
[keys]
include = ["STRIPE_*", "MY_APP_SIGNING_*"]
Three concentric layers; nothing inner imports from anything outer:
[heuristic]
enabled = true
```
- **`core/`** -- harness-unaware, sync. Policy types, `.botignore` walker, `botignore.toml` parser, `Policy::check`.
- **`core/secrets/`** -- `.botsecrets` config, manifest discovery, multi-format parser, Aho-Corasick redactor, heuristic scanner.
- **`harness/`** -- `HarnessAdapter` trait for PreToolUse (policy gate) and PostToolUse (output redaction). Each adapter is feature-gated.
- **`bin/fermata.rs`** -- `clap`, stdio, and exit codes.
Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
---
See [docs/configuration.md](docs/configuration.md) for the full reference.
## See also
## Commands
- `docs/tools/fermata.md` -- Dirigent integration plan
- `docs/architecture/fermata-security-philosophy.md` -- security philosophy and the reveal triangle
- `docs/workpad/brainstorm/fermata.md` -- full product spec and field notes
- `docs/architecture/crates.md` -- crate dependency map
```bash
# Check if a path is allowed
fermata check --op read /path/to/.env # exit 1 = blocked
fermata check --op write src/main.rs # exit 0 = allowed
# Run as a hook (reads harness JSON from stdin)
fermata hook --harness claude
fermata hook --harness claude --event post-tool-use
```
See [docs/commands.md](docs/commands.md) for the full CLI reference.
## Library API
fermata is also a Rust library:
```rust
use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
// Load .botsecrets and build the redaction manifest
let config = SecretsConfig::load("/path/to/project")?;
let manifest = Manifest::discover(&config)?;
// Known-value redaction (Aho-Corasick, sub-millisecond)
let redactor = Redactor::from_manifest(&manifest);
let clean = redactor.redact("DB_PASSWORD=hunter2");
// -> "DB_PASSWORD=*****"
// Heuristic scanning (regex patterns)
let scanner = Scanner::new(&config);
let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
```
## Security Model
fermata addresses a novel security concern: **reveal** -- whether secret *values* enter the LLM context. Traditional file-level access control operates on file identity (which file). Secret redaction operates on data content (which values). The reveal problem can only be solved at the data-content level.
Read [docs/security-model.md](docs/security-model.md) for the full analysis, including the Reveal Triangle and defense-in-depth architecture.
## Threat Model
fermata is a heuristic guard, not a sandbox. It defends against statistical agent behavior and prompt-driven mistakes -- not a deliberate adversary. This is a strength: the threat model is well-defined, and the boundaries are documented honestly.
Read [docs/threat-model.md](docs/threat-model.md) for what fermata catches, what it doesn't, and what to combine it with.
## Harness Support
| Harness | Status | Mechanism |
|---------|--------|-----------|
| Claude Code | Shipped | PreToolUse + PostToolUse hooks |
| Codex CLI | Planned | Pre-exec hook adapter |
| Gemini CLI | Planned | MCP server mode |
| Any MCP agent | Planned | MCP proxy wrapping existing servers |
The policy engine and redaction logic are identical across all modes. Only the I/O adapter changes.
## Status
v0.2 -- policy gate and secret filtering engine are production-ready. All core components are implemented and tested:
- `.botignore` walker with gitignore semantics
- `botignore.toml` with read/write/bash namespaces
- Claude Code PreToolUse and PostToolUse adapters
- `.botsecrets` config, manifest discovery, multi-format parser (.env, TOML, YAML, JSON)
- Aho-Corasick known-value redactor
- Heuristic scanner with gitleaks-derived patterns
## The `.botsecrets` Vision
`.botsecrets` is designed to be the **`.gitignore` of AI agent security**: a simple, declarative, human-readable file that every project can drop in to protect its secrets from AI agents.
The format is harness-agnostic from day one. It declares *what* to protect, not *how*. The same `.botsecrets` works with Claude Code, Codex, Gemini, and any future harness that supports tool lifecycle hooks.
## License
Licensed under either of [Apache License, Version 2.0](LICENSE-APACHE) or [MIT License](LICENSE-MIT) at your option.