77520819f6
New user-friendly README modeled after sandcage's layout (Why / Quick Start / How It Works), plus four focused docs under docs/: - commands.md — full CLI reference with options, exit codes, examples - configuration.md — .botignore, botignore.toml, .botsecrets reference - security-model.md — the Reveal Triangle and defense-in-depth layers - threat-model.md — L0-L6 coverage, honest limitations, pairing guidance All Dirigent/monorepo internals stripped — ready for standalone export. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
223 lines
7.7 KiB
Markdown
223 lines
7.7 KiB
Markdown
# fermata
|
|
|
|
**A fast, harness-agnostic security layer for AI coding agents.**
|
|
|
|
AI coding agents read files, run commands, and inspect output as part of their normal workflow. When they read `.env`, secret values get tokenized into the LLM's context window -- and from there they can leak into commits, PR descriptions, log messages, or API calls. No AI coding agent ships built-in post-read secret filtering today. fermata fixes that.
|
|
|
|
## Why
|
|
|
|
Traditional security blocks the file and hopes the agent doesn't find the data through another path. This is insufficient -- secrets appear in shell output, log files, error messages, and indirect reads that bypass any access-control list.
|
|
|
|
fermata operates on two independent levels:
|
|
|
|
- **Policy gate** (PreToolUse) -- `.botignore` blocks reads, writes, and dangerous commands before they execute. Catches ~90% of accidental secret access.
|
|
- **Secret filtering** (PostToolUse) -- `.botsecrets` redacts secret *values* from tool output before they enter the LLM context. Catches the remaining cases regardless of how secrets appear.
|
|
|
|
The key insight: blocking a file is necessary but not sufficient. The agent can have read access to `.env` without secret values being revealed -- if the output is redacted before it reaches the model.
|
|
|
|
## Quick Start
|
|
|
|
### Install
|
|
|
|
```bash
|
|
cargo install --path . --features cli
|
|
```
|
|
|
|
### Protect a project in 30 seconds
|
|
|
|
```bash
|
|
# Block direct access to secret files
|
|
echo ".env" > .botignore
|
|
|
|
# Declare where secrets live -- fermata parses them and redacts values
|
|
cat > .botsecrets << 'EOF'
|
|
[files]
|
|
patterns = [".env", ".env.*", "secrets.*"]
|
|
EOF
|
|
```
|
|
|
|
### Wire into Claude Code
|
|
|
|
Add both hooks in `.claude/settings.json`:
|
|
|
|
```json
|
|
{
|
|
"hooks": {
|
|
"PreToolUse": [
|
|
{
|
|
"matcher": "Bash|Read|Edit|Write",
|
|
"hooks": [
|
|
{ "type": "command", "command": "fermata hook --harness claude" }
|
|
]
|
|
}
|
|
],
|
|
"PostToolUse": [
|
|
{
|
|
"matcher": "Bash|Read|Edit|Write",
|
|
"hooks": [
|
|
{ "type": "command", "command": "fermata hook --harness claude --event post-tool-use" }
|
|
]
|
|
}
|
|
]
|
|
}
|
|
}
|
|
```
|
|
|
|
That's it. PreToolUse blocks forbidden operations. PostToolUse redacts secret values from tool output before they reach the LLM.
|
|
|
|
## How It Works
|
|
|
|
fermata interposes on every tool call in the agent's lifecycle:
|
|
|
|
```
|
|
Agent wants to run a tool
|
|
|
|
|
PreToolUse ── fermata checks .botignore / botignore.toml
|
|
| blocked? → deny with reason
|
|
| allowed? ↓
|
|
Tool executes
|
|
|
|
|
PostToolUse ── fermata scans output for secret values
|
|
| found? → replace with ***** before LLM sees it
|
|
|
|
|
Clean output enters LLM context
|
|
```
|
|
|
|
Three layers of defense, each independent:
|
|
|
|
| Layer | Mechanism | What it catches |
|
|
|-------|-----------|-----------------|
|
|
| **Access control** | `.botignore` rules block tool calls by path | Direct reads/writes to sensitive files |
|
|
| **Known-value redaction** | `.botsecrets` declares secret files; fermata parses them and builds an Aho-Corasick automaton | Every occurrence of a declared secret value, in any tool output, regardless of source |
|
|
| **Heuristic detection** | Regex patterns from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs) | Secrets not covered by the manifest -- runtime-generated, unexpected locations |
|
|
|
|
Performance: ~1-5ms per tool call. Cold start (loading config + parsing secret files) is ~10-20ms.
|
|
|
|
## Configuration
|
|
|
|
Three files, each optional, each solving a different problem:
|
|
|
|
### `.botignore` -- the 80% case
|
|
|
|
Gitignore syntax. Blocks both reads and writes. Onboarding is one line.
|
|
|
|
```gitignore
|
|
.env
|
|
.env.*
|
|
secrets/**
|
|
```
|
|
|
|
### `botignore.toml` -- per-operation rules
|
|
|
|
Separate namespaces so the same file can be readable but not writable:
|
|
|
|
```toml
|
|
[read]
|
|
patterns = [".env*", "secrets/**"]
|
|
|
|
[write]
|
|
patterns = ["vendor/**", "*.lock"]
|
|
|
|
[bash]
|
|
deny = ["rm -rf /", "curl * | sh"]
|
|
```
|
|
|
|
### `.botsecrets` -- secret value redaction
|
|
|
|
Declares which files contain secrets. fermata parses them, extracts values, and redacts every occurrence in tool output.
|
|
|
|
```toml
|
|
[files]
|
|
patterns = [".env", ".env.*", "secrets.*"]
|
|
|
|
[keys]
|
|
include = ["STRIPE_*", "MY_APP_SIGNING_*"]
|
|
|
|
[heuristic]
|
|
enabled = true
|
|
```
|
|
|
|
Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
|
|
|
|
See [docs/configuration.md](docs/configuration.md) for the full reference.
|
|
|
|
## Commands
|
|
|
|
```bash
|
|
# Check if a path is allowed
|
|
fermata check --op read /path/to/.env # exit 1 = blocked
|
|
fermata check --op write src/main.rs # exit 0 = allowed
|
|
|
|
# Run as a hook (reads harness JSON from stdin)
|
|
fermata hook --harness claude
|
|
fermata hook --harness claude --event post-tool-use
|
|
```
|
|
|
|
See [docs/commands.md](docs/commands.md) for the full CLI reference.
|
|
|
|
## Library API
|
|
|
|
fermata is also a Rust library:
|
|
|
|
```rust
|
|
use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
|
|
|
|
// Load .botsecrets and build the redaction manifest
|
|
let config = SecretsConfig::load("/path/to/project")?;
|
|
let manifest = Manifest::discover(&config)?;
|
|
|
|
// Known-value redaction (Aho-Corasick, sub-millisecond)
|
|
let redactor = Redactor::from_manifest(&manifest);
|
|
let clean = redactor.redact("DB_PASSWORD=hunter2");
|
|
// -> "DB_PASSWORD=*****"
|
|
|
|
// Heuristic scanning (regex patterns)
|
|
let scanner = Scanner::new(&config);
|
|
let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
|
|
// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
|
|
```
|
|
|
|
## Security Model
|
|
|
|
fermata addresses a novel security concern: **reveal** -- whether secret *values* enter the LLM context. Traditional file-level access control operates on file identity (which file). Secret redaction operates on data content (which values). The reveal problem can only be solved at the data-content level.
|
|
|
|
Read [docs/security-model.md](docs/security-model.md) for the full analysis, including the Reveal Triangle and defense-in-depth architecture.
|
|
|
|
## Threat Model
|
|
|
|
fermata is a heuristic guard, not a sandbox. It defends against statistical agent behavior and prompt-driven mistakes -- not a deliberate adversary. This is a strength: the threat model is well-defined, and the boundaries are documented honestly.
|
|
|
|
Read [docs/threat-model.md](docs/threat-model.md) for what fermata catches, what it doesn't, and what to combine it with.
|
|
|
|
## Harness Support
|
|
|
|
| Harness | Status | Mechanism |
|
|
|---------|--------|-----------|
|
|
| Claude Code | Shipped | PreToolUse + PostToolUse hooks |
|
|
| Codex CLI | Planned | Pre-exec hook adapter |
|
|
| Gemini CLI | Planned | MCP server mode |
|
|
| Any MCP agent | Planned | MCP proxy wrapping existing servers |
|
|
|
|
The policy engine and redaction logic are identical across all modes. Only the I/O adapter changes.
|
|
|
|
## Status
|
|
|
|
v0.2 -- policy gate and secret filtering engine are production-ready. All core components are implemented and tested:
|
|
|
|
- `.botignore` walker with gitignore semantics
|
|
- `botignore.toml` with read/write/bash namespaces
|
|
- Claude Code PreToolUse and PostToolUse adapters
|
|
- `.botsecrets` config, manifest discovery, multi-format parser (.env, TOML, YAML, JSON)
|
|
- Aho-Corasick known-value redactor
|
|
- Heuristic scanner with gitleaks-derived patterns
|
|
|
|
## The `.botsecrets` Vision
|
|
|
|
`.botsecrets` is designed to be the **`.gitignore` of AI agent security**: a simple, declarative, human-readable file that every project can drop in to protect its secrets from AI agents.
|
|
|
|
The format is harness-agnostic from day one. It declares *what* to protect, not *how*. The same `.botsecrets` works with Claude Code, Codex, Gemini, and any future harness that supports tool lifecycle hooks.
|
|
|
|
## License
|
|
|
|
Licensed under either of [Apache License, Version 2.0](LICENSE-APACHE) or [MIT License](LICENSE-MIT) at your option.
|