🛰️ export standalone-repo assets (c86caab7)

2026-05-29 18:19:22 +02:00
parent 168aefd415
commit ebd5abeac4
5 changed files with 288 additions and 219 deletions
@@ -1,35 +1,60 @@
-# fermata
+# 𝄐 fermata

-**A fast, harness-agnostic security layer for AI coding agents.**
+**The security layer for AI coding agents.**

-AI coding agents read files, run commands, and inspect output as part of their normal workflow. When they read `.env`, secret values get tokenized into the LLM's context window -- and from there they can leak into commits, PR descriptions, log messages, or API calls. The solution is not blocking the read -- the agent needs to see config structure and key names to reason about your project. The solution is **redacting secret values from the output before they reach the model**. No AI coding agent ships built-in post-read secret filtering today. fermata fixes that.
+AI coding agents read files, run shell commands, and inspect output as part of normal work. When they read `.env`, the secret values get tokenized into the LLM's context window. From there, they can leak into commits, PR descriptions, or API calls the agent makes. The secret is irrecoverably revealed.

-## Why
+fermata sits between the agent and its tools. It blocks operations that shouldn't happen, and scrubs secret values from the output of operations that should.

-Blocking reads is the wrong approach. The agent needs to see file structure. It needs to know which keys exist in `.env`, what your database config looks like, how your secrets are organized. What it does *not* need to see is the actual secret values. An agent can have full read access to `.env` without secret values being revealed -- if the output is redacted before it reaches the model.
+> [!CAUTION]
+> **Alpha software.** Fermata is functional and in daily use by the author, but not widely tested across diverse environments. The core library and Claude Code hook adapters are production-grade; other features are earlier in maturity. Expect rough edges and breaking changes.

-fermata operates on two independent levels:
+---

- **Secret filtering** (PostToolUse) -- `.botsecrets` declares where secrets live; fermata parses them, builds an Aho-Corasick automaton, and redacts secret *values* from tool output before they enter the LLM context. This is the primary defense. It catches secrets regardless of how they appear -- direct reads, shell output, log files, error messages.
- **Policy gate** (PreToolUse) -- `.botsecrets [policy]` / `.botignore` blocks dangerous writes and destructive commands before they execute. Supplementary protection for write safety and anti-jailbreak.
+## The Problem

-The key insight: file-level access control operates on file identity (*which file*). Secret redaction operates on data content (*which values*). The reveal problem can only be solved at the data-content level.
+Traditional security blocks the file. But secrets also appear in shell output, log files, error messages, environment variable dumps, and indirect reads that bypass any access-control list.

-> **Note:** fermata also accepts `fermata.toml` as an alias for `.botsecrets` (same format, `.botsecrets` takes priority when both exist).
+<p align="center">
+  <img src="threat-landscape.svg" alt="Where secrets leak from — blocking the file is necessary but not sufficient" width="720">
+</p>
+
+The actual concern is not "can the agent open this file?" but "do secret *values* enter the LLM context?" An agent can have read access to `.env` without the secret values being revealed — if the output is redacted before it reaches the model.
+
+---
+
+## How It Works
+
+fermata interposes on the tool lifecycle at two points:
+
+<p align="center">
+  <img src="interception-flow.svg" alt="How fermata intercepts — PreToolUse blocks, PostToolUse redacts" width="720">
+</p>
+
+**PreToolUse** — Before the tool executes, fermata checks `.botsecrets [policy]` and `.botignore` rules against the operation. A blocked write never happens. A blocked command never runs. Most harnesses already handle basic file blocking, but fermata catches stragglers and works in permissive/yolo modes too.
+
+**PostToolUse** — After the tool executes, fermata scans the output for secret values. Declared secrets (loaded from files matched by `.botsecrets`) are replaced using an Aho-Corasick automaton — zero false negatives, sub-millisecond. A secondary heuristic scan catches undeclared secrets that match known formats (AWS keys, JWTs, GitHub PATs, database URLs). This is the primary defense layer.
+
+This means `source .env && echo $DB_PASSWORD` is caught even though no file read was blocked — the secret value itself is scrubbed from the output before the LLM ever sees it.
+
+---

 ## Quick Start

 ### Install

 ```bash
-cargo install --path . --features cli
+cargo install --git https://git.g4b.org/dirigence/fermata --features cli
 ```

-### Protect a project in 30 seconds
+Requires a working [Rust toolchain](https://rustup.rs).

-```bash
-# Declare where secrets live -- fermata parses them and redacts values from agent output
-cat > .botsecrets << 'EOF'
+### Protect a project
+
+Create a `.botsecrets` file at your project root — the primary (and usually only) config you need:
+
+```toml
+# .botsecrets
 [files]
 patterns = [".env", ".env.*", "secrets.*"]

@@ -38,10 +63,21 @@ patterns = [".claude/**", "vendor/**", "*.lock"]

 [policy.bash]
 deny = ["rm -rf /", "curl * | sh"]
-EOF
 ```

-One file. The agent can read `.env` freely -- fermata redacts the secret values from the output before they reach the model. Write protection and bash safety rules live in the same `.botsecrets` under `[policy]`.
+One file. The agent can read `.env` freely — fermata redacts the secret values from the output before they reach the model. Write protection and bash safety rules live in the same `.botsecrets` under `[policy]`.
+
+fermata ships with built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, and ~25 more) that cover the common cases automatically.
+
+> **Note:** fermata also accepts `fermata.toml` as an alias for `.botsecrets` (same format, `.botsecrets` takes priority when both exist).
+
+Optionally, add a `.botignore` for simple path blocking using gitignore syntax:
+
+```gitignore
+# .botignore (optional — complements .botsecrets)
+vendor/
+*.lock
+```

 ### Wire into Claude Code

@@ -70,153 +106,89 @@ Add both hooks in `.claude/settings.json`:
 }
 ```

-That's it. PostToolUse redacts secret values from tool output before they reach the LLM. PreToolUse blocks forbidden writes and dangerous commands.
+PostToolUse redacts secret values from output before they reach the LLM. PreToolUse blocks forbidden writes and dangerous commands.

-## How It Works
+---

-fermata interposes on every tool call in the agent's lifecycle:
+## What Fermata Does Not Do

-```
-Agent wants to run a tool
-        |
-   PreToolUse ── .botsecrets [policy] / .botignore
-        |            write blocked? → deny
-        |            bash denied? → deny
-        |            otherwise → allow (including reads of .env!)
-        |
-   Tool executes
-        |
-   PostToolUse ── .botsecrets [files] + [keys] + [heuristic]
-        |            secret values found? → redact before LLM sees it
-        |
-   Clean output enters LLM context
-```
+fermata is a heuristic guard, not a sandbox. It defends against statistical agent behavior — the unguided LLM reaching for `.env`, the overly-broad glob, the stray `cat` of a credential file. It does not defend against a deliberate adversary trying to escape the box.

-Three layers of defense, each independent:
+Things fermata cannot catch:

-| Layer | Mechanism | What it catches |
-|-------|-----------|-----------------|
-| **Known-value redaction** | `.botsecrets` declares secret files; fermata parses them and builds an Aho-Corasick automaton | Every occurrence of a declared secret value, in any tool output, regardless of source |
-| **Heuristic detection** | Regex patterns from gitleaks detect undeclared secrets (AWS keys, JWTs, GitHub PATs, database URLs) | Secrets not covered by the manifest -- runtime-generated, unexpected locations |
-| **Access control** | `.botsecrets [policy]` / `.botignore` rules block writes and dangerous commands | Destructive writes, anti-jailbreak (agent modifying its own hooks), dangerous shell commands |
+- **Network exfiltration** — an agent sending secrets via `curl` or `git push`. Use network-level controls (firewall, container networking) for this.
+- **Kernel-level file access** — a process bypassing tool hooks entirely. Use container isolation or a sandbox for hard filesystem boundaries.
+- **Character-by-character reconstruction** — an adversarial agent reassembling a secret across multiple tool calls.

-Performance: ~1-5ms per tool call. Cold start (loading config + parsing secret files) is ~10-20ms.
+These are honest boundaries, not future promises. See [docs/threat-model.md](docs/threat-model.md) for the full analysis.
+
+---

 ## Configuration

-### `.botsecrets` -- the primary (and usually only) config
+**`.botsecrets`** is the primary configuration file. It declares which files contain secrets (`[files]`), how to redact them (`[redaction]`), and optionally embeds access-control policy (`[policy.write]`, `[policy.bash]`, `[policy.read]`). Most projects need only this file. `.botsecrets` can do everything `.botignore` can and more.

-`.botsecrets` is the unified configuration file. It declares both what to redact and what to restrict:
+**`.botignore`** uses gitignore syntax to block reads and writes. Useful for monorepo subtree exclusion or teams that prefer gitignore syntax for simple path blocking. Complements `.botsecrets` but is not required.

-```toml
-[files]
-patterns = [".env", ".env.*", "secrets.*"]
+See [docs/configuration.md](docs/configuration.md) for the full reference with examples.

-[keys]
-include = ["STRIPE_*", "MY_APP_SIGNING_*"]
+---

-[heuristic]
-enabled = true
+## Status

-# Access control: write protection and bash safety.
-# Reading secret-containing files is allowed -- Layer 1 redacts the values.
+v0.2 — secret filtering engine and policy gate are production-ready:

-[policy.write]
-patterns = [".claude/**", "vendor/**", "*.lock"]
+| Component | Status | Maturity |
+|-----------|--------|----------|
+| `.botsecrets` config + `[policy]` section | Done | production |
+| `.botignore` walker (gitignore semantics) | Done | production |
+| Known-value redactor (Aho-Corasick) | Done | production |
+| Heuristic scanner (gitleaks-derived patterns) | Done | production |
+| Multi-format secret parser (.env, TOML, YAML, JSON) | Done | production |
+| Claude Code PreToolUse + PostToolUse adapters | Done | production |
+| CLI: `fermata check` and `fermata hook` | Done | production |

-[policy.bash]
-deny = ["rm -rf /", "curl * | sh"]
-```
+Out of scope for v0.2: Codex / Gemini hook adapters, MCP server mode, audit log, filesystem watcher.

-Built-in key patterns (`*_KEY`, `*_SECRET`, `*_PASSWORD`, `*_TOKEN`, `DATABASE_URL`, etc.) handle most projects without custom configuration.
-
-### `.botignore` -- optional simple layer
-
-Gitignore syntax. For projects that want a minimal, familiar format for write protection. Complements `.botsecrets` but is not required.
-
-```gitignore
-vendor/**
-*.lock
-```
-
-See [docs/configuration.md](docs/configuration.md) for the full reference.
-
-## Commands
-
-```bash
-# Check if a path is allowed
-fermata check --op read /path/to/.env     # exit 1 = blocked
-fermata check --op write src/main.rs       # exit 0 = allowed
-
-# Run as a hook (reads harness JSON from stdin)
-fermata hook --harness claude
-fermata hook --harness claude --event post-tool-use
-```
-
-See [docs/commands.md](docs/commands.md) for the full CLI reference.
-
-## Library API
-
-fermata is also a Rust library:
-
-```rust
-use dirigent_fermata::core::secrets::{Manifest, Redactor, Scanner, SecretsConfig};
-
-// Load .botsecrets and build the redaction manifest
-let config = SecretsConfig::load("/path/to/project")?;
-let manifest = Manifest::discover(&config)?;
-
-// Known-value redaction (Aho-Corasick, sub-millisecond)
-let redactor = Redactor::from_manifest(&manifest);
-let clean = redactor.redact("DB_PASSWORD=hunter2");
-// -> "DB_PASSWORD=*****"
-
-// Heuristic scanning (regex patterns)
-let scanner = Scanner::new(&config);
-let findings = scanner.scan("Found key: AKIA1234567890ABCDEF");
-// -> [Finding { pattern: "AWS Access Key", confidence: High, .. }]
-```
-
-## Security Model
-
-fermata addresses a novel security concern: **reveal** -- whether secret *values* enter the LLM context. Traditional file-level access control operates on file identity (which file). Secret redaction operates on data content (which values). The reveal problem can only be solved at the data-content level.
-
-Read [docs/security-model.md](docs/security-model.md) for the full analysis, including the Reveal Triangle and defense-in-depth architecture.
-
-## Threat Model
-
-fermata is a heuristic guard, not a sandbox. It defends against statistical agent behavior and prompt-driven mistakes -- not a deliberate adversary. This is a strength: the threat model is well-defined, and the boundaries are documented honestly.
-
-Read [docs/threat-model.md](docs/threat-model.md) for what fermata catches, what it doesn't, and what to combine it with.
+---

 ## Harness Support

 | Harness | Status | Mechanism |
 |---------|--------|-----------|
 | Claude Code | Shipped | PreToolUse + PostToolUse hooks |
-| Codex CLI | Planned | Pre-exec hook adapter |
+| Codex CLI | Planned | Hook adapter |
 | Gemini CLI | Planned | MCP server mode |
 | Any MCP agent | Planned | MCP proxy wrapping existing servers |
+| Any shell-based hook | Supported | CLI exit codes |

 The policy engine and redaction logic are identical across all modes. Only the I/O adapter changes.

-## Status
+---

-v0.2 -- secret filtering engine and policy gate are production-ready. All core components are implemented and tested:
+## Background

- `.botsecrets` config with `[files]`, `[keys]`, `[heuristic]`, and `[policy]` sections
- Aho-Corasick known-value redactor
- Heuristic scanner with gitleaks-derived patterns
- Manifest discovery, multi-format parser (.env, TOML, YAML, JSON)
- Claude Code PreToolUse and PostToolUse adapters
- `.botignore` walker with gitignore semantics
+fermata addresses a novel security concern — **reveal**: whether secret *values* enter the LLM context, independent of whether the agent can open a file. This distinction (file identity vs. data content) is explored in:

-## The `.botsecrets` Vision
+- [docs/security-model.md](docs/security-model.md) — the Reveal Triangle and defense-in-depth architecture
+- [docs/threat-model.md](docs/threat-model.md) — what fermata catches at each detection level, and where it stops
+- [docs/commands.md](docs/commands.md) — full CLI reference

-`.botsecrets` is designed to be the **`.gitignore` of AI agent security**: a simple, declarative, human-readable file that every project can drop in to protect its secrets from AI agents.
+The `.botsecrets` format is designed to be the **`.gitignore` of AI agent security**: a simple, declarative, harness-agnostic file that every project can drop in. The portable sections (`[files]`, `[keys]`, `[redaction]`, `[heuristic]`) declare *what* to protect; the `[policy]` section adds fermata-specific access control.

-The format is harness-agnostic from day one. It declares *what* to protect, not *how*. One file covers both redaction (`[files]`, `[keys]`, `[heuristic]`) and access control (`[policy]`). The same `.botsecrets` works with Claude Code, Codex, Gemini, and any future harness that supports tool lifecycle hooks.
+---
+
+## Part of Dirigent
+
+Fermata is the security subsystem of [Dirigent](https://git.g4b.org/dirigence/dirigent), a multi-agent orchestration platform. It is developed in the upstream monorepo and exported here for standalone use — no other Dirigent component is required.
+
+---

 ## License

-Licensed under either of [Apache License, Version 2.0](LICENSE-APACHE) or [MIT License](LICENSE-MIT) at your option.
+Licensed under either of
+
+- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
+- MIT License ([LICENSE-MIT](LICENSE-MIT))
+
+at your option.