🛰️ export standalone-repo assets (c86caab7)

This commit is contained in:
2026-05-29 18:20:19 +02:00
parent ed8bc3e5fd
commit 31fae4a84c
3 changed files with 205 additions and 188 deletions
-148
View File
@@ -1,148 +0,0 @@
# Package: dirigent_anth
Claude Code JSONL session parser and toolkit.
## Quick Facts
- **Type**: Library
- **Main Entry**: src/lib.rs
- **Dependencies**: serde, serde_json, chrono, uuid, camino, thiserror, tracing, dirs
- **Status**: Core parsing complete — ready for downstream consumers
## Purpose
Reads Claude Code's local JSONL session storage (`~/.claude/projects/`) and produces typed, deduplicated, correlated Rust data structures. The types are the product — downstream consumers (archivist import, shell usage analyzers, session browsers) depend on these structs.
## Key Features
- **Session Discovery**: Scan `~/.claude/projects/` for all Claude Code projects and sessions
- **JSONL Parsing**: Lenient line-by-line parser that handles unknown fields and message types
- **Streaming Dedup**: Collapse streamed assistant messages to their final version
- **Tool Correlation**: ID-based pairing of tool_use → tool_result across parallel calls
- **Conversation Tree**: Reconstruct uuid/parentUuid threading with branch detection
- **Noise Classification**: Identify meta messages, warmup, interruptions, API errors
- **Sub-Agent Loading**: Recursive parsing of sub-agent JSONL with metadata
- **Timestamp Parsing**: Handle ISO 8601, Unix seconds, and Unix milliseconds
## Architecture
### Design Principles
1. **Types are the product** — Well-typed Rust structs that downstream consumers import
2. **Lenient parsing** — Unknown fields ignored, unknown message types logged and skipped
3. **Stream-oriented** — Line-by-line BufReader parsing, never loads entire files
4. **Sync-first** — File parsing is CPU-bound; no async overhead
5. **Cross-platform** — camino::Utf8PathBuf throughout for Windows/Unix compatibility
### Module Organization
- **`types.rs`** — All public data types (Content, ContentBlock, RawMessage variants, ToolCall, etc.)
- **`error.rs`** — AntError enum with I/O, JSON parse, home-not-found, invalid-path variants
- **`parser.rs`** — JSONL line parser and file parser with lenient error handling
- **`dedup.rs`** — Streaming deduplication of assistant messages by uuid
- **`correlation.rs`** — Tool call ↔ result pairing by tool_use_id
- **`tree.rs`** — Conversation tree from uuid/parentUuid relationships
- **`noise.rs`** — Noise pattern classification (meta, warmup, interruptions, etc.)
- **`discovery.rs`** — Filesystem scanning for Claude projects and sessions
- **`subagent.rs`** — Sub-agent JSONL and metadata loading
- **`util.rs`** — Timestamp parsing utilities
## Public API
### Quick Start
```rust
use dirigent_anth::{discover_claude_home, discover_projects, load_session};
// Discover all projects
let home = discover_claude_home()?;
let projects = discover_projects(&home)?;
// Load a session with full parsing
for project in &projects {
for session_ref in &project.sessions {
let session = load_session(session_ref)?;
println!("Messages: {}, Tools: {}, Subagents: {}",
session.messages.len(),
session.tool_exchanges.len(),
session.subagents.len());
}
}
```
### Key Functions
| Function | Purpose |
|----------|---------|
| `discover_claude_home()` | Find `~/.claude/` directory |
| `discover_projects(home)` | Scan for all project directories |
| `parse_session(path)` | Parse a JSONL file into messages |
| `parse_session_deduped(path)` | Parse with streaming dedup applied |
| `dedup_messages(msgs)` | Deduplicate streamed assistant messages |
| `correlate_tools(msgs)` | Pair tool calls with results by ID |
| `ConversationTree::build(msgs)` | Build conversation tree |
| `classify_noise(msg)` | Classify a message as noise |
| `load_subagents(dir)` | Load sub-agent sessions from artifacts |
| `load_session(ref)` | Full parse: dedup + correlate + tree + subagents |
| `parse_timestamp(value)` | Parse ISO/Unix timestamps |
## Data Model
### Claude Code JSONL Format
Each line in `~/.claude/projects/<encoded-path>/<session-uuid>.jsonl` is a JSON object with a `type` field discriminator. Five types: `user`, `assistant`, `progress`, `system`, `queue-operation`.
- **Outer wrapper**: camelCase fields (sessionId, parentUuid, isSidechain, gitBranch)
- **Inner message body**: snake_case fields (stop_reason, tool_use_id, is_error)
- **Content**: Either a plain string or array of typed content blocks
### Content Blocks
| Type | Fields |
|------|--------|
| text | `text` |
| tool_use | `id`, `name`, `input` |
| tool_result | `tool_use_id`, `content`, `is_error` |
| thinking | `thinking` |
| image | `source` |
Unknown content block types are silently dropped (lenient deserialization).
## Testing
```bash
cargo test --package dirigent_anth
```
Tests use synthetic JSONL fixtures in `tests/fixtures/`:
- `minimal_session.jsonl` — Basic session with all message types
- `streaming_dedup.jsonl` — Streaming dedup scenario
- `tool_correlation.jsonl` — Parallel and sequential tool calls
- `branching_tree.jsonl` — Conversation with branches
- `noise_patterns.jsonl` — All noise pattern types
- `subagent/` — Sub-agent session with parent and metadata
## Error Handling
- Individual unparseable JSONL lines are logged and skipped (lenient)
- I/O errors and missing directories are propagated as AntError
- Unknown message types are skipped via serde
- Unknown content blocks are silently filtered
## Related Packages
- **dirigent_archivist** — Future consumer for session import
- No current dependencies on other dirigent packages (standalone)
## Future Enhancements
- Bash command analysis module (shell usage analytics)
- Archivist event transform/import
- CLI tool with scan/analyze/import subcommands
- SQLite caching layer
- Watch mode for new session monitoring
## Documentation
- **Package README**: `./README.md` - User-facing overview
- **API Docs**: Run `cargo doc --package dirigent_anth --open`
- **Design Plan**: `docs/superpowers/plans/2026-03-23-dirigent-ant-design.md`
+37 -40
View File
@@ -1,40 +1,37 @@
[package] [package]
name = "dirigent_anth" name = "dirigent_anth"
version = "0.1.0" version = "0.1.0"
edition = "2021" edition = "2021"
[lib] [lib]
path = "src/lib.rs" path = "src/lib.rs"
[[bin]] [[bin]]
name = "anth_bear" name = "anth_bear"
path = "src/bin/anth.rs" path = "src/bin/anth.rs"
[[bin]] [[bin]]
name = "anth_usage" name = "anth_usage"
path = "src/bin/anth_usage.rs" path = "src/bin/anth_usage.rs"
[features] [features]
default = [] default = []
dirigent-paths = ["dep:dirigent_config"] dirigent-paths = ["dep:dirigent_config"]
[dependencies] [dependencies]
serde = { version = "1.0", features = ["derive"] } serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0" serde_json = "1.0"
chrono = { version = "0.4", features = ["serde"] } chrono = { version = "0.4", features = ["serde"] }
chrono-tz = "0.10" chrono-tz = "0.10"
uuid = { version = "1.11", features = ["serde"] } uuid = { version = "1.11", features = ["serde"] }
camino = { version = "1.1", features = ["serde1"] } camino = { version = "1.1", features = ["serde1"] }
dirs = "6.0" dirs = "6.0"
thiserror = "2.0" thiserror = "2.0"
tracing = "0.1" tracing = "0.1"
regex = "1" regex = "1"
portable-pty = "0.8" portable-pty = "0.8"
vt100 = "0.15" vt100 = "0.15"
dirigent_config = { path = "../dirigent_config", optional = true } dirigent_config = { path = "../dirigent_config", optional = true }
[dev-dependencies] [dev-dependencies]
tempfile = "3.0" tempfile = "3.0"
[lints]
workspace = true
+168
View File
@@ -0,0 +1,168 @@
# 𝄋 anth
**Independent tools for working with Anthropic product data.**
anth is a Rust toolkit for reading and analyzing data produced by Anthropic's AI products. It ships two command-line tools and a library for building your own.
> [!CAUTION]
> **Alpha software.** anth is under active development and not fully battle-tested. APIs may change between releases. Use at your own risk.
>
> **Not affiliated with Anthropic.** This is an independent, unofficial project. *Claude* and *Claude Code* are trademarks of Anthropic, PBC. anth is not endorsed by, sponsored by, or affiliated with Anthropic in any way.
---
## Tools
### `anth_usage` — token usage capture `beta`
Launches Claude Code in a pseudo-terminal, sends the `/usage` command, and captures the output as structured data. Useful for tracking costs, monitoring consumption, or feeding usage metrics into dashboards.
**Requires [Claude Code](https://docs.anthropic.com/en/docs/claude-code) installed and available as `claude` on your PATH.**
```bash
# Default: JSON output of token usage
anth_usage
# Raw terminal output (for debugging)
anth_usage --raw
# Specify a working directory
anth_usage --workdir /path/to/project
# Use current directory
anth_usage --cwd
```
| Flag | Description |
|------|-------------|
| `--debug` | Enable debug logging to stderr |
| `--raw` | Output raw screen content instead of parsed JSON |
| `--no-trust` | Fail if Claude Code prompts for folder trust (instead of auto-confirming) |
| `--workdir <path>` | Working directory for Claude Code |
| `--cwd` | Use current working directory |
How it works: anth_usage spawns a PTY, waits for Claude Code's UI to render, handles the folder trust prompt if needed, sends `/usage`, and parses the terminal output via a vt100 emulator. The result is structured JSON with token counts and cost data.
Install:
```bash
cargo install --git https://git.g4b.org/dirigence/dirigent_anth --bin anth_usage
```
---
### `anth_bear` — session inspector `beta`
Browse and search Claude Code's local session storage (`~/.claude/projects/`). Validates parsing, searches message content, and reports aggregate statistics.
```bash
# Validate all sessions parse correctly
anth_bear validate
# Search user and assistant messages
anth_bear search "database migration"
# Aggregate statistics — tool usage, message counts
anth_bear stats
```
| Command | Description |
|---------|-------------|
| `validate` | Parse all sessions, report errors, count messages/tools/subagents |
| `search <query>` | Case-insensitive search across user and assistant messages |
| `stats` | Aggregate statistics: project counts, tool usage rankings (top 15) |
Install:
```bash
cargo install --git https://git.g4b.org/dirigence/dirigent_anth --bin anth_bear
```
---
## Library `production`
The tools above are built on the anth library. Use it to build your own session browsers, usage analyzers, and archival tooling.
Claude Code stores every session as JSONL on the local filesystem. Each line is a message — user input, assistant response, tool call, tool result, meta event. The format is lenient (fields come and go between releases) and noisy (streamed assistant messages produce many partial lines that should collapse to one).
anth reads those files and gives you back well-typed structs:
- **Discovery** — scan `~/.claude/projects/` for projects and sessions
- **Lenient parsing** — line-by-line `BufReader`, unknown fields ignored, format drift tolerated
- **Streaming dedup** — collapse streamed assistant messages by `uuid` to their final version
- **Tool correlation** — pair `tool_use``tool_result` by `tool_use_id`, including parallel calls
- **Conversation tree** — reconstruct `uuid` / `parentUuid` threading, with branch detection
- **Noise classification** — flag meta messages, warmup, interruptions, API errors
- **Sub-agent loading** — recursive parsing of sub-agent JSONL with metadata
- **Cross-platform paths** — `camino::Utf8PathBuf` throughout for Windows + Unix
### Quick start
```rust
use dirigent_anth::{discover_claude_home, discover_projects, load_session};
let home = discover_claude_home()?;
let projects = discover_projects(&home)?;
for project in &projects {
for session in &project.sessions {
let parsed = load_session(&session.path)?;
println!("{}: {} messages, {} tool calls",
session.id,
parsed.messages.len(),
parsed.tool_calls.len());
}
}
```
### Library dependency
```toml
[dependencies]
dirigent_anth = { git = "https://git.g4b.org/dirigence/dirigent_anth" }
```
### Modules
| Module | Purpose |
|--------|---------|
| `types` | Public data types — `Content`, `ContentBlock`, `RawMessage` variants, `ToolCall` |
| `parser` | JSONL line and file parser with lenient error handling |
| `dedup` | Streaming deduplication of assistant messages by uuid |
| `correlation` | Tool call ↔ result pairing by `tool_use_id` |
| `tree` | Conversation tree from uuid / parentUuid relationships |
| `noise` | Meta / warmup / interruption / API-error classification |
| `discovery` | Filesystem scanning for Claude projects and sessions |
| `subagent` | Sub-agent JSONL and metadata loading |
| `util` | Timestamp parsing — ISO 8601, Unix seconds, Unix milliseconds |
---
## Design principles
1. **Types are the product.** Downstream consumers depend on the structs, not the binaries.
2. **Lenient parsing.** Unknown fields are ignored; unknown message types are logged and skipped. Claude Code format changes between releases — anth keeps reading.
3. **Stream-oriented.** Line-by-line `BufReader`. Sessions can be large; nothing is loaded whole.
4. **Sync-first.** File parsing is CPU-bound. No async overhead.
5. **Cross-platform.** `camino::Utf8PathBuf` for Windows / Unix parity.
---
## About this repository
This is a downstream mirror. anth is developed inside the upstream
[Dirigent](https://git.g4b.org/dirigence/dirigent) monorepo and exported here
for standalone distribution. Issues and pull requests are accepted on the
`develop` branch, but canonical development happens upstream.
---
## License
Licensed under either of
- Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE))
- MIT License ([LICENSE-MIT](LICENSE-MIT))
at your option.