Files
dirigent/crates/dirigent_anth/CLAUDE.md
T
2026-05-08 01:59:04 +02:00

149 lines
5.8 KiB
Markdown

# Package: dirigent_anth
Claude Code JSONL session parser and toolkit.
## Quick Facts
- **Type**: Library
- **Main Entry**: src/lib.rs
- **Dependencies**: serde, serde_json, chrono, uuid, camino, thiserror, tracing, dirs
- **Status**: Core parsing complete — ready for downstream consumers
## Purpose
Reads Claude Code's local JSONL session storage (`~/.claude/projects/`) and produces typed, deduplicated, correlated Rust data structures. The types are the product — downstream consumers (archivist import, shell usage analyzers, session browsers) depend on these structs.
## Key Features
- **Session Discovery**: Scan `~/.claude/projects/` for all Claude Code projects and sessions
- **JSONL Parsing**: Lenient line-by-line parser that handles unknown fields and message types
- **Streaming Dedup**: Collapse streamed assistant messages to their final version
- **Tool Correlation**: ID-based pairing of tool_use → tool_result across parallel calls
- **Conversation Tree**: Reconstruct uuid/parentUuid threading with branch detection
- **Noise Classification**: Identify meta messages, warmup, interruptions, API errors
- **Sub-Agent Loading**: Recursive parsing of sub-agent JSONL with metadata
- **Timestamp Parsing**: Handle ISO 8601, Unix seconds, and Unix milliseconds
## Architecture
### Design Principles
1. **Types are the product** — Well-typed Rust structs that downstream consumers import
2. **Lenient parsing** — Unknown fields ignored, unknown message types logged and skipped
3. **Stream-oriented** — Line-by-line BufReader parsing, never loads entire files
4. **Sync-first** — File parsing is CPU-bound; no async overhead
5. **Cross-platform** — camino::Utf8PathBuf throughout for Windows/Unix compatibility
### Module Organization
- **`types.rs`** — All public data types (Content, ContentBlock, RawMessage variants, ToolCall, etc.)
- **`error.rs`** — AntError enum with I/O, JSON parse, home-not-found, invalid-path variants
- **`parser.rs`** — JSONL line parser and file parser with lenient error handling
- **`dedup.rs`** — Streaming deduplication of assistant messages by uuid
- **`correlation.rs`** — Tool call ↔ result pairing by tool_use_id
- **`tree.rs`** — Conversation tree from uuid/parentUuid relationships
- **`noise.rs`** — Noise pattern classification (meta, warmup, interruptions, etc.)
- **`discovery.rs`** — Filesystem scanning for Claude projects and sessions
- **`subagent.rs`** — Sub-agent JSONL and metadata loading
- **`util.rs`** — Timestamp parsing utilities
## Public API
### Quick Start
```rust
use dirigent_anth::{discover_claude_home, discover_projects, load_session};
// Discover all projects
let home = discover_claude_home()?;
let projects = discover_projects(&home)?;
// Load a session with full parsing
for project in &projects {
for session_ref in &project.sessions {
let session = load_session(session_ref)?;
println!("Messages: {}, Tools: {}, Subagents: {}",
session.messages.len(),
session.tool_exchanges.len(),
session.subagents.len());
}
}
```
### Key Functions
| Function | Purpose |
|----------|---------|
| `discover_claude_home()` | Find `~/.claude/` directory |
| `discover_projects(home)` | Scan for all project directories |
| `parse_session(path)` | Parse a JSONL file into messages |
| `parse_session_deduped(path)` | Parse with streaming dedup applied |
| `dedup_messages(msgs)` | Deduplicate streamed assistant messages |
| `correlate_tools(msgs)` | Pair tool calls with results by ID |
| `ConversationTree::build(msgs)` | Build conversation tree |
| `classify_noise(msg)` | Classify a message as noise |
| `load_subagents(dir)` | Load sub-agent sessions from artifacts |
| `load_session(ref)` | Full parse: dedup + correlate + tree + subagents |
| `parse_timestamp(value)` | Parse ISO/Unix timestamps |
## Data Model
### Claude Code JSONL Format
Each line in `~/.claude/projects/<encoded-path>/<session-uuid>.jsonl` is a JSON object with a `type` field discriminator. Five types: `user`, `assistant`, `progress`, `system`, `queue-operation`.
- **Outer wrapper**: camelCase fields (sessionId, parentUuid, isSidechain, gitBranch)
- **Inner message body**: snake_case fields (stop_reason, tool_use_id, is_error)
- **Content**: Either a plain string or array of typed content blocks
### Content Blocks
| Type | Fields |
|------|--------|
| text | `text` |
| tool_use | `id`, `name`, `input` |
| tool_result | `tool_use_id`, `content`, `is_error` |
| thinking | `thinking` |
| image | `source` |
Unknown content block types are silently dropped (lenient deserialization).
## Testing
```bash
cargo test --package dirigent_anth
```
Tests use synthetic JSONL fixtures in `tests/fixtures/`:
- `minimal_session.jsonl` — Basic session with all message types
- `streaming_dedup.jsonl` — Streaming dedup scenario
- `tool_correlation.jsonl` — Parallel and sequential tool calls
- `branching_tree.jsonl` — Conversation with branches
- `noise_patterns.jsonl` — All noise pattern types
- `subagent/` — Sub-agent session with parent and metadata
## Error Handling
- Individual unparseable JSONL lines are logged and skipped (lenient)
- I/O errors and missing directories are propagated as AntError
- Unknown message types are skipped via serde
- Unknown content blocks are silently filtered
## Related Packages
- **dirigent_archivist** — Future consumer for session import
- No current dependencies on other dirigent packages (standalone)
## Future Enhancements
- Bash command analysis module (shell usage analytics)
- Archivist event transform/import
- CLI tool with scan/analyze/import subcommands
- SQLite caching layer
- Watch mode for new session monitoring
## Documentation
- **Package README**: `./README.md` - User-facing overview
- **API Docs**: Run `cargo doc --package dirigent_anth --open`
- **Design Plan**: `docs/superpowers/plans/2026-03-23-dirigent-ant-design.md`