ed8bc3e5fd
Rename the Claude Code session parser crate from dirigent_ant to dirigent_anth. Binary targets renamed: ant → anth_bear, ant_usage → anth_usage. Module claude_usage renamed to anth_usage throughout. Also normalizes CRLF → LF line endings across touched files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
149 lines
5.8 KiB
Markdown
149 lines
5.8 KiB
Markdown
# Package: dirigent_anth
|
|
|
|
Claude Code JSONL session parser and toolkit.
|
|
|
|
## Quick Facts
|
|
- **Type**: Library
|
|
- **Main Entry**: src/lib.rs
|
|
- **Dependencies**: serde, serde_json, chrono, uuid, camino, thiserror, tracing, dirs
|
|
- **Status**: Core parsing complete — ready for downstream consumers
|
|
|
|
## Purpose
|
|
|
|
Reads Claude Code's local JSONL session storage (`~/.claude/projects/`) and produces typed, deduplicated, correlated Rust data structures. The types are the product — downstream consumers (archivist import, shell usage analyzers, session browsers) depend on these structs.
|
|
|
|
## Key Features
|
|
|
|
- **Session Discovery**: Scan `~/.claude/projects/` for all Claude Code projects and sessions
|
|
- **JSONL Parsing**: Lenient line-by-line parser that handles unknown fields and message types
|
|
- **Streaming Dedup**: Collapse streamed assistant messages to their final version
|
|
- **Tool Correlation**: ID-based pairing of tool_use → tool_result across parallel calls
|
|
- **Conversation Tree**: Reconstruct uuid/parentUuid threading with branch detection
|
|
- **Noise Classification**: Identify meta messages, warmup, interruptions, API errors
|
|
- **Sub-Agent Loading**: Recursive parsing of sub-agent JSONL with metadata
|
|
- **Timestamp Parsing**: Handle ISO 8601, Unix seconds, and Unix milliseconds
|
|
|
|
## Architecture
|
|
|
|
### Design Principles
|
|
|
|
1. **Types are the product** — Well-typed Rust structs that downstream consumers import
|
|
2. **Lenient parsing** — Unknown fields ignored, unknown message types logged and skipped
|
|
3. **Stream-oriented** — Line-by-line BufReader parsing, never loads entire files
|
|
4. **Sync-first** — File parsing is CPU-bound; no async overhead
|
|
5. **Cross-platform** — camino::Utf8PathBuf throughout for Windows/Unix compatibility
|
|
|
|
### Module Organization
|
|
|
|
- **`types.rs`** — All public data types (Content, ContentBlock, RawMessage variants, ToolCall, etc.)
|
|
- **`error.rs`** — AntError enum with I/O, JSON parse, home-not-found, invalid-path variants
|
|
- **`parser.rs`** — JSONL line parser and file parser with lenient error handling
|
|
- **`dedup.rs`** — Streaming deduplication of assistant messages by uuid
|
|
- **`correlation.rs`** — Tool call ↔ result pairing by tool_use_id
|
|
- **`tree.rs`** — Conversation tree from uuid/parentUuid relationships
|
|
- **`noise.rs`** — Noise pattern classification (meta, warmup, interruptions, etc.)
|
|
- **`discovery.rs`** — Filesystem scanning for Claude projects and sessions
|
|
- **`subagent.rs`** — Sub-agent JSONL and metadata loading
|
|
- **`util.rs`** — Timestamp parsing utilities
|
|
|
|
## Public API
|
|
|
|
### Quick Start
|
|
|
|
```rust
|
|
use dirigent_anth::{discover_claude_home, discover_projects, load_session};
|
|
|
|
// Discover all projects
|
|
let home = discover_claude_home()?;
|
|
let projects = discover_projects(&home)?;
|
|
|
|
// Load a session with full parsing
|
|
for project in &projects {
|
|
for session_ref in &project.sessions {
|
|
let session = load_session(session_ref)?;
|
|
println!("Messages: {}, Tools: {}, Subagents: {}",
|
|
session.messages.len(),
|
|
session.tool_exchanges.len(),
|
|
session.subagents.len());
|
|
}
|
|
}
|
|
```
|
|
|
|
### Key Functions
|
|
|
|
| Function | Purpose |
|
|
|----------|---------|
|
|
| `discover_claude_home()` | Find `~/.claude/` directory |
|
|
| `discover_projects(home)` | Scan for all project directories |
|
|
| `parse_session(path)` | Parse a JSONL file into messages |
|
|
| `parse_session_deduped(path)` | Parse with streaming dedup applied |
|
|
| `dedup_messages(msgs)` | Deduplicate streamed assistant messages |
|
|
| `correlate_tools(msgs)` | Pair tool calls with results by ID |
|
|
| `ConversationTree::build(msgs)` | Build conversation tree |
|
|
| `classify_noise(msg)` | Classify a message as noise |
|
|
| `load_subagents(dir)` | Load sub-agent sessions from artifacts |
|
|
| `load_session(ref)` | Full parse: dedup + correlate + tree + subagents |
|
|
| `parse_timestamp(value)` | Parse ISO/Unix timestamps |
|
|
|
|
## Data Model
|
|
|
|
### Claude Code JSONL Format
|
|
|
|
Each line in `~/.claude/projects/<encoded-path>/<session-uuid>.jsonl` is a JSON object with a `type` field discriminator. Five types: `user`, `assistant`, `progress`, `system`, `queue-operation`.
|
|
|
|
- **Outer wrapper**: camelCase fields (sessionId, parentUuid, isSidechain, gitBranch)
|
|
- **Inner message body**: snake_case fields (stop_reason, tool_use_id, is_error)
|
|
- **Content**: Either a plain string or array of typed content blocks
|
|
|
|
### Content Blocks
|
|
|
|
| Type | Fields |
|
|
|------|--------|
|
|
| text | `text` |
|
|
| tool_use | `id`, `name`, `input` |
|
|
| tool_result | `tool_use_id`, `content`, `is_error` |
|
|
| thinking | `thinking` |
|
|
| image | `source` |
|
|
|
|
Unknown content block types are silently dropped (lenient deserialization).
|
|
|
|
## Testing
|
|
|
|
```bash
|
|
cargo test --package dirigent_anth
|
|
```
|
|
|
|
Tests use synthetic JSONL fixtures in `tests/fixtures/`:
|
|
- `minimal_session.jsonl` — Basic session with all message types
|
|
- `streaming_dedup.jsonl` — Streaming dedup scenario
|
|
- `tool_correlation.jsonl` — Parallel and sequential tool calls
|
|
- `branching_tree.jsonl` — Conversation with branches
|
|
- `noise_patterns.jsonl` — All noise pattern types
|
|
- `subagent/` — Sub-agent session with parent and metadata
|
|
|
|
## Error Handling
|
|
|
|
- Individual unparseable JSONL lines are logged and skipped (lenient)
|
|
- I/O errors and missing directories are propagated as AntError
|
|
- Unknown message types are skipped via serde
|
|
- Unknown content blocks are silently filtered
|
|
|
|
## Related Packages
|
|
|
|
- **dirigent_archivist** — Future consumer for session import
|
|
- No current dependencies on other dirigent packages (standalone)
|
|
|
|
## Future Enhancements
|
|
|
|
- Bash command analysis module (shell usage analytics)
|
|
- Archivist event transform/import
|
|
- CLI tool with scan/analyze/import subcommands
|
|
- SQLite caching layer
|
|
- Watch mode for new session monitoring
|
|
|
|
## Documentation
|
|
|
|
- **Package README**: `./README.md` - User-facing overview
|
|
- **API Docs**: Run `cargo doc --package dirigent_anth --open`
|
|
- **Design Plan**: `docs/superpowers/plans/2026-03-23-dirigent-ant-design.md`
|