5.8 KiB
5.8 KiB
Package: dirigent_anth
Claude Code JSONL session parser and toolkit.
Quick Facts
- Type: Library
- Main Entry: src/lib.rs
- Dependencies: serde, serde_json, chrono, uuid, camino, thiserror, tracing, dirs
- Status: Core parsing complete — ready for downstream consumers
Purpose
Reads Claude Code's local JSONL session storage (~/.claude/projects/) and produces typed, deduplicated, correlated Rust data structures. The types are the product — downstream consumers (archivist import, shell usage analyzers, session browsers) depend on these structs.
Key Features
- Session Discovery: Scan
~/.claude/projects/for all Claude Code projects and sessions - JSONL Parsing: Lenient line-by-line parser that handles unknown fields and message types
- Streaming Dedup: Collapse streamed assistant messages to their final version
- Tool Correlation: ID-based pairing of tool_use → tool_result across parallel calls
- Conversation Tree: Reconstruct uuid/parentUuid threading with branch detection
- Noise Classification: Identify meta messages, warmup, interruptions, API errors
- Sub-Agent Loading: Recursive parsing of sub-agent JSONL with metadata
- Timestamp Parsing: Handle ISO 8601, Unix seconds, and Unix milliseconds
Architecture
Design Principles
- Types are the product — Well-typed Rust structs that downstream consumers import
- Lenient parsing — Unknown fields ignored, unknown message types logged and skipped
- Stream-oriented — Line-by-line BufReader parsing, never loads entire files
- Sync-first — File parsing is CPU-bound; no async overhead
- Cross-platform — camino::Utf8PathBuf throughout for Windows/Unix compatibility
Module Organization
types.rs— All public data types (Content, ContentBlock, RawMessage variants, ToolCall, etc.)error.rs— AntError enum with I/O, JSON parse, home-not-found, invalid-path variantsparser.rs— JSONL line parser and file parser with lenient error handlingdedup.rs— Streaming deduplication of assistant messages by uuidcorrelation.rs— Tool call ↔ result pairing by tool_use_idtree.rs— Conversation tree from uuid/parentUuid relationshipsnoise.rs— Noise pattern classification (meta, warmup, interruptions, etc.)discovery.rs— Filesystem scanning for Claude projects and sessionssubagent.rs— Sub-agent JSONL and metadata loadingutil.rs— Timestamp parsing utilities
Public API
Quick Start
use dirigent_anth::{discover_claude_home, discover_projects, load_session};
// Discover all projects
let home = discover_claude_home()?;
let projects = discover_projects(&home)?;
// Load a session with full parsing
for project in &projects {
for session_ref in &project.sessions {
let session = load_session(session_ref)?;
println!("Messages: {}, Tools: {}, Subagents: {}",
session.messages.len(),
session.tool_exchanges.len(),
session.subagents.len());
}
}
Key Functions
| Function | Purpose |
|---|---|
discover_claude_home() |
Find ~/.claude/ directory |
discover_projects(home) |
Scan for all project directories |
parse_session(path) |
Parse a JSONL file into messages |
parse_session_deduped(path) |
Parse with streaming dedup applied |
dedup_messages(msgs) |
Deduplicate streamed assistant messages |
correlate_tools(msgs) |
Pair tool calls with results by ID |
ConversationTree::build(msgs) |
Build conversation tree |
classify_noise(msg) |
Classify a message as noise |
load_subagents(dir) |
Load sub-agent sessions from artifacts |
load_session(ref) |
Full parse: dedup + correlate + tree + subagents |
parse_timestamp(value) |
Parse ISO/Unix timestamps |
Data Model
Claude Code JSONL Format
Each line in ~/.claude/projects/<encoded-path>/<session-uuid>.jsonl is a JSON object with a type field discriminator. Five types: user, assistant, progress, system, queue-operation.
- Outer wrapper: camelCase fields (sessionId, parentUuid, isSidechain, gitBranch)
- Inner message body: snake_case fields (stop_reason, tool_use_id, is_error)
- Content: Either a plain string or array of typed content blocks
Content Blocks
| Type | Fields |
|---|---|
| text | text |
| tool_use | id, name, input |
| tool_result | tool_use_id, content, is_error |
| thinking | thinking |
| image | source |
Unknown content block types are silently dropped (lenient deserialization).
Testing
cargo test --package dirigent_anth
Tests use synthetic JSONL fixtures in tests/fixtures/:
minimal_session.jsonl— Basic session with all message typesstreaming_dedup.jsonl— Streaming dedup scenariotool_correlation.jsonl— Parallel and sequential tool callsbranching_tree.jsonl— Conversation with branchesnoise_patterns.jsonl— All noise pattern typessubagent/— Sub-agent session with parent and metadata
Error Handling
- Individual unparseable JSONL lines are logged and skipped (lenient)
- I/O errors and missing directories are propagated as AntError
- Unknown message types are skipped via serde
- Unknown content blocks are silently filtered
Related Packages
- dirigent_archivist — Future consumer for session import
- No current dependencies on other dirigent packages (standalone)
Future Enhancements
- Bash command analysis module (shell usage analytics)
- Archivist event transform/import
- CLI tool with scan/analyze/import subcommands
- SQLite caching layer
- Watch mode for new session monitoring
Documentation
- Package README:
./README.md- User-facing overview - API Docs: Run
cargo doc --package dirigent_anth --open - Design Plan:
docs/superpowers/plans/2026-03-23-dirigent-ant-design.md