# Package: dirigent_anth Claude Code JSONL session parser and toolkit. ## Quick Facts - **Type**: Library - **Main Entry**: src/lib.rs - **Dependencies**: serde, serde_json, chrono, uuid, camino, thiserror, tracing, dirs - **Status**: Core parsing complete — ready for downstream consumers ## Purpose Reads Claude Code's local JSONL session storage (`~/.claude/projects/`) and produces typed, deduplicated, correlated Rust data structures. The types are the product — downstream consumers (archivist import, shell usage analyzers, session browsers) depend on these structs. ## Key Features - **Session Discovery**: Scan `~/.claude/projects/` for all Claude Code projects and sessions - **JSONL Parsing**: Lenient line-by-line parser that handles unknown fields and message types - **Streaming Dedup**: Collapse streamed assistant messages to their final version - **Tool Correlation**: ID-based pairing of tool_use → tool_result across parallel calls - **Conversation Tree**: Reconstruct uuid/parentUuid threading with branch detection - **Noise Classification**: Identify meta messages, warmup, interruptions, API errors - **Sub-Agent Loading**: Recursive parsing of sub-agent JSONL with metadata - **Timestamp Parsing**: Handle ISO 8601, Unix seconds, and Unix milliseconds ## Architecture ### Design Principles 1. **Types are the product** — Well-typed Rust structs that downstream consumers import 2. **Lenient parsing** — Unknown fields ignored, unknown message types logged and skipped 3. **Stream-oriented** — Line-by-line BufReader parsing, never loads entire files 4. **Sync-first** — File parsing is CPU-bound; no async overhead 5. **Cross-platform** — camino::Utf8PathBuf throughout for Windows/Unix compatibility ### Module Organization - **`types.rs`** — All public data types (Content, ContentBlock, RawMessage variants, ToolCall, etc.) - **`error.rs`** — AntError enum with I/O, JSON parse, home-not-found, invalid-path variants - **`parser.rs`** — JSONL line parser and file parser with lenient error handling - **`dedup.rs`** — Streaming deduplication of assistant messages by uuid - **`correlation.rs`** — Tool call ↔ result pairing by tool_use_id - **`tree.rs`** — Conversation tree from uuid/parentUuid relationships - **`noise.rs`** — Noise pattern classification (meta, warmup, interruptions, etc.) - **`discovery.rs`** — Filesystem scanning for Claude projects and sessions - **`subagent.rs`** — Sub-agent JSONL and metadata loading - **`util.rs`** — Timestamp parsing utilities ## Public API ### Quick Start ```rust use dirigent_anth::{discover_claude_home, discover_projects, load_session}; // Discover all projects let home = discover_claude_home()?; let projects = discover_projects(&home)?; // Load a session with full parsing for project in &projects { for session_ref in &project.sessions { let session = load_session(session_ref)?; println!("Messages: {}, Tools: {}, Subagents: {}", session.messages.len(), session.tool_exchanges.len(), session.subagents.len()); } } ``` ### Key Functions | Function | Purpose | |----------|---------| | `discover_claude_home()` | Find `~/.claude/` directory | | `discover_projects(home)` | Scan for all project directories | | `parse_session(path)` | Parse a JSONL file into messages | | `parse_session_deduped(path)` | Parse with streaming dedup applied | | `dedup_messages(msgs)` | Deduplicate streamed assistant messages | | `correlate_tools(msgs)` | Pair tool calls with results by ID | | `ConversationTree::build(msgs)` | Build conversation tree | | `classify_noise(msg)` | Classify a message as noise | | `load_subagents(dir)` | Load sub-agent sessions from artifacts | | `load_session(ref)` | Full parse: dedup + correlate + tree + subagents | | `parse_timestamp(value)` | Parse ISO/Unix timestamps | ## Data Model ### Claude Code JSONL Format Each line in `~/.claude/projects//.jsonl` is a JSON object with a `type` field discriminator. Five types: `user`, `assistant`, `progress`, `system`, `queue-operation`. - **Outer wrapper**: camelCase fields (sessionId, parentUuid, isSidechain, gitBranch) - **Inner message body**: snake_case fields (stop_reason, tool_use_id, is_error) - **Content**: Either a plain string or array of typed content blocks ### Content Blocks | Type | Fields | |------|--------| | text | `text` | | tool_use | `id`, `name`, `input` | | tool_result | `tool_use_id`, `content`, `is_error` | | thinking | `thinking` | | image | `source` | Unknown content block types are silently dropped (lenient deserialization). ## Testing ```bash cargo test --package dirigent_anth ``` Tests use synthetic JSONL fixtures in `tests/fixtures/`: - `minimal_session.jsonl` — Basic session with all message types - `streaming_dedup.jsonl` — Streaming dedup scenario - `tool_correlation.jsonl` — Parallel and sequential tool calls - `branching_tree.jsonl` — Conversation with branches - `noise_patterns.jsonl` — All noise pattern types - `subagent/` — Sub-agent session with parent and metadata ## Error Handling - Individual unparseable JSONL lines are logged and skipped (lenient) - I/O errors and missing directories are propagated as AntError - Unknown message types are skipped via serde - Unknown content blocks are silently filtered ## Related Packages - **dirigent_archivist** — Future consumer for session import - No current dependencies on other dirigent packages (standalone) ## Future Enhancements - Bash command analysis module (shell usage analytics) - Archivist event transform/import - CLI tool with scan/analyze/import subcommands - SQLite caching layer - Watch mode for new session monitoring ## Documentation - **Package README**: `./README.md` - User-facing overview - **API Docs**: Run `cargo doc --package dirigent_anth --open` - **Design Plan**: `docs/superpowers/plans/2026-03-23-dirigent-ant-design.md`