sync from monorepo @ 2452e92e
This commit is contained in:
@@ -0,0 +1,148 @@
|
||||
# Package: dirigent_anth
|
||||
|
||||
Claude Code JSONL session parser and toolkit.
|
||||
|
||||
## Quick Facts
|
||||
- **Type**: Library
|
||||
- **Main Entry**: src/lib.rs
|
||||
- **Dependencies**: serde, serde_json, chrono, uuid, camino, thiserror, tracing, dirs
|
||||
- **Status**: Core parsing complete — ready for downstream consumers
|
||||
|
||||
## Purpose
|
||||
|
||||
Reads Claude Code's local JSONL session storage (`~/.claude/projects/`) and produces typed, deduplicated, correlated Rust data structures. The types are the product — downstream consumers (archivist import, shell usage analyzers, session browsers) depend on these structs.
|
||||
|
||||
## Key Features
|
||||
|
||||
- **Session Discovery**: Scan `~/.claude/projects/` for all Claude Code projects and sessions
|
||||
- **JSONL Parsing**: Lenient line-by-line parser that handles unknown fields and message types
|
||||
- **Streaming Dedup**: Collapse streamed assistant messages to their final version
|
||||
- **Tool Correlation**: ID-based pairing of tool_use → tool_result across parallel calls
|
||||
- **Conversation Tree**: Reconstruct uuid/parentUuid threading with branch detection
|
||||
- **Noise Classification**: Identify meta messages, warmup, interruptions, API errors
|
||||
- **Sub-Agent Loading**: Recursive parsing of sub-agent JSONL with metadata
|
||||
- **Timestamp Parsing**: Handle ISO 8601, Unix seconds, and Unix milliseconds
|
||||
|
||||
## Architecture
|
||||
|
||||
### Design Principles
|
||||
|
||||
1. **Types are the product** — Well-typed Rust structs that downstream consumers import
|
||||
2. **Lenient parsing** — Unknown fields ignored, unknown message types logged and skipped
|
||||
3. **Stream-oriented** — Line-by-line BufReader parsing, never loads entire files
|
||||
4. **Sync-first** — File parsing is CPU-bound; no async overhead
|
||||
5. **Cross-platform** — camino::Utf8PathBuf throughout for Windows/Unix compatibility
|
||||
|
||||
### Module Organization
|
||||
|
||||
- **`types.rs`** — All public data types (Content, ContentBlock, RawMessage variants, ToolCall, etc.)
|
||||
- **`error.rs`** — AntError enum with I/O, JSON parse, home-not-found, invalid-path variants
|
||||
- **`parser.rs`** — JSONL line parser and file parser with lenient error handling
|
||||
- **`dedup.rs`** — Streaming deduplication of assistant messages by uuid
|
||||
- **`correlation.rs`** — Tool call ↔ result pairing by tool_use_id
|
||||
- **`tree.rs`** — Conversation tree from uuid/parentUuid relationships
|
||||
- **`noise.rs`** — Noise pattern classification (meta, warmup, interruptions, etc.)
|
||||
- **`discovery.rs`** — Filesystem scanning for Claude projects and sessions
|
||||
- **`subagent.rs`** — Sub-agent JSONL and metadata loading
|
||||
- **`util.rs`** — Timestamp parsing utilities
|
||||
|
||||
## Public API
|
||||
|
||||
### Quick Start
|
||||
|
||||
```rust
|
||||
use dirigent_anth::{discover_claude_home, discover_projects, load_session};
|
||||
|
||||
// Discover all projects
|
||||
let home = discover_claude_home()?;
|
||||
let projects = discover_projects(&home)?;
|
||||
|
||||
// Load a session with full parsing
|
||||
for project in &projects {
|
||||
for session_ref in &project.sessions {
|
||||
let session = load_session(session_ref)?;
|
||||
println!("Messages: {}, Tools: {}, Subagents: {}",
|
||||
session.messages.len(),
|
||||
session.tool_exchanges.len(),
|
||||
session.subagents.len());
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Key Functions
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `discover_claude_home()` | Find `~/.claude/` directory |
|
||||
| `discover_projects(home)` | Scan for all project directories |
|
||||
| `parse_session(path)` | Parse a JSONL file into messages |
|
||||
| `parse_session_deduped(path)` | Parse with streaming dedup applied |
|
||||
| `dedup_messages(msgs)` | Deduplicate streamed assistant messages |
|
||||
| `correlate_tools(msgs)` | Pair tool calls with results by ID |
|
||||
| `ConversationTree::build(msgs)` | Build conversation tree |
|
||||
| `classify_noise(msg)` | Classify a message as noise |
|
||||
| `load_subagents(dir)` | Load sub-agent sessions from artifacts |
|
||||
| `load_session(ref)` | Full parse: dedup + correlate + tree + subagents |
|
||||
| `parse_timestamp(value)` | Parse ISO/Unix timestamps |
|
||||
|
||||
## Data Model
|
||||
|
||||
### Claude Code JSONL Format
|
||||
|
||||
Each line in `~/.claude/projects/<encoded-path>/<session-uuid>.jsonl` is a JSON object with a `type` field discriminator. Five types: `user`, `assistant`, `progress`, `system`, `queue-operation`.
|
||||
|
||||
- **Outer wrapper**: camelCase fields (sessionId, parentUuid, isSidechain, gitBranch)
|
||||
- **Inner message body**: snake_case fields (stop_reason, tool_use_id, is_error)
|
||||
- **Content**: Either a plain string or array of typed content blocks
|
||||
|
||||
### Content Blocks
|
||||
|
||||
| Type | Fields |
|
||||
|------|--------|
|
||||
| text | `text` |
|
||||
| tool_use | `id`, `name`, `input` |
|
||||
| tool_result | `tool_use_id`, `content`, `is_error` |
|
||||
| thinking | `thinking` |
|
||||
| image | `source` |
|
||||
|
||||
Unknown content block types are silently dropped (lenient deserialization).
|
||||
|
||||
## Testing
|
||||
|
||||
```bash
|
||||
cargo test --package dirigent_anth
|
||||
```
|
||||
|
||||
Tests use synthetic JSONL fixtures in `tests/fixtures/`:
|
||||
- `minimal_session.jsonl` — Basic session with all message types
|
||||
- `streaming_dedup.jsonl` — Streaming dedup scenario
|
||||
- `tool_correlation.jsonl` — Parallel and sequential tool calls
|
||||
- `branching_tree.jsonl` — Conversation with branches
|
||||
- `noise_patterns.jsonl` — All noise pattern types
|
||||
- `subagent/` — Sub-agent session with parent and metadata
|
||||
|
||||
## Error Handling
|
||||
|
||||
- Individual unparseable JSONL lines are logged and skipped (lenient)
|
||||
- I/O errors and missing directories are propagated as AntError
|
||||
- Unknown message types are skipped via serde
|
||||
- Unknown content blocks are silently filtered
|
||||
|
||||
## Related Packages
|
||||
|
||||
- **dirigent_archivist** — Future consumer for session import
|
||||
- No current dependencies on other dirigent packages (standalone)
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Bash command analysis module (shell usage analytics)
|
||||
- Archivist event transform/import
|
||||
- CLI tool with scan/analyze/import subcommands
|
||||
- SQLite caching layer
|
||||
- Watch mode for new session monitoring
|
||||
|
||||
## Documentation
|
||||
|
||||
- **Package README**: `./README.md` - User-facing overview
|
||||
- **API Docs**: Run `cargo doc --package dirigent_anth --open`
|
||||
- **Design Plan**: `docs/superpowers/plans/2026-03-23-dirigent-ant-design.md`
|
||||
Reference in New Issue
Block a user