31 KiB
Package: dirigent_archivist
Persistent storage for all agentic interactions in Dirigent.
Quick Facts
- Type: Library
- Main Entry: src/lib.rs
- Dependencies: dirigent_protocol, uuid, chrono, serde, tokio, tracing, thiserror, sha2, hex, async-trait
- Status: Complete - Production ready with comprehensive tests
Purpose
The Archivist provides file-based archival storage for all session data, messages, and attachments in Dirigent. It implements an archive-first architecture with connector API fallback, using NDJSON, JSON, and TSV formats for durability and human-readability.
Key Features
- File-based Storage: NDJSON for messages, JSON for metadata, TSV for indices
- Content-Addressable Files: SHA-256 based storage for attachments with automatic deduplication
- Session Lineage: Track splits, continuations, and mutations with parent references
- Connector Registry: Coordinate UID assignment across connectors with collision detection
- Event Streaming: Real-time updates via EventHandler subscribing to dirigent_protocol events
- Archive-First Design: Read from archive first, fall back to connector API when needed
- Caching: In-memory caching of connector and session mappings for performance
Architecture
The Archivist is built on three core architectural principles:
1. Archive-First Read Strategy
The Archivist is the primary source of truth for historical data:
- UI and APIs query the archive first
- Only fall back to connector APIs if data is not in archive
- This enables offline access and consistent history across restarts
2. Write-Through Event Capture (Append-Only)
The EventHandler subscribes to the global event stream from dirigent_core:
- Captures session creation, message streaming, and tool calls in real-time
- Uses MessageAccumulator to assemble streaming chunks into complete messages
- Writes complete messages to archive immediately upon finalization
- No polling required - fully event-driven
- Append-only writes: Messages are appended as events arrive, NOT in chronological order
- File order reflects event timing, not message timestamps
3. File-Based Storage with Sort-on-Read
All data is stored in human-readable, grep-able formats:
- NDJSON (Newline-Delimited JSON): Incremental append-only logs for messages and mappings
- JSON: Structured metadata for sessions and connectors
- TSV (Tab-Separated Values): Fast indices for cross-references
- Content-Addressed Files: Binary attachments stored by SHA-256 hash for deduplication
- Sort-on-Read:
get_messages()sorts by timestamp and message_id to ensure chronological order despite append-only writes
Backend Trait Layer (Phase 2)
The archivist uses a trait-based backend abstraction. ArchiveBackend
defines the mandatory session and message primitives every backend must
provide, plus as_xxx() accessors returning optional sub-traits:
SearchBackend— reserved for Phase 3+ indexed backends (not wired)DagBackend— session lineage DAG edgesMetaEventsBackend— ACP connection lifecycle eventsConnectorRegistryBackend— per-archive connector metadataSessionMappingBackend— native↔scroll session ID mapping
JsonlBackend is the Phase 2 concrete implementation (file-based
NDJSON/JSON/TSV) and opts into every sub-trait except SearchBackend
(content search continues to be served by ripgrep via
crates/api/src/archivist/search_task.rs).
The Archivist struct (in src/coordinator/) owns a registry of backends
keyed by archive name and performs orchestration (alias detection, session
lineage, move/copy, DAG walks, archive lifecycle). Consumers hold
Arc<Archivist> directly — the coordinator is concrete, not a trait.
See docs/plans/2026-04-18-archivist-phase2-design.md for design rationale.
Multi-Backend Registry (Phase 3)
The coordinator (Archivist) holds Vec<Arc<ArchiveRegistration>> sorted
by read_priority instead of a flat HashMap<name, Arc<dyn ArchiveBackend>>.
Each registration carries:
backend: Arc<dyn ArchiveBackend>+ its declared capabilitiesfailure_mode:Required(must succeed) |BestEffort(errors log + drift health)read_priority: lower = tried first for reads; also selects the default write target when no archive is namedwrite_active: participates in fanout writesenabled: kill-switch without removing configwrite_policy:Inline(default;awaitper call) orQueued(mpsc + batch_window + overflow policy)- Runtime state:
last_health,last_error,consecutive_failures(allArc<RwLock<_>>, shared with the writer task when queued) - Optional
writer: Option<WriterHandle>(Some iffwrite_policy = Queued)
Backends are declared in dirigent.toml under [[archives]] and
constructed at boot via Archivist::from_config(cfg, &BackendRegistry).
Add a new backend type by implementing BackendFactory and registering
it on the BackendRegistry before from_config.
Reads
get_session, get_messages_paged, count_messages, get_meta_events,
get_children, etc. walk the registry in priority order via
read_walk_per_session(scroll_id, predicate, op). The predicate
capability-filters; Unavailable backends are skipped. The first backend
that returns Some(value) wins and its name is cached against the
scroll_id in a positive LRU (capacity 10_000). Subsequent reads for the
same scroll_id short-circuit to the cached backend before falling back
to the full priority walk.
Collection-shape reads (list_sessions_paged, list_connectors,
list_meta_sessions, find_meta_session_by_client) use
read_walk_collection — first enabled backend that can answer wins, no
cache, no aggregation across backends. Phase 3 explicitly defers
cross-backend merge/dedup to a later phase.
Writes
Mutating methods (append_messages, register_session, update_session_*,
append_meta_events, append_dag_edge, clear_session_messages,
update_connector_fingerprint) resolve a primary (per-call archive: Some(name) override or the default-write target) and fan out to every
other enabled && write_active backend that has the required capability.
Capability-mismatched backends are skipped with a debug capability_skip
log (never an error). Required failures propagate to the caller;
BestEffort failures log + drift health.
register_connector currently does NOT fan out — alias detection + the
tri-state Accepted/Aliased/Rejected return shape make replication
non-trivial. Fanout for connectors is deferred; single-backend setups are
unaffected.
For write_policy = Queued backends, the primary/secondary write paths
enqueue a WriteOp into the backend's writer task instead of awaiting.
Errors drift the backend's health but do not propagate to the caller.
Coalescing merges consecutive AppendMessages/AppendMetaEvents for the
same scroll_id within batch_window_ms.
Cross-backend operations
delete_session(scroll_id, _)fans out to every enabled backend that has the session. Copies inwrite_active=falsebackends produceArchivistError::DeleteOnReadOnlyBackend(write-active copies are still deleted); cache invalidated regardless of outcome.copy_session(scroll_id, from, to)reads fromfrom, writes toto, including DAG and meta-events when both sides have the capability. The source remains canonical (the cache is NOT rewritten).move_session(scroll_id, from, to)iscopy + delete-from-source. If the source-side delete fails after the copy succeeded,ArchivistError::PartialMove { copied_to, delete_error }is returned so the caller knows the session now lives in both places.
The Phase 2 connector-aware move_session(scroll_id, target_connector_uid, _)
and copy_session(scroll_id, target_connector_uid, _) survived the Phase
3 rename as move_session_to_connector / copy_session_to_connector.
Their bulk variant is move_sessions_to_connector.
Health
HealthStatus drifts on every coordinator call that observes a backend:
- Successful write →
Healthy;consecutive_failuresreset to 0. - Successful read →
Healthy(only rescuesDegraded; does not reset the counter). - Write failure →
Degraded { reason };consecutive_failures += 1; after K = 5 consecutive failures drifts toUnavailable { reason }. Reads skipUnavailablebackends; writes against anUnavailableRequiredbackend fail, while writes against anUnavailableBestEffortbackend are still attempted. - Read failure alone never drifts past
Degraded; writes are the authoritative health signal.
list_archives_with_health() returns a Vec<ArchiveStatus> snapshot of
every registration: name, type, capabilities, health, last_error, and
queue_depth (for queued backends).
Lifecycle
Phase 3 is startup-only. add_archive / remove_archive /
set_default_archive on the coordinator return
ArchivistError::DynamicRegistryUnsupported. To change the registry,
edit dirigent.toml and restart the server. Archivist::shutdown()
drains queued writer tasks (sends WriteOp::Shutdown on each writer's
mpsc and awaits ack); call it before process exit.
Test-only constructors Archivist::from_registrations(regs) and
SessionMetadata::stub(scroll_id) live under #[cfg(any(test, feature = "test-utils"))] for integration tests that bypass the factory.
See docs/plans/2026-04-19-archivist-phase3-design.md for the full
design rationale, and examples/multi_backend.rs for a runnable
end-to-end example.
Module Organization
Core Modules
lib.rs: Public API surface and re-exportstypes.rs: Core data structures (session metadata, message records, connector info, API types)error.rs: Error types and Result alias for archivist operations
Backend Layer (backend/)
traits.rs:ArchiveBackendtrait + 5 optional sub-traitscapability.rs:ArchiveCapabilityenum +CapabilitySettypehealth.rs:HealthStatusenum returned byhealth_checkcontract.rs: Reusable behavioral tests for any&dyn ArchiveBackend(cfg-gated)mock.rs: In-memoryMockBackendfor coordinator unit tests (cfg-gated)
Concrete Backends (backends/)
jsonl/: The file-basedJsonlBackend— the only Phase 2 backend. Reusesstorage/primitives for NDJSON/JSON/TSV operations.
Coordinator (coordinator/)
mod.rs: TheArchiviststruct + constructorsarchives.rs: Archive lifecycle (add/remove/list/default)connectors.rs: Connector registration + alias detectionsessions.rs: Session registration, metadata updates, move/copymeta.rs: Meta events, DAG walks, cleanup
Storage Layer (storage/)
Low-level file I/O primitives used by JsonlBackend. All storage operations are async and use tokio.
paths.rs: ArchivePaths utility for consistent directory structure and path resolutionndjson.rs: Newline-delimited JSON operations (read_ndjson, append_ndjson)json.rs: JSON operations (read_json, write_json)tsv.rs: Tab-separated value operations for connector indexfiles.rs: Content-addressable file storage with SHA-256 hashing and deduplication
Supporting Modules
registry.rs: Archive registry persistence (multi-archive metadata)migration.rs: Single-archive → multi-archive migration pathsession.rs: Session lineage types shared across layersaccumulator.rs: MessageAccumulator for assembling streaming message chunksbackfill.rs: Backfill helpers for importing historical sessionsimport/: External conversation importers (e.g. Claude export)
Events
events.rs: EventHandler for subscribing to dirigent_protocol events and archiving them
Configuration
The Archivist archive root is determined by DirigentPaths resolution:
- Set
DIRIGENT_DATA_DIRto override the data directory; archives will be stored at<data_dir>/archives/ - Defaults to
~/.local/share/dirigent/archives/(or platform equivalent)
DIRIGENT_DATA_DIR=/path/to/data dx serve
Archive Structure
dirigent_archive/
├── .contexts/
│ └── {scroll_id:uuidv7}/ # One directory per session
│ ├── session.json # Session metadata
│ ├── messages.jsonl # Incremental message log (.ndjson also supported)
│ └── lineage.json # Session lineage info (optional)
├── .db/
│ └── connectors/
│ ├── index.tsv # Fast connector lookup (TSV)
│ └── {connector_uid}/
│ ├── connector.json # Connector metadata
│ └── sessions.jsonl # Session mappings (.ndjson also supported)
└── .files/
└── {sha256-hash} # Content-addressable file storage
Why Hidden Directories?
The .contexts, .db, and .files directories are hidden (prefixed with .) to keep the archive root clean for future rendered outputs (like chat.md exports). This is similar to how .git hides implementation details in a codebase.
File Formats
Session Metadata (session.json)
{
"version": 1,
"scroll_id": "01936e8f-e5a7-7000-8000-000000000001",
"created_at": "2025-01-01T12:00:00Z",
"updated_at": "2025-01-01T12:30:00Z",
"title": "Implement user authentication",
"connector_uid": "01936e8f-e5a7-7000-8000-000000000002",
"native_session_id": "abc123",
"agent_id": null,
"parent_scroll_id": null,
"continuation": null,
"tags": ["backend", "auth"],
"metadata": {
"source": "OpenCode",
"model": "claude-3-5-sonnet"
}
}
Messages Log (messages.jsonl)
One JSON object per line, append-only:
{"version":1,"message_id":"01936e8f-e5a7-7000-8000-000000000003","session":"01936e8f-e5a7-7000-8000-000000000001","parent_id":null,"ts":"2025-01-01T12:01:00Z","role":"user","author":"alice","content_md":"How do I implement JWT auth?","attachments":[],"metadata":{}}
{"version":1,"message_id":"01936e8f-e5a7-7000-8000-000000000004","session":"01936e8f-e5a7-7000-8000-000000000001","parent_id":"01936e8f-e5a7-7000-8000-000000000003","ts":"2025-01-01T12:01:10Z","role":"assistant","author":"claude","content_md":"Here's how to implement JWT authentication...","attachments":[],"metadata":{"model":"claude-3-5-sonnet"}}
IMPORTANT - Ordering: The order of lines in the message log file (messages.jsonl or messages.ndjson) reflects event arrival order, NOT chronological order. Assistant replies often arrive after subsequent user messages due to streaming latency, resulting in non-chronological file order. Always use the Archivist::get_messages() API to retrieve messages, which sorts by ts (timestamp) and message_id (UUIDv7) to guarantee chronological order.
File Format Compatibility: The archivist supports both .ndjson and .jsonl file extensions for newline-delimited JSON files. When reading, .jsonl is preferred if present, with automatic fallback to .ndjson for backward compatibility. Write operations use .jsonl (canonical format). Both formats are identical in content - the difference is purely the file extension.
Connector Index (index.tsv)
Tab-separated values with header row:
connector_uid type title client_native_id alias_of created_at
01936e8f-e5a7-7000-8000-000000000002 OpenCode OpenCode Local opencode@http://localhost:12225 2025-01-01T12:00:00Z
Session Mappings (sessions.jsonl)
Maps native session IDs from connectors to scroll IDs in the archive:
{"version":1,"connector_uid":"01936e8f-e5a7-7000-8000-000000000002","native_session_id":"abc123","scroll_id":"01936e8f-e5a7-7000-8000-000000000001","created_at":"2025-01-01T12:00:00Z","alias_of":null}
Message Ordering Guarantees
The Problem: Append Order ≠ Chronological Order
In the event-driven architecture, messages are written to the message log file (messages.jsonl) as completion events arrive. Due to streaming latency:
- User messages complete nearly instantly and are written immediately
- Assistant messages stream over time and complete later
- A second user message can be written before the first assistant reply completes
Example scenario:
T0: User sends "tell me a joke about snakes" (ts=18:23:36.947)
T1: Assistant starts streaming reply (ts=18:23:36.969)
T2: User sends "now one about tigers" (ts=18:23:49.429) <- completes and writes BEFORE assistant finishes
T3: Assistant finishes "snakes" reply <- writes AFTER "tigers" user message
File order in the message log file:
1. user "snakes" (18:23:36.947)
2. user "tigers" (18:23:49.429) <- written second
3. assistant "snakes" (18:23:36.969) <- written third, but timestamp is earlier!
The Solution: Sort-on-Read
The Archivist::get_messages() implementation sorts messages before returning:
- Primary sort:
ts(timestamp) ascending - Secondary sort:
message_id(UUIDv7) ascending for stable tie-breaking
This guarantees chronological order regardless of NDJSON append order:
1. user "snakes" (18:23:36.947)
2. assistant "snakes" (18:23:36.969)
3. user "tigers" (18:23:49.429)
Why This Approach?
- Maintains durability: Append-only writes preserve crash safety
- No migration needed: Existing archives work without rewrites
- Simple implementation: No buffered writes or complex write-time ordering
- Performance trade-off: Small CPU cost on read (sorting) vs. complex write-time coordination
Consumer Guidance
- DO: Use
Archivist::get_messages()to retrieve messages - DON'T: Read the message log file directly and assume file order = chronological order
- UI/API: Always sort by
tsthenmessage_idfor defense in depth - Tie-breaking: Use
message_id(UUIDv7) as secondary sort for stable ordering when timestamps match
Key Types
SessionMetadata
Stores all metadata about a session including:
- scroll_id: UUIDv7 identifier for the session
- connector_uid: Which connector owns this session
- native_session_id: Original session ID from the connector (optional)
- title: Optional human-readable session title (see Title Management below)
- parent_scroll_id: For session lineage (splits, continuations)
- continuation: Type of continuation (SPLIT, COMPACT, REFERENCE, EDIT)
- tags: User-defined categorization
- metadata: Free-form JSON for connector-specific fields
Title Management
Session titles are fully supported and persist across restarts. Titles are stored in the SessionMetadata struct and saved to the session.json file.
Setting Titles:
// Update title for an existing session
archivist.update_session_metadata(
scroll_id,
Some("My Custom Session Title".to_string()),
None, // model
None // archive
).await?;
Default Behavior:
- New sessions can specify an initial title during registration
- If no title is provided, sessions default to
None - The UI typically displays "Untitled" for sessions without titles
Title Loading:
- Titles are automatically loaded when retrieving session metadata via
get_session_metadata() - Session lists include titles via
list_sessions()andlist_sessions_all() - Titles are part of the
SessionMetadatastruct returned by all session queries
UI Integration:
- The web UI displays session titles in the session list and sidebar
- Users can rename sessions via the "Rename" button in the session list view
- Renaming calls
api::archivist::rename_session()which usesupdate_session_metadata() - Title changes are persisted immediately and survive application restarts
MessageRecord
Represents a single message in the archive:
- message_id: UUIDv7 identifier
- session: scroll_id this message belongs to
- role: "user", "assistant", or "system"
- content_md: Message content in Markdown format
- attachments: References to attached files
- metadata: Free-form JSON for connector-specific fields
ConnectorRecord
Metadata about a connector:
- connector_uid: UUIDv7 identifier
- type: "OpenCode", "ACP", or custom
- client_native_id: Unique identifier from client (e.g., "opencode@http://localhost:12225")
- alias_of: If this connector is an alias of another (for deduplication)
Archivist Public API
The Archivist struct (in coordinator/) is the main public entry point
for archival operations. Consumers hold Arc<Archivist> and call inherent
methods — there is no Archivist trait anymore. The coordinator resolves
the target backend per call (via archive: Option<String>) and delegates
to ArchiveBackend methods.
Key method families (see coordinator/*.rs for full signatures):
- Archive lifecycle (
archives.rs):add_archive,remove_archive,list_archives,set_default_archive - Connectors (
connectors.rs):register_connectorwith tri-state result (Accepted / Aliased / Rejected),list_connectors - Sessions (
sessions.rs):register_session,get_session_metadata,update_session_metadata,list_sessions_paged,move_session,copy_session,resolve_session - Messages:
append_messages,get_messages(sorts bytsthenmessage_idfor stable chronological order) - Meta / DAG (
meta.rs): meta-event recording, session lineage DAG walks, cleanup routines
List Filter vs. Full-Text Search
Two distinct query paths exist — do not conflate them.
List filter — Archivist::list_sessions_paged(SessionListQuery) returns a
cursor-paged list of sessions, AND-filtered by title_query (substring on
title), tags, model_filter (substring on metadata.model), project_id,
connector_uid, and include_hidden. This is the right tool for "narrow the
list of visible sessions."
Full-text search — api::search_sessions (in the api package, backed by
api::archivist::search_task::SearchTask) spawns rg --json over the
archive's .contexts/ tree to find messages containing text. It streams
SearchExcerpts with parsed NDJSON content and supports cancellation via
CancellationToken. This is the right tool for "find messages containing
text."
Do not extend list_sessions_paged to do content search. Content search
belongs in the ripgrep pipeline. Future improvements to content search
(indexed backends, relevance scoring) are Phase 2d / Phase 3 concerns.
JsonlBackend Implementation
The Phase 2 production backend — an implementation of ArchiveBackend plus
every sub-trait except SearchBackend:
- Thread-safe: Uses RwLock for in-memory caches
- Async: All operations use tokio for non-blocking I/O
- Caching: In-memory caches for connector and session mappings
- Collision Detection: Tri-state registration for connectors and sessions
Located under src/backends/jsonl/ and split by concern (backend.rs,
connectors.rs, dag.rs, mapping.rs, meta.rs).
Caching Strategy
JsonlBackend maintains two in-memory caches:
-
connector_cache: HashMap<Uuid, ConnectorRecord>
- Populated on registration
- Read from TSV index on startup (future enhancement)
-
session_cache: HashMap<(Uuid, String), Uuid>
- Maps (connector_uid, native_session_id) to scroll_id
- Populated on registration and session resolution
- Enables fast session lookups without disk I/O
Event Handling
The EventHandler subscribes to dirigent_protocol events and archives them in real-time:
// Create archivist and event handler
let archivist = Archivist::new_with_single_archive(archive_path).await?;
let handler = EventHandler::new(Arc::new(archivist));
// Subscribe to event stream from dirigent_core
let events = event_stream.subscribe();
// Run event loop (blocking)
handler.run(events).await;
Supported Events
- SessionCreated: Registers new sessions with the archivist
- MessageCompleted: Writes finalized messages to the archive
- SessionUpdate: Accumulates streaming message chunks
- AgentMessageChunk
- UserMessageChunk
- AgentThoughtChunk
- ToolCall
MessageAccumulator
Assembles streaming message chunks into complete messages:
- Accumulates text chunks by message_id
- Tracks thinking blocks separately
- Stores tool calls with input/output
- Finalizes messages on MessageCompleted event
- Converts to MessageRecord for archival
Integration with dirigent_core
The Archivist integrates with dirigent_core via the global event stream:
- CoreRuntime emits events for all connector operations
- EventHandler subscribes to event stream
- MessageAccumulator assembles streaming chunks
- Archivist writes complete messages to archive
This enables:
- Automatic archival of all sessions and messages
- No polling required - fully event-driven
- Consistent history across restarts
- Offline access to historical data
Testing
The package has comprehensive test coverage across multiple dimensions:
Unit Tests
Located in each module (src/*.rs, src/storage/*.rs):
- Type serialization/deserialization
- UUIDv7 generation and ordering
- Timestamp formatting (RFC 3339)
- Storage operations (NDJSON, JSON, TSV, files)
- Connector registration tri-state logic
- Session registration and alias detection
Integration Tests
Located in tests/:
integration_tests.rs: FullArchivist+JsonlBackendlifecycle, event handler integration, multi-connector scenarios, session lineage, message accumulationlist_sessions_paged_test.rs,pagination_test.rs: List filter + cursor pagination coverageimport_claude_idempotency_test.rs: Claude export re-import idempotency
Backend Contract Tests
src/backend/contract.rs holds reusable async assertions that any
&dyn ArchiveBackend must pass. JsonlBackend and MockBackend both
run the contract suite; new backends added in Phase 3+ should do the same.
Examples
Located in examples/:
basic_usage.rs: Core archivist operationsevent_handling.rs: EventHandler and MessageAccumulatorfile_storage.rs: Content-addressable file storage
Run tests:
cargo test --package dirigent_archivist
Run examples:
cargo run --package dirigent_archivist --example basic_usage
cargo run --package dirigent_archivist --example event_handling
cargo run --package dirigent_archivist --example file_storage
Performance Characteristics
- Append Operations: O(1) with sequential file writes
- Session Lookup: O(1) with in-memory cache, O(n) cache miss
- Message Retrieval: O(n) where n = number of messages (NDJSON parsing)
- File Storage: O(1) content-addressable lookup with SHA-256 hashing
- Connector Index: O(n) TSV scan, suitable for hundreds of connectors
Scalability Considerations
- Large Sessions: NDJSON is append-only, so reading large sessions requires parsing all lines
- Many Sessions: TSV indices are suitable for thousands of sessions per connector
- File Deduplication: SHA-256 hashing provides automatic deduplication across sessions
- Concurrent Access: RwLock allows multiple concurrent readers, single writer
Error Handling
The Archivist uses thiserror for rich error types:
pub enum ArchivistError {
IoError(std::io::Error),
SerdeError(serde_json::Error),
SessionUnknown(Uuid),
CollisionInconsistent(Uuid),
// ... etc
}
All public APIs return Result<T, ArchivistError> for explicit error handling.
Development Notes
- All storage operations are async (using tokio)
- Content-addressable storage uses SHA-256 hashes (hex-encoded)
- Archive directory structure mirrors session/message hierarchy
- UUIDv7 provides time-ordered, sortable identifiers
- RFC 3339 UTC timestamps for all time-based fields
- Schema versioning via
versionfield in all records
Related Packages
- dirigent_protocol: Shared types and protocol definitions (dependency)
- dirigent_core: Runtime integration for SSE event capture (integration point)
- api: Server functions for archive queries (future)
- web: UI for archive browsing and search (future)
Phase 4: ArchiveFilter (2026-04-21)
Every ArchiveRegistration carries a filter: ArchiveFilter. The filter
describes which sessions/writes the backend wants to receive. Fields:
include_connectors: Option<HashSet<Uuid>>— if Some, only these connector UIDs pass.Nonemeans no connector gate.exclude_connectors: HashSet<Uuid>— always rejected.include_tags: HashSet<String>— if non-empty, the session must carry at least one matching tag.exclude_tags: HashSet<String>— any matching tag rejects.include_hidden: bool— defaulttrue. Iffalse, sessions whose metadata has"hidden": trueare skipped.
Primary-always-writes invariant
The per-call primary (either the archive: Some(name) argument or the
default write-target) is never filtered. If a caller explicitly asks
to write to archive X, the filter on X is not consulted. Filters only
gate secondary fanout.
Boot validator
At boot (coordinator/boot.rs), the validator rejects configurations
where:
- No write-active + enabled registration has an unrestricted filter
(
ArchiveFilter::default()is unrestricted). Prevents configurations that silently drop all writes. - An archive's filter has
include_connectors = Some(empty set)— equivalent to "reject everything", which is almost certainly a config bug.
See docs/plans/2026-04-21-archivist-phase4-design.md §4 for the full
design rationale.
Phase 5: Importers (2026-04-21)
The import:: module centres on an Importer trait with per-source
implementations under import::sources::*. Each source produces a
ParsedConversation (ChatGPT) / ParsedSession (Codex) / session
directory walk (Claude) and feeds the results through the common
import_sessions orchestrator, which fires ImportProgressEvents on a
bounded ImportProgressSink.
Importer trait
Every importer declares a config_shape() so UIs can render a dynamic
form; a discover() that returns an ImportDiscovery preview; and an
import() that does the actual work. All three methods are async.
The trait lives in import::trait_def. Shape types (ImportConfig,
ImportTarget, ConfigField, ConfigFieldKind, ImportError) are
serialisable and safe to cross the WASM boundary.
Registry
ImporterRegistry::with_defaults() registers every enabled
importer-* feature. Currently: claude, chatgpt, codex. The
registry is constructed at boot and stored on AppState.
Progress sink
ImportProgressSink::channel() returns a bounded mpsc pair.
Non-terminal events use try_send (dropped on full); terminal events
use send().await so consumers always see the final state.
Source crates
dirigent_chatgpt— parsesconversations.jsonfrom the OpenAI data export.dirigent_codex— parses*.jsonlsession files under~/.codex/sessions.
Both crates hold pure parser types with zero dirigent-specific types.
See docs/plans/2026-04-21-archivist-phase5-design.md.
Future Enhancements
- Indexed
SearchBackendimplementations (tantivy/sqlite) — currently content search is ripgrep-based in theapipackage - Session splitting and lineage management (mutations.ndjson)
- Knowledge overview generation (chat.md exports)
- Embedding storage and search (embeds/)
- Network RPC interface for remote archivist
- Compaction and pruning policies
- Additional concrete backends (e.g. SQLite, remote)
Documentation
- Package README:
./README.md- User-facing overview - Architecture Docs:
../../docs/building/05_archivist/- Design and planning - API Docs: Run
cargo doc --package dirigent_archivist --open - Examples: See
examples/directory for working code samples