Source integration is organized by family: each family defines its own
trait surface tuned to the semantics of that source category. Families compose
from a shared paging and value vocabulary (PageBuf, Cursor, ItemKey,
Budgets, error types in gossip-contracts/src/connector/) but have
independent trait surfaces. Once a family runtime finishes executing one work
unit, it can hand that unit to the shared runtime commit pipeline to make
durable progress through a family-neutral path.
Item-at-a-time enumeration and byte reads. The coordination layer assigns shard ranges; the runtime drives the source through page-fill / scan / read cycles.
| Trait | Crate | File | Role |
|---|---|---|---|
OrderedContentSource |
gossip-contracts |
src/connector/ordered.rs |
Fill pages of ScanItem, open/read item bytes |
Key types: OrderedContentCapabilities, ScanItem, PageBuf<ScanItem>,
Cursor, Budgets, EnumerateError, ReadError.
Worker loop sketch:
fill_page(shard, cursor, budgets)
-> Result<Option<PageBuf<ScanItem>>, EnumerateError>
Ok(None) => exhausted-empty terminal signal
Ok(Some(PageBuf)) => { items, state: HasMore{cursor} | Complete }
Complete pages are terminal non-empty pages; the
runtime performs one exhausted-empty suffix call
before treating the shard as fully enumerated.
for each item:
open(item_ref, budgets) -> Result<Box<dyn io::Read + Send>, ReadError>
(optionally) read_range(item_ref, offset, dst, budgets) -> Result<usize, ReadError>
checkpoint cursor
Concrete connectors:
FilesystemConnector directly implements OrderedContentSource and keeps
matching inherent helper methods in gossip-connectors/src/filesystem.rs.
InMemoryDeterministicConnector exposes the same read/split surface as
inherent methods in gossip-connectors/src/in_memory.rs.
Submission staging:
gossip-orchestrator stages filesystem submissions before runtime execution:
request.rscanonicalizes raw paths, validates them against the requested source mode (single file vs. directory root), and enforces path/mode consistency. For untrusted input,normalize_within(allowed_root)also verifies that the canonical path resides within a server-configured root directory, rejecting symlink escapes and traversal attempts. ProducesNormalizedFilesystemRequest.planner.rsmaps normalized requests into the deterministic one-shard startup geometry consumed by later payload and registration stages.payload.rsencodes the typed filesystem shard metadata that coordination stores inconnector_extraand the runtime later decodes during lease hydration.setup.rslowers the normalized request, planned geometry, and typed payload into a validated initial manifest, then executes thecreate_run_with_shardslifecycle that makes the startup shard set claimable.git_request.rscanonicalizes raw Git repo targets, validates repository identity via tenant-scoped normalization, and preserves request-side selection intent (default-branch, explicit refs, or explicit commit) for later Git control-plane stages.git_payload.rsencodes the typed Git shard metadata that coordination stores inconnector_extrafor repo-frontier shards and the runtime later decodes during Git lease hydration.
The filesystem stages are required for filesystem security and determinism. Git request normalization is required for tenant-scoped identity resolution and target deduplication.
Whole-repository operations: commit walks, tree diffs, pack scans. Git execution is intentionally separate from ordered-content because the runtime operates on entire repositories rather than individual items.
| Trait | Crate | File | Role |
|---|---|---|---|
GitRepoDiscoverySource |
gossip-contracts |
src/connector/git.rs |
Page over GitRepoTarget in RepoKey order |
GitMirrorManager |
gossip-contracts |
src/connector/git.rs |
Acquire or refresh a LocalMirror |
GitRepoExecutor |
gossip-contracts |
src/connector/git.rs |
Run repo-native scanning against a mirror |
Key types: RepoKey, RepoLocator, GitRepoTarget, GitSelection,
LocalMirror, GitExecutionLimits, GitRunOutcome, GitRunError,
GitDiscoveryCapabilities.
Pipeline:
1. GitRepoDiscoverySource::discover_page(shard, cursor, budgets)
-> Result<Option<PageBuf<GitRepoTarget>>, EnumerateError>
Ok(None) => terminal completion (shard fully enumerated)
2. GitMirrorManager::sync_mirror(locator)
-> Result<LocalMirror, GitRunError>
3. GitRepoExecutor::run_repo(mirror, selection, limits)
-> Result<GitRunOutcome, GitRunError>
gossip-scanner-runtime/src/git_discovery.rs defines the payload-backed
StaticGitRepoDiscoverySource for one-target repo-frontier shards. It
emits one terminal page when the carried RepoKey is inside the assigned shard
and otherwise relies on the ordered key boundary for replay-safe completion.
gossip-scanner-runtime/src/git_repo.rs wires that source into
GitRepoRuntime::execute_discovery, which the distributed runtime uses before
and after repo execution to decide whether a singleton shard is already covered
by its cursor or is complete after a durable finalize-backed checkpoint.
gossip-scanner-runtime/src/git_persistence.rs defines the runtime-owned
adapter that satisfies scanner-git's ref-watermark, seen-blob, and finalize
persistence seams, restores a fingerprint-scoped MIDX ordinal cache for the
live worker when artifact metadata matches, and maps complete inner finalizes
onto the shared
repo-frontier receipt/checkpoint path. run_git_repo_worker in
gossip-scanner-runtime/src/distributed.rs composes the full singleton path:
static discovery, mirror sync, mirror-backed execution through
GitRepoRuntime::execute_repo, capture of emitted finding payloads through
FindingsCaptureSink, translation via the shared
PersistenceFinding/translate_git_item_result path, and finally shard
advancement from the durable repo-frontier checkpoint cursor.
All families build on types in gossip-contracts/src/connector/common.rs:
| Type | Role |
|---|---|
PageBuf<T> |
Non-empty page container with PageState (HasMore / Complete) |
PageState |
Cursor-carrying continuation or terminal signal |
PagingCapabilities |
Feature flags: ordered_keys, resumable, splittable |
KeyedPageItem |
Trait: item_key() + size_hint() |
validate_filled_page |
Validates non-empty, strictly increasing keys within shard bounds |
Error types (EnumerateError, ReadError) and value types (ItemKey,
ItemRef, Cursor, Budgets) live in gossip-contracts/src/connector/api.rs
and types.rs.
| File | Purpose |
|---|---|
crates/gossip-contracts/src/connector/ordered.rs |
Ordered-content family contract |
crates/gossip-contracts/src/connector/git.rs |
Git family contract (three traits + supporting types) |
crates/gossip-contracts/src/connector/common.rs |
Shared paging vocabulary |
crates/gossip-contracts/src/connector/types.rs |
Toxic-byte wrappers, cursor, budgets |
crates/gossip-contracts/src/connector/api.rs |
Error taxonomy, capabilities |
crates/gossip-contracts/src/connector/conformance.rs |
Ordered-content conformance harness shared by concrete connector implementations |
crates/gossip-contracts/src/connector/mod.rs |
Re-export hub, canonical connector tags |
crates/gossip-contracts/src/connector/api_tests.rs |
Unit tests for error taxonomy and capabilities |
crates/gossip-contracts/src/connector/common_tests.rs |
Unit tests for shared paging vocabulary |
crates/gossip-contracts/src/connector/types_tests.rs |
Unit tests for toxic-byte wrappers, cursor, and budgets |
crates/gossip-connectors/src/lib.rs |
Crate root re-exports for concrete filesystem and in-memory connector implementations |
crates/gossip-connectors/src/filesystem.rs |
Filesystem ordered-content connector |
crates/gossip-connectors/src/in_memory.rs |
Deterministic in-memory test connector |
crates/gossip-connectors/src/common.rs |
Shared connector utilities |
crates/gossip-connectors/src/split_estimator.rs |
Streaming byte-weighted split-point estimator (internal; used by common.rs and FilesystemConnector) |
crates/gossip-connectors/src/filesystem_tests.rs |
Unit tests for filesystem connector |
crates/gossip-connectors/src/in_memory_tests.rs |
Unit tests for in-memory connector |
crates/gossip-connectors/src/split_estimator_tests.rs |
Unit tests for split-point estimator |
crates/gossip-orchestrator/src/lib.rs |
Re-export hub for filesystem and Git request normalization, planning, and run setup |
crates/gossip-orchestrator/src/git_payload.rs |
Typed Git shard payload wire format for repo-frontier shards (encode/decode) |
crates/gossip-orchestrator/src/git_planner.rs |
Deterministic Git initial shard geometry planner |
crates/gossip-orchestrator/src/git_request.rs |
Canonical Git submission request normalization and target deduplication |
crates/gossip-orchestrator/src/git_setup.rs |
Coordination-backed Git run setup and shard registration |
crates/gossip-orchestrator/src/request.rs |
Canonical filesystem submission request normalization |
crates/gossip-orchestrator/src/planner.rs |
Deterministic filesystem initial shard geometry planner |
crates/gossip-orchestrator/src/payload.rs |
Typed filesystem shard payload wire format (encode/decode) |
crates/gossip-orchestrator/src/setup.rs |
Coordination-backed filesystem run setup and shard registration |
crates/gossip-orchestrator/src/test_support.rs |
Shared test fixtures for orchestrator unit tests |
crates/gossip-scanner-runtime/src/lib.rs |
Runtime crate root: public family entrypoints, execution-mode selection, validation, and shared scan report/config types |
crates/gossip-scanner-runtime/src/cli.rs |
CLI/runtime flag mapping and entrypoint wiring for filesystem and Git scans |
crates/gossip-scanner-runtime/src/ordered_content.rs |
Runtime integration for ordered content |
crates/gossip-scanner-runtime/src/git_discovery.rs |
Static single-target Git repository discovery source |
crates/gossip-scanner-runtime/src/git_executor.rs |
Contract-level adapter that runs scanner-git against a local mirror |
crates/gossip-scanner-runtime/src/git_persistence.rs |
Runtime-backed Git persistence adapters and repo-frontier receipt helpers |
crates/gossip-scanner-runtime/src/git_mirror.rs |
Worker-local Git mirror lifecycle and deterministic mirror-cache naming |
crates/gossip-scanner-runtime/src/git_repo.rs |
Runtime integration for Git repo-native |
crates/gossip-scanner-runtime/src/commit_pipeline.rs |
Family-neutral bounded execution -> durable-commit bridge shared after result translation |
crates/gossip-scanner-runtime/src/commit_sink.rs |
Commit-sink trait and bridge record types for scan-loop lifecycle |
crates/gossip-scanner-runtime/src/commit_model.rs |
Frozen runtime commit vocabulary: CompletedUnit, CommitRequest, UnitCommitReceipt |
crates/gossip-scanner-runtime/src/done_ledger_bloom.rs |
In-memory Bloom filter used to prefilter done-ledger lookups during durable commit processing |
crates/gossip-scanner-runtime/src/event_sink.rs |
Owned event sinks and forwarders for CLI/runtime output surfaces |
crates/gossip-scanner-runtime/src/parity.rs |
Cross-scanner parity helpers shared by runtime tests and tooling |
crates/gossip-scanner-runtime/src/result_translation.rs |
Deterministic scan-result -> persistence-row translation |
crates/gossip-scanner-runtime/src/result_committer.rs |
Authoritative findings -> done-ledger durable commit stage |
crates/gossip-scanner-runtime/src/checkpoint_aggregator.rs |
Receipt-driven prefix checkpoint aggregation |
crates/gossip-scanner-runtime/src/coordination_sink.rs |
Coordination event recorder payloads for distributed scans |
crates/gossip-scanner-runtime/src/distributed.rs |
Distributed worker-loop runtime and receipt-backed commit plumbing |
crates/gossip-scanner-runtime/src/cli_tests.rs |
CLI argument parsing and entrypoint configuration tests |
crates/gossip-scanner-runtime/src/distributed/commit_bridge.rs |
Scan-to-commit adapter and commit pipeline draining |
crates/gossip-scanner-runtime/src/distributed/execution.rs |
Scan execution, per-lease orchestration, and top-level worker loops |
crates/gossip-scanner-runtime/src/distributed/integration_tests.rs |
End-to-end integration tests for the distributed module |
crates/gossip-scanner-runtime/src/distributed/lease_ops.rs |
Lease lifecycle: claiming, hydration, deadline watchdog, and shard advancement |
crates/gossip-scanner-runtime/src/distributed/test_support.rs |
Shared test doubles and fixture builders for distributed module tests |
crates/gossip-scanner-runtime/src/distributed/types.rs |
Distributed runtime types, errors, and shared definitions |
crates/gossip-scanner-runtime/src/distributed/unit_tests.rs |
Component-level unit tests for distributed coordination and state transitions |
crates/gossip-scanner-runtime/src/lib_tests.rs |
Runtime tests for parsing, validation, and local scan execution |
crates/gossip-scanner-runtime/src/runtime_durability_tests.rs |
Durability integration tests for translation, commit, and receipt-driven checkpoint aggregation |
crates/gossip-scanner-runtime/src/test_fixtures.rs |
Shared test data builders and git repository setup helpers |
- Define the contract in
gossip-contracts/src/connector/— one or more traits plus any family-specific value types. Build on the shared paging vocabulary where applicable. - Implement in
gossip-connectors/src/— concrete connector(s) for the family. - Wire into runtime in
gossip-scanner-runtime/src/— translation from coordination shard assignments to the family's trait surface. - Update this doc — add the new family to the table above.
See boundary-4-connectors.md for the full connector architecture including paging invariants, error taxonomy, and the ordered-content conformance harness.