ADR 0001: Git Repository Execution Model

Status: Accepted
Date: 2026-03-31

Context

Git repository scanning uses a repo-native execution path, not the ordered-content page loop. The repository already has most of the building blocks for that path:

Existing building block	Current location	Why it matters
`ShardStatus::{Active, Done, Split, Parked}`	`crates/gossip-coordination/src/record.rs`	The outer coordinator lifecycle is already fixed and persisted.
`LeaseUncertainty::{DeadlineElapsed, AdvanceStaleFence, AdvanceLeaseExpired}`	`crates/gossip-scanner-runtime/src/distributed.rs`	Lease-loss semantics already exist at runtime level.
`CheckpointBoundaryKind::RepoFrontier`	`crates/gossip-contracts/src/persistence/page_commit.rs`	Repo-frontier progress is already a first-class boundary kind.
`CompletedUnit::repo_frontier(...)`	`crates/gossip-scanner-runtime/src/commit_model.rs`	Repo-frontier units already fit the shared runtime commit vocabulary.
`CheckpointAggregatorInput`	`crates/gossip-scanner-runtime/src/commit_model.rs`	Outer progress already advances from durable receipts only.
`PersistenceStore::commit_finalize(&FinalizeOutput)`	`crates/scanner-git/src/persist.rs`	Inner Git execution already has an atomic finalize seam.
`FinalizeOutcome::{Complete, Partial}` with suppressed `watermark_ops` on `Partial`	`crates/scanner-git/src/finalize.rs`	Partial runs already prevent ref-frontier advancement.
`SeenBlobStore`, `seen_blob` markers, and MIDX ordinal caches	`crates/scanner-git/src/seen_store.rs`, `crates/scanner-git/src/finalize.rs`, `crates/scanner-git/src/ordinal_seen.rs`	Replay already has an idempotent dedupe surface plus a fingerprint-scoped acceleration cache for configured snapshots.
`GitRepoDiscoverySource`, `GitMirrorManager`, `GitRepoExecutor`, `GitRunError`	`crates/gossip-contracts/src/connector/git.rs`	The repo-family contract surface already exists.

This document locks how those pieces compose. It does not introduce a second coordinator state machine, a second checkpoint path, or a compatibility layer.

Decision Summary

Repo execution is a worker-local substate machine inside ShardStatus::Active.
Inner Git durability hands off to outer repo-frontier checkpointing through a sequential durable-receipt seam.
The connector-level Git runtime error model is ternary: Retryable, Permanent, and StaleOwner.
The first shipped scope stays narrow: one normalized repo target per shard, deterministic mirror and selection inputs, no repo-internal coordination state, and no repo-internal shard splitting.

Decision 1: Repo Execution Lives Inside `ShardStatus::Active`

The coordinator keeps its current coarse shard lifecycle. Repo execution phases are worker-local substates inside ShardStatus::Active; they do not add new persisted ShardStatus variants.

stateDiagram-v2
    direction LR

    [*] --> Active : worker acquires shard

    state "ShardStatus::Active" as Active {
        [*] --> Claimed
        Claimed --> MirrorSync : start mirror work
        MirrorSync --> Executing : mirror ready
        Executing --> InnerDurable : inner finalize durable
        InnerDurable --> OuterCheckpointed : repo-frontier receipt durably checkpointed

        MirrorSync --> MirrorSync : retryable error
        Executing --> Executing : retryable error

        Claimed --> LeaseLost : lease loss
        MirrorSync --> LeaseLost : lease loss
        Executing --> LeaseLost : lease loss
        InnerDurable --> LeaseLost : lease loss

        Claimed --> PermanentFailure : permanent error
        MirrorSync --> PermanentFailure : permanent error
        Executing --> PermanentFailure : permanent error
        InnerDurable --> PermanentFailure : permanent error
    }

    Active --> Done : completion after OuterCheckpointed
    Active --> Parked : coordinator accepts park request
    Done --> [*]
    Parked --> [*]

LeaseLost is terminal only for the current worker attempt. The worker stops immediately, discards in-flight local state, and leaves the shard Active so a new owner can replay from the last durable frontier. PermanentFailure is a worker-local decision to request parking; the actual persisted transition remains the existing outer Active -> Parked coordination transition.

Required behavior

ShardStatus stays exactly Active | Done | Split | Parked.
Claimed, MirrorSync, Executing, InnerDurable, and OuterCheckpointed are invisible to coordination storage.
Retryable errors in MirrorSync or Executing may loop inside the same lease, bounded by the lease deadline.
Lease loss from any inner phase stops execution immediately and never produces a new checkpoint or terminal transition.
Permanent failure from any inner phase requests Parked.

Alternatives considered

Track repo execution in a second persisted state machine. Rejected because it duplicates lease-loss handling and creates desynchronization risk between the worker view and the coordinator view.
Add repo-specific outer shard states. Rejected because coordination only needs coarse ownership and terminal-state information, and ShardStatus discriminants are already persisted.

Decision 2: Inner Git Durability Hands Off Through a Durable-Receipt Seam

Inner Git persistence owns repo-data durability. The outer runtime owns shard frontier durability. The handoff is sequential: inner persistence commits first, then the runtime translates that durable outcome into the shared receipt chain that drives repo-frontier checkpointing.

flowchart TD
    A["MirrorSync + Executing"] --> B["scanner-git builds FinalizeOutput"]
    B --> C["PersistenceStore::commit_finalize(output)"]
    C --> D{"FinalizeOutcome"}
    D -->|Complete| E["data_ops + watermark_ops durable"]
    D -->|Partial| F["data_ops durable; watermark_ops suppressed"]
    E --> G["Translate repo result + captured findings<br/>via translate_git_item_result"]
    F --> G
    G --> H["ResultCommitter -> ItemCommitReceipt"]
    H --> I["CompletedUnit::repo_frontier + ItemCommitReceipt -> UnitCommitReceipt"]
    I --> J["CheckpointAggregatorInput(RepoFrontier)"]
    J --> K["PrefixCheckpointAggregator.prepare_checkpoint()"]
    K --> L["CheckpointCommitReceipt"]
    L --> M["Coordinator checkpoint / complete transition"]

    C -. crash after inner durability .-> N["Safe replay: blob_ctx, finding, and seen_blob keys are deterministic; seen_blob markers suppress duplicate blob work; Partial never writes watermark_ops; outer repo frontier has not advanced yet"]

The seam is deliberately not a two-phase commit. The inner store already has the atomic contract it needs:

data_ops are always safe to write.
watermark_ops are written only for FinalizeOutcome::Complete.
commit_finalize writes the chosen operation set atomically.

The outer runtime already has the receipt-only rule it needs:

CheckpointAggregatorInput accepts only UnitCommitReceipt.
PrefixCheckpointAggregator advances only from durable contiguous prefixes.
RepoFrontier remains a normal checkpoint-boundary kind in the shared aggregator rather than a Git-only side channel.

The translation step now includes Git findings themselves, not only finalize outcome metadata. Repo-frontier workers capture emitted finding payloads during scan execution, normalize them behind PersistenceFinding, and commit them through the same findings-first, done-ledger-second ResultCommitter path used by ordered-content scans before they synthesize the outer checkpoint receipt.

Crash window: `InnerDurable -> OuterCheckpointed`

The only interesting crash window is after commit_finalize succeeds but before the outer checkpoint receipt is durably acknowledged.

That window is safe because:

inner writes are keyed deterministically, so reissuing them converges on the same persisted state;
seen_blob markers provide replay-time dedupe for already-scanned blobs;
configured MIDX ordinal caches are keyed by the repo artifact fingerprint, so replay only reuses them when the snapshot metadata still matches the committed seen scope;
FinalizeOutcome::Partial suppresses watermark_ops, so partial replays never advance ref-frontier watermarks; and
outer repo-frontier advancement still has not happened, because only the receipt path through CheckpointAggregatorInput and CheckpointCommitReceipt may move the shard frontier.

The result is a single-writer outbox-style handoff: inner persistence proves the repo-local writes are durable, and the outer runtime turns that proof into the family-neutral checkpoint protocol.

Alternatives considered

Use a two-phase commit spanning scanner-git persistence and the outer runtime. Rejected because it adds a large coordination refactor without improving the safety properties that deterministic replay already provides.
Advance the outer frontier from scan completion, queue drain, or another non-durable signal. Rejected because the shared runtime model already forbids raw completion signals from acting as authoritative progress.

Decision 3: Connector-Level Git Errors Are Ternary

The Git repo-runtime contract needs a third connector-level error class: StaleOwner.

Scope of this decision

Three distinct ErrorClass enums exist today:

Layer	Current location	Current shape	This ADR changes it?
Connector level	`crates/gossip-contracts/src/connector/api.rs`	`Retryable \| Permanent`	Yes, conceptually reserve `StaleOwner` here.
Scheduler level	`crates/scanner-scheduler/src/scheduler/failure.rs`	Binary with reason subtypes	No.
Remote backend level	`crates/scanner-scheduler/src/scheduler/remote.rs`	`Retryable \| Permanent`	No.

This ADR changes only the connector-level meaning. Scheduler and remote error taxonomies remain separate until they need their own explicit ownership-loss mapping.

Class table

Class	Meaning	Worker action	Example
`Retryable`	The current owner may retry within the same lease.	Retry locally while the lease is still authoritative.	transient mirror sync failure, temporary rate limit, concurrent maintenance retry surface
`Permanent`	The current owner cannot make progress without an external change.	Request `Parked`.	permission denied, repository missing, unsupported selection or malformed configuration
`StaleOwner`	The work has become non-authoritative because lease ownership is no longer trustworthy.	Stop immediately, drop in-flight local state, leave the shard `Active` for reassignment.	deadline elapsed, stale fence rejection, lease-expired rejection, mirror or persistence detecting superseded ownership

StaleOwner is not a retry request and not a park request. It means "the work may be valid, but this worker is no longer allowed to finish it."

Migration constraint

When the connector-level enum grows StaleOwner, every binary if err.is_retryable() { ... } else { ... } call site must be audited. The current is_retryable() helper is intentionally binary; without an audit it would collapse StaleOwner into the non-retryable branch and silently treat ownership loss like a permanent failure. Future code must branch on class() or an equivalent three-way helper.

Alternatives considered

Keep the connector layer binary and model ownership loss only through LeaseUncertainty in the runtime. Rejected because ownership loss may be detected inside mirror management or persistence code that is naturally exposed through GitRunError.
Add StaleOwner to every error enum immediately. Rejected because the connector, scheduler, and remote layers have different responsibilities and should not be forced into lockstep without a separate design decision.

Decision 4: The First Scope Stays Narrow

The first shipped shape is intentionally constrained.

Topic	Locked decision	Reason	Deferred expansion
Shard granularity	One normalized repo target maps to one shard.	Control-plane state scales with repo targets instead of repo-internal objects.	Multi-repo packing and shard-level packing heuristics.
Shard splitting	No repo-internal split points in this shape. A repo shard is either replayed, parked, or completed as one repo target.	Split-point discovery inside a repository adds coordination and replay complexity immediately.	Split/pack strategies that operate on multi-repo shards.
Coordinator visibility	The coordinator stores only coarse shard state, repo-frontier progress, and terminal outcomes.	Repo-internal commits, trees, blobs, and mirror details are worker-local execution state.	Repo-internal coordination state, if a later design proves it is needed.
Mirror lifecycle	Mirror location and refresh behavior are deterministic per repo target. The runtime implementation lives in `gossip-scanner-runtime::git_mirror::LocalMirrorManager`.	Reassignment and replay need a stable local execution surface.	Shared mirror pools, eviction policy, and provider-specific mirror orchestration.
Explicit commit selection	Explicit commit inputs lower to stable synthetic refs before execution.	Identical inputs must normalize to the same ordered repo targets and checkpoint identity.	Additional selection UX and provider-specific aliases.
`StaleOwner` rollout	The semantic is locked now, but the connector-level enum, constructor surface, and consumers may land in a separate change.	The behavior must be documented before wiring spreads through the runtime.	Scheduler-level and remote-level ownership-loss mappings.

Alternatives considered

Pack multiple repositories into a single shard from the start. Rejected because it immediately couples checkpointing, split policy, and replay policy.
Introduce repo-internal coordinator state early. Rejected because it increases storage and protocol surface before the coarse repo-frontier path is proven.

Invariants for All Downstream Implementation Work

Every downstream implementation task, design note, and test plan for this execution model keeps these invariants explicit:

A stale fence token is never accepted again.
Loss of lease stops repo execution quickly.
The outer repo frontier never advances before inner durable persistence completes.
Control-plane state scales with shards, not with objects inside repositories.
Identical request inputs normalize to the same ordered repo targets.
Identical explicit commit inputs lower to the same synthetic ref.
Logs, metrics, and traces never contain raw secret bytes, repo paths, refs, or tokens.
Duplicate submission and worker replay are idempotent.

Operational telemetry follows the same rule. Stage-oriented Git observability may emit scalar timings, shard digests, and closed-set retry or lease-loss labels, but never raw mirror roots, repo locators, refs, commit IDs, or connector tokens.

Consequences

ShardStatus remains the only persisted outer lifecycle for repo work.
RepoFrontier remains part of the shared receipt and checkpoint model rather than a Git-only completion path.
Lease loss is a first-class outcome that stops work without parking the shard.
Git stage telemetry stays low-cardinality and redaction-safe: worker logs and recorder events emit digests plus scalar timings rather than raw repository identifiers.
The first scope optimizes for deterministic replay and narrow control-plane state, not for packing density.

Complementary Documentation

docs/source-families.md describes the family split between ordered-content and repo-native Git execution.
docs/gossip-coordination/coordination-error-model.md complements this ADR with the coordination-layer error taxonomy and lease/fence validation model.
diagrams/05-shard-and-run-state-machines.md describes the outer shard and run lifecycle that this ADR intentionally preserves.
diagrams/10-failure-modes-and-recovery.md describes the failure and replay model that this ADR relies on for ownership loss and durable replay.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ADR 0001: Git Repository Execution Model

Context

Decision Summary

Decision 1: Repo Execution Lives Inside `ShardStatus::Active`

Required behavior

Alternatives considered

Decision 2: Inner Git Durability Hands Off Through a Durable-Receipt Seam

Crash window: `InnerDurable -> OuterCheckpointed`

Alternatives considered

Decision 3: Connector-Level Git Errors Are Ternary

Scope of this decision

Class table

Migration constraint

Alternatives considered

Decision 4: The First Scope Stays Narrow

Alternatives considered

Invariants for All Downstream Implementation Work

Consequences

Complementary Documentation

FilesExpand file tree

0001-git-mvp-execution-model.md

Latest commit

History

0001-git-mvp-execution-model.md

File metadata and controls

ADR 0001: Git Repository Execution Model

Context

Decision Summary

Decision 1: Repo Execution Lives Inside ShardStatus::Active

Required behavior

Alternatives considered

Decision 2: Inner Git Durability Hands Off Through a Durable-Receipt Seam

Crash window: InnerDurable -> OuterCheckpointed

Alternatives considered

Decision 3: Connector-Level Git Errors Are Ternary

Scope of this decision

Class table

Migration constraint

Alternatives considered

Decision 4: The First Scope Stays Narrow

Alternatives considered

Invariants for All Downstream Implementation Work

Consequences

Complementary Documentation

Decision 1: Repo Execution Lives Inside `ShardStatus::Active`

Crash window: `InnerDurable -> OuterCheckpointed`