Coordination Error Model

1. Overview

The coordination error model provides compile-time-enforced, operation-specific error types for every shard coordination operation. Rather than a single mega-enum that forces callers to handle irrelevant variants, each operation returns a dedicated error type that admits only the CoordError variants semantically valid for that operation.

Design philosophy

Compile-time exhaustiveness over wildcard convenience. Every From<CoordError> impl explicitly enumerates all variants (accepted and rejected). Adding a new CoordError variant triggers a compile error in every From impl, forcing a conscious routing decision for the new variant. Rejected variants map to unreachable!().
Security-first redaction. Custom Debug and Display impls on error types redact sensitive fields (hash values, worker identities, raw key bytes) to prevent information leakage in logs and error messages.
Idempotency before authorization. The 3-step validation pipeline checks op-log replay before lease validation, so a worker that successfully completed an operation but crashed before receiving the response can retry and get the cached result -- even after the lease has expired or the shard has reached a terminal state.
Operation-specific narrowing. Callers get precise match arms: a CheckpointError can never produce AlreadyLeased, and a RenewError can never produce OpIdConflict.
#[non_exhaustive] on all enums. Adding variants is a non-breaking change for downstream crate consumers who match on them.

Source files

File	Role
`crates/gossip-coordination/src/error.rs`	`CoordError`, all shard-level operation errors, `AcquireScratch`, `FixedBuf`, `IdempotentOutcome`
`crates/gossip-coordination/src/validation.rs`	`validate_lease`, `validate_cursor_update_pooled`, `check_op_idempotency`
`crates/gossip-coordination/src/lease.rs`	`Lease`, `LeaseHolder`, `OpLogEntry`, `OpKind`, `OpResult`
`crates/gossip-coordination/src/run_errors.rs`	`CreateRunError`, `RegisterShardsError`, `GetRunError`, `RunTransitionError`, `UnparkError`
`crates/gossip-coordination/src/split_execution.rs`	`DerivedShardKind`, derived shard IDs, payload hash functions
`crates/gossip-coordination/src/facade.rs`	`ClaimError`, `ShardClaiming` trait, default claim algorithm

2. Error Hierarchy

2.1 CoordError -- the shared building block

CoordError (error.rs) is the core coordination error enum. It provides the shared vocabulary of error conditions used across all lease-gated shard operations. Individual operation error types wrap these variants via From<CoordError> impls.

CoordError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── StaleFence { presented: FenceEpoch, current: FenceEpoch }
├── LeaseExpired { deadline: LogicalTime, now: LogicalTime }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
├── OpIdConflict { op_id: OpId, expected_hash: u64, actual_hash: u64 }
├── CursorRegression { old_key: Option<usize>, new_key: Option<usize> }
├── CursorOutOfBounds(CursorOutOfBoundsDetail)
├── CursorKeyTooLarge { size: usize, max: usize }
├── CursorTokenTooLarge { size: usize, max: usize }
├── SplitInvalid(SplitValidationError)
└── CheckpointMissingKey

CursorOutOfBoundsDetail (error.rs) captures redacted byte lengths:

Field	Type	Meaning
`last_key`	`usize`	Length of the cursor key that fell outside range
`spec_start`	`usize`	Length of shard's inclusive lower bound
`spec_end`	`usize`	Length of shard's exclusive upper bound

2.2 Shard-level operation error types

Each shard operation has a dedicated error enum that wraps the subset of CoordError variants relevant to that operation. The From<CoordError> impls (error.rs) route accepted variants and unreachable!() on rejected ones.

AcquireError (error.rs)

Acquire is special: it does NOT require a pre-existing lease, so it cannot produce StaleFence or LeaseExpired. It has no From<CoordError> impl.

AcquireError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
├── AlreadyLeased { current_owner: WorkerId, lease_deadline: LogicalTime }
└── BackendError(InfraError)

RenewError (error.rs)

Renew extends a lease deadline without modifying shard progress. Excludes OpIdConflict, cursor, and split variants.

RenewError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── StaleFence { presented: FenceEpoch, current: FenceEpoch }
├── LeaseExpired { deadline: LogicalTime, now: LogicalTime }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
└── BackendError(InfraError)

CheckpointError (error.rs)

Checkpoint is the richest mutation (advances cursor, idempotent via op-log). The widest error surface along with CompleteError. Excludes only SplitInvalid. Has From<SlabFull> for ResourceExhausted.

CheckpointError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── StaleFence { presented: FenceEpoch, current: FenceEpoch }
├── LeaseExpired { deadline: LogicalTime, now: LogicalTime }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
├── OpIdConflict { op_id: OpId, expected_hash: u64, actual_hash: u64 }
├── CursorRegression { old_key: Option<usize>, new_key: Option<usize> }
├── CursorOutOfBounds(CursorOutOfBoundsDetail)
├── CursorKeyTooLarge { size: usize, max: usize }
├── CursorTokenTooLarge { size: usize, max: usize }
├── CheckpointMissingKey
├── ResourceExhausted(SlabFull)
└── BackendError(InfraError)

CompleteError (error.rs)

Shares most variants with CheckpointError (records a final cursor position). Excludes SplitInvalid. Has From<SlabFull>.

CompleteError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── StaleFence { presented: FenceEpoch, current: FenceEpoch }
├── LeaseExpired { deadline: LogicalTime, now: LogicalTime }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
├── OpIdConflict { op_id: OpId, expected_hash: u64, actual_hash: u64 }
├── CursorRegression { old_key: Option<usize>, new_key: Option<usize> }
├── CursorOutOfBounds(CursorOutOfBoundsDetail)
├── CursorKeyTooLarge { size: usize, max: usize }
├── CursorTokenTooLarge { size: usize, max: usize }
├── CheckpointMissingKey
├── ResourceExhausted(SlabFull)
└── BackendError(InfraError)

ParkError (error.rs)

Park transitions a shard to the Parked terminal state. Carries OpIdConflict but excludes all cursor variants, CheckpointMissingKey, and SplitInvalid.

ParkError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── StaleFence { presented: FenceEpoch, current: FenceEpoch }
├── LeaseExpired { deadline: LogicalTime, now: LogicalTime }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
├── OpIdConflict { op_id: OpId, expected_hash: u64, actual_hash: u64 }
└── BackendError(InfraError)

SplitError (error.rs)

Used by both split operations. Carries OpIdConflict and SplitInvalid but excludes cursor variants and CheckpointMissingKey. Has From<SlabFull>.

SplitError (#[non_exhaustive])
├── ShardNotFound { shard: ShardKey }
├── TenantMismatch { expected: TenantId }
├── StaleFence { presented: FenceEpoch, current: FenceEpoch }
├── LeaseExpired { deadline: LogicalTime, now: LogicalTime }
├── ShardTerminal { shard: ShardKey, status: ShardStatus }
├── OpIdConflict { op_id: OpId, expected_hash: u64, actual_hash: u64 }
├── SplitInvalid(SplitValidationError)
├── ResourceExhausted(SlabFull)
└── BackendError(InfraError)

Type aliases (error.rs):

SplitReplaceError = SplitError
SplitResidualError = SplitError

2.3 Variant routing matrix

Each operation error type accepts a subset of CoordError variants via its From<CoordError> impl. Rejected variants panic at unreachable!().

`CoordError` variant	Renew	Checkpoint	Complete	Park	Split
`ShardNotFound`	yes	yes	yes	yes	yes
`TenantMismatch`	yes	yes	yes	yes	yes
`StaleFence`	yes	yes	yes	yes	yes
`LeaseExpired`	yes	yes	yes	yes	yes
`ShardTerminal`	yes	yes	yes	yes	yes
`OpIdConflict`	--	yes	yes	yes	yes
`CursorRegression`	--	yes	yes	--	--
`CursorOutOfBounds`	--	yes	yes	--	--
`CursorKeyTooLarge`	--	yes	yes	--	--
`CursorTokenTooLarge`	--	yes	yes	--	--
`SplitInvalid`	--	--	--	--	yes
`CheckpointMissingKey`	--	yes	yes	--	--

AcquireError is excluded from the matrix because it defines its own variants directly and has no From<CoordError> impl.

ResourceExhausted(SlabFull) is not routed through CoordError -- it originates from From<SlabFull> impls on CheckpointError, CompleteError, and SplitError.

2.4 Run-level error types

Run-level operations (run_errors.rs) use per-operation error types where variants differ meaningfully, rather than the From<CoordError> narrowing pattern. The only shared conversion across run-level types is From<RunOpIdConflict>.

CreateRunError (run_errors.rs)

CreateRunError (#[non_exhaustive])
├── InvalidConfig(RunConfigError)
├── RunAlreadyExists { run: RunId }
├── RegisterShardsFailed(RegisterShardsError)
├── GetRunFailed(GetRunError)
├── ConfigMismatch { run: RunId }
└── BackendError(InfraError)

RegisterShardsError (run_errors.rs)

RegisterShardsError (#[non_exhaustive])
├── RunNotFound
├── TenantMismatch { expected: TenantId }
├── WrongStatus { status: RunStatus }
├── ManifestInvalid(ManifestValidationError)
├── OpIdConflict(RunOpIdConflict)
├── ShardLimitExceeded { current, additional, max, scope: ShardLimitScope }
├── ResourceExhausted { resource: &'static str }
└── BackendError(InfraError)

GetRunError (run_errors.rs)

GetRunError (#[non_exhaustive])
├── RunNotFound
├── TenantMismatch { expected: TenantId }
└── BackendError(InfraError)

RunTransitionError (run_errors.rs)

Shared by complete_run, fail_run, and cancel_run.

RunTransitionError (#[non_exhaustive])
├── RunNotFound
├── TenantMismatch { expected: TenantId }
├── RunTerminal { status: RunStatus }
├── WrongStatus { status: RunStatus, target: RunStatus }
├── OpIdConflict(RunOpIdConflict)
└── BackendError(InfraError)

UnparkError (run_errors.rs)

UnparkError (#[non_exhaustive])
├── ShardNotFound
├── TenantMismatch { expected: TenantId }
├── RunTerminal { status: RunStatus }
├── NotParked { status: ShardStatus }
├── OpIdConflict(RunOpIdConflict)
└── BackendError(InfraError)

2.5 ClaimError (facade.rs)

ClaimError is intentionally coarser than AcquireError: transient race conditions are absorbed by the claim retry loop and surface as NoneAvailable only when every candidate has been exhausted.

ClaimError (#[non_exhaustive])
├── NoneAvailable { earliest_deadline: Option<LogicalTime> }
├── RunNotFound
├── TenantMismatch { expected: TenantId }
├── Throttled { retry_after: LogicalTime }
└── BackendError(InfraError)

Has From<GetRunError> mapping RunNotFound and TenantMismatch 1:1.

3. Validation Pipeline

Every lease-gated mutation (checkpoint, complete, park, split) follows the same three-step validation pipeline. The ordering is deliberate and safety-critical.

Step 1: Idempotency check -- `check_op_idempotency` (validation.rs)

Called first so that replays succeed even after the lease expires or the shard reaches a terminal state.

Three-way return:

Condition	Return	Caller action
`op_id` not in log	`Ok(None)`	Proceed with fresh execution
`op_id` found, hash matches	`Ok(Some(entry))`	Return cached result (replay)
`op_id` found, hash differs	`Err(OpIdConflict)`	Reject -- caller reused an OpId with different parameters

Preconditions: payload_hash != 0 (asserted at runtime -- zero indicates broken hashing).

Op-log capacity: The op-log is bounded to 16 entries. If an OpId has been evicted, this function returns Ok(None) and the caller treats it as a new operation. This is safe because:

Terminal operations (complete, park, split_replace): the shard is terminal after first execution, so no further ops push entries and the terminal op's entry is never evicted.
Non-terminal operations (checkpoint, split_residual): eviction requires 16+ intervening ops within the same lease epoch. The fence epoch is the primary deduplication guard.

Step 2: Lease validation -- `validate_lease` (validation.rs)

Five checks in security-priority order:

Tenant isolation (validation.rs) -- record.tenant == tenant. Checked first so a mismatch never reveals whether the shard is terminal, has a stale fence, or has an expired lease.
Terminal status (validation.rs) -- !record.status.is_terminal(). Fast rejection of dead shards before more expensive checks.
Fence epoch (validation.rs) -- lease.fence() == record.fence_epoch. Zombie worker fencing (Kleppmann protocol). A mismatch means another worker has since acquired the shard.
Lease expiry (validation.rs) -- now < record.lease_deadline(). If the fence check passes but the record has no lease holder (deadline is None), the record is in an inconsistent state and StaleFence is returned to force re-acquisition. The half-open interval means now >= deadline is expired.
Owner divergence (validation.rs) -- record.lease_owner() == Some(lease.owner()). Defense-in-depth: catches identity mismatches when fence epochs agree (e.g., state reconstruction bugs).

Preconditions: now > LogicalTime::ZERO (asserted at runtime -- zero indicates a broken clock).

Postconditions (on Ok(())):

record.tenant == tenant
record.status is non-terminal
lease.fence() == record.fence_epoch
record.is_leased_at(now) is true

Step 3: Operation-specific validation

Varies by operation:

Checkpoint / Complete: validate_cursor_update_pooled (validation.rs)
Split: validate_split_coverage / validate_residual_split (from gossip_contracts::coordination::shard_spec)
Park: No additional validation beyond Steps 1-2
Renew: Step 1 is skipped (renew is not idempotent via OpId)

Cursor validation -- `validate_cursor_update_pooled` (validation.rs)

Operates entirely on borrowed slices to avoid materializing owned cursor/spec values on the checkpoint hot path. Checks five properties in order:

Key presence (validation.rs) -- new_cursor.last_key() must be Some. A checkpoint with no key represents no progress.
Key size (validation.rs) -- last_key.len() <= MAX_KEY_SIZE. Defense-in-depth against oversized keys.
Token size (validation.rs) -- token.len() <= MAX_TOKEN_SIZE. Defense-in-depth before storing in the slab.
Monotonicity (validation.rs) -- new_last_key >= old_last_key (lexicographic). Prevents cursor regression within a lease epoch.
Bounds (validation.rs) -- last_key must fall within [spec_start, spec_end). Empty spec_start / spec_end are treated as unbounded on that side.

The ordering matters: a missing key is reported before monotonicity, and monotonicity is reported before bounds. This gives the caller the most actionable error first.

4. Lease Semantics

Lease structure (lease.rs)

Lease is the capability token returned by acquire_and_restore_into and required by every lease-gated mutation. It carries:

Field	Type	Purpose
`tenant`	`TenantId`	Tenant isolation scope
`run`	`RunId`	Run identity
`shard`	`ShardId`	Shard identity
`owner`	`WorkerId`	The worker holding this lease
`fence`	`FenceEpoch`	Monotonic fencing token
`deadline`	`LogicalTime`	Expiry time (half-open: `now < deadline` is active)

Fields are private. The coordinator is the sole producer; workers receive leases as opaque tokens.

Construction panics: fence < FenceEpoch::INITIAL or deadline == LogicalTime::ZERO.

Renewal: set_deadline (lease.rs) extends the deadline without changing the fence epoch. Panics if the new deadline is before the current deadline (monotonicity violation). Equal deadlines are allowed (harmless no-op).

LeaseHolder (lease.rs)

Stored inside the ShardRecord. Bundles owner: WorkerId and deadline: LogicalTime into a single value so the record stores Option<LeaseHolder> instead of two separate Option fields -- making the "both-present-or-both-absent" invariant structurally impossible to violate.

Construction panics: deadline == LogicalTime::ZERO.

Fence epoch protocol

Every acquire_and_restore_into increments the shard's fence_epoch. Subsequent mutations must present a lease whose fence matches the record's current epoch. Stale epochs are rejected.

This implements the "fencing token protocol" from Kleppmann (2016): all writes carry the fence epoch, and the backend rejects stale epochs. The epoch is monotonically increasing -- it increments on every ownership transfer and on administrative unpark.

Lease expiry semantics

The coordinator uses a half-open interval:

now < deadline -- lease is active
now >= deadline -- lease is expired

At expiry, the coordinator rejects mutations and allows other workers to re-acquire the shard.

5. Idempotency Log

OpKind (lease.rs)

Operation kinds that participate in per-shard idempotency. #[repr(u8)] with stable discriminants persisted in the op-log.

Variant	Discriminant	Status Change	Lease Required?
`Checkpoint`	0	none (cursor advances)	yes
`Complete`	1	Active -> Done	yes
`Park`	2	Active -> Parked	yes
`SplitReplace`	3	Active -> Split	yes
`SplitResidual`	4	none (parent stays Active)	yes
`Unpark`	5	Parked -> Active (admin)	no

OpResult (lease.rs)

Stored result status for replay detection. #[repr(u8)].

Variant	Discriminant	Meaning
`Completed`	0	Operation executed successfully
`Error`	1	Validation failed, shard NOT mutated
`Superseded`	2	Valid but overtaken by a later mutation

OpLogEntry (lease.rs)

A single entry in the bounded per-shard operation log.

Field	Type	Purpose
`op_id`	`OpId`	Unique idempotency key
`kind`	`OpKind`	Operation type
`result`	`OpResult`	Cached outcome
`payload_hash`	`u64`	BLAKE3 fingerprint of operation parameters
`executed_at`	`LogicalTime`	Timestamp of first execution

Construction panics: executed_at == LogicalTime::ZERO or payload_hash == 0.

Deduplication key: (OpId, payload_hash):

Same pair -> idempotent replay, return cached result
Same OpId, different hash -> conflict, reject
OpId not found -> new operation (or evicted)

Replay detection flow

Look up OpId in the shard's op-log (reverse scan, most-recent first).
Found, same hash -> return cached OpResult as IdempotentOutcome::Replayed.
Found, different hash -> reject as OpIdConflict.
Not found -> execute the mutation, record the entry, return IdempotentOutcome::Executed.

IdempotentOutcome (error.rs)

pub enum IdempotentOutcome<T> {
    Executed(T),   // first execution
    Replayed(T),   // retry from op-log
}

Methods: into_inner(), is_replay(), is_executed(), map(), as_ref().

Used by: checkpoint/complete/park_shard return IdempotentOutcome<()>, split_replace returns IdempotentOutcome<SplitReplaceResult>, split_residual returns IdempotentOutcome<SplitResidualResult>.

6. Error Classification

Retryable errors (worker should back off and retry)

Error	Condition	Retry guidance
`AlreadyLeased`	Another worker holds the shard	Wait until `lease_deadline`
`LeaseExpired`	Worker's lease expired	Re-acquire the shard
`ResourceExhausted(SlabFull)`	Byte slab full	Retry after freeing slab space
`ClaimError::NoneAvailable`	All shards claimed or terminal	Wait until `earliest_deadline`
`ClaimError::Throttled`	Worker claim cooldown active	Wait until `retry_after`

Fatal errors (worker must stop processing this shard)

Error	Condition	Required action
`StaleFence`	Zombie worker -- another worker acquired the shard	Stop immediately, re-acquire
`ShardTerminal`	Shard is Done/Split/Parked	Stop, shard is finished
`OpIdConflict`	Same OpId reused with different payload	Bug in caller -- investigate
`CursorRegression`	New cursor key < old cursor key	Bug in caller -- safety invariant
`CursorOutOfBounds`	Cursor outside shard key range	Bug in caller -- safety invariant
`CursorKeyTooLarge`	Key exceeds `MAX_KEY_SIZE`	Bug in caller or corruption
`CursorTokenTooLarge`	Token exceeds `MAX_TOKEN_SIZE`	Bug in caller or corruption
`SplitInvalid`	Split coverage validation failed	Bug in caller -- fix split plan
`CheckpointMissingKey`	Checkpoint without `last_key`	Bug in caller -- include key

Escalation errors (always a bug or configuration error)

Error	Condition	Required action
`TenantMismatch`	Cross-tenant request	Always a logic bug -- investigate
`ShardNotFound`	Shard not in store	Index corruption or race with deletion

7. Security Redaction

All error types implement custom Debug and Display to prevent information leakage.

Error	Redacted Fields	Reason
`OpIdConflict`	`expected_hash`, `actual_hash` shown as `<redacted>` in Debug, omitted from Display	Prevents oracle attacks on payload hashing
`AlreadyLeased`	`current_owner` shown as `<redacted>` in both Debug and Display	Prevents worker identity leakage
`CursorRegression`	Raw key bytes replaced with `Some([N bytes])` summaries	Prevents data exfiltration via error messages
`CursorOutOfBounds`	Raw key bytes replaced with `[N bytes]` summaries	Same as above
`TenantMismatch`	Only `expected` (caller's tenant) exposed; actual tenant deliberately omitted	Prevents cross-tenant enumeration

Run-level OpIdConflict variants wrap RunOpIdConflict as a tuple variant (not exposing fields) to prevent external callers from extracting raw hash values via pattern matching.

8. AcquireScratch and FixedBuf

FixedBuf<N> (error.rs)

Fixed-capacity byte buffer with logical length tracking. Used for inline storage of small variable-length fields without heap allocation.

Method	Purpose
`new()`	Create empty buffer (zeroed backing, length 0)
`write(bytes, oversized_msg)`	Write bytes, panics if `bytes.len() > N`
`from_slice(bytes, oversized_msg)`	Convenience constructor: create and write
`read()`	Borrow written portion `&bytes[..len]`
`reset()`	Set logical length to zero (no zeroing)
`has_data()`	Returns `true` if `len > 0`

Implements Clone, PartialEq, Eq, Debug (shows first 8 hex bytes).

AcquireScratch (error.rs)

Caller-owned scratch buffer for allocation-free acquire snapshots. Pre-allocates fixed-capacity arrays for every variable-size field in a shard snapshot.

Internal fields:

Field	Type	Purpose
`spec_start`	`FixedBuf<MAX_KEY_SIZE>`	Shard spec key range start
`spec_end`	`FixedBuf<MAX_KEY_SIZE>`	Shard spec key range end
`spec_metadata`	`FixedBuf<MAX_METADATA_SIZE>`	Shard spec opaque metadata
`cursor_last_key`	`FixedBuf<MAX_KEY_SIZE>`	Cursor last_key
`cursor_token`	`FixedBuf<MAX_TOKEN_SIZE>`	Cursor opaque token
`spawned`	`InlineVec<ShardId, MAX_SPAWNED_PER_SHARD>`	Spawned child shard IDs

Usage pattern:

let mut scratch = AcquireScratch::new();  // once per session
loop {
    let view = backend.acquire_and_restore_into(now, tenant, key, worker, &mut scratch)?;
    // use view.snapshot.spec(), view.snapshot.cursor(), etc.
    // scratch is overwritten on next call -- view is invalidated
}

Methods:

Method	Purpose
`new()`	Create empty scratch
`reset()`	Reset all logical lengths
`write_spec(start, end, metadata)`	Copy spec bytes into scratch
`write_cursor(last_key, token)`	Copy cursor bytes into scratch
`write_spawned_iter(iter)`	Copy lineage IDs from iterator
`view(status, cursor_semantics, parent)`	Build `ShardSnapshotView` backed by scratch

Borrow-checker enforcement: acquire_and_restore_into takes &'a mut AcquireScratch and returns AcquireResultView<'a>, preventing the caller from calling it again while the view is still live.

Related result types

ShardSnapshotView<'a> (error.rs): Borrowed shard state view backed by AcquireScratch. Fields: status, spec, cursor, cursor_semantics, parent, spawned.

AcquireResultView<'a> (error.rs): Contains lease: Lease, snapshot: ShardSnapshotView<'a>, capacity: CapacityHint.

CapacityHint (error.rs): Advisory capacity information piggybacked on acquire/renew results. Fields: available_count: u32, earliest_deadline: Option<LogicalTime>. Fail-open advisory metadata.

RenewResult (error.rs): Contains new_deadline: LogicalTime, capacity: CapacityHint.

9. Integration: Error Flow

Shard-level flow

  ┌──────────────────────┐
  │   Caller (facade,    │
  │   orchestrator)      │
  │                      │
  │  calls checkpoint()  │
  └──────────┬───────────┘
             │
  ┌──────────▼───────────┐
  │  check_op_idempotency│ ──▶ Ok(Some(entry)) → return Replayed
  │  (validation.rs)     │ ──▶ Err(OpIdConflict) → CoordError
  │                      │ ──▶ Ok(None) → continue
  └──────────┬───────────┘
             │
  ┌──────────▼───────────┐
  │  validate_lease       │ ──▶ Err(TenantMismatch|ShardTerminal|
  │  (validation.rs)      │      StaleFence|LeaseExpired) → CoordError
  │                       │ ──▶ Ok(()) → continue
  └──────────┬────────────┘
             │
  ┌──────────▼───────────────┐
  │  validate_cursor_update  │ ──▶ Err(CheckpointMissingKey|CursorKeyTooLarge|
  │  _pooled (validation.rs) │      CursorTokenTooLarge|CursorRegression|
  │                          │      CursorOutOfBounds) → CoordError
  │                          │ ──▶ Ok(()) → continue
  └──────────┬───────────────┘
             │
  ┌──────────▼───────────┐
  │  Apply mutation       │
  │  (in_memory.rs)       │ ──▶ SlabFull → CheckpointError::ResourceExhausted
  └──────────┬────────────┘
             │
  ┌──────────▼───────────┐
  │  ? operator triggers  │ ──▶ From<CoordError> for CheckpointError
  │  From conversion      │     (error.rs)
  └──────────┬────────────┘
             │
  ┌──────────▼───────────┐
  │  Return to caller     │ ──▶ Result<IdempotentOutcome<()>, CheckpointError>
  └──────────────────────┘

The ? operator in backend methods uses From<CoordError> conversions automatically. CoordError is the shared intermediate type that all validation functions return; it gets narrowed to the operation-specific error type at the backend boundary.

Claim-level flow

  ┌────────────────────────┐
  │  ShardClaiming::        │
  │  claim_next_available   │
  └──────────┬──────────────┘
             │
  ┌──────────▼───────────────┐
  │  collect_claim_candidates│ ──▶ Err(GetRunError) → From<GetRunError>
  │  _into (RunManagement)   │     → ClaimError::RunNotFound|TenantMismatch
  └──────────┬───────────────┘
             │
  ┌──────────▼───────────────┐
  │  For each candidate:     │
  │  acquire_and_restore_into│ ──▶ Ok → return success
  │                          │ ──▶ Err(AlreadyLeased|ShardTerminal|
  │                          │      ShardNotFound) → skip, try next
  │                          │ ──▶ Err(TenantMismatch) → fail immediately
  └──────────┬───────────────┘
             │ all candidates exhausted
  ┌──────────▼───────────────┐
  │  ClaimError::NoneAvailable│
  │  { earliest_deadline }    │
  └───────────────────────────┘

Run-level flow

Run-level operations use per-operation error types directly. The shared From<RunOpIdConflict> conversion routes idempotency conflicts into each type's OpIdConflict variant. Run-level errors do not go through CoordError.

10. Source of Truth Table

Shard-level error types (error.rs)

Type	Location	Variants
`CoordError`	error.rs	12
`CursorOutOfBoundsDetail`	error.rs	struct (3 fields)
`AcquireError`	error.rs	5
`RenewError`	error.rs	6
`CheckpointError`	error.rs	13
`CompleteError`	error.rs	13
`ParkError`	error.rs	7
`SplitError`	error.rs	9
`SplitReplaceError` (alias)	error.rs	= `SplitError`
`SplitResidualError` (alias)	error.rs	= `SplitError`

From impls (error.rs)

Conversion	Location
`From<CoordError> for CheckpointError`	error.rs
`From<CoordError> for CompleteError`	error.rs
`From<CoordError> for ParkError`	error.rs
`From<CoordError> for SplitError`	error.rs
`From<CoordError> for RenewError`	error.rs
`From<SlabFull> for CheckpointError`	error.rs
`From<SlabFull> for CompleteError`	error.rs
`From<SlabFull> for SplitError`	error.rs

Result and outcome types (error.rs)

Type	Location	Purpose
`CapacityHint`	error.rs	Advisory shard availability
`ShardSnapshotView<'a>`	error.rs	Borrowed shard state view
`AcquireResultView<'a>`	error.rs	Lease + snapshot + capacity
`FixedBuf<N>`	error.rs	Fixed-capacity byte buffer
`AcquireScratch`	error.rs	Caller-owned scratch buffer
`RenewResult`	error.rs	Renewal result
`IdempotentOutcome<T>`	error.rs	Executed vs. Replayed wrapper

Validation functions (validation.rs)

Function	Location	Returns
`validate_lease`	validation.rs	`Result<(), CoordError>`
`validate_cursor_update_pooled`	validation.rs	`Result<(), CoordError>`
`check_op_idempotency`	validation.rs	`Result<Option<&OpLogEntry>, CoordError>`

Lease types (lease.rs)

Type	Location	Purpose
`LeaseHolder`	lease.rs	Owner + deadline stored in `ShardRecord`
`Lease`	lease.rs	Capability token returned to workers
`OpKind`	lease.rs	Operation type discriminant (`#[repr(u8)]`)
`OpResult`	lease.rs	Cached result status (`#[repr(u8)]`)
`OpLogEntry`	lease.rs	Single op-log entry

Run-level error types (run_errors.rs)

Type	Location	Variants
`CreateRunError`	run_errors.rs	6
`RegisterShardsError`	run_errors.rs	8
`GetRunError`	run_errors.rs	3
`RunTransitionError`	run_errors.rs	6
`UnparkError`	run_errors.rs	6

Facade-level error (facade.rs)

Type	Location	Variants
`ClaimError`	facade.rs	5

Split execution types (split_execution.rs)

Type	Location	Purpose
`DerivedShardKind`	split_execution.rs	Child vs. Residual discriminant
`SplitReplaceResult`	split_execution.rs	Child shard IDs from split-replace
`SplitResidualResult`	split_execution.rs	Residual shard ID from split-residual

11. References

Kleppmann, M. "How to do distributed locking." 2016. https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
Gray, C. and Cheriton, D. "Leases: An Efficient Fault-Tolerant Mechanism for Distributed File Cache Consistency." SOSP 1989.
Stripe. "Designing robust and predictable APIs with idempotency." (Brandur Leach, 2017).
Cho, J. et al. "Breakwater: Overload Control for Microservices." OSDI 2020. (Credit piggybacking pattern used for CapacityHint.)

FilesExpand file tree

coordination-error-model.md

Latest commit

History