This guide explains how to use the deterministic scheduler simulation harness to test the executor's work-stealing policy.
The scheduler test harness validates the executor's work distribution policy - not scanning logic. It simulates multi-worker scheduling without OS threads, enabling reproducible bug discovery and systematic interleaving exploration.
What it tests:
- Local LIFO queue semantics (cache locality)
- Global FIFO injector behavior
- Work-stealing with randomized victim selection
- Gate/count correctness (no lost tasks after
join()) - Park/unpark semantics (no deadlocks while work exists)
- Resource accounting (bounded permits)
Why deterministic simulation:
- Reproducible bugs: Any failure produces an artifact that replays identically
- Systematic exploration: Enumerate interleavings that would require millions of runs to hit randomly
- Shared policy code:
executor_core.rsruns identically in production and simulation
# Run corpus regression tests
cargo test --features scheduler-sim scheduler_sim_replay_corpus
# Run stress tests with random case generation
cargo test --features scheduler-sim scheduler_sim_stress_smoke
# Scale stress runs via environment variables
SCHEDULER_SIM_STRESS_SEEDS=100 cargo test --features scheduler-sim scheduler_sim_stress_smokeKey paths:
- Corpus:
crates/scanner-engine-integration-tests/tests/simulation/corpus/*.json- regression tests replayed on every run - Failures:
crates/scanner-engine-integration-tests/tests/failures/*.json- where stress failures are written
A ReproArtifact defines a complete simulation scenario in JSON. The harness loads the case, initializes the executor, and steps through actions until completion or failure.
{
"schema_version": 1,
"seed": 42,
"case": {
"exec_cfg": { ... },
"resources": [ ... ],
"programs": [ ... ],
"tasks": [ ... ],
"initial_runnable": [ ... ],
"external_events": [ ... ],
"max_steps": 100
},
"driver_choices": [ ... ],
"expected_trace_hash": 12345678901234567890,
"failure": { ... }
}| Field | Type | Default | Description |
|---|---|---|---|
workers |
usize | required | Number of simulated worker threads |
steal_tries |
u32 | 4 | Steal attempts before parking |
seed |
u64 | required | RNG seed for deterministic victim selection |
wake_on_hoard_threshold |
u32 | 32 | Local spawns before waking a sibling worker |
"exec_cfg": {
"workers": 2,
"steal_tries": 2,
"seed": 178,
"wake_on_hoard_threshold": 32
}Models semaphores, byte budgets, or other bounded resources. Tasks can acquire and release units; the harness checks for overflow/underflow.
"resources": [
{ "id": 0, "total": 4 },
{ "id": 1, "total": 1 }
]Each program is a sequence of bytecode instructions. Multiple tasks can share a program (like a function definition).
"programs": [
{
"name": "worker",
"code": [
{ "Yield": { "placement": "Local" } },
"Complete"
]
}
]Logical task states. Each task references a program and starts at a program counter (pc).
"tasks": [
{ "tid": 0, "program": 0, "pc": 0 },
{ "tid": 1, "program": 1, "pc": 0 }
]Task IDs to enqueue at simulation start. A task ID can appear multiple times to model multiple run-tokens.
"initial_runnable": [0, 1, 0]Events delivered by the driver at specific steps. These model external I/O completions and the join() call that closes the accepting gate.
"external_events": [
{ "at_step": 5, "event": { "IoComplete": { "token": 7 } } },
{ "at_step": 10, "event": "CloseGateJoin" }
]Hard cap on simulation steps to prevent infinite runs. Reaching this limit without invariant violation means the test passed.
Each choice is an index into the enabled action list at that step. Empty means use the default policy (first enabled action).
"driver_choices": [
{ "idx": 0 },
{ "idx": 1 }
]A 64-bit hash of the trace events. Replay must produce this exact hash, validating determinism.
failure records the outcome captured when the artifact was created.
Replay currently asserts expected_trace_hash; this field is metadata.
"failure": {
"kind": "Timeout",
"step": 11,
"message": "corpus artifact"
}"Timeout" with "corpus artifact" is the common convention for non-violation corpus cases.
Some corpus artifacts intentionally use other kinds (for example "Panic").
Each instruction executes atomically. After execution, the task either continues (new PC), blocks, or completes.
| Instruction | Scanner Equivalent | JSON Syntax |
|---|---|---|
Yield |
Async yield / reschedule self | { "Yield": { "placement": "Local" } } |
Spawn |
ctx.spawn_local() / spawn_global() |
{ "Spawn": { "program": 1, "placement": "Local" } } |
Sleep |
Timer-based wait / backoff | { "Sleep": { "ticks": 5 } } |
WaitIo |
Waiting for network/disk I/O | { "WaitIo": { "token": 7 } } |
TryAcquire |
ByteBudget / permit check | { "TryAcquire": { "res": 0, "units": 1, "ok": 2, "fail": 3 } } |
Release |
Return budget / permit | { "Release": { "res": 0, "units": 1 } } |
Jump |
Control flow (goto PC) | { "Jump": { "target": 0 } } |
Complete |
Task finishes normally | "Complete" |
Panic |
Intentional crash (error path testing) | "Panic" |
Used by Yield and Spawn to control where run-tokens are enqueued:
| Placement | Queue | Semantics |
|---|---|---|
Local |
Worker's local LIFO queue | Cache locality, same worker likely executes |
Global |
Global FIFO injector | Fairness, any worker can pick up |
External |
Global injector (gated) | Respects accepting gate; fails after join() |
Yield: Re-enqueues the current task as a new run-token. The task continues at PC+1 when next executed.
{ "Yield": { "placement": "Local" } }Spawn: Creates a new task from the specified program and enqueues it. Parent continues at PC+1.
{ "Spawn": { "program": 1, "placement": "Global" } }Sleep: Blocks until ticks time units pass. The driver's AdvanceTimeTo action wakes sleepers.
{ "Sleep": { "ticks": 3 } }WaitIo: Blocks until the matching IoComplete event is delivered.
{ "WaitIo": { "token": 42 } }TryAcquire: Attempts to acquire units from resource res. Jumps to ok on success, fail on insufficient availability.
{
"TryAcquire": {
"res": 0,
"units": 1,
"ok": 1,
"fail": 3
}
}Release: Returns units to resource res. Panics on over-release.
{ "Release": { "res": 0, "units": 1 } }Jump: Unconditional jump to target PC.
{ "Jump": { "target": 0 } }Complete: Task finishes. No more instructions execute.
"Complete"Panic: Triggers a panic for testing error handling paths.
"Panic"Tests that a single task can yield and complete.
{
"exec_cfg": { "workers": 1, "steal_tries": 2, "seed": 161, "wake_on_hoard_threshold": 32 },
"resources": [],
"programs": [
{
"name": "basic",
"code": [
{ "Yield": { "placement": "Local" } },
"Complete"
]
}
],
"tasks": [{ "tid": 0, "program": 0, "pc": 0 }],
"initial_runnable": [0],
"external_events": [{ "at_step": 3, "event": "CloseGateJoin" }],
"max_steps": 50
}Tests that tasks spawned locally can be stolen by idle workers.
{
"exec_cfg": { "workers": 2, "steal_tries": 2, "seed": 178, "wake_on_hoard_threshold": 32 },
"resources": [],
"programs": [
{
"name": "root",
"code": [
{ "Spawn": { "program": 1, "placement": "Local" } },
"Complete"
]
},
{
"name": "child",
"code": ["Complete"]
}
],
"tasks": [{ "tid": 0, "program": 0, "pc": 0 }],
"initial_runnable": [0],
"external_events": [],
"max_steps": 20
}Worker 0 spawns a child locally. Worker 1 can steal the child from worker 0's queue (FIFO steal = oldest first).
Tests that tasks block on I/O until completion events arrive, and that sleepers wake after time advances.
{
"exec_cfg": { "workers": 1, "steal_tries": 2, "seed": 195, "wake_on_hoard_threshold": 32 },
"resources": [],
"programs": [
{
"name": "io_sleep",
"code": [
{ "WaitIo": { "token": 7 } },
{ "Sleep": { "ticks": 2 } },
"Complete"
]
}
],
"tasks": [{ "tid": 0, "program": 0, "pc": 0 }],
"initial_runnable": [0],
"external_events": [
{ "at_step": 1, "event": { "IoComplete": { "token": 7 } } },
{ "at_step": 5, "event": "CloseGateJoin" }
],
"max_steps": 40
}Tests that bounded resources are correctly accounted and that TryAcquire branches correctly.
{
"exec_cfg": { "workers": 1, "steal_tries": 2, "seed": 212, "wake_on_hoard_threshold": 32 },
"resources": [{ "id": 0, "total": 1 }],
"programs": [
{
"name": "resource_release",
"code": [
{ "TryAcquire": { "res": 0, "units": 1, "ok": 1, "fail": 3 } },
{ "Release": { "res": 0, "units": 1 } },
"Complete",
"Complete"
]
},
{
"name": "resource_fail",
"code": [
{ "TryAcquire": { "res": 0, "units": 1, "ok": 1, "fail": 2 } },
"Complete",
"Complete"
]
}
],
"tasks": [
{ "tid": 0, "program": 0, "pc": 0 },
{ "tid": 1, "program": 1, "pc": 0 }
],
"initial_runnable": [0, 1, 0],
"external_events": [],
"max_steps": 20
}With only 1 unit available, one task acquires while the other fails and takes the fail branch.
- After finding a concurrency bug: Create a regression test from the failure artifact
- Before major executor changes: Add stress coverage for the changed behavior
- When adding new executor features: Write specification tests that exercise the feature
- Create a JSON file in
crates/scanner-engine-integration-tests/tests/simulation/corpus/following the schema above - Run
cargo test --features scheduler-sim scheduler_sim_replay_corpus - The test will fail with
trace hash mismatchand print the actual hash - Copy the printed hash into your JSON as
expected_trace_hash - Re-run to verify the test passes
Example workflow:
# Create your test case
vim crates/scanner-engine-integration-tests/tests/simulation/corpus/my_test.json
# Run to get the trace hash (will fail first time)
cargo test --features scheduler-sim scheduler_sim_replay_corpus
# Update expected_trace_hash with the printed value
# Re-run to verify
cargo test --features scheduler-sim scheduler_sim_replay_corpusWhen stress tests find a violation, they write an artifact to crates/scanner-engine-integration-tests/tests/failures/:
{
"failure": {
"kind": { "Violation": "RunnableStarvation" },
"step": 47,
"message": "task 3 runnable since step 12 (backlog 5)"
}
}Key fields:
failure.kind: The violation type (see Invariants below)failure.step: Step number where the violation occurreddriver_choices: Exact interleaving to reproduce the bugexpected_trace_hash: Hash for determinism verification
# Default: 16 seeds, small bounds, fast
cargo test --features scheduler-sim scheduler_sim_stress_smoke
# Scale up for deeper testing
SCHEDULER_SIM_STRESS_SEEDS=1000 \
SCHEDULER_SIM_STRESS_MAX_STEPS=200 \
cargo test --features scheduler-sim scheduler_sim_stress_smoke
# Reproduce a specific failure
SCHEDULER_SIM_STRESS_SEEDS=1 \
SCHEDULER_SIM_STRESS_SEED_BASE=3735928559 \
cargo test --features scheduler-sim scheduler_sim_stress_smokeEnvironment variables:
| Variable | Default | Description |
|---|---|---|
SCHEDULER_SIM_STRESS_SEEDS |
16 | Number of random cases to generate |
SCHEDULER_SIM_STRESS_MAX_STEPS |
80 | Maximum steps per case |
SCHEDULER_SIM_STRESS_MAX_PROGRAMS |
4 | Maximum programs per case |
SCHEDULER_SIM_STRESS_MAX_TASKS |
4 | Maximum initial tasks per case |
SCHEDULER_SIM_STRESS_SEED_BASE |
3490561775 (0xD00D_BEEF) | Base seed for RNG (env value is parsed as decimal) |
The harness validates these invariants on every step:
| Invariant | Violation Kind | Meaning |
|---|---|---|
| Resource bounds | ResourceOverflow |
Released more units than acquired |
| Gate consistency | GateViolation |
Gate closed with zero in-flight but done not set |
| Bounded starvation | RunnableStarvation |
Runnable task not executed within bound |
Other ViolationKind variants exist in the enum (InFlightMismatch, LostWakeup, IllegalUnblock, etc.), but they are not emitted by the current oracle checks.
The starvation check uses a dynamic bound: K = workers * (steal_tries + 2) * 4 + backlog. A runnable task must execute within K steps of becoming runnable, where backlog is the in-flight count when it became runnable. This accounts for legitimate queuing delays in busy systems.
Q: Do I need to write tests for my scanning logic?
A: No. This harness tests the executor (work distribution), not scanning. Use regular unit tests for scanning logic, rule validation, and transform correctness.
Q: How do I model waiting for a file read?
A: Use WaitIo { token: N } in the program, then schedule IoComplete { token: N } in external_events. The task blocks until the event is delivered.
Q: Why does the corpus use "Timeout" as the failure kind?
A: In corpus artifacts, "Timeout" is the common metadata convention for a non-violation case ("corpus artifact"). Replay correctness is enforced by expected_trace_hash.
Q: How do I debug a failing stress test?
A:
- Find the artifact in
crates/scanner-engine-integration-tests/tests/failures/ - Set
SCHEDULER_SIM_STRESS_SEEDS=1andSCHEDULER_SIM_STRESS_SEED_BASE=<seed-as-decimal>to reproduce - Add the artifact to
crates/scanner-engine-integration-tests/tests/simulation/corpus/to make it a permanent regression test
Q: Can I test the actual crossbeam queues?
A: No. The simulation uses deterministic VecDeque models. The production crossbeam queues require real parallelism to test (consider loom for that). The harness tests the scheduling policy, not the queue implementation.
Q: How do driver_choices work?
A: At each step, the harness computes a list of enabled actions in a stable order. The driver_choices[i].idx selects the action at that index. Empty choices means use the default (index 0). This enables exact replay of any execution.