This document inventories every testing approach in the repository, maps overlap between encoding/mutation and infrastructure, records keep/merge/migrate decisions for each component, defines a deterministic contract for a shared mutation core, and lays out a staged rollout plan. It serves as the initial design deliverable for the Counterexample Testing Unification effort.
| # | Area | Location | Feature Gate | Tests What |
|---|---|---|---|---|
| 1 | Property tests | crates/scanner-engine-integration-tests/tests/property/ (20 files) |
property-tests |
Mathematical invariants (soundness, determinism, roundtrip) |
| 2 | Simulation tests | crates/scanner-engine-integration-tests/tests/simulation/ (15 .rs files) |
sim-harness etc. |
System-level behavior (scheduling, chunking, faults) |
| 3 | Corpus replay --- scanner | crates/scanner-engine-integration-tests/tests/corpus/scanner/ (71 .case.json) |
sim-harness |
Deterministic regression replay |
| 4 | Corpus replay --- git | crates/scanner-engine-integration-tests/tests/corpus/git_scan/ (11 .case.json) |
sim-harness |
Git deterministic regression replay |
| 5 | Scanner sim module | crates/scanner-scheduler/src/sim_scanner/ (7 files) |
sim-harness |
Scenario generation, runner, oracles |
| 6 | Git sim module | crates/scanner-git/src/sim_git_scan/ (18 files) |
sim-harness |
Git repo model generation, stage pipeline |
| 7 | Shared sim infra | crates/scanner-scheduler/src/sim/ (9 files + mutation/ subdir) |
sim-harness |
RNG, fault injection, minimization, executor |
| 8 | Offline validators | crates/scanner-engine/src/engine/offline_validate.rs |
None | Structural token validation |
| 9 | YAML unit tests | crates/scanner-engine/src/rules/yaml_unit_tests.rs |
None | Rule parsing/scanning roundtrip |
| 10 | Integration tests | crates/scanner-engine-integration-tests/tests/integration/ (23 scenario files + main.rs harness) |
integration-tests |
Handcrafted regression tests |
| 11 | Fuzz targets | Per-crate fuzz/fuzz_targets/ (29 targets across 5 crates) |
Nightly | Coverage-guided mutation |
| 12 | Real-rules harness | crates/scanner-engine-integration-tests/tests/simulation/scanner_real_rules.rs |
real-rules-harness |
Golden baseline comparison |
| 13 | Smoke tests | crates/scanner-engine-integration-tests/tests/smoke/ (1 file) |
smoke-tests |
End-to-end sanity |
| 14 | Diagnostic tests | crates/scanner-engine-integration-tests/tests/diagnostic/ (3 files) |
diagnostic-tests |
Allocation and runtime diagnostics |
Twenty files in crates/scanner-engine-integration-tests/tests/property/ exercise component-level mathematical
invariants using proptest. Each file targets a specific subsystem: anchor
soundness (regex2anchor_soundness.rs), entropy thresholds
(entropy_threshold_soundness.rs), binary classification
(binary_classification.rs), git pack delta application (git_pack_delta.rs),
path policy (path_policy_soundness.rs), and others. Gated behind
property-tests and stdx-proptest. Run as cargo test --features property-tests,stdx-proptest --test property. No mutation/encoding logic —
these verify mathematical properties of pure functions.
Fifteen test files in crates/scanner-engine-integration-tests/tests/simulation/ drive the scanner and git simulation
harnesses. Scanner variants include random stress (scanner_random.rs), corpus
replay (scanner_corpus.rs), archive-specific random and corpus tests, budget
invariance, discovery, and max-file-size. Git variants include random stress
(git_scan_random.rs), corpus replay (git_scan_corpus.rs), and shallow
limits. A scheduler simulation (scheduler_sim.rs) tests scheduling logic in
isolation. All simulation tests use deterministic seeded RNG and are gated
behind sim-harness.
Seventy-one scanner .case.json files and eleven git-scan .case.json files
in crates/scanner-engine-integration-tests/tests/corpus/. Each case is a serialized scenario + fault plan + schedule
seed that deterministically replays in milliseconds. New failures discovered by
random simulation are minimized and added here. This is the fastest regression
gate.
Seven source files in crates/scanner-scheduler/src/sim_scanner/. The generator (generator.rs)
builds deterministic in-memory filesystems with embedded secrets. The
runner (runner.rs) implements the chunked-scanning event loop
with seven oracles: termination, monotonic progress, overlap dedup, duplicate
suppression, in-flight budget, ground-truth, and differential. Scenario types,
replay, and virtual path tables round out the module.
Eighteen source files in crates/scanner-git/src/sim_git_scan/. The generator builds synthetic
git repository models (commit graphs, tree entries, pack files). The runner
(runner.rs) executes a five-stage pipeline (RepoOpen →CommitWalk → TreeDiff → PackExec → Finalize) and validates output shape
(sorted/disjoint OID sets) and stability across schedule seeds. Additional
modules handle pack byte serialization, commit-graph construction, fault
injection, and persistence.
Nine files plus a mutation/ subdirectory in crates/scanner-scheduler/src/sim/: rng.rs (deterministic xorshift64*), fault.rs
(path-keyed fault plans), fs.rs (in-memory filesystem), executor.rs
(deterministic task scheduler), trace.rs (bounded ring trace), clock.rs
(simulated clock), artifact.rs (reproduction artifacts), and minimize.rs
(scanner-side deterministic minimizer with greedy shrink passes).
crates/scanner-engine/src/engine/offline_validate.rs provides structural token
validation: length checks, charset constraints, checksum verification, and
entropy gates. Thirty-nine #[test] functions with hand-authored test
vectors. No mutation — vectors are static byte slices.
crates/scanner-engine/src/rules/yaml_unit_tests.rs tests rule parsing and single-rule
scanning via roundtrip assertions. Each test constructs a rule from YAML,
compiles it, and scans a test string. No mutation — inputs are string literals.
Scenario files in crates/scanner-engine-integration-tests/tests/integration/ cover handcrafted regression
scenarios. Each file targets a specific behavior area (chunking, dedup,
transforms, multi-rule interaction). Gated behind integration-tests. Inputs
are manually constructed byte sequences. Clear, readable, but labor-intensive to
extend.
Fuzz targets across scanner-engine, scanner-git, gossip-stdx, gossip-contracts, and gossip-coordination-etcd live in per-crate fuzz/fuzz_targets/ directories and use cargo-fuzz/libFuzzer.
Coverage-guided mutation across anchor soundness, base64 gate ops, pack parsing,
offline validators, SIMD classification, text sanitization, and more. Run on
nightly; not in CI default gate.
crates/scanner-engine-integration-tests/tests/simulation/scanner_real_rules.rs scans curated fixtures
at crates/scanner-engine-integration-tests/tests/corpus/real_rules/fixtures/ with production rules from
default_rules.yaml and compares findings against a golden baseline at
crates/scanner-engine-integration-tests/tests/corpus/real_rules/expected/findings.json. Gated behind
real-rules-harness. Fixtures are hand-authored; no mutation generation.
One file in crates/scanner-engine-integration-tests/tests/smoke/ providing end-to-end sanity checks for the
scanner and git-scan pipelines. Fast, minimal, gated behind smoke-tests.
Three files in crates/scanner-engine-integration-tests/tests/diagnostic/ checking runtime properties like
post-startup allocation behavior and unfilterable rule analysis. Gated behind
diagnostic-tests.
| Capability | Scanner Sim | Git Sim | Property | Real-Rules | Offline Validators | Fuzz |
|---|---|---|---|---|---|---|
| Base64 encode | generator.rs |
— | — | Manual fixtures | Static vectors | fuzz_b64_gate_* |
| URL percent encode | generator.rs |
— | — | — | — | — |
| UTF-16 LE/BE encode | generator.rs |
— | — | — | — | — |
| Nested (alternating layers) | generator.rs |
— | — | — | — | — |
| Near-miss mutation | NONE | NONE | NONE | NONE | NONE | Coverage-guided only |
| Representation selection | generator.rs |
— | — | — | — | — |
| Token generation | generator.rs |
— | — | — | — | — |
Key finding: encoding functions in crates/scanner-scheduler/src/sim_scanner/generator.rs are the ONLY place systematic secret transforms happen. No near-miss
mutation operators exist anywhere in the codebase. Fuzz targets apply
coverage-guided byte mutation but without semantic awareness of token structure.
| Component | Scanner Sim | Git Sim | Shared? |
|---|---|---|---|
| Deterministic RNG | sim/rng.rs:SimRng |
Same | Yes — already in crates/scanner-scheduler/src/sim/ |
| Fault plan | sim/fault.rs:FaultPlan (path-keyed) |
sim_git_scan/fault.rs:GitFaultPlan (resource-keyed) |
No — domain-specific keying |
| Minimizer | sim/minimize.rs (greedy shrink) |
sim_git_scan/minimize.rs (graph-aware) |
No — different shrink strategies |
| Trace ring | sim/trace.rs |
sim_git_scan/trace.rs |
No — different event types |
| Executor | sim/executor.rs:SimExecutor |
Same | Yes — already in crates/scanner-scheduler/src/sim/ |
| Artifact format | sim/artifact.rs:ReproArtifact |
sim_git_scan/artifact.rs:GitReproArtifact |
No — different payload shapes |
Encoding/decoding is domain-independent (byte transforms on secrets). Infrastructure components are appropriately split: shared when the interface is identical (RNG, executor), separate when domain constraints differ (fault keying, minimization strategy, trace event types).
Encoding transforms (base64, URL-percent, UTF-16, nested) are pure byte-to-byte functions with no dependency on scanner or git domain logic. They belong in a shared module.
Harnesses are domain-specific and must remain separate:
- Scanner sim: chunked I/O with overlap dedup, 7 oracles
(
runner.rs), path-keyedFaultPlan, in-memory filesystem. - Git sim: five-stage pipeline (
RepoOpen → CommitWalk → TreeDiff → PackExec → Finalize), output-shape and stability oracles (runner.rs), resource-keyedGitFaultPlan, commit-graph model.
The harness boundary is where domain-specific scheduling, fault injection, and oracle verification happen. The mutation boundary is where domain-independent byte transforms happen. These are cleanly separable.
Property tests and simulation tests operate at different abstraction layers and are complementary, not redundant:
- Property tests verify component-level mathematical invariants: regex-to-anchor soundness, entropy threshold monotonicity, base64 gate roundtrip, path policy prefix closure. Each test is self-contained and exercises a single function's contract.
- Simulation tests verify system-level emergent behavior: the interaction of chunking + overlap dedup + transform decoding + fault injection + scheduling nondeterminism. These properties only emerge when multiple components compose.
Neither subsumes the other. A property test cannot catch a chunking boundary bug; a simulation cannot efficiently enumerate entropy threshold corner cases.
The 82 corpus .case.json files (71 scanner + 11 git) are the fastest
regression path:
- Each case replays in single-digit milliseconds.
- Cases are deterministic: same input, same schedule seed, same output.
- New failures discovered by random simulation or fuzz targets are minimized and added to the corpus, converting expensive discovery into cheap replay.
More expensive tests (random simulation, property tests, fuzz targets) serve as discovery mechanisms. The corpus serves as the regression gate. This two-tier model keeps CI fast while maintaining coverage depth.
CI must be deterministic, fast, and network-independent. If LLM-generated fixtures are ever introduced:
- LLM output is serialized as static fixture files and committed to the repository.
- Fixtures are reviewed like any other code change.
- CI treats LLM-authored fixtures identically to hand-authored ones.
- No LLM calls happen during
cargo test. - The generation script (if any) is a developer tool, not a CI dependency.
| Current Component | Location | Purpose | Action | Target | Rationale |
|---|---|---|---|---|---|
encode_secret() |
generator.rs |
Dispatch raw→representation | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Domain-independent; reusable by git sim and real-rules |
base64_encode_std() |
generator.rs |
Base64 standard encoding | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Pure byte transform |
percent_encode_all() |
generator.rs |
URL percent encoding | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Pure byte transform |
encode_utf16() |
generator.rs |
UTF-16 LE/BE widening | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Pure byte transform |
encode_nested() |
generator.rs |
Alternating layer nesting | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Pure byte transform |
hex_nibble() |
generator.rs |
Nibble→hex helper | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Dependency of percent_encode_all |
SecretRepr |
scenario.rs |
Encoding representation enum | Done | crates/scanner-scheduler/src/sim/mutation/encode.rs |
Domain-independent type |
make_token() |
generator.rs |
Rule prefix + random tail | Keep | crates/scanner-scheduler/src/sim_scanner/generator.rs |
Scanner-specific format (SIM{id}_...) |
generate_scenario() |
sim_scanner/generator.rs |
Full scanner scenario | Keep | crates/scanner-scheduler/src/sim_scanner/generator.rs |
Domain-specific orchestration |
generate_scenario() |
sim_git_scan/generator.rs |
Full git scenario | Keep | crates/scanner-git/src/sim_git_scan/generator.rs |
Domain-specific orchestration |
SimRng |
sim/rng.rs |
Deterministic RNG | No change | Already in crates/scanner-scheduler/src/sim/ |
Already correct location |
SimExecutor |
sim/executor.rs |
Deterministic scheduler | No change | Already in crates/scanner-scheduler/src/sim/ |
Already correct location |
FaultPlan |
sim/fault.rs |
Scanner fault injection | Keep | crates/scanner-scheduler/src/sim/fault.rs |
Path-keyed, scanner-specific |
GitFaultPlan |
sim_git_scan/fault.rs |
Git fault injection | Keep | crates/scanner-git/src/sim_git_scan/fault.rs |
Resource-keyed, git-specific |
| Scanner minimizer | sim/minimize.rs |
Greedy shrink passes | Keep | crates/scanner-scheduler/src/sim/minimize.rs |
Domain-specific shrink logic |
| Git minimizer | sim_git_scan/minimize.rs |
Graph-aware shrink | Keep | crates/scanner-git/src/sim_git_scan/minimize.rs |
Graph-aware, git-specific |
| Scanner corpus | crates/scanner-engine-integration-tests/tests/corpus/scanner/ (71 cases) |
Regression replay | Keep | Same | Canonical fast gate |
| Git corpus | crates/scanner-engine-integration-tests/tests/corpus/git_scan/ (11 cases) |
Regression replay | Keep | Same | Canonical fast gate |
| Near-miss operators | crates/scanner-scheduler/src/sim/mutation/op.rs |
Near-miss mutation ops | Done | crates/scanner-scheduler/src/sim/mutation/op.rs |
Core new capability |
| Property tests | crates/scanner-engine-integration-tests/tests/property/ (20 files) |
Math invariants | Keep | Same | Different abstraction layer |
| Offline validator tests | crates/scanner-engine/src/engine/offline_validate.rs (39 tests) |
Validator vectors | Keep + Augment | Same + mutation-derived vectors | Add near-miss vectors in the fixture/validator augmentation section |
| Integration tests | crates/scanner-engine-integration-tests/tests/integration/ (23 scenario files + main.rs harness) |
Handcrafted regression | Keep | Same | Clear, readable, stable |
| Real-rules fixtures | crates/scanner-engine-integration-tests/tests/corpus/real_rules/ |
Curated corpus | Keep + Augment | Same + near-miss fixtures | Add near-miss fixtures in the fixture/validator augmentation section |
| Fuzz targets | Per-crate fuzz/fuzz_targets/ (29 targets) |
Coverage-guided | Keep | Same | Complementary discovery mechanism |
The shared mutation core must satisfy a strict deterministic contract.
Inputs:
family: TokenFamily — structural category of the secret
base_seed: u64 — seed for deterministic RNG
ops: Vec<MutOp> — ordered mutation/encoding pipeline
context: ContextWrap — surrounding context (e.g. JSON field, YAML value)
Outputs:
mutated_bytes: Vec<u8> — final byte sequence
expected_outcome: Outcome — expected detection result
DETERMINISM: Given identical (family, base_seed, ops, context),
the output is byte-for-byte identical across runs, platforms, and Rust
versions. No HashMap iteration, no system randomness, no floating point.
ISOLATION: Each mutation call is stateless. No global mutable state, no thread-locals, no ambient configuration.
SEED STABILITY: The RNG is SimRng (xorshift64* with multiplier
0x2545F4914F6CDD1D, zero-seed remapped to 0x9E3779B97F4A7C15). The
algorithm and constants are frozen. Any change to the RNG breaks all
corpus artifacts and requires a full re-minimization pass.
Reference implementation (crates/scanner-scheduler/src/sim/rng.rs):
pub fn next_u64(&mut self) -> u64 {
let mut x = self.state;
x ^= x >> 12;
x ^= x << 25;
x ^= x >> 27;
self.state = x;
x.wrapping_mul(0x2545F4914F6CDD1D)
}ENCODING PURITY: Every encoding function is a pure function from
&[u8] → Vec<u8>. No side effects, no RNG consumption, no allocation
beyond the output buffer.
EXPECTED OUTCOME: Each mutation produces an Outcome:
MustMatch— the engine must detect this token (positive case).MustNotMatch— the engine must NOT detect this token (negative/near-miss).MayMatch— indeterminate; useful for stress testing but not gated.
SERIALIZATION: All types derive serde::Serialize and
serde::Deserialize. Corpus artifacts round-trip through JSON without loss.
The types below reflect the actual implementation in
crates/scanner-scheduler/src/sim/mutation/.
// -- family.rs --
/// Token structural family. Determines which mutation operators are valid.
enum TokenFamily {
AwsAccessKey,
GithubFinegrainedPat,
GithubClassicPat,
JwtLike,
Base64Blob,
UrlEncodedBlob,
}
/// Expected detection outcome.
enum Outcome {
MustMatch,
MustNotMatch,
MayMatch,
}
// -- op.rs --
/// Individual mutation operation (fully specified parameters).
enum MutOp {
Truncate { len: usize },
CharsetViolate { positions: Vec<usize>, replacement: u8 },
PrefixMangle { replacement: Vec<u8> },
ChecksumCorrupt,
EntropyReduce { repeat_byte: u8, count: usize },
Encode { repr: SecretRepr },
Extend { suffix: Vec<u8> },
}
/// Fieldless discriminant mirror of MutOp for allowed-ops declarations.
enum MutOpKind {
Truncate, CharsetViolate, PrefixMangle, ChecksumCorrupt,
EntropyReduce, Encode, Extend,
}
// -- encode.rs --
/// Encoding layer applied to a secret before it appears in a file.
enum SecretRepr {
Raw,
Base64,
UrlPercent,
Utf16Le,
Utf16Be,
Nested { depth: u8 },
}An AWS access key has the form AKIA[A-Z2-7]{16}. A CharsetViolate
near-miss:
- Start:
AKIAIOSFODNN7EXAMPLE(valid,MustMatch) - Apply
CharsetViolate { positions: [0, 1, 2, ...], replacement: b'a' }: Replace bytes at specified positions with lowercasea→aaaaiosfodnn7example - Result: lowercase characters no longer match
AKIA[A-Z2-7]{16}→MustNotMatch
The mutation is deterministic (no RNG needed for this op), produces a structurally similar but invalid token, and has a clear expected outcome.
- Created
crates/scanner-scheduler/src/sim/mutation/as a submodule directory with 7 files:mod.rs— public re-exportsop.rs—MutOp/MutOpKindenums,apply_opspipeline withMAX_OUTPUT_BYTESsafety guardfamily.rs—TokenFamily(6 variants:AwsAccessKey,GithubFinegrainedPat,GithubClassicPat,JwtLike,Base64Blob,UrlEncodedBlob) withgen_valid,allowed_ops, andexpectationoracle;Outcomeenumencode.rs—SecretReprenum, migrated encoding functions (base64_encode_std,percent_encode_all,encode_utf16,encode_nested,base64url_encode_nopad,hex_nibble,encode_secret)plan.rs—MutationPlan,GeneratedCase,ContextWrap,WrappedToken,execute_planpipelineplan_gen.rs—random_mutation_plan,random_mutation_plans_all_familiesfor sim-harness integrationadapter.rs—build_mutation_scenario,build_mutation_engine,check_mutation_expectationstwo-oracle adapter forScannerSimRunner
- Near-miss mutation operators implemented:
Truncate,CharsetViolate,PrefixMangle,ChecksumCorrupt,EntropyReduce,Encode,Extend. - Thin shims in
sim_scanner/generator.rsdelegate to the shared module. - All existing corpus artifacts and tests pass unchanged.
- Acceptance:
cargo test --features sim-harnesspasses with zero delta.
- Add
near_miss_count: u32field toScenarioGenConfig(default 0). - Generator produces near-miss mutation cases whose expected outcome is
MustNotMatchwhennear_miss_count > 0. - Runner validates that near-miss secrets are NOT found (new oracle check).
- New random sim seeds exercise near-miss scenarios.
- Minimized failures added to corpus.
- Generate near-miss fixtures for
crates/scanner-engine-integration-tests/tests/corpus/real_rules/fixtures/using the mutation core. Commit as static files. - Add mutation-derived vectors to
offline_validate.rstests — particularly for charset, length, and checksum boundary conditions. - Update golden baseline if new fixtures alter expected findings.
- Create
propteststrategies that composeMutOpsequences. - Property: for any seed and op sequence, the output is deterministic.
- Property: encoding-only op sequences produce
MustMatchoutcomes. - New fuzz target:
fuzz_mutation_pipeline.rsfor coverage-guided mutation op sequence exploration.
- Document the format for LLM-generated fixture files.
- Provide a generation script that calls an LLM, serializes output, and
writes
.fixture.jsonfiles. - No CI dependency on LLM availability.
- This section specifies a fixture-generation contract and introduces no CI-time runtime dependency.
-
Chunk-boundary transforms: A base64-encoded secret may span a chunk boundary after encoding but not before (or vice versa). The mutation core must compute expected outcome based on the encoded byte length, not the raw length. The scanner sim's overlap-dedup oracle already handles cross-chunk findings; near-miss tests must exercise this boundary.
-
Non-root findings and
SCANNER_SIM_STRICT_NON_ROOT: Transform-decoded findings are non-root. The differential oracle only compares non-root findings whenSCANNER_SIM_STRICT_NON_ROOT=1. Near-miss mutations on encoded secrets must respect this flag — aMustNotMatchnear-miss inside a base64 layer should not cause a failure when strict non-root is off. -
UTF-16/nested depth interaction:
encode_utf16doubles byte length;encode_nestedalternates base64 and percent-encoding. Composing both can produce very large outputs. The mutation contract should cap maximum output size (e.g., 64 KiB) and returnMayMatchfor capped results. -
Tokens valid only after encoding: Some mutation + encoding sequences might accidentally produce a valid token for a different rule. The
expected_outcomemust be computed against the specific rule under test. Cross-rule false positives areMayMatch, notMustNotMatch. -
Conservative indeterminate for unknown families: For
TokenFamilyvariants where we lack structural knowledge, near-miss operators that depend on structure (e.g.,ChecksumCorrupt) should returnMayMatchrather thanMustNotMatch. -
Archive corruption interaction: Scanner sim injects archive corruption faults. If a near-miss secret is inside an archive entry and the archive is corrupted, the entry may not be extracted at all. The expected outcome depends on whether the entry was successfully inflated — use the existing
ExpectedDispositionmachinery to handle this. -
Entropy gate interaction: The
EntropyReduceoperator intentionally lowers Shannon entropy. If the engine's entropy gate rejects the token before rule matching, the outcome isMustNotMatch. But the threshold is rule-specific. The operator must consult theTokenFamily's entropy floor to produce correct expectations.
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Serde compatibility break | Low | High (all corpus artifacts) | Variant names for SecretRepr unchanged during migration. MutOp is new, no existing artifacts to break. Add a serde roundtrip test alongside the shared mutation core extraction. |
| False failures from near-miss | Medium | Medium (CI noise) | near_miss_count defaults to 0; existing tests unaffected. Near-miss tests are additive. New failures are always minimized before corpus addition. |
| Kitchen-sink module | Medium | Low (maintenance) | Exceeded 500-line threshold; factored into submodule directory mutation/ with 7 files (mod.rs, op.rs, family.rs, encode.rs, plan.rs, plan_gen.rs, adapter.rs). |
Rollback: Each rollout slice is independently reversible.
- Shared mutation core extraction: revert by removing
crates/scanner-scheduler/src/sim/mutation/directory, restoring inline functions ingenerator.rs. No corpus changes. - Near-miss scanner-sim integration: remove
near_miss_countfrom config, remove the new oracle check. Corpus additions are additive and can be deleted. - Fixture and validator augmentation: delete generated fixtures, revert the baseline. Offline validator vector additions are additive.
Verify that all existing test suites pass before any code changes:
# Property tests
cargo test --features property-tests,stdx-proptest --test property
# Scanner corpus replay
cargo test --features sim-harness --test simulation scanner_corpus
# Git corpus replay
cargo test --features sim-harness --test simulation git_scan_corpus
# Scanner random simulation (single seed)
cargo test --features sim-harness --test simulation scanner_random
# Integration tests
cargo test --features integration-tests --test integration
# Offline validator tests
cargo test --lib offline_validate
# YAML unit tests
cargo test --lib yaml_unit_testsRecord pass counts as the baseline for later regression checks.
- Run the command list above to capture a local baseline before changing code.
- Keep command output with the run artifacts rather than embedding snapshots in this design doc.
| File | Relevance |
|---|---|
crates/scanner-scheduler/src/sim_scanner/generator.rs |
Encoding functions, token generation |
crates/scanner-scheduler/src/sim_scanner/scenario.rs |
SecretRepr enum, ExpectedDisposition |
crates/scanner-scheduler/src/sim_scanner/runner.rs |
Scanner oracles (7), chunked scanning event loop |
crates/scanner-git/src/sim_git_scan/runner.rs |
Git stage pipeline, output-shape validation |
crates/scanner-git/src/sim_git_scan/fault.rs |
GitFaultPlan, GitResourceId (resource-keyed) |
crates/scanner-scheduler/src/sim/rng.rs |
SimRng xorshift64* implementation |
crates/scanner-scheduler/src/sim/fault.rs |
FaultPlan (path-keyed) |
crates/scanner-scheduler/src/sim/minimize.rs |
Scanner minimizer (greedy shrink passes) |
crates/scanner-git/src/sim_git_scan/minimize.rs |
Git minimizer (graph-aware shrink) |
crates/scanner-scheduler/src/sim/executor.rs |
SimExecutor deterministic scheduler |
crates/scanner-scheduler/src/sim/mutation/ |
Shared mutation core output: 7 files |
crates/scanner-engine/src/engine/offline_validate.rs |
Structural token validators |
crates/scanner-engine/src/rules/yaml_unit_tests.rs |
Rule parsing/scanning roundtrip |
crates/scanner-engine-integration-tests/tests/simulation/scanner_real_rules.rs |
Real-rules golden baseline harness |
| Document | Relevance |
|---|---|
docs/scanner-scheduler/scanner_harness_modes.md |
Synthetic vs real-rules mode comparison |
docs/scanner-scheduler/scanner_test_harness_guide.md |
Synthetic harness usage guide |
docs/scanner-git/git_simulation_harness_guide.md |
Git simulation harness guide |
docs/scanner-engine/detection-engine.md |
Engine architecture (transform pipeline context) |
docs/scanner-engine/engine-transforms.md |
Transform chain design (encoding context) |
docs/kani-verification.md |
Formal verification approach |