Skip to content

Commit e5a00ea

Browse files
CommanderStormpre-commit-ci[bot]Copilot
authored
test: switch from roundtrip based tests to e2e based fuzzers (#59)
This is stacked on top of #58 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 2d5d4e7 commit e5a00ea

9 files changed

Lines changed: 368 additions & 283 deletions

File tree

.github/workflows/ci.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,9 @@ jobs:
6767
strategy:
6868
matrix:
6969
include:
70-
- fuzz_target: fastpfor_cpp
71-
- fuzz_target: fastpfor_rust
70+
- fuzz_target: cpp_roundtrip
71+
- fuzz_target: rust_compress_oracle
72+
- fuzz_target: rust_decompress_oracle
7273

7374
steps:
7475
- uses: actions/checkout@v6

fuzz/Cargo.toml

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,19 @@ fastpfor = { path = "..", features = ["cpp", "rust"] }
1717
members = ["."]
1818

1919
[[bin]]
20-
name = "fastpfor_rust"
21-
path = "fuzz_targets/fastpfor_rust.rs"
20+
name = "cpp_roundtrip"
21+
path = "fuzz_targets/cpp_roundtrip.rs"
2222
test = false
2323
doc = false
2424

2525
[[bin]]
26-
name = "fastpfor_cpp"
27-
path = "fuzz_targets/fastpfor_cpp.rs"
26+
name = "rust_compress_oracle"
27+
path = "fuzz_targets/rust_compress_oracle.rs"
28+
test = false
29+
doc = false
30+
31+
[[bin]]
32+
name = "rust_decompress_oracle"
33+
path = "fuzz_targets/rust_decompress_oracle.rs"
2834
test = false
2935
doc = false

fuzz/README.md

Lines changed: 25 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,36 +1,17 @@
11
# Fuzzing FastPFOR
22

3-
This directory contains a fuzz test for the FastPFOR compression codec to find bugs, panics, and data corruption issues.
3+
This directory contains fuzz tests for the FastPFOR compression codec to find bugs, panics, and data corruption issues.
44

55
## Why Fuzz FastPFOR?
66

7-
The FastPFOR codec is the core compression algorithm. Fuzzing helps catch:
8-
- Data corruption during compress/decompress roundtrips
7+
The FastPFOR codec is a core compression algorithm. Fuzzing helps catch:
8+
- Implementation discrepancies between Rust and C++
9+
- Data corruption during compress/decompress cycles
910
- Panics on edge case inputs
1011
- Buffer overflows or underflows
1112
- Incorrect handling of different block sizes (128 vs 256)
1213
- Issues with boundary conditions (empty data, very large values, etc.)
1314

14-
## Known Issues Found
15-
16-
The fuzzer has already discovered the following issues:
17-
18-
### Data Loss with Small Inputs (Issue #1)
19-
20-
**Input:** Single element array `[0]` with block size 128
21-
22-
**Issue:** FastPFOR silently drops data that doesn't fit into complete blocks. When the input length is less than the block size (128 or 256), `greatest_multiple(input_length, block_size)` returns 0, causing the codec to:
23-
1. Write 0 to the output header
24-
2. Skip compression entirely
25-
3. On decompression, return 0 elements instead of the original input
26-
27-
**Expected:** Should either:
28-
- Compress partial blocks correctly, or
29-
- Return an error indicating input is too small, or
30-
- Document this limitation clearly
31-
32-
This is a **data corruption bug** where the codec claims success but loses data.
33-
3415
## Prerequisites
3516

3617
Install cargo-fuzz and switch to nightly Rust:
@@ -40,34 +21,41 @@ cargo install cargo-fuzz
4021
rustup install nightly
4122
```
4223

43-
## Running the Fuzzer
24+
## Running the Fuzzers
25+
26+
Run the oracle-based compression fuzzer:
4427

4528
```bash
4629
cd fuzz
47-
cargo +nightly fuzz run fastpfor_rust
30+
cargo +nightly fuzz run rust_compress_oracle
31+
# or
32+
cargo +nightly fuzz run rust_decompress_oracle
33+
# or
34+
cargo +nightly fuzz run cpp_roundtrip
4835
```
4936

5037
Run for a specific duration (e.g., 60 seconds):
5138

5239
```bash
53-
cargo +nightly fuzz run fastpfor_rust -- -max_total_time=60
40+
cargo +nightly fuzz run rust_compress_oracle -- -max_total_time=60
5441
```
5542

56-
## What It Tests
43+
Run with specific number of iterations:
5744

58-
The fuzzer:
59-
1. Generates random sequences of u32 integers
60-
2. Randomly selects block size (128 or 256)
61-
3. Compresses the data with FastPFOR
62-
4. Decompresses the result
63-
5. Verifies the output matches the original input exactly
64-
65-
This ensures the codec is lossless and doesn't corrupt data under any input pattern.
45+
```bash
46+
cargo +nightly fuzz run rust_compress_oracle -- -runs=1000
47+
```
6648

6749
## If a Crash Is Found
6850

69-
Crashes are saved to `fuzz/artifacts/fastpfor_rust/`. To reproduce:
51+
Crashes are saved to `fuzz/artifacts/<target_name>/`. To reproduce:
52+
53+
```bash
54+
cargo +nightly fuzz run <target_name> fuzz/artifacts/<target_name>/crash-<hash>
55+
```
56+
57+
For example:
7058

7159
```bash
72-
cargo +nightly fuzz run fastpfor_rust fuzz/artifacts/fastpfor_rust/crash-<hash>
60+
cargo +nightly fuzz run rust_compress_oracle fuzz/artifacts/rust_compress_oracle/crash-abc123
7361
```

fuzz/fuzz_targets/common.rs

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
use fastpfor::cpp;
2+
use fastpfor::rust;
3+
4+
pub type BoxedCppCodec = Box<dyn cpp::Codec32>;
5+
6+
#[derive(arbitrary::Arbitrary)]
7+
pub struct FuzzInput<C> {
8+
pub data: Vec<u32>,
9+
pub codec: C,
10+
}
11+
12+
impl<C: std::fmt::Debug> std::fmt::Debug for FuzzInput<C> {
13+
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
14+
f.debug_struct("FuzzInput<C>")
15+
.field("data_length", &self.data.len())
16+
.field("codec", &self.codec)
17+
.finish()
18+
}
19+
}
20+
21+
#[derive(arbitrary::Arbitrary, Clone, Copy, PartialEq, Eq, Debug)]
22+
pub enum RustCodec {
23+
FastPFOR256,
24+
FastPFOR128,
25+
VariableByte,
26+
JustCopy,
27+
}
28+
29+
impl From<RustCodec> for rust::Codec {
30+
fn from(codec: RustCodec) -> Self {
31+
use rust::*;
32+
match codec {
33+
RustCodec::FastPFOR256 => Codec::from(FastPFOR::new(DEFAULT_PAGE_SIZE, BLOCK_SIZE_256)),
34+
RustCodec::FastPFOR128 => Codec::from(FastPFOR::new(DEFAULT_PAGE_SIZE, BLOCK_SIZE_128)),
35+
RustCodec::VariableByte => Codec::from(VariableByte::new()),
36+
RustCodec::JustCopy => Codec::from(JustCopy::new()),
37+
}
38+
}
39+
}
40+
41+
#[derive(Clone, Copy, Eq, PartialEq, arbitrary::Arbitrary, Debug)]
42+
pub enum CppCodec {
43+
BP32,
44+
Copy,
45+
FastBinaryPacking8,
46+
FastPFor128,
47+
FastPFor256,
48+
FastBinaryPacking16,
49+
FastBinaryPacking32,
50+
MaskedVByte,
51+
NewPFor,
52+
OptPFor,
53+
PFor2008,
54+
PFor,
55+
SimdBinaryPacking,
56+
SimdFastPFor128,
57+
SimdFastPFor256,
58+
SimdGroupSimple,
59+
SimdGroupSimpleRingBuf,
60+
SimdNewPFor,
61+
SimdOptPFor,
62+
SimdPFor,
63+
SimdSimplePFor,
64+
// Simple16, // cannot encode arbitrary bytes
65+
// Simple8b, // cannot encode arbitrary bytes
66+
// Simple8bRle, // cannot encode arbitrary bytes
67+
// Simple9, // cannot encode arbitrary bytes
68+
// Simple9Rle, // cannot encode arbitrary bytes
69+
// SimplePFor, // cannot encode arbitrary bytes
70+
// Snappy, // Conditional with #ifdef
71+
StreamVByte,
72+
VByte,
73+
VarInt,
74+
// VarIntG8iu, // Conditional with #ifdef
75+
VarIntGb,
76+
// VsEncoding, // This is leaking memory
77+
}
78+
79+
impl From<CppCodec> for BoxedCppCodec {
80+
fn from(codec: CppCodec) -> Self {
81+
use cpp::*;
82+
match codec {
83+
CppCodec::BP32 => Box::new(BP32Codec::default()),
84+
CppCodec::Copy => Box::new(CopyCodec::default()),
85+
CppCodec::FastBinaryPacking8 => Box::new(FastBinaryPacking8Codec::default()),
86+
CppCodec::FastPFor128 => Box::new(FastPFor128Codec::default()),
87+
CppCodec::FastPFor256 => Box::new(FastPFor256Codec::default()),
88+
CppCodec::FastBinaryPacking16 => Box::new(FastBinaryPacking16Codec::default()),
89+
CppCodec::FastBinaryPacking32 => Box::new(FastBinaryPacking32Codec::default()),
90+
CppCodec::MaskedVByte => Box::new(MaskedVByteCodec::default()),
91+
CppCodec::NewPFor => Box::new(NewPForCodec::default()),
92+
CppCodec::OptPFor => Box::new(OptPForCodec::default()),
93+
CppCodec::PFor2008 => Box::new(PFor2008Codec::default()),
94+
CppCodec::PFor => Box::new(PForCodec::default()),
95+
CppCodec::SimdBinaryPacking => Box::new(SimdBinaryPackingCodec::default()),
96+
CppCodec::SimdFastPFor128 => Box::new(SimdFastPFor128Codec::default()),
97+
CppCodec::SimdFastPFor256 => Box::new(SimdFastPFor256Codec::default()),
98+
CppCodec::SimdGroupSimple => Box::new(SimdGroupSimpleCodec::default()),
99+
CppCodec::SimdGroupSimpleRingBuf => Box::new(SimdGroupSimpleRingBufCodec::default()),
100+
CppCodec::SimdNewPFor => Box::new(SimdNewPForCodec::default()),
101+
CppCodec::SimdOptPFor => Box::new(SimdOptPForCodec::default()),
102+
CppCodec::SimdPFor => Box::new(SimdPForCodec::default()),
103+
CppCodec::SimdSimplePFor => Box::new(SimdSimplePForCodec::default()),
104+
// CppCodec::Simple16 => Box::new(Simple16Codec::default()),
105+
// CppCodec::Simple8b => Box::new(Simple8bCodec::default()),
106+
// CppCodec::Simple8bRle => Box::new(Simple8bRleCodec::default()),
107+
// CppCodec::Simple9 => Box::new(Simple9Codec::default()),
108+
// CppCodec::Simple9Rle => Box::new(Simple9RleCodec::default()),
109+
// CppCodec::SimplePFor => Box::new(SimplePForCodec::default()),
110+
// CppCodec::Snappy => Box::new(SnappyCodec::default()),
111+
CppCodec::StreamVByte => Box::new(StreamVByteCodec::default()),
112+
CppCodec::VByte => Box::new(VByteCodec::default()),
113+
CppCodec::VarInt => Box::new(VarIntCodec::default()),
114+
// CppCodec::VarIntG8iu => Box::new(VarIntG8iuCodec::default()),
115+
CppCodec::VarIntGb => Box::new(VarIntGbCodec::default()),
116+
// CppCodec::VsEncoding => Box::new(VsEncodingCodec::default()),
117+
}
118+
}
119+
}

fuzz/fuzz_targets/cpp_roundtrip.rs

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
#![no_main]
2+
3+
use libfuzzer_sys::fuzz_target;
4+
mod common;
5+
use common::*;
6+
7+
fuzz_target!(|data: FuzzInput<CppCodec>| {
8+
let codec = BoxedCppCodec::from(data.codec);
9+
let input = data.data;
10+
11+
// Allocate output buffer with generous size
12+
let mut output = vec![0u32; input.len() * 2 + 1024];
13+
14+
// Compress the data
15+
let enc_slice = codec.encode32(&input, &mut output).unwrap();
16+
17+
// Now decompress
18+
let mut decoded = vec![0u32; input.len() * 2 + 1024];
19+
let dec_slice = codec.decode32(enc_slice, &mut decoded).unwrap();
20+
21+
// Verify roundtrip
22+
if dec_slice.len() + input.len() < 200 {
23+
assert_eq!(input, dec_slice, "Decompressed output mismatches");
24+
} else {
25+
assert_eq!(dec_slice.len(), input.len(), "Decompressed length mismatch");
26+
for (i, (&original, &decoded)) in input.iter().zip(dec_slice.iter()).enumerate() {
27+
assert_eq!(
28+
original, decoded,
29+
"Mismatch at position {i}: expected {original}, got {decoded}"
30+
);
31+
}
32+
}
33+
});

0 commit comments

Comments
 (0)