Skip to content

Commit a2e3f0b

Browse files
authored
chore: consolidate Rust and C++ API (#73)
* split all API into cleanly defined "block" api and "variable length" api. The API forces the input to be blocks of constant size, at compile time, and for the codec to actually match that. * Made code shorter by a thousand lines * Identical C++ and Rust usage * Introduce a new Composite codec that requires a generic `BlockCodec` and `AnyLenCodec` ```rust pub trait BlockCodec { /// The fixed-size block type. Must be plain-old-data (`Pod`). /// In practice this will be `[u32; 128]` or `[u32; 256]`. type Block: Pod; /// Compress a slice of complete, fixed-size blocks. /// /// No remainder is possible — the caller must split the input first using /// [`slice_to_blocks`] and handle any remainder separately. fn encode_blocks( &mut self, blocks: &[Self::Block], out: &mut Vec<u32>, ) -> Result<(), FastPForError>; /// Decompress exactly `n_blocks` blocks from `input`. fn decode_blocks( &mut self, input: &[u32], n_blocks: usize, out: &mut Vec<u32>, ) -> Result<(), FastPForError>; } /// Compresses and decompresses an arbitrary-length `&[u32]` slice. pub trait AnyLenCodec { /// Compress an arbitrary-length slice of `u32` values. fn encode(&mut self, input: &[u32], out: &mut Vec<u32>) -> Result<(), FastPForError>; /// Decompress a previously compressed slice of `u32` values. fn decode(&mut self, input: &[u32], out: &mut Vec<u32>) -> Result<(), FastPForError>; } /// Split a flat `&[u32]` into `(&[Blocks::Block], &[u32])` without copying. /// /// # Example /// /// ```rust,ignore /// let data: Vec<u32> = (0..600).collect(); // 2 × 256 + 88 remainder /// let (blocks, remainder) = slice_to_blocks::<FastPFor256>(&data); /// assert_eq!(blocks.len(), 2); // 2 blocks of [u32; 256] /// assert_eq!(remainder.len(), 88); /// ``` #[must_use] pub fn slice_to_blocks<Blocks: BlockCodec>(input: &[u32]) -> (&[Blocks::Block], &[u32]) { ... } /// Combines a block-oriented codec with an arbitrary-length tail codec. /// /// `CompositeCodec<Blocks, Tail>` implements [`AnyLenCodec`]: it accepts any /// input length, encodes the aligned prefix with `Blocks`, and the /// sub-block remainder with `Tail`. /// /// # Wire format /// /// ```text /// [ n_blocks: u32 ] [ Blocks encoded data... ] [ Tail encoded data... ] /// ``` /// /// # Example /// /// ```rust,ignore /// use fastpfor::{AnyLenCodec, FastPFor256WithVByte}; /// /// let data: Vec<u32> = (0..600).collect(); // 2 × 256 + 88 remainder /// let codec = FastPFor256WithVByte::default(); /// /// let mut encoded = Vec::new(); /// codec.encode(&data, &mut encoded).unwrap(); /// /// let mut decoded = Vec::new(); /// codec.decode(&encoded, &mut decoded).unwrap(); /// assert_eq!(decoded, data); /// ``` pub struct CompositeCodec<Blocks: BlockCodec, Tail: AnyLenCodec> {...} ```
1 parent 085fa08 commit a2e3f0b

46 files changed

Lines changed: 2711 additions & 3475 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,9 @@ jobs:
7575
matrix:
7676
include:
7777
- fuzz_target: cpp_roundtrip
78-
- fuzz_target: rust_compress_oracle
79-
- fuzz_target: rust_decompress_oracle
80-
- fuzz_target: rust_decompress_arbitrary
78+
- fuzz_target: encode_oracle
79+
- fuzz_target: decode_oracle
80+
- fuzz_target: decode_arbitrary
8181
steps:
8282
- uses: actions/checkout@v6
8383
with: {persist-credentials: false, submodules: recursive}

Cargo.toml

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,8 @@ name = "fastpfor_benchmark"
2121
required-features = ["rust"]
2222
harness = false
2323

24-
[[bench]]
25-
name = "bench_utils"
26-
required-features = ["rust"]
27-
harness = false
28-
bench = false
29-
3024
[features]
31-
# Eventually we may want to build without the C++ bindings by default.
32-
# Keeping it on for now to simplify development.
33-
default = ["cpp", "rust"]
25+
default = ["rust"]
3426
# Used internally for testing and benchmarking. Not intended for public use.
3527
_all_compatible = ["cpp_portable", "rust"]
3628
# Use portable C++ code that will not rely on the latest CPU features. This is the default for the C++ bindings.

README.md

Lines changed: 147 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -8,48 +8,140 @@
88
[![CI build status](https://github.com/fast-pack/FastPFOR-rs/actions/workflows/ci.yml/badge.svg)](https://github.com/fast-pack/FastPFOR-rs/actions)
99
[![Codecov](https://img.shields.io/codecov/c/github/fast-pack/FastPFOR-rs)](https://app.codecov.io/gh/fast-pack/FastPFOR-rs)
1010

11-
This is a Rust wrapper for the [C++ FastPFor library](https://github.com/fast-pack/FastPFor), as well as a pure Rust re-implementation. Supports 32-bit and 64-bit integers, and SIMD-optimized codecs for 128-bit and 256-bit vectors. Based on the [Decoding billions of integers per second through vectorization, 2012](https://arxiv.org/abs/1209.2137) paper.
11+
Fast integer compression for Rust — both a pure-Rust implementation and a wrapper around the [C++ FastPFor library](https://github.com/fast-pack/FastPFor).
12+
Supports 32-bit (and for some codecs 64-bit) integers.
13+
Based on the [Decoding billions of integers per second through vectorization, 2012](https://arxiv.org/abs/1209.2137) paper.
1214

1315
The Rust **decoder** is about 29% faster than the C++ version. The Rust implementation contains no `unsafe` code, and when built without the `cpp` feature this crate has `#![forbid(unsafe_code)]`.
1416

15-
### Supported algorithms
16-
Unless otherwise specified, all codecs support `&[u32]` only.
17-
18-
```text
19-
* BP32
20-
* Copy
21-
* FastBinaryPacking16
22-
* FastBinaryPacking32
23-
* FastBinaryPacking8
24-
* FastPFor128 (both `&[u32]` and `&[u64]`)
25-
* FastPFor256 (both `&[u32]` and `&[u64]`)
26-
* MaskedVByte
27-
* NewPFor
28-
* OptPFor
29-
* PFor
30-
* PFor2008
31-
* SimdBinaryPacking
32-
* SimdFastPFor128
33-
* SimdFastPFor256
34-
* SimdGroupSimple
35-
* SimdGroupSimpleRingBuf
36-
* SimdNewPFor
37-
* SimdOptPFor
38-
* SimdPFor
39-
* SimdSimplePFor
40-
* Simple16
41-
* Simple8b
42-
* Simple8bRle
43-
* Simple9
44-
* Simple9Rle
45-
* SimplePFor
46-
* StreamVByte
47-
* VByte
48-
* VarInt (both `&[u32]` and `&[u64]`)
49-
* VarIntGb
17+
## Usage
18+
19+
### Rust Implementation (default)
20+
21+
The simplest way is `FastPFor256` — a composite codec that handles any input
22+
length by compressing aligned 256-element blocks with `FastPForBlock256` and encoding any
23+
leftover values with `VariableByte`.
24+
25+
```rust
26+
use fastpfor::{AnyLenCodec, FastPFor256};
27+
28+
let mut codec = FastPFor256::default();
29+
let input: Vec<u32> = (0..1000).collect();
30+
31+
let mut encoded = Vec::new();
32+
codec.encode(&input, &mut encoded).unwrap();
33+
34+
let mut decoded = Vec::new();
35+
codec.decode(&encoded, &mut decoded, None).unwrap();
36+
37+
assert_eq!(decoded, input);
5038
```
5139

40+
For block-aligned inputs you can use the lower-level `BlockCodec` API:
41+
42+
```rust
43+
use fastpfor::{BlockCodec, FastPForBlock256, slice_to_blocks};
44+
45+
let mut codec = FastPForBlock256::default();
46+
let input: Vec<u32> = (0..512).collect(); // exactly 2 blocks of 256
47+
48+
let (blocks, remainder) = slice_to_blocks::<FastPForBlock256>(&input);
49+
assert_eq!(blocks.len(), 2);
50+
assert!(remainder.is_empty());
51+
52+
let mut encoded = Vec::new();
53+
codec.encode_blocks(blocks, &mut encoded).unwrap();
54+
55+
let mut decoded = Vec::new();
56+
codec.decode_blocks(&encoded, Some(u32::try_from(blocks.len() * 256).expect("block count fits in u32")), &mut decoded).unwrap();
57+
58+
assert_eq!(decoded, input);
59+
```
60+
61+
### C++ Wrapper (`cpp` feature)
62+
63+
Enable the `cpp` feature in `Cargo.toml`:
64+
65+
```toml
66+
fastpfor = { version = "0.1", features = ["cpp"] }
67+
```
68+
69+
All C++ codecs implement the same `AnyLenCodec` trait (`encode` / `decode`), so
70+
the usage pattern is identical to the Rust examples above — just swap the codec type,
71+
e.g. `cpp::CppFastPFor128::new()`.
72+
73+
**Thread safety:** C++ codec instances have internal state and are **not thread-safe**.
74+
Create one instance per thread or synchronize access externally.
75+
76+
## Crate Features
77+
78+
| Feature | Default | Description |
79+
|----------------|---------|----------------------------------------------------------------------------------------------|
80+
| `rust` | **yes** | Pure-Rust implementation — no `unsafe`, no build dependencies |
81+
| `cpp` | no | C++ wrapper via CXX — requires a C++14 compiler with SIMD support |
82+
| `cpp_portable` | no | Enables `cpp`, compiles C++ with SSE4.2 baseline (runs on any x86-64 from ~2008+) |
83+
| `cpp_native` | no | Enables `cpp`, compiles C++ with `-march=native` for maximum throughput on the build machine |
84+
85+
The `FASTPFOR_SIMD_MODE` environment variable (`portable` or `native`) can override the SIMD mode at build time.
86+
87+
**Recommendation:** Use `cpp_portable` (not `cpp_native`) for distributable binaries.
88+
89+
## Supported Algorithms
90+
91+
### Rust (`rust` feature)
92+
93+
Rust block codecs require block-aligned input. `CompositeCodec` chains a block codec with a tail codec (e.g. `VariableByte`) to handle arbitrary-length input. `FastPFor256` and `FastPFor128` are type aliases for such composites.
94+
95+
| Codec | Description |
96+
|--------------------|--------------------------------------------------------------|
97+
| `FastPFor256` | `CompositeCodec` of `FastPForBlock256` + `VariableByte` |
98+
| `FastPFor128` | `CompositeCodec` of `FastPForBlock128` + `VariableByte` |
99+
| `VariableByte` | Variable-byte encoding, MSB is opposite to protobuf's varint |
100+
| `JustCopy` | No compression; useful as a baseline |
101+
| `FastPForBlock256` | `FastPFor` with 256-element blocks; block-aligned input only |
102+
| `FastPForBlock128` | `FastPFor` with 128-element blocks; block-aligned input only |
103+
104+
### C++ (`cpp` feature)
105+
106+
All C++ codecs are composite (any-length) and implement `AnyLenCodec` only.
107+
`u64`-capable codecs (`CppFastPFor128`, `CppFastPFor256`, `CppVarInt`) also implement `BlockCodec64` with `encode64` / `decode64`.
108+
109+
| Codec | Notes |
110+
|-----------------------------|------------------------------------------------------------------------|
111+
| `CppFastPFor128` | `FastPFor + VByte` composite, 128-element blocks. Also supports `u64`. |
112+
| `CppFastPFor256` | `FastPFor + VByte` composite, 256-element blocks. Also supports `u64`. |
113+
| `CppSimdFastPFor128` | SIMD-optimized 128-element variant |
114+
| `CppSimdFastPFor256` | SIMD-optimized 256-element variant |
115+
| `CppBP32` | Binary packing, 32-bit blocks |
116+
| `CppFastBinaryPacking8` | Binary packing, 8-bit groups |
117+
| `CppFastBinaryPacking16` | Binary packing, 16-bit groups |
118+
| `CppFastBinaryPacking32` | Binary packing, 32-bit groups |
119+
| `CppSimdBinaryPacking` | SIMD-optimized binary packing |
120+
| `CppPFor` | Patched frame-of-reference |
121+
| `CppSimplePFor` | Simplified `PFor` variant |
122+
| `CppNewPFor` | `PFor` with improved exception handling |
123+
| `CppOptPFor` | Optimized `PFor` |
124+
| `CppPFor2008` | Reference implementation from original paper |
125+
| `CppSimdPFor` | SIMD `PFor` |
126+
| `CppSimdSimplePFor` | SIMD `SimplePFor` |
127+
| `CppSimdNewPFor` | SIMD `NewPFor` |
128+
| `CppSimdOptPFor` | SIMD `OptPFor` |
129+
| `CppSimple16` | 16 packing modes in 32-bit words |
130+
| `CppSimple9` | 9 packing modes |
131+
| `CppSimple9Rle` | Simple9 with run-length encoding |
132+
| `CppSimple8b` | 8 packing modes in 64-bit words |
133+
| `CppSimple8bRle` | Simple8b with run-length encoding |
134+
| `CppSimdGroupSimple` | SIMD group-simple encoding |
135+
| `CppSimdGroupSimpleRingBuf` | SIMD group-simple with ring buffer |
136+
| `CppVByte` | Standard variable-byte encoding |
137+
| `CppMaskedVByte` | SIMD masked variable-byte |
138+
| `CppStreamVByte` | SIMD stream variable-byte |
139+
| `CppVarInt` | Standard varint. Also supports `u64`. |
140+
| `CppVarIntGb` | Group varint |
141+
| `CppCopy` | No compression (baseline) |
142+
52143
## Benchmarks
144+
53145
### Decoding
54146

55147
Using Linux x86-64 running `just bench::cpp-vs-rust-decode native`. The values below are time measurements; smaller values indicate faster decoding.
@@ -67,100 +159,58 @@ Using Linux x86-64 running `just bench::cpp-vs-rust-decode native`. The values b
67159
| `uniform_small_value_distribution/1024` | 606.4 | 405.44 | 33.14% |
68160
| `uniform_small_value_distribution/4096` | 2017.3 | 1403.7 | 30.42% |
69161

70-
Rust Encoding has not yet been either optimized or even fully verified.
71-
72-
## Usage
73-
74-
### Crate Features
75-
* `cpp` - C++ implementation (uses portable SIMD mode)
76-
* `rust` - Rust implementation (safe Rust code, no `unsafe` blocks)
77-
78-
#### SIMD Mode Configuration
79-
80-
The C++ backend can be compiled with different SIMD instruction sets. Control this by enabling one of these features:
81-
| Mode | Description |
82-
|------|-------------|
83-
| `cpp_portable` | **Default.** Uses SSE4.2 baseline only. Binaries run on any x86-64 CPU from ~2008+. Best for distributable libraries. |
84-
| `cpp_native` | Uses `-march=native` to enable all SIMD instructions supported by the build machine (AVX, AVX2, etc.). Maximum performance but may crash on CPUs lacking those instructions. |
85-
86-
Feature selection can be overridden with the `FASTPFOR_SIMD_MODE` environment variable set to "portable" or "native".
87-
88-
**Recommendation:** Use `portable` (default) for libraries and distributed binaries. Use `native` only when building for a specific machine where you need maximum performance.
89-
90-
### Using C++ Wrapper
91-
92-
```rust
93-
use fastpfor::AnyLenCodec as _;
94-
use fastpfor::cpp::CppSimdFastPFor128;
95-
96-
fn main() {
97-
let mut codec = CppSimdFastPFor128::new();
98-
99-
let input = vec![1u32, 2, 3, 4, 5];
100-
let mut compressed = Vec::new();
101-
codec.encode(&input, &mut compressed).unwrap();
102-
103-
let mut decoded = Vec::new();
104-
codec
105-
.decode(&compressed, &mut decoded, None)
106-
.unwrap();
107-
108-
assert_eq!(input, decoded);
109-
}
110-
```
162+
Rust encoding has not yet been fully optimized or verified.
111163

112164
## Build Requirements
113165

114-
- When using the **Rust implementation**:
115-
no additional dependencies are required.
116-
- When using the **C++ implementation**:
117-
you need to have a C++ compiler that supports C++14 and SIMD intrinsics.
166+
- **Rust feature** (`rust`, the default): no additional dependencies.
167+
- **C++ feature** (`cpp`): requires a C++14-capable compiler with SIMD intrinsics.
118168
See [FastPFor C++ requirements](https://github.com/fast-pack/FastPFor?tab=readme-ov-file#software-requirements).
119169

120170
### Linux
121171

122-
The default GitHub action runner for Linux has all the needed dependencies.
172+
The default GitHub Actions runner has all needed dependencies.
123173

124-
For local development, you may need to install the following packages:
174+
For local development:
125175

126176
```bash
127177
# This list may be incomplete
128178
sudo apt-get install build-essential
129179
```
130180

131-
`libsimde-dev` is optional. On ARM/aarch64, the C++ build fetches `SIMDe` via `CMake`,
132-
and the Rust CXX bridge now reuses that fetched include path automatically.
133-
Install `libsimde-dev` only if you prefer a system package fallback.
181+
`libsimde-dev` is optional. On ARM/aarch64, the C++ build fetches `SIMDe` via `CMake`
182+
and the CXX bridge reuses that include path automatically.
134183

135184
### macOS
136-
On Apple Silicon, manual `SIMDe` installation is usually not required.
137-
The C++ build fetches `SIMDe` via `CMake`, and the Rust CXX bridge reuses that path.
138185

139-
If you prefer a system package fallback, install `SIMDe` with Homebrew and set include flags.
186+
On Apple Silicon, `SIMDe` installation is usually not required — the C++ build fetches it via `CMake`.
187+
188+
If you prefer a Homebrew fallback:
140189

141190
```bash
142-
# optional: install SIMDe via Homebrew
143191
brew install simde
144-
145-
# optional fallback: ensure the compiler can find Homebrew headers
146192
export CXXFLAGS="-I/opt/homebrew/include"
147193
export CFLAGS="-I/opt/homebrew/include"
148194
```
149195

150196
## Development
151197

152-
* This project is easier to develop with [just](https://github.com/casey/just#readme), a modern alternative to `make`.
153-
Install it with `cargo install just`.
154-
* To get a list of available commands, run `just`.
155-
* To run tests, use `just test`.
198+
This project uses [just](https://github.com/casey/just#readme) as a task runner:
199+
200+
```bash
201+
cargo install just # install once
202+
just # list available commands
203+
just test # run all tests
204+
```
156205

157206
## License
158207

159208
Licensed under either of
160209

161210
* Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or <https://www.apache.org/licenses/LICENSE-2.0>)
162211
* MIT license ([LICENSE-MIT](LICENSE-MIT) or <https://opensource.org/licenses/MIT>)
163-
at your option.
212+
213+
at your option.
164214

165215
### Contribution
166216

0 commit comments

Comments
 (0)