This document provides guidance for AI assistants working with the rustims codebase.
rustims is a framework for processing raw data from Ion-Mobility Spectrometry (IMS) in proteomics mass spectrometry. It provides efficient algorithm implementations and robust data structures using Rust as the backend, with Python bindings via PyO3.
- License: MIT
- Author: David Teschner
- Python Version: >=3.11, <3.14
- Rust Version: 1.84+
rustims/
├── mscore/ # Core Rust library - data structures & algorithms
├── rustdf/ # Rust TDF file reader/writer (Bruker timsTOF format)
├── rustms/ # Additional Rust MS utilities (chemistry, proteomics)
├── imspy_connector/ # PyO3 Python bindings (compiles to wheel)
├── packages/ # Modular Python packages
│ ├── imspy-core/ # Base data structures, timsTOF dataset access
│ ├── imspy-predictors/ # PyTorch models for CCS, RT, fragment intensities
│ ├── imspy-dia/ # DIA-PASEF clustering and feature extraction
│ ├── imspy-search/ # Database search integration (sagepy, mokapot)
│ ├── imspy-simulation/ # TimSim synthetic data generation
│ └── imspy-vis/ # Visualization tools
├── imsjl_connector/ # Julia bindings via FFI (experimental)
├── IMSJL/ # Julia code (experimental)
└── .github/workflows/ # CI/CD pipelines
The project follows a layered architecture:
- Rust Core Layer (
mscore,rustdf,rustms): Low-level, high-performance implementations - Binding Layer (
imspy_connector): PyO3 bindings exposing Rust to Python - Python API Layer (
packages/): Modular Python packages with ML/DL features
Core library containing:
data/- Spectrum, peptide, SMILES data structureschemistry/- Elements, amino acids, UNIMOD, sum formulasalgorithm/- Isotope calculations, peptide utilitiestimstof/- Frame, slice, spectrum structures for timsTOF datasimulation/- Annotation and simulation utilities
Key types: MzSpectrum, MsType, PeptideSequence, ImsFrame, TimsFrame
TDF file I/O library:
data/- Dataset reading, DDA/DIA handling, raw data accesssim/- Simulation containers, precursor handling, synthetic data generation
Additional MS utilities:
chemistry/- Chemical formulas, elements, UNIMODproteomics/- Amino acids, peptidesalgorithm/- Isotope and peptide algorithmsms/- Spectrum utilities
Python bindings organized as 21 submodules:
py_mz_spectrum,py_peptide,py_tims_frame,py_tims_slice,py_datasetpy_dda,py_dia,py_quadrupole,py_feature,py_pseudopy_chemistry,py_elements,py_sumformula,py_amino_acids,py_unimod,py_constantspy_annotation,py_simulation,py_ml_utility,py_spectrum_processing,py_utility- Each
py_*.rsfile wraps corresponding Rust types - Uses
#[pyclass]and#[pymethods]attributes
| Package | Version | Module | Description |
|---|---|---|---|
| imspy-core | 0.4.0 | imspy_core |
Base data structures, timsTOF dataset access |
| imspy-predictors | 0.5.0 | imspy_predictors |
PyTorch models for CCS, RT, fragment intensities |
| imspy-dia | 0.4.0 | imspy_dia |
DIA-PASEF clustering and feature extraction |
| imspy-search | 0.4.0 | imspy_search |
Database search integration (sagepy, mokapot) |
| imspy-simulation | 0.4.0 | imspy_simulation |
TimSim synthetic data generation & EVAL pipeline |
| imspy-vis | 0.4.0 | imspy_vis |
Visualization and plotting tools |
core/- RustWrapperObject base classdata/- MzSpectrum, PeptideSequence wrapperschemistry/- Elements, amino acids, UNIMOD, mobility, constantstimstof/- TimsDataset, TimsDatasetDDA, TimsDatasetDIA, TimsFrame, TimsSliceutility/- General helpers, sequence processing
ccs/- Collisional cross-section predictors (PyTorchCCSPredictor)rt/- Retention time predictors (PyTorchRTPredictor)intensity/- Ion intensity predictors (Prosit2023TimsTofWrapper,DeepPeptideIntensityPredictor)ionization/- Charge state predictorskoina_models/- Koina remote prediction servicemodels/- Neural network model definitionspretrained/- Pretrained model managementutilities/- Tokenizers and utilities
clustering/- DIA clustering utilitiespipeline/- DIA processing pipeline (cluster_pipeline, cluster_report)
cli/- Command-line interfaces (imspy_dda, imspy_ccs, imspy_rescore_sage)configs/- Configuration filesdda_extensions.py- Sage extension methods for PrecursorDDA/TimsDatasetDDArescoring.py- PSM rescoringutility.py,mgf.py,sage_output_utility.py
timsim/- TimSim simulator, GUI, jobs, validation, integration tests (EVAL pipeline)builders/- Simulation builderscore/- Core simulation logicdata/- Simulation data structuresannotation.py,experiment.py,acquisition.py,tdf.py
frame_rendering.py- Frame renderingpointcloud.py- Point cloud visualization
| Entry Point | Package | Function |
|---|---|---|
imspy-dda |
imspy-search | imspy_search.cli.imspy_dda:main |
imspy-ccs |
imspy-search | imspy_search.cli.imspy_ccs:main |
imspy-rescore-sage |
imspy-search | imspy_search.cli.imspy_rescore_sage:main |
open_tracer |
imspy-dia | imspy_dia.pipeline.cluster_pipeline:main |
imspy-cluster-report |
imspy-dia | imspy_dia.pipeline.cluster_report:main |
timsim |
imspy-simulation | imspy_simulation.timsim.simulator:main |
timsim-gui |
imspy-simulation | imspy_simulation.timsim.gui:main |
# Build specific crate
cd mscore && cargo build --release
cd rustdf && cargo build --release
# Run tests
cargo test --verbose# Install maturin
pip install maturin[patchelf]
# Build wheel
cd imspy_connector
maturin build --release
# Install wheel
pip install --force-reinstall ./target/wheels/<filename>.whl# Install from packages/ directory
cd packages
pip install -e ./imspy-core
pip install -e ./imspy-predictors
pip install -e ./imspy-dia
pip install -e ./imspy-search
pip install -e ./imspy-simulation
pip install -e ./imspy-vis# Rust tests
cd mscore && cargo test --verbose
cd rustdf && cargo test --verbose
# Python tests
cd packages/imspy-predictors && pytest tests/- rust.yml: Builds and tests
mscoreandrustdfon push/PR to main - imspy-connector-publish.yml: Builds wheels for multiple platforms on release
- imspy-publish.yml: Publishes Python packages on release
- docs.yml: Builds Rust docs (cargo doc) and Python docs (Sphinx)
- Use
#[derive(Clone, Debug, Serialize, Deserialize)]for data structures - Add
Encode, Decodefrom bincode for binary serialization - Document public APIs with
///doc comments - Use
rayonfor parallelism - Tests go in
#[cfg(test)]modules at end of files
/// Brief description of the function.
///
/// # Arguments
///
/// * `param` - Description of the parameter.
///
/// # Examples
///
/// ```
/// use mscore::...;
/// // example code
/// ```
pub fn example_function(param: Type) -> ReturnType {
// implementation
}Pattern for wrapping Rust types:
#[pyclass]
#[derive(Clone)]
pub struct PyRustType {
pub inner: RustType,
}
#[pymethods]
impl PyRustType {
#[new]
pub fn new(/* params */) -> PyResult<Self> {
Ok(PyRustType { inner: RustType::new(/* params */) })
}
#[getter]
pub fn property(&self) -> Type {
self.inner.property.clone()
}
}- Use type hints for all function signatures
- Wrapper classes implement
RustWrapperObjectpattern withget_py_ptr()andfrom_py_ptr()methods - Use
imspy_connectorsubmodules:ims = imspy_connector.py_<module>
class PythonWrapper(RustWrapperObject):
def __init__(self, param: Type):
self.__py_ptr = ims.PyRustType(param)
@classmethod
def from_py_ptr(cls, ptr):
instance = cls.__new__(cls)
instance.__py_ptr = ptr
return instance
def get_py_ptr(self):
return self.__py_ptrMzSpectrum: m/z and intensity vectorsIndexedMzSpectrum: MzSpectrum with indexTimsSpectrum: IMS spectrum with scan/mobilityMzSpectrumAnnotated: Annotated spectrum with peak metadata
ImsFrame: Ion mobility frame (retention_time, mobility, mz, intensity)TimsFrame: timsTOF frame with frame_id and ms_typeRawTimsFrame: Raw frame with scan/tof indices
PeptideSequence: Sequence with modifications (UNIMOD format)PeptideIon: Peptide with charge and intensityPeptideProductIonSeries: Fragment ion series (b/y ions)
- Implement in appropriate
mscore,rustdf, orrustmsmodule - Add Python binding in
imspy_connector/src/py_<module>.rs - Register in module's
#[pymodule]function - Create Python wrapper in the appropriate package under
packages/
- Update Rust struct in
mscoreorrustdf - Update PyO3 wrapper in
imspy_connector - Update Python wrapper in the appropriate package under
packages/ - Run tests:
cargo testandpytest
Versions are maintained in:
mscore/Cargo.toml(0.4.1)rustdf/Cargo.toml(0.4.1)rustms/Cargo.toml(0.1.0)imspy_connector/Cargo.toml(0.4.1)packages/imspy-core/pyproject.toml(0.4.0)packages/imspy-predictors/pyproject.toml(0.5.0)packages/imspy-dia/pyproject.toml(0.4.0)packages/imspy-search/pyproject.toml(0.4.0)packages/imspy-simulation/pyproject.toml(0.4.0)packages/imspy-vis/pyproject.toml(0.4.0)
Dependencies between Rust crates reference specific versions (e.g., mscore = { version = "0.4.1" }).
- Rust docs: https://thegreatherrlebert.github.io/rustims/main/mscore/
- Python docs: https://thegreatherrlebert.github.io/rustims/main/imspy/
- TimSim docs:
packages/imspy-simulation/SIMULATOR_README.md - EVAL pipeline:
packages/imspy-simulation/src/imspy_simulation/timsim/integration/VALIDATION_README.md
- Python >=3.11: Required by all packages (upper bound <3.14)
- Bruker SDK: Optional but recommended for accurate mass/mobility calibration
- GPU Support: PyTorch ships with CUDA support (
pip install torch) - Local Development: Use
path = "../mscore"in Cargo.toml for local development (commented out in production)
pyo3: Python bindingsnumpy: NumPy array interoprayon: Parallelismserde/bincode: Serializationrusqlite: SQLite for TDF fileszstd/lzf: Compression
torch: Deep learning models (PyTorch)sagepy: Peptide search (imspy-search)mokapot: Statistical validation (imspy-search)numba: JIT compilationpandas/numpy: Data handlingkoinapy: Remote prediction service (optional, imspy-predictors)
- Rust source:
<crate>/src/**/*.rs - Python source:
packages/<package>/src/<module>/**/*.py - Tests (Rust): Inline in source files
- Tests (Python):
packages/<package>/tests/ - CI:
.github/workflows/*.yml