Automated validation framework for TIMSIM simulations. Generate synthetic datasets, analyze with production proteomics tools, and validate against ground truth.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Simulate │ ──▶ │ Analyze │ ──▶ │ Validate │ ──▶ │ Report │
│ (timsim) │ │ (DiaNN/FP) │ │ (vs truth) │ │ (HTML/JSON)│
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
# From PyPI (recommended)
pip install imspy-simulation
# With KOINA remote model support (optional)
pip install imspy-predictors[koina]From source
source /path/to/your/env/bin/activate
pip install -e /path/to/rustims/packages/imspy-core
pip install -e /path/to/rustims/packages/imspy-predictors
pip install -e /path/to/rustims/packages/imspy-simulation
# Rebuild Rust backend if needed
cd /path/to/rustims/imspy_connector
maturin develop --releaseCreate env.toml in your working directory:
# Analysis tools
[tools]
diann_path = "/path/to/diann-linux"
fragpipe_path = "/path/to/fragpipe/bin/fragpipe"
fragpipe_tools = "/path/to/fragpipe/tools"
fragpipe_workflow_dia = "/path/to/workflows/DIA_SpecLib_Quant_diaPASEF.workflow"
fragpipe_workflow_dda = "/path/to/workflows/LFQ-noMBR.workflow"
sage_path = "/path/to/sage"
# Optional: additional workflow files for phospho tests
# fragpipe_workflow_dia_phospho = "/path/to/workflows/DIA_Phospho.workflow"
# fragpipe_workflow_dda_phospho = "/path/to/workflows/DDA_Phospho.workflow"
# fragpipe_python = "/path/to/python" # Python used by FragPipe (if different)
# Output and reference data
[paths]
output_base = "/path/to/output"
reference_dia = "/path/to/blank_dia.d"
reference_dda = "/path/to/blank_dda.d"
fasta_hela = "/path/to/hela.fasta"
fasta_hela_decoys = "/path/to/hela-decoys.fasta"
# Performance settings
[performance]
num_threads = -1
use_gpu = true
# Optional: tool-specific timeouts (seconds)
# diann_threads = 8 # Override thread count for DiaNN
# diann_timeout = 7200 # DiaNN timeout (default: 2h)
# fragpipe_timeout = 7200 # FragPipe timeout (default: 2h)DiaNN, FragPipe, and Sage are not bundled with imspy-simulation due to licensing restrictions. You must install them separately and configure their paths in env.toml.
DiaNN:
- Download the Linux binary from https://github.com/vdemichev/DiaNN
- Make it executable:
chmod +x diann-linux - Set
diann_pathinenv.toml
FragPipe:
- Download a release from https://github.com/Nesvilab/FragPipe
- Requires Java runtime (
java -versionto verify) - Set
fragpipe_path,fragpipe_tools, and workflow paths inenv.toml
Sage (optional, DDA only):
- Open source — download a binary from https://github.com/lazear/sage or build from source
- Set
sage_pathinenv.toml(optional — auto-discovered if on$PATH)
# List available tests
python -m imspy_simulation.timsim.integration.sim --env env.toml --list
# Run a single simulation
python -m imspy_simulation.timsim.integration.sim --env env.toml --test IT-DIA-HELA
# Run evaluation (analysis + validation)
python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELA
# Run all tests
python -m imspy_simulation.timsim.integration.sim --env env.toml --all
python -m imspy_simulation.timsim.integration.eval --env env.toml --all| Test ID | Mode | Sample | Description |
|---|---|---|---|
IT-DIA-HELA |
DIA | HeLa | Standard proteomics (150K peptides) |
IT-DIA-HYE |
DIA | HYE | Multi-species quantification |
IT-DIA-HYE-A/B |
DIA | HYE | Fold-change benchmark (paired) |
IT-DIA-PHOS |
DIA | HeLa | Phosphoproteomics |
IT-DIA-PHOS-A/B |
DIA | HeLa | PTM site localization benchmark |
IT-DIA-PARTIAL-FRAG |
DIA | HeLa | Partial fragmentation (30% unfrag) |
IT-DDA-TOPN |
DDA | HeLa | TopN DDA (250K peptides) |
IT-DDA-HLA |
DDA | HeLa | Immunopeptidomics |
IT-DDA-PARTIAL-FRAG |
DDA | HeLa | DDA with partial fragmentation |
Generates synthetic timsTOF datasets:
python -m imspy_simulation.timsim.integration.sim --env env.toml --test IT-DIA-HELAOutput:
output/IT-DIA-HELA/
├── SIM_SUCCESS # Status marker
├── IT-DIA-HELA_config.toml # Resolved configuration
├── IT-DIA-HELA/
│ └── IT-DIA-HELA.d/ # Simulated timsTOF data
├── synthetic_data.db # Ground truth database
└── IT-DIA-HELA_preview.mp4 # Preview video (if enabled)
Runs analysis tools and validates results:
python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELASteps:
- Run DiaNN analysis
- Run FragPipe analysis
- Run Sage analysis (DDA only)
- Extract identifications from each tool
- Match against ground truth
- Calculate metrics (ID rate, RT/IM correlation)
- Check against pass/fail thresholds
- Generate reports
Output:
output/IT-DIA-HELA/
├── EVAL_PASS # Status marker (or EVAL_FAIL)
├── diann/
│ ├── report.parquet # DiaNN results
│ └── report.log.txt
├── fragpipe/
│ ├── psm.tsv
│ └── peptide.tsv
├── sage/ # DDA only
│ └── results.sage.tsv
└── validation/
├── validation_metrics.json # Detailed metrics
├── validation_report.html # Visual report
└── plots/ # Comparison plots
| Metric | Description | Typical Threshold |
|---|---|---|
| ID Rate | Ground truth peptides identified | ≥ 25-30% |
| RT Correlation | Expected vs observed retention time | ≥ 0.95 |
| IM Correlation | Expected vs observed ion mobility | ≥ 0.95 |
| Metric | Description | Typical Threshold |
|---|---|---|
| Species Ratio Error | Deviation from expected H:Y:E ratios | ≤ 20% |
| Fold Change Error | Deviation from expected fold changes | ≤ 30% |
| Metric | Description | Typical Threshold |
|---|---|---|
| PTM Site Accuracy | Correctly localized phosphosites | ≥ 80% |
Test configs are in configs/ directory. Each .toml file defines:
[test_metadata]
test_id = "IT-DIA-HELA"
description = "Standard HeLa DIA-PASEF benchmark"
acquisition_type = "DIA"
sample_type = "hela"
analysis_tools = ["diann", "fragpipe"]
[thresholds]
min_id_rate = 0.28
min_rt_correlation = 0.95
min_im_correlation = 0.95
[paths]
save_path = "${output_base}/IT-DIA-HELA"
reference_path = "${reference_dia}"
fasta_path = "${fasta_hela}"
# ... simulation parameters ...Use ${variable} syntax for machine-specific paths:
${output_base}→ fromenv.toml [paths] output_base${reference_dia}→ fromenv.toml [paths] reference_dia${fasta_hela}→ fromenv.toml [paths] fasta_hela
Create configs/IT-NEW-TEST.toml:
[test_metadata]
test_id = "IT-NEW-TEST"
description = "My new test"
acquisition_type = "DIA" # or "DDA"
sample_type = "hela"
analysis_tools = ["diann", "fragpipe"]
[thresholds]
min_id_rate = 0.25
min_rt_correlation = 0.95
min_im_correlation = 0.95
[paths]
save_path = "${output_base}/IT-NEW-TEST"
reference_path = "${reference_dia}"
fasta_path = "${fasta_hela}"
[experiment]
experiment_name = "IT-NEW-TEST"
acquisition_type = "DIA"
gradient_length = 3600.0
# ... add other sections as needed ...Edit sim.py and eval.py, add to AVAILABLE_TESTS:
AVAILABLE_TESTS = [
"IT-DIA-HELA",
# ...
"IT-NEW-TEST", # Add here
]python -m imspy_simulation.timsim.integration.sim --env env.toml --test IT-NEW-TEST
python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-NEW-TEST# List tests
python -m imspy_simulation.timsim.integration.sim --env env.toml --list
# Run single test
python -m imspy_simulation.timsim.integration.sim --env env.toml --test IT-DIA-HELA
# Run multiple tests
python -m imspy_simulation.timsim.integration.sim --env env.toml --tests IT-DIA-HELA,IT-DDA-TOPN
# Run all tests
python -m imspy_simulation.timsim.integration.sim --env env.toml --all# Run with all tools
python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELA
# Run with specific tool
python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELA --tool diann
# Skip analysis (validate existing results)
python -m imspy_simulation.timsim.integration.eval --env env.toml --test IT-DIA-HELA --skip-analysis
# Run all evaluations
python -m imspy_simulation.timsim.integration.eval --env env.toml --allAfter running all tests, check:
output/
├── evaluation_summary.json # Machine-readable results
├── index.html # Dashboard with pass/fail status
└── full_report.log # Complete execution log
Each test directory contains status markers:
SIM_SUCCESS/SIM_FAILED- Simulation statusEVAL_PASS/EVAL_FAIL- Evaluation status
# Check DiaNN is executable
chmod +x /path/to/diann-linux
# Verify FASTA has decoys (or let DiaNN generate them)
# Check diann/report.log.txt for errors# Ensure Java is available
java -version
# Check workflow file path in env.toml
# Verify fragpipe_tools path is correct- Check FASTA file matches sample type
- Verify reference .d file has correct acquisition mode
- Review simulation parameters (noise, gradient length)
- Consider lowering threshold for initial testing
- Check available disk space
- Reduce
num_sample_peptides - Enable
lazy_frame_assembly = true - Check GPU memory if using CUDA
imspy_simulation/timsim/integration/
├── sim.py # Simulation runner
├── eval.py # Evaluation runner
├── configs/ # Test configurations
│ ├── IT-DIA-HELA.toml
│ ├── IT-DDA-TOPN.toml
│ └── ...
└── VALIDATION_README.md # This file
| Tool | Version | Purpose | Download |
|---|---|---|---|
| DiaNN | 1.8+ | DIA/DDA analysis | github.com/vdemichev/DiaNN |
| FragPipe | 21+ | DIA/DDA analysis | github.com/Nesvilab/FragPipe |
| Sage | 0.14+ | DDA analysis (optional) | github.com/lazear/sage |
imspy-simulation
imspy-core
imspy-connector (Rust backend)
pandas
numpy
toml
- CPU: 8+ cores recommended
- RAM: 32GB+ for large datasets
- GPU: NVIDIA CUDA-capable (optional, speeds up ML models)
- Disk: 50GB+ free space per test
MIT License