A framework for measuring the environmental impact of ML inference. Tracks CO2 emissions, energy consumption, and water usage across different hardware setups.
Training gets all the attention, but inference runs 24/7 in production. We built this to answer: "How much does running this model actually cost the environment?"
Available on PyPI:
pip install ml-ecolyzerWith framework-specific dependencies:
pip install ml-ecolyzer[huggingface] # transformers, diffusers
pip install ml-ecolyzer[pytorch] # torchvision, torchaudio
pip install ml-ecolyzer[all] # everythingfrom mlecolyzer import EcoLyzer
config = {
"project": "my_analysis",
"models": [{"name": "gpt2", "task": "text"}],
"datasets": [{"name": "wikitext", "task": "text", "limit": 100}]
}
eco = EcoLyzer(config)
results = eco.run()
print(f"CO2: {results['final_report']['analysis_summary']['total_co2_emissions_kg']:.6f} kg")
print(f"Energy: {results['final_report']['analysis_summary']['total_energy_kwh']:.6f} kWh")- CO2 emissions - Based on power draw and regional carbon intensity
- Energy usage - Via NVIDIA-SMI, psutil, or RAPL
- Water footprint - Cooling overhead varies by hardware tier
- ESS (Environmental Sustainability Score) - Parameters per gram of CO2, useful for comparing models
ESS = Effective Parameters (M) / CO2 (g)
Higher ESS = more efficient. INT8 models typically score ~74% higher than FP32.
- GPUs: A100, T4, RTX series, GTX series
- CPU-only works too
- Frameworks: HuggingFace, PyTorch, scikit-learn
project: "benchmark_run"
models:
- name: "facebook/opt-350m"
task: "text"
quantization:
enabled: true
target_dtype: "int8"
datasets:
- name: "wikitext"
task: "text"
limit: 500
hardware:
device_profile: "auto"
output:
output_dir: "./results"
export_formats: ["json", "csv"]# Single run
mlecolyzer analyze --model gpt2 --dataset wikitext --task text
# System info
mlecolyzer infoRan 1,500+ inference configs across:
- Hardware: GTX 1650, RTX 4090, Tesla T4, A100
- Models: GPT-2, OPT, Qwen, LLaMA, Phi, Whisper, ViT
- Precisions: FP32, FP16, INT8
Key findings:
- A100 has poor ESS when underutilized (overkill for small batches)
- Consumer GPUs (RTX/T4) often more efficient for single-batch inference
- Quantization helps a lot, especially INT8
See CONTRIBUTING.md. PRs welcome.
# Dev setup
pip install -e ".[dev]"
pytest@misc{mlecolyzer2026,
title={ML-EcoLyzer: A Framework for Quantifying the Environmental Impact of Machine Learning Inference},
author={Minoza, Jose Marie Antonio and Laylo, Rex Gregor and Villarin, Christian and Ibanez, Sebastian},
year={2026},
note={AAAI Workshop on AI for Environmental Science},
eprint={2511.06694},
archivePrefix={arXiv},
primaryClass={cs.LG},
doi={10.48550/arXiv.2511.06694}
}MIT
