Heart Disease Predictor (reproducible ML pipeline)

A small, production-style ML project that predicts heart disease using the UCI Heart Disease dataset. Includes a reproducible pipeline, threshold tuning on a validation set, model comparison, PR curve summaries, and a final aggregated report.

Quickstart

1) Install dependencies

This repo uses uv for fast, pinned installs.

Install uv (once): https://docs.astral.sh/uv/
Then run:

uv sync --dev

2) Run the end-to-end value pipeline

This runs checks + trains models + generates reports:

make report-e2e VAL_BEST_METRIC=f1

Demo / proof

For a quick, read-only demo path without committing generated reports:

docs/results_snapshot.md
reports/model_card.md
Latest Release (download the reports_bundle_<tag>.zip asset): ../../releases/latest

What this project produces

All outputs are written to reports/. The main entry point is:

reports/final_report.md (aggregates everything)

Also included:

reports/model_comparison.md (baseline vs RF vs HGB at each model’s val-tuned threshold)
reports/*val_tuning_report.md (chosen threshold on val + resulting test metrics)
reports/pr_curve_*.md and reports/pr_curve_*.csv (Precision–Recall summaries + curve data)
reports/*.json (machine-readable metrics)
reports/predictions_*.csv (predictions for val/test runs)

How evaluation works (high level)

The model outputs probabilities.
We select the best classification threshold on the validation set based on VAL_BEST_METRIC (default: f1).
We then report metrics on the test set at that chosen threshold.
PR curves are generated to summarize the precision/recall tradeoff.

Make targets (useful ones)

make check — format/lint/typecheck/tests
make data — download dataset
make preprocess — build processed dataset
make split — train/val/test split
make train-baseline / make train-rf / make train-hgb — train models + write metric reports
make final-report-print VAL_BEST_METRIC=f1 — generate final report
make pr-curves-print VAL_BEST_METRIC=f1 — generate PR summaries
make report-e2e VAL_BEST_METRIC=f1 — one-command end-to-end “value step”

CI

GitHub Actions runs the same value step:

make report-e2e VAL_BEST_METRIC=f1

It also uploads the reports/ outputs as an artifact (even if the job fails), so you can download final_report.md and related files directly from the workflow run.

Notes / limitations

This is a small dataset; metrics can vary with random seeds/splits.
This repo emphasizes clean workflow + reproducibility over state-of-the-art modeling.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.github/workflows		.github/workflows
data		data
docs		docs
models		models
notebooks		notebooks
reports		reports
scripts		scripts
src/mlproj		src/mlproj
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Predictor (reproducible ML pipeline)

Quickstart

1) Install dependencies

2) Run the end-to-end value pipeline

Demo / proof

What this project produces

How evaluation works (high level)

Make targets (useful ones)

CI

Notes / limitations

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Predictor (reproducible ML pipeline)

Quickstart

1) Install dependencies

2) Run the end-to-end value pipeline

Demo / proof

What this project produces

How evaluation works (high level)

Make targets (useful ones)

CI

Notes / limitations

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages