A decision-first fraud screening mini-system with a clean separation between inference and analytics UI:
- FastAPI inference service:
/health,/metadata,/predict,/predict/batch - Streamlit dashboard: data overview + batch scoring + thresholds + metrics + segments
- Pre-trained artifacts in
artifacts/: RandomForest + XGBoost + threshold policy
Runs out-of-the-box with demo data. For meaningful results, upload real labeled data (or place a compatible CSV locally).
- Strict input validation (schema-driven)
- Single-record and batch scoring
- Model selection (
rf/xgb) and threshold control - Low-latency JSON responses with measured
latency_ms
- Upload a CSV or auto-load a local dataset (if present)
- Built-in synthetic demo dataset when no data is available
- Decision policy presets + custom thresholding
- Metrics + diagnostic plots + segmented analysis
Policy presets are just safe defaults for the operating threshold:
| Preset | Intent | Typical effect |
|---|---|---|
| Strict | Minimize false positives | Higher threshold → fewer flagged cases, may miss more fraud |
| Balanced | Default trade-off | Mid threshold → balanced precision/recall |
| Lenient | Maximize recall | Lower threshold → catch more fraud, more false positives |
You can always override the threshold manually in the UI.
flowchart LR
DATA["CSV upload / auto-load / synthetic demo"] --> UI["Streamlit UI"]
UI -->|"httpx: /metadata /predict /predict/batch"| API["FastAPI Inference API"]
API -->|"load once at startup"| ART["artifacts/ (models + metadata + thresholds)"]
Key modules:
src/fraud_dashboard/api/— FastAPI app (loads artifacts once, validates input, returns strict JSON)src/fraud_dashboard/ui/— Streamlit UI (calls the API viahttpx)src/fraud_dashboard/data/— schema validation + synthetic demo generatorartifacts/— serialized models +metadata.json+thresholds.jsontests/— API + artifact loading + contract sanity checks
- Python 3.11+ (recommended). Streamlit Cloud runs on Python 3.13.
- Docker Desktop (for Docker Compose quickstart)
Windows PowerShell
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -U pip setuptools wheel
pip install -r requirements.txt -r requirements-dev.txt
pip install -e .Linux/macOS
python3.11 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip setuptools wheel
pip install -r requirements.txt -r requirements-dev.txt
pip install -e .Recommended (repo entrypoint):
python api.pyAlternative (standard Uvicorn):
uvicorn fraud_dashboard.api.app:app --host 127.0.0.1 --port 8000Open:
- API docs:
http://127.0.0.1:8000/docs - Health:
http://127.0.0.1:8000/health - Metadata:
http://127.0.0.1:8000/metadata
python -m streamlit run app.pyOpen: http://127.0.0.1:8501
On Windows, prefer
python -m streamlit ...to ensure you're using the venv-installed Streamlit (avoid globalAppData\Roaminginstalls).
Streamlit Cloud entrypoint is
streamlit_app.py. Local entrypoint isapp.py.
Runs two containers: api and ui.
docker compose up --builddocker compose up -d --builddocker compose logs -fdocker compose downOpen:
- API:
http://127.0.0.1:8000 - UI:
http://127.0.0.1:8501
The UI container calls the API at http://api:8000 via Compose service DNS.
If you see Bind for 0.0.0.0:8501 failed: port is already allocated:
- Stop the process/container using the port, or
- Change the host port mapping in
docker-compose.yml(e.g."8502:8501") and re-rundocker compose up --build.
The UI supports uploaded CSVs.
Auto-load behavior:
- If
creditcard.csvexists in one of the search paths below, the UI loads it automatically. - Otherwise, the UI generates a synthetic demo dataset (same schema) so the dashboard remains runnable.
Default search paths:
data/creditcard.csv(recommended)creditcard.csv(repo root)data/demo_creditcard.csv/mnt/data/creditcard.csv
Dataset sources are not redistributed in this repo. See DATA_LICENSE.md for attribution and terms.
pip install kaggle
kaggle datasets download -d mlg-ulb/creditcardfraud -p data --unzipEnvironment variables:
FRAUD_API_URL(preferred): FastAPI base URL for the Streamlit UI
Examples:- local:
http://127.0.0.1:8000 - compose:
http://api:8000
- local:
API_BASE_URL(back-compat): same purpose
See: .env.example
GET /healthGET /metadataPOST /predictPOST /predict/batch
import httpx
meta = httpx.get("http://127.0.0.1:8000/metadata").json()
features = meta["schema"]["features"]
record = {f: 0.0 for f in features}
resp = httpx.post(
"http://127.0.0.1:8000/predict",
json={"record": record, "model": "rf"},
).json()
print(resp)./scripts/predict.ps1 -ApiUrl "http://127.0.0.1:8000" -Model rfModel + threshold can be passed in the request body, or via query parameters for quick manual testing. If both are present, query parameters take precedence.
A successful /predict response is strict JSON:
{
"model": "rf",
"threshold": 0.05348,
"proba_fraud": 0.00033,
"label": 0,
"latency_ms": 49
}ruff check .
ruff format --check .
pytest -qOptional (visibility only):
pip-auditCI runs: Ruff + Pytest + Docker build stage (.github/workflows/ci.yml).
This repo ships pre-trained serialized artifacts (artifacts/*.joblib). To keep the project runnable out of the box, the ML/data stack is pinned to the training-time versions.
Note (Streamlit Cloud): Streamlit Cloud runs on Python 3.13, so
requirements.txtuses environment markers to keep the artifact stack stable on Python 3.11/3.12 while remaining wheel-installable on Python 3.13.
If you want to upgrade dependencies, treat it as a refresh cycle:
- upgrade deps
- re-export artifacts (via
scripts/train.py) - update
artifacts/metadata.json - re-run tests
.
├─ api.py # FastAPI entrypoint (robust for src layout)
├─ app.py # Local Streamlit entrypoint
├─ streamlit_app.py # Streamlit Cloud entrypoint
├─ src/ # package code (src layout)
├─ artifacts/ # pre-trained models + threshold policy
├─ scripts/ # utility scripts (PowerShell + training)
├─ tests/ # unit + contract tests
├─ docker-compose.yml
├─ Dockerfile
└─ docs/CASE_STUDY.md
See: SECURITY.md
See: docs/CASE_STUDY.md
- Code license: MIT — see
LICENSE. - Dataset is not redistributed. If you download it, follow the dataset terms — see
DATA_LICENSE.md.




