Portfolio Streamlit app for reviewing employee retention risk in the Salifort Motors HR dataset.
Live app: https://salifort-retention-risk-explorer.streamlit.app/
This repo is designed for recruiters, hiring managers, interviewers, and technical reviewers who want to see a complete analytics product rather than only a notebook. It shows how a retention-risk modeling project can be packaged into a readable Streamlit app with clear limitations and responsible-use boundaries.
This is not a production HR platform and should not be used to make employment decisions.
How can Salifort Motors spot early retention risk, focus manager attention, and avoid treating a model score as an automated HR decision?
- Explores workforce patterns in the cleaned Salifort HR dataset.
- Shows model and threshold trade-offs for the public reference model.
- Uses SHAP outputs to explain why the model flags risk.
- Shows department exposure to support manager review.
- Explains limitations, runtime behavior, and responsible-use boundaries.
- Provides optional advanced reviewer tools for citations, retrieval evidence, source previews, workflow readiness, and plan previews.
The project has an offline build layer and a Streamlit app layer.
- Dataset:
data/hr_capstone_dataset.csvis checked into the repo so the app is reproducible. - Cleaning:
app/utils/load_data.pystandardizes the dataset and removes duplicates before metrics are shown. - Generated artifacts:
artifacts/v2/contains model metadata, row-level scores, threshold tables, department exposure, and SHAP summaries when available. - Static figures:
outputs/figures/contains stable PNG figures used for EDA, validation, threshold, and SHAP pages. - Streamlit runtime:
app/app.pyandapp/pages/load local files and render the app. Streamlit does not retrain models or regenerate SHAP values during a visitor session. - Offline builders: scripts in
scripts/can rebuild artifacts or validate advanced Navigator assets outside Streamlit.
For a page-by-page tour, see the Streamlit app walkthrough. For the full docs index, see docs/README.md.
Use these documents when you want more detail than this README:
- Documentation Guide: central index for all project docs.
- Product Requirements Document: business framing, target audiences, scope, and non-goals.
- Technical Design and Architecture: runtime layers, artifacts, retrieval design, workflow contracts, and boundaries.
- Environment Setup and Deployment Guide: local setup, optional API configuration, and deployment steps.
- User Manual: how to use the app responsibly and what each page is for.
- Streamlit App Walkthrough: page-by-page review order and guidance.
- Navigator Notes: advanced reviewer documentation for the PACE Navigator.
The public portfolio story preserves:
- Public reference model: weighted XGBoost.
- Selected threshold:
0.29. - Runtime approach: load generated artifacts when available; do not train models inside Streamlit.
If row-level artifacts are missing, selected app views can fall back to a simpler screening score so the demo remains explorable. That fallback is clearly separated from the final model probability.
- Overview: Explains the project question, dataset, model, threshold, and suggested review path. Start here.
- PACE Navigator: Gives a guided project map first, then optional advanced review tools for fixed-question answers, citations, retrieval evidence, workflow readiness, and plan preview.
- Workforce Explorer: Lets reviewers filter departments, salary bands, tenure bands, and risk flags to inspect workforce slices and department exposure.
- EDA & Patterns: Shows stable visual evidence for workload, salary, department, tenure, and project-load patterns.
- Model & Threshold Lab: Compares model metrics and explains how the selected threshold changes recall, precision, false positives, and review workload.
- Explainability: Uses SHAP outputs to explain which features influence the model signal, while keeping causal claims off-limits.
- Manager Action View: Turns exposure patterns into practical review priorities and responsible-use guidance.
- Methods & Limitations: Explains the architecture, artifacts, fallback logic, PACE, retrieval, Airflow scaffold, agent shell, and production boundaries.
- Overview
- Workforce Explorer
- EDA & Patterns
- Model & Threshold Lab
- Explainability
- Manager Action View
- Methods & Limitations
- PACE Navigator if you want the advanced reviewer layer
PACE means Plan, Analyze, Construct, and Execute. In this repo it is a project map that helps organize the work:
- Plan: frame the business question and responsible-use boundaries.
- Analyze: explore workforce patterns and data signals.
- Construct: build the model story, artifacts, threshold view, and explanations.
- Execute: present review-ready outputs and manager-facing decision support.
The PACE Navigator includes advanced review surfaces built from prepared project files.
- Retrieval pack: selected project metadata and documentation are converted into traceable chunks.
- Embedding index: those chunks can be embedded locally through the OpenAI API when the user supplies their own key.
- Answer viewer: fixed review questions plus retrieval depth can retrieve relevant chunks and assemble structured, citation-backed answers.
- Source preview: small eligible source files can be previewed safely from governed paths.
- Audit exports: reviewer summaries can be exported as markdown, text, or JSON.
This is retrieval-backed review support, not a chatbot and not open-ended answer generation.
The repo includes workflow contracts and an Airflow-ready scaffold so reviewers can inspect how future offline jobs could be organized. Streamlit does not run Airflow jobs.
The agent shell is a controlled plan-preview surface. It maps fixed request types to known workflows and blockers. It does not execute workflows, trigger background jobs, or act autonomously.
pip install -r requirements.txt
streamlit run app/app.pySome advanced retrieval-backed reviewer features require a local OpenAI API key supplied through environment variables such as RAG_STREAMLIT_OPENAI_API_KEY or OPENAI_API_KEY. No API key or secret should be committed to this repository.
app/: Streamlit app, pages, loaders, services, and view models.data/: checked-in HR dataset used by the app.outputs/figures/: stable project figures used in app pages.artifacts/v2/: generated model and explanation artifacts consumed by the app.navigator/: registries, retrieval packs, readiness contracts, and advanced review metadata.scripts/: offline builders and validation scripts.docs/: walkthroughs and Navigator implementation notes.
Production-like ideas in this repo include clear runtime boundaries, generated artifact contracts, validation scripts, citation-backed review, and responsible-use language.
Portfolio/demo-only boundaries include no live HR data feed, no production scheduler, no autonomous agent execution, no employee action workflow, and no hidden model retraining.