The Train section has 3 control panels above the Train button:
- Train/Test Split slider — drag between 60/40 and 90/10
- Hyperparameter sliders — SVM C, KNN k, Decision Tree max depth
- Feature toggles — switch any of the 8 input features on/off
After training you get a full dashboard: model comparison bar chart, all 4 ROC curves on one chart, and per-model metric cards.
Enter 8 clinical values → animated risk gauge + 4 model vote cards + auto risk summary.
DIA PREDICT is a full-stack ML web application that trains 4 scikit-learn classifiers on the Pima Indians Diabetes Dataset and lets you explore, compare, and tune them entirely in the browser — no terminal needed after setup.
The app has 10 sections accessible from the navbar:
| Section | What you see |
|---|---|
| Home | Hero with project intro and Get Started CTA |
| Train | Dataset stats, control panels, one-click training |
| Results | 4 model cards with live metrics and verdict chips |
| ROC | All 4 ROC curves on one chart with AUC labels |
| Confusion Matrix | Per-model matrix + full classification report |
| Learning Curves | Train vs. CV score as training data grows |
| Feature Importance | Ranked table with animated bars, diabetic vs. healthy averages |
| Feature Scatter | Interactive 2D scatter coloured by outcome |
| Predict | Patient form → ensemble vote → animated gauge |
| History | All past predictions, clickable to reload, CSV export |
| Model | Kernel / Config | Tunable via slider |
|---|---|---|
| SVM | RBF kernel | C (regularization strength) |
| KNN | Distance-weighted | k (number of neighbors, odd values) |
| Logistic Regression | L2 regularization | — |
| Decision Tree | CART algorithm | max_depth |
All 4 train simultaneously. Results update all cards, charts, and the confusion matrix instantly.
1 — Train/Test Split slider Drag from 60% training → 90% training. The model cards and all metrics update to reflect the new split. Shows the bias-variance tradeoff hands-on: too little training data → underfit; too little test data → unreliable evaluation.
2 — Hyperparameter sliders
- SVM C (0.1 → 10): higher C = less regularization = risk of overfitting
- KNN k (1 → 21, odd): lower k = more sensitive, higher k = smoother boundary
- Tree depth (2 → 15): deeper = more complex model = risk of memorizing training data
3 — Feature toggles Switch any of the 8 clinical features on or off before training. Minimum 2 required. Removing weak features can improve generalization; removing important ones (like Glucose) shows a clear accuracy drop — a great way to understand feature importance.
ROC Curves All 4 model curves on one chart. Each label shows the model's AUC score. A dashed diagonal represents a random classifier (AUC = 0.5). A perfect model hugs the top-left corner.
Confusion Matrix Switch between models using tab buttons. Shows True Positive, False Positive, False Negative, True Negative counts with a full classification report table (precision, recall, F1, support per class).
Learning Curves For each model: how does accuracy change as training data grows? Training score (pink) vs. 5-fold cross-validation score (blue). A large gap = overfitting. Converging lines = good generalization.
Model Comparison Bar Chart Side-by-side bars for Accuracy, F1, and ROC-AUC across all 4 models. Makes it easy to spot which model performs best on each metric.
Feature Importance Table Ranked by absolute Logistic Regression coefficients. Each row shows: importance score, animated bar, diabetic average, healthy average, and % difference. Glucose and BMI consistently top the rankings.
Feature Scatter Plot Pick any 2 features from dropdowns — the chart plots all 768 patients coloured by outcome (pink = diabetic, grey = healthy). Great for spotting clusters and correlations.
Enter values for all 8 features:
| Field | Normal Range |
|---|---|
| Pregnancies | 0 – 17 |
| Glucose (mg/dL) | 50 – 200 |
| Blood Pressure (mmHg) | 40 – 130 |
| Skin Thickness (mm) | 0 – 99 |
| Insulin (μU/mL) | 0 – 846 |
| BMI | 15 – 67 |
| Diabetes Pedigree Function | 0.08 – 2.5 |
| Age | 21 – 81 |
You get back:
- An animated semicircle gauge showing ensemble risk probability (green → amber → pink)
- 4 vote cards — one per model with its individual probability and Safe / Risk verdict
- A risk summary that auto-flags high glucose (>126), elevated BMI (>30), age risk (>40), multiple pregnancies (>5), and high pedigree function (>0.6)
Every prediction is saved in the session. The history panel shows glucose, BMI, age, outcome tag, and probability for each run. Click any row to reload those values into the prediction form. Export all predictions as a CSV file.
| Model | Accuracy | F1 | Precision | Recall | ROC-AUC |
|---|---|---|---|---|---|
| SVM (RBF, C=1) | ~79% | ~0.67 | ~0.72 | ~0.63 | ~0.84 |
| Logistic Regression | ~77% | ~0.65 | ~0.70 | ~0.61 | ~0.83 |
| KNN (k=9) | ~75% | ~0.62 | ~0.68 | ~0.57 | ~0.81 |
| Decision Tree (depth=5) | ~74% | ~0.60 | ~0.66 | ~0.55 | ~0.79 |
Numbers vary slightly each run. Use the hyperparameter sliders to push them higher.
| Layer | Technology |
|---|---|
| Backend | Python 3.11, Flask |
| ML | scikit-learn — SVC, KNeighborsClassifier, LogisticRegression, DecisionTreeClassifier |
| Data processing | Pandas, NumPy |
| Frontend | HTML5, Bootstrap 5, Chart.js |
| Fonts | Google Fonts — Montserrat |
| Icons | Font Awesome 6 |
| Deployment | Vercel (serverless) |
# 1. Clone
git clone https://github.com/Parisaroozgarian/Diabetes-Prediction-Platform.git
cd Diabetes-Prediction-Platform
# 2. Virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
.\venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Run
python diabetes.pyOpen http://localhost:5000 in your browser.
Important: click Train All Models first before using the Predict section.
Diabetes-Prediction-Platform/
├── diabetes.py # Flask app + all ML logic
│ # /api/train — train with custom params
│ # /api/predict — patient inference
│ # /api/results — return cached results
├── requirements.txt # flask, scikit-learn, pandas, numpy
├── runtime.txt # python-3.11
├── vercel.json # Vercel serverless routing config
├── static/
│ ├── app.js # All frontend: charts, sliders, toggles, predict, history
│ ├── style.css # Custom styles + hero gradient fallback
│ ├── diabetes.csv # Pima Indians dataset — 768 patients, 8 features
│ └── images/ # Model icons, hero background image
├── templates/
│ └── index.html # Single-page app — all 10 sections
└── Jupyter/ # Exploratory notebooks (one per model)
├── svm-pima-indians-diabetes-database.ipynb
├── logisticregression.ipynb
├── decisiontreeclassifier.ipynb
└── kneighborsclassifier-*.ipynb
Train all 4 models. All parameters are optional.
{
"test_size": 0.2,
"hyperparams": {
"svm_C": 1.0,
"knn_k": 9,
"dt_depth": 5
},
"features": [0, 1, 2, 3, 4, 5, 6, 7]
}Returns: accuracy, F1, ROC-AUC, precision, recall, confusion matrix, classification report, learning curve data, and ROC curve points for all 4 models.
Run all trained models on one patient record.
{
"pregnancies": 3,
"glucose": 120,
"bloodpressure": 70,
"skinthickness": 20,
"insulin": 80,
"bmi": 32.0,
"diabetespedigreefunction": 0.47,
"age": 33
}Returns:
{
"probs": { "svm": 0.42, "lr": 0.38, "knn": 0.45, "dt": 0.40 },
"preds": { "svm": 0, "lr": 0, "knn": 0, "dt": 0 },
"ensemble": 0.41
}Returns the cached training results without retraining.
npm i -g vercel
vercelMIT License — see LICENSE for details.
Parisa Roozgarian
- Pima Indians Diabetes Database — UCI ML Repository
- scikit-learn · Flask · Chart.js · Bootstrap 5
Made with ❤️ by Parisa Roozgarian