🚀 Live App: breast-cancer-prediction-app-by-rudra.streamlit.app
This is a Streamlit-based web application that predicts whether a breast tumor is benign or malignant using a trained logistic regression model. Built using the Breast Cancer Wisconsin Diagnostic Dataset, this project demonstrates the complete data science lifecycle — from problem definition to model deployment.
🎯 Objective: Build a reliable, interactive, and modular diagnostic tool that aids in early breast cancer detection using medical imaging features.
- ✅ Demonstrates end-to-end ML workflow
- ✅ Real-time prediction app with deployed UI
- ✅ Modular, readable, and scalable Python code
- ✅ Excellent use case for health tech/AI in diagnostics
- Predict tumor type (Benign or Malignant)
- Real-time probability scores
- User-friendly UI with input sliders
- Modular codebase for easy scalability
- Live deployment on Streamlit Cloud
├── assets/ # Visual assets (e.g., logos, screenshots)
├── data/ # Dataset and derived files
├── model/ # Trained model (Pickle file)
├── notebooks/ # Exploratory and preprocessing notebooks
│ ├── p1-understand-the-data.ipynb
│ ├── p2-eda.ipynb
│ └── p3-outliers.ipynb
├── utils/ # Custom modules for charts, sidebar, data
│ ├── charts.py
│ ├── data_model.py
│ ├── sidebar.py
├── streamlit_app.py # Main app entry point
├── requirements.txt # Python dependencies
└── README.md # Project documentation
The model uses 30 key measurements from cell nuclei obtained via digitized images, including:
- Mean: Radius, Texture, Perimeter, Area, Smoothness
- Standard Error: Radius SE, Perimeter SE, Concavity SE, etc.
- Worst (largest): Texture worst, Area worst, Symmetry worst, etc.
Each of these is captured using an intuitive sidebar interface in the app.
- Algorithm: Logistic Regression
- Library:
scikit-learn - Training Dataset: UCI Breast Cancer Diagnostic Data
- Model File: Saved as
model.pklfor production use
Note: The pickle file is version-sensitive. Please ensure compatible scikit-learn versions (1.6.1 recommended) for loading the model.
This project was developed following an end-to-end data science pipeline:
- Problem Definition
- Data Collection & Cleaning
- Exploratory Data Analysis (EDA)
- Outlier Handling & Feature Engineering
- Model Building & Evaluation
- Prediction Pipeline Creation
- Web App Development using Streamlit
- Deployment on Streamlit Cloud
| Tool/Library | Usage |
|---|---|
| Python | Core programming language |
| Pandas / NumPy | Data manipulation |
| Matplotlib / Seaborn | Data visualization |
| Plotly | Interactive Data visualization |
| Scikit-learn | Model building & evaluation |
| Streamlit | Web interface & deployment |
| Pickle | Model serialization |
Example inputs include cell features like radius, area, smoothness, etc.
The prediction result is shown instantly with confidence levels.
cancer-prediction.mp4
-
Clone the repository
git clone https://github.com/Rudra-G-23/breast-cancer-prediction-app.git cd breast-cancer-prediction-app -
Install dependencies
pip install -r requirements.txt
-
Launch the app
streamlit run streamlit_app.py
- 📘 Streamlit Deployment Guide
- 📊 Kaggle Dataset
- 🔬 Feature Engineering Notebook
- 📈 Modeling Example
- 🎥 Inspiration
Rudra Prasad Bhuyan
📧 rudraprasadbhuyan000@gmail.com
🔗 GitHub | LinkedIn | Kaggle
