End-to-end customer churn prediction project using machine learning with business insights and deployment-ready pipeline.
Customer churn is a critical challenge in the telecom industry, directly impacting revenue and customer lifetime value. This project aims to predict customer churn and identify key factors influencing customer attrition, enabling proactive retention strategies.
- Telco Customer Churn Dataset
- Total Records: 7,043 customers
- Target Variable: Churn (Yes/No)
The project follows an end-to-end machine learning workflow:
- Data cleaning and preprocessing
- Exploratory Data Analysis (EDA)
- Feature engineering
- Model building using machine learning algorithms
- Model evaluation and comparison
- Feature importance analysis
- Business recommendations
- Logistic Regression (baseline model)
- Random Forest Classifier (final selected model)
- ROC-AUC Score
- Precision, Recall, F1-score
- Confusion Matrix
- Customers on month-to-month contracts have the highest churn risk
- Low-tenure customers are more likely to churn
- Higher monthly charges significantly increase churn probability
- Introduce early engagement and onboarding programs for new customers
- Encourage customers to switch to long-term contracts through incentives
- Offer personalized plans and discounts to high-value customers
The final preprocessing and modeling pipeline was serialized using joblib, making it reusable for batch scoring or
future production deployment.
- Python
- Pandas, NumPy
- Scikit-learn
- Matplotlib, Seaborn
- Built an end-to-end customer churn prediction pipeline
- Identified high-risk customer segments using machine learning
- Translated model outputs into clear business recommendations
- Prepared a deployment-ready preprocessing and modeling pipeline