💳 Credit Card Fraud Detection with Imbalanced Data Handling

📌 Problem Statement

Fraud detection is a classic example of an extremely imbalanced classification problem, where fraudulent transactions represent only ~1–2% of total data.

In such cases, accuracy becomes misleading. A model predicting "no fraud" for every transaction would still achieve 98%+ accuracy.

This project focuses on building and evaluating models using appropriate metrics and threshold tuning strategies.

📊 Dataset

Synthetic dataset generated using make_classification with:

10,000 transactions
10 features
1.6% fraud rate
Severe class imbalance (≈ 98% normal, 2% fraud)

⚙️ Approach

Generated highly imbalanced dataset
Applied feature scaling using StandardScaler
Compared two models:
- Logistic Regression
- Random Forest
Evaluated using:
- ROC-AUC
- Precision-Recall Curve
- F1-score
Tuned classification threshold to optimize fraud detection performance

📈 Model Performance

Logistic Regression

ROC-AUC: 0.69
Poor precision on fraud class
High false positives

Random Forest (Default Threshold = 0.50)

ROC-AUC: 0.81
Fraud F1-score: 0.31
Low recall on fraud

Random Forest (Tuned Threshold = 0.13)

Fraud F1-score improved to 0.56
Fraud Precision: 0.78
Fraud Recall: 0.44

Threshold tuning significantly improved fraud detection performance without drastically increasing false positives.

📉 Key Insights

Accuracy is not reliable for imbalanced datasets
Precision-Recall curve is more informative than ROC in rare-event detection
Default threshold (0.5) is not always optimal
Threshold tuning can dramatically improve real-world fraud detection systems

🛠 Tech Stack

Python
Pandas
NumPy
Scikit-learn
Matplotlib
Seaborn

▶️ How to Run

1️⃣ Install dependencies

pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
credit_card_fraud_detection.ipynb		credit_card_fraud_detection.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💳 Credit Card Fraud Detection with Imbalanced Data Handling

📌 Problem Statement

📊 Dataset

⚙️ Approach

📈 Model Performance

Logistic Regression

Random Forest (Default Threshold = 0.50)

Random Forest (Tuned Threshold = 0.13)

📉 Key Insights

🛠 Tech Stack

▶️ How to Run

1️⃣ Install dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

💳 Credit Card Fraud Detection with Imbalanced Data Handling

📌 Problem Statement

📊 Dataset

⚙️ Approach

📈 Model Performance

Logistic Regression

Random Forest (Default Threshold = 0.50)

Random Forest (Tuned Threshold = 0.13)

📉 Key Insights

🛠 Tech Stack

▶️ How to Run

1️⃣ Install dependencies

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages