Skip to content

farixzz/phishing-detector-ml

Repository files navigation

๐Ÿšจ Phishing Detector ML

Machine Learningโ€“Based Phishing URL Detection System with Threat Intelligence & Visualization

Created by farixzz

๐Ÿš€ Live Demo

Click here to try the live application!


๐Ÿ–ผ๏ธ Application Preview

This is a production-grade security tool that leverages a state-of-the-art machine learning pipeline and real-time threat intelligence to detect phishing URLs. The project features a dual interface: a user-friendly web GUI for interactive analysis and a powerful CLI for automation and batch processing.


โœจ Key Features

  • ๐Ÿ“Š Batch Analysis & Threat Mapping (GUI):

    • Upload a CSV file containing hundreds of URLs for efficient batch processing.
    • Visualize the geographic locations of detected phishing domains on an interactive Global Threat Map.
    • Download the complete analysis results as a CSV file.
  • ๐Ÿง  Production-Grade Machine Learning Model:

    • A sophisticated pipeline combining a TF-IDF Vectorizer with a powerful LightGBM classifier, trained on a massive dataset of over 500,000 verified URLs.
    • Data-Driven Auto-Thresholding: The detection threshold isn't a guess. It's automatically calculated using ROC curve analysis during each training cycle to maintain a security-first posture (optimized for >= 95% recall).
  • ๐Ÿšฉ Explainable AI (XAI):

    • When a phishing URL is detected, the tool provides a list of "Potential Red Flags" (e.g., presence of suspicious keywords, use of an IP address), explaining why a URL was flagged.
  • ๐Ÿ“ก Real-Time API Intelligence:

    • Enhances ML predictions by cross-referencing URLs with the VirusTotal API, leveraging data from over 70 security vendors.
  • ๐Ÿ–ฅ๏ธ Dual Interface for All Users:

    • Graphical User Interface (GUI): An intuitive and polished web application built with Streamlit.
    • Powerful Command-Line Interface (CLI): A feature-rich CLI for scripting, automation, and integration, supporting JSON and CSV output.
  • ๐Ÿšจ Enterprise-Grade Alerting:

    • Includes a SIEM integration feature that can send high-confidence alerts in the standard Common Event Format (CEF), allowing it to plug into a professional Security Operations Center (SOC).

๐Ÿ—๏ธ System Architecture

flowchart LR

U[User / Analyst] --> UI[Streamlit UI / CLI]

UI --> N[URL Normalizer]
N --> F[Feature Extraction]

F --> M[ML Pipeline<br/>TF-IDF + LightGBM]
M --> P[Probability Scoring<br/>+ Auto Threshold]

P --> E[Explainability Engine<br/>Red Flags]

P --> TI[Threat Intelligence<br/>VirusTotal API]
P --> G[Threat Map<br/>Geo Visualization]
P --> S[SIEM Alerts<br/>CEF Format]

TI --> UI
G --> UI
E --> UI

## ๐Ÿ› ๏ธ Technology Stack

-   **Backend & ML:** Python, Scikit-learn, LightGBM, Pandas, Joblib
-   **GUI:** Streamlit
-   **CLI:** Click
-   **Visualization:** Folium, Streamlit-Folium
-   **APIs & Data:** Requests, tldextract
-   **Packaging:** PyInstaller
-   **Version Control:** Git, Git LFS

---

## ๐Ÿš€ Setup and Installation

1.  **Clone this repository:**
    ```bash
    # Replace the URL with your repository's URL from GitHub
    git clone https://github.com/farixzz/phishing-detector-ml.git
    cd phishing-detector-ml
    ```

2.  **Set up the environment:**
    ```bash
    python3 -m venv venv
    # On Windows: venv\Scripts\activate
    # On Linux/macOS: source venv/bin/activate
    ```

3.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

4.  **Configure API Keys:**
    -   Create a copy of `config.py.template` and rename it to `config.py`.
    -   Open `config.py` and add your free API key from **VirusTotal**.

---

## ๐Ÿ’ป How to Use

1. **Graphical User Interface (GUI)**

- Launch the Streamlit web application for the most user-friendly experience.

```bash
streamlit run app.py
Loading
  1. Command-Line Interface (CLI)
  • The CLI is ideal for automation and batch processing.
python main.py --url "https://www.example.com"
  • Analyze URLs from a File and Save Results
python main.py --input-file urls.txt --output-file results.csv
  • Get JSON Output for Scripting
python main.py --url "http://suspicious-site.com" --json-output

๐Ÿ“‚ Project Structure

phishing-detector-ml/
โ”œโ”€โ”€ data/               # Raw datasets used for aggregation
โ”œโ”€โ”€ models/             # Trained .joblib model (tracked with Git LFS)
โ”œโ”€โ”€ .gitignore          # Git ignore rules (data/, config.py, etc.)
โ”œโ”€โ”€ README.md           # Project documentation
โ”œโ”€โ”€ aggregate_data.py   # Dataset aggregation & cleaning
โ”œโ”€โ”€ api_checker.py      # VirusTotal API integration
โ”œโ”€โ”€ app.py              # Streamlit GUI application
โ”œโ”€โ”€ config.py.template  # Template for local API keys
โ”œโ”€โ”€ detector.py         # Core analysis & prediction logic
โ”œโ”€โ”€ geo_utils.py        # Threat map geolocation helper
โ”œโ”€โ”€ main.py             # CLI entry point
โ”œโ”€โ”€ requirements.txt    # Minimal, cloud-safe dependencies
โ”œโ”€โ”€ siem_alerter.py     # CEF-based SIEM alert generator
โ”œโ”€โ”€ train_model.py      # Model training with auto-threshold tuning
โ”œโ”€โ”€ ui_helpers.py       # Explainability (Red Flags) logic
โ””โ”€โ”€ url_normalizer.py   # URL normalization & preprocessing

๐Ÿ“„ License

โš ๏ธ Disclaimer

This project is intended for educational and research purposes only. Do not use this tool for illegal, unethical, or unauthorized activities.

โญ Author

farixzz

If you found this project useful, feel free to โญ the repository!

About

๐Ÿšจ Machine Learningโ€“based Phishing URL Detection with Streamlit, Threat Map, Batch CSV Analysis & SIEM Alerts

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages