Skip to content

Susreel7/SocialMedia_Hate_Speech_Detection

Repository files navigation

Hate Speech Detection 🚫🗣️

Python Streamlit Scikit-Learn License

📌 Overview

This project is a machine learning-based application designed to detect and classify hate speech in text data. Utilizing Natural Language Processing (NLP) techniques and a Decision Tree classifier, the system can identify whether a given text contains hate speech, offensive language, or is neutral.

The project includes a complete machine learning pipeline from data preprocessing to model deployment using a user-friendly Streamlit web interface.

🚀 Features

  • Text Classification: Classifies text into three categories:
    • Hate Speech: Content that expresses hatred.
    • Offensive Language: Content that is offensive but not necessarily hate speech.
    • Neither: Neutral or non-offensive content.
  • Interactive Web App: A Streamlit-based interface for real-time predictions.
  • NLP Pipeline: Includes stemming, stop-word removal, and Count Vectorization.
  • Visualizations: Provides confusion matrices and classification reports for model evaluation.

📂 Project Structure

Hate_Speech_Detection/
├── data/                   # Dataset files (e.g., tweets.csv)
├── ml_pipeline/            # Source code for the ML pipeline
│   ├── data_preprocessing.py # Data cleaning and preprocessing
│   ├── deploy.py           # Streamlit application
│   ├── model_training.py   # Model training script
│   ├── model_evaluation.py # Model evaluation script
│   ├── pipeline.py         # Main pipeline runner
│   └── ...
├── model/                  # Saved models (.pkl files)
├── Notebooks/              # Jupyter notebooks for exploration
├── requirements.txt        # Python dependencies
├── setup.sh                # Setup script
└── README.md               # Project documentation

📊 Dataset

The project uses a dataset of tweets labeled for hate speech detection.

  • Source: Kaggle (or specify if different)
  • Labels:
    • 0: Neither
    • 1: Hate Speech
    • 2: Offensive Language

🛠️ Tech Stack

  • Language: Python
  • Libraries:
    • scikit-learn: For model building and evaluation.
    • pandas & numpy: For data manipulation.
    • nltk: For Natural Language Processing tasks.
    • streamlit: For the web application.
    • matplotlib & seaborn: For data visualization.

⚙️ Installation

  1. Clone the repository:

    git clone https://github.com/Susreel7/SocialMedia_Hate_Speech_Detection.git
    cd SocialMedia_Hate_Speech_Detection
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

🏃‍♂️ Usage

Running the Web App

To launch the interactive Streamlit application:

streamlit run ml_pipeline/deploy.py

The app will open in your browser at http://localhost:8501.

Retraining the Model

If you want to retrain the model with new data:

  1. Place your dataset in the data/ directory.
  2. Run the pipeline script:
    python ml_pipeline/pipeline.py

This will preprocess the data, train the model, evaluate it, and save the new artifacts in the model/ directory.

📈 Model Performance

The model is evaluated using standard metrics:

  • Accuracy: ~88%
  • Precision, Recall, F1-Score: Detailed reports are generated during training.

🤝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/YourFeature).
  5. Open a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Dataset provided by Kaggle.
  • Inspiration from various NLP research papers on hate speech detection.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors