Hate Speech Detection 🚫🗣️

📌 Overview

This project is a machine learning-based application designed to detect and classify hate speech in text data. Utilizing Natural Language Processing (NLP) techniques and a Decision Tree classifier, the system can identify whether a given text contains hate speech, offensive language, or is neutral.

The project includes a complete machine learning pipeline from data preprocessing to model deployment using a user-friendly Streamlit web interface.

🚀 Features

Text Classification: Classifies text into three categories:
- Hate Speech: Content that expresses hatred.
- Offensive Language: Content that is offensive but not necessarily hate speech.
- Neither: Neutral or non-offensive content.
Interactive Web App: A Streamlit-based interface for real-time predictions.
NLP Pipeline: Includes stemming, stop-word removal, and Count Vectorization.
Visualizations: Provides confusion matrices and classification reports for model evaluation.

📂 Project Structure

Hate_Speech_Detection/
├── data/                   # Dataset files (e.g., tweets.csv)
├── ml_pipeline/            # Source code for the ML pipeline
│   ├── data_preprocessing.py # Data cleaning and preprocessing
│   ├── deploy.py           # Streamlit application
│   ├── model_training.py   # Model training script
│   ├── model_evaluation.py # Model evaluation script
│   ├── pipeline.py         # Main pipeline runner
│   └── ...
├── model/                  # Saved models (.pkl files)
├── Notebooks/              # Jupyter notebooks for exploration
├── requirements.txt        # Python dependencies
├── setup.sh                # Setup script
└── README.md               # Project documentation

📊 Dataset

The project uses a dataset of tweets labeled for hate speech detection.

Source: Kaggle (or specify if different)
Labels:
- 0: Neither
- 1: Hate Speech
- 2: Offensive Language

🛠️ Tech Stack

Language: Python
Libraries:
- scikit-learn: For model building and evaluation.
- pandas & numpy: For data manipulation.
- nltk: For Natural Language Processing tasks.
- streamlit: For the web application.
- matplotlib & seaborn: For data visualization.

⚙️ Installation

Clone the repository:

git clone https://github.com/Susreel7/SocialMedia_Hate_Speech_Detection.git
cd SocialMedia_Hate_Speech_Detection

Create a virtual environment (optional but recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

🏃‍♂️ Usage

Running the Web App

To launch the interactive Streamlit application:

streamlit run ml_pipeline/deploy.py

The app will open in your browser at http://localhost:8501.

Retraining the Model

If you want to retrain the model with new data:

Place your dataset in the data/ directory.
Run the pipeline script:
```
python ml_pipeline/pipeline.py
```

This will preprocess the data, train the model, evaluate it, and save the new artifacts in the model/ directory.

📈 Model Performance

The model is evaluated using standard metrics:

Accuracy: ~88%
Precision, Recall, F1-Score: Detailed reports are generated during training.

🤝 Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeature).
Commit your changes (git commit -m 'Add some feature').
Push to the branch (git push origin feature/YourFeature).
Open a Pull Request.

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Dataset provided by Kaggle.
Inspiration from various NLP research papers on hate speech detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Hate Speech Detection 🚫🗣️

📌 Overview

🚀 Features

📂 Project Structure

📊 Dataset

🛠️ Tech Stack

⚙️ Installation

🏃‍♂️ Usage

Running the Web App

Retraining the Model

📈 Model Performance

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.streamlit		.streamlit
Notebooks		Notebooks
data		data
ml_pipeline		ml_pipeline
model		model
.gitignore		.gitignore
.gitkeep		.gitkeep
GIT_PUSH_INSTRUCTIONS.md		GIT_PUSH_INSTRUCTIONS.md
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Hate Speech Detection 🚫🗣️

📌 Overview

🚀 Features

📂 Project Structure

📊 Dataset

🛠️ Tech Stack

⚙️ Installation

🏃‍♂️ Usage

Running the Web App

Retraining the Model

📈 Model Performance

🤝 Contributing

📜 License

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages