Skip to content

Amey-Thakur/COVID19-WEB-SCRAPER

Repository files navigation

COVID19-WEB-SCRAPER

License: MIT Status Platform Technology Developed by

A robust data scraping and visualization tool for monitoring live COVID-19 statistics in India, implemented using Python, Beautiful Soup, and Seaborn for scholarly analysis.

Source Code  ·  Technical Specification  ·  Kaggle Notebook  ·  Live Demo


Authors  ·  Overview  ·  Features  ·  Structure  ·  Quick Start  ·  Visualization  ·  Usage Guidelines  ·  License  ·  About  ·  Acknowledgments


Authors

Terna Engineering College | Computer Engineering | Batch of 2022

Amey Thakur
Amey Thakur

ORCID
Hasan Rizvi
Hasan Rizvi

GitHub

Important

🤝🏻 Special Acknowledgement

Special thanks to Hasan Rizvi for his meaningful contributions, guidance, and support that helped shape this work.


Overview

The COVID19-WEB-SCRAPER is a Python-based utility developed to provide real-time insights into the COVID-19 situation in India. It programmatically extracts live data from the Ministry of Health and Family Welfare (MOHFW) website and processes it into actionable visualizations.

Developed as a mini-project for the Open Source Tech Lab, this tool demonstrates the practical application of web scraping (BeautifulSoup), data manipulation (Pandas), and complex statistical visualization (Matplotlib & Seaborn).

Note

Research Impact: The logic and architectural overview of this project are part of a curated Computer Engineering collection on ResearchGate.

Resources

# Resource Description
1 Kaggle Notebook Interactive Jupyter Notebook environment
2 Technical Specification Technical architecture and logic specification
3 Source Code Core Python implementation files
4 OST Laboratory Academic repository for Open Source Tech

Tip

Scraping Efficiency

For Large-scale data extraction or automated monitoring, the scraping logic can be optimized by implementing headless browser sessions or introducing random sleep intervals between requests. This minimizes load on the target server and reduces the risk of rate-limiting or IP blocking.


Features

Feature Description
Live Scraping Real-time data extraction from official MOHFW sources
Tabular Reports Clean, formatted console representation using PrettyTable
Data Processing Automated parsing and integer conversion via Pandas
Bar Plots Comparative analysis of confirmed cases across different States/UTs
Donut Charts Proportional distribution of Nationwide Confirmed, Recovered, and Deceased cases

Tech Stack

  • Language: Python 3.x
  • Libraries: BeautifulSoup4, Requests, Pandas, Matplotlib, Seaborn, PrettyTable
  • Environment: Local Machine / Google Colab / Kaggle
  • Web Dashboard: HTML, CSS, JavaScript (GitHub Pages)

Project Structure

COVID19-WEB-SCRAPER/
│
├── docs/                                    # Formal Documentation
│   └── SPECIFICATION.md                     # Technical Architecture & Specification
│
├── Mini-Project/                            # Academic Reports
│   └── Outputs/                             # Generated Data Visualizations
│       ├── Statewise_Confirmed_Cases.jpg    # Statistical Bar Chart
│       └── Nationwide_Distribution.jpg      # Statistical Donut Chart
│
├── Source Code/                             # Core Implementation
│   ├── Scraper_Notebook.ipynb               # Jupyter Implementation
│   ├── Main_Scraper.py                      # Standalone Script
│   └── requirements.txt                     # Execution Dependencies
│
├── .gitattributes                           # Git Configuration
├── .gitignore                               # Git Ignore Rules
├── CITATION.cff                             # Citation Metadata
├── codemeta.json                            # Project Metadata (JSON-LD)
├── LICENSE                                  # MIT License
├── README.md                                # Main Documentation
└── SECURITY.md                              # Security Policy & Posture

Quick Start

Prerequisites

  • Python 3.x: Ensure the core interpreter is installed on your local environment.
  • Terminal: Access to a Bash shell or command prompt for manual execution.
  • Dependencies: Install the required analytical libraries using pip:
pip install pandas seaborn matplotlib requests beautifulsoup4 prettytable

Warning

Technical Posture & Ethics

This tool is designed for educational purposes. Web scraping is highly dependent on the target website's DOM structure; any modifications to the MOHFW portal may require iterative updates to the scraping logic. Always adhere to the target site's robots.txt and ethical data collection standards.

Installation & Deployment

  1. Clone the Collection
    Retrieve the localized repository using the following Git command:

    git clone https://github.com/Amey-Thakur/COVID19-WEB-SCRAPER.git
    cd COVID19-WEB-SCRAPER
  2. Environment Configuration
    Navigate to the source directory and verify that all dependencies are resolved.

  3. Execution
    Execute the scraping utility directly from the terminal:

    python "Source Code/Covid19_Web_Scraper.py"

Tip

Real-Time COVID-19 Statistical Visualization Dashboard

Access nationwide statistics and state-wise comparative analysis programmatically scraped from official health sources, optimized for high-fidelity data visualization and scholarly research.

Launch Live Dashboard


Visualization Results

Statewise Confirmed Cases (Bar Plot)

Bar Plot

Nationwide Distribution (Donut Chart)

Donut Chart


Usage Guidelines

This repository is openly shared to support learning and knowledge exchange across the academic community.

For Students
Use this mini-project as a reference for understanding web scraping logic, high-precision data normalization, and comparative statistical visualization using Python. The source code is documented to support self-paced learning and exploration of real-world data science challenges.

For Educators
This project may serve as a practical example or supplementary teaching resource for Open Source Tech Lab courses (CSL405). Attribution is appreciated when utilizing content.

For Researchers
The documentation and organization provide insights into academic project curation and educational software structuring.


License

This repository and all linked academic content are made available under the MIT License. See the LICENSE file for complete terms.

Note

Summary: You are free to share and adapt this content for any purpose, even commercially, as long as you provide appropriate attribution to the original author.

Copyright © 2020 Amey Thakur, Hasan Rizvi


About This Repository

Created & Maintained by: Amey Thakur & Hasan Rizvi
Academic Journey: Bachelor of Engineering in Computer Engineering (2018-2022)
Institution: Terna Engineering College, Navi Mumbai
University: University of Mumbai

This project features the COVID19-WEB-SCRAPER, a terminal-based data analysis utility developed as a 4th-semester mini-project for the Open Source Tech Lab course. It showcases the practical application of web scraping logic, high-precision data normalization, and comparative statistical visualization.

Connect: GitHub  ·  LinkedIn  ·  Kaggle  ·  ORCID

Acknowledgments

Grateful acknowledgment to Hasan Rizvi for his exceptional collaboration and innovative contributions during the development of this project. His technical expertise in logical structuring, high-precision data normalization, and commitment to software quality were instrumental in building this robust web scraper. Learning alongside him was a transformative experience; his thoughtful approach to problem-solving and steady encouragement turned complex challenges into meaningful learning moments. This work reflects the growth and insights gained from our side-by-side academic journey. Thank you, Hasan, for everything you shared and taught along the way.

Grateful acknowledgment to the faculty members of the Department of Computer Engineering at Terna Engineering College for their guidance and instruction in Open Source Technology. Their expertise in collaborative development and Unix-like environments helped shape the technical foundation of this project.

Special thanks to the mentors and peers whose encouragement, discussions, and support contributed meaningfully to this learning endeavor.



Presented as part of the 4th Semester Mini-Project @ Terna Engineering College


Computer Engineering (B.E.) - University of Mumbai

Semester-wise curriculum, laboratories, projects, and academic notes.