Skip to content

Kuba27x/London-Smart-Meter-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

⚡ London Smart Meters: End-to-End Big Data Analytics

Apache Spark Databricks Python Power BI DAX

📖 Project Overview

This project explores the energy consumption patterns of London households using 5.6 million smart meter readings, combined with historical weather data and UK demographic classifications (ACORN). The goal was to build a highly optimized, end-to-end data pipeline and an interactive Business Intelligence dashboard to uncover how weather conditions, holidays, and social classes impact the power grid.

📂 Data Sources

Kaggle

The data used in this project was sourced from the public Kaggle dataset: Smart Meters in London.

To build the final model, three separate data domains were extracted and combined:

  • Energy Consumption: Granular, half-hourly smart meter readings from London households (processed from multiple CSV blocks).
  • Historical Weather: Hourly weather metrics (temperature, conditions summary) via the Dark Sky API.
  • Demographics: CACI's ACORN socio-economic classification mapping for each household.

🛠️ Tech Stack & Architecture

  • Data Engineering (Cloud Computing): Databricks, PySpark
  • Data Modeling & Transformation: Power Query, DAX
  • Data Visualization: Microsoft Power BI
  • Architecture Flow: Raw CSVs (5.6M rows) ➔ Databricks (PySpark Cleansing & Joins) ➔ Aggregated Gold Table (~68k rows) ➔ Power BI Data Model ➔ Interactive Dashboard

📊 Interactive Dashboard

1️⃣ Overview: Big Data & Daily Trends

Provides a high-level view of the dataset, KPI metrics, and the baseline daily/weekly routines of Londoners. Reveals the "Weekend Effect" on energy demand. Overview Page

2️⃣ Weather Impact: Correlation & Heatmaps

Proves the direct correlation between dropping temperatures, severe weather conditions (wind, overcast), and spikes in energy consumption using advanced Combo Charts and an Hour-by-Weather Matrix. Weather Impact Page

3️⃣ Holidays & Demographics: Behavioral Analysis

Analyzes grid stress during UK Bank Holidays, highlighting the "sleep-in" effect (shifted morning peaks) and proving that affluent households (ACORN-A) drive the majority of holiday energy surges. Holidays Page

💡 Key Business Insights

  1. The "Windy & Overcast" Spike: Cold temperatures alone don't peak the grid; the combination of high winds and cloud cover drastically increases heating and lighting usage compared to cold but clear days.
  2. The Holiday Shift: During Bank Holidays, the typical morning energy peak (7:00-8:00 AM) flattens out, and high demand sustains throughout the midday hours as people stay home.
  3. Demographic Discrepancy: The wealthiest demographic group (ACORN-A) exhibits a significantly higher spike (+10-15%) in energy usage during holidays compared to lower-income groups.

🚧 Challenges & Solutions

Challenge 1: Data Volume & Excel Limitations Processing 5.6 million raw rows along with complex multi-table joins (weather, demographics) exceeded standard desktop processing capabilities.

  • Solution: Deployed a cluster on Databricks and utilized PySpark for distributed data processing. Aggregated the granular half-hourly data into hourly blocks, compressing the dataset into a highly optimized "Gold Table" of ~68,000 rows for lightning-fast Power BI performance.

Challenge 2: Handling Demographic Outliers (Missing Data) The raw demographic table contained unassigned households labeled simply as ACORN-. Standard find-and-replace tools in Power Query inadvertently corrupted valid categories (e.g., changing ACORN-A to UndefinedA).

  • Solution: Implemented advanced Power Query exact-match filtering (Match entire cell contents) to safely isolate and rename orphaned records to Undefined, preserving data integrity for valid ACORN groups.

Challenge 3: Extreme Weather Anomalies Initial bar charts showed a paradoxical drop in energy consumption during the absolute lowest temperatures (-4°C to -2°C), misleading the trend line.

  • Solution: Built a Combo Chart overlaying the average energy usage (bars) with a total data point count (line). This visually proved to stakeholders that extreme sub-zero temperatures were rare outliers (very small sample size, mostly occurring at night), securing the validity of the overall temperature-dependency trend.

About

End-to-end Big Data analytics pipeline processing 5.6M smart meter records. Built with Databricks (PySpark) for data engineering and Power BI for interactive visualization of weather and demographic impacts on the energy grid.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors