This project explores the energy consumption patterns of London households using 5.6 million smart meter readings, combined with historical weather data and UK demographic classifications (ACORN). The goal was to build a highly optimized, end-to-end data pipeline and an interactive Business Intelligence dashboard to uncover how weather conditions, holidays, and social classes impact the power grid.
The data used in this project was sourced from the public Kaggle dataset: Smart Meters in London.
To build the final model, three separate data domains were extracted and combined:
- Energy Consumption: Granular, half-hourly smart meter readings from London households (processed from multiple CSV blocks).
- Historical Weather: Hourly weather metrics (temperature, conditions summary) via the Dark Sky API.
- Demographics: CACI's ACORN socio-economic classification mapping for each household.
- Data Engineering (Cloud Computing): Databricks, PySpark
- Data Modeling & Transformation: Power Query, DAX
- Data Visualization: Microsoft Power BI
- Architecture Flow:
Raw CSVs (5.6M rows) ➔ Databricks (PySpark Cleansing & Joins) ➔ Aggregated Gold Table (~68k rows) ➔ Power BI Data Model ➔ Interactive Dashboard
Provides a high-level view of the dataset, KPI metrics, and the baseline daily/weekly routines of Londoners. Reveals the "Weekend Effect" on energy demand.

Proves the direct correlation between dropping temperatures, severe weather conditions (wind, overcast), and spikes in energy consumption using advanced Combo Charts and an Hour-by-Weather Matrix.

Analyzes grid stress during UK Bank Holidays, highlighting the "sleep-in" effect (shifted morning peaks) and proving that affluent households (ACORN-A) drive the majority of holiday energy surges.

- The "Windy & Overcast" Spike: Cold temperatures alone don't peak the grid; the combination of high winds and cloud cover drastically increases heating and lighting usage compared to cold but clear days.
- The Holiday Shift: During Bank Holidays, the typical morning energy peak (7:00-8:00 AM) flattens out, and high demand sustains throughout the midday hours as people stay home.
- Demographic Discrepancy: The wealthiest demographic group (ACORN-A) exhibits a significantly higher spike (+10-15%) in energy usage during holidays compared to lower-income groups.
Challenge 1: Data Volume & Excel Limitations Processing 5.6 million raw rows along with complex multi-table joins (weather, demographics) exceeded standard desktop processing capabilities.
- Solution: Deployed a cluster on Databricks and utilized PySpark for distributed data processing. Aggregated the granular half-hourly data into hourly blocks, compressing the dataset into a highly optimized "Gold Table" of ~68,000 rows for lightning-fast Power BI performance.
Challenge 2: Handling Demographic Outliers (Missing Data)
The raw demographic table contained unassigned households labeled simply as ACORN-. Standard find-and-replace tools in Power Query inadvertently corrupted valid categories (e.g., changing ACORN-A to UndefinedA).
- Solution: Implemented advanced Power Query exact-match filtering (Match entire cell contents) to safely isolate and rename orphaned records to
Undefined, preserving data integrity for valid ACORN groups.
Challenge 3: Extreme Weather Anomalies Initial bar charts showed a paradoxical drop in energy consumption during the absolute lowest temperatures (-4°C to -2°C), misleading the trend line.
- Solution: Built a Combo Chart overlaying the average energy usage (bars) with a total data point count (line). This visually proved to stakeholders that extreme sub-zero temperatures were rare outliers (very small sample size, mostly occurring at night), securing the validity of the overall temperature-dependency trend.