This project explores a public dataset of New York City Housing Authority (NYCHA) water bills from 2013 to 2025. Using Python and pandas, I cleaned the data, created a daily consumption metric, and compared trends across boroughs over time.
How has the average daily water consumption per billing period changed across different boroughs between 2013 and 2025 in New York City?
- Preliminary EDA: dataset shape, types, summary stats and histograms for consumption and charges
- Data cleaning:
- Dropped negative current charges (rare) and missing values
- Converted
Revenue Monthto datetime - Kept reasonable billing periods (days between 1 and 365)
- Feature engineering:
consumptionPerDay = Consumption (HCF) / # days- Removed top 5% of daily consumption values to reduce extreme outlier influence (industrial/commerical buildings metrics)
- Aggregation & visualization:
- Grouped by
BoroughandRevenue Monthto compute mean daily consumption - Plotted borough time series with year-labeled ticks (2013–2025)
- Grouped by
Queens and the Bronx show the highest mean daily usage, followed by Manhattan and Brooklyn, while Staten Island is much lower. Most boroughs show a gradual decline after ~2020, and the 2025 series ends mid-year.
src/– Python scriptsdata/– CSV datasetsfigures/– generated plots
HCF = hundred cubic feet (1 HCF ≈ 748 gallons)
python src/analysis.py