SQL final project for ALY6030: Data Warehousing & SQL. In this project, I turned a small pharmacy claims sample into a simple star schema, set up primary and foreign keys in MySQL 8+, created an ERD, and wrote reporting queries for business users.
This was my final SQL project for ALY6030: Data Warehousing & SQL in Spring 2025 at Northeastern University.
The original raw file was a small pharmacy claims sample with repeated member and fill information in one wide table. My goal was to reorganize it into a cleaner warehouse structure, build the database logic in MySQL, and prepare example SQL reports that business users could use later when larger production data becomes available.
This repo is my public portfolio version. I kept the project simple and readable, but I also kept the real project files so the work still feels traceable. The workflow can be reproduced manually from the files in this repo, but it is not packaged as a one-click automated build.
- Project walkthrough
- SQL file
- SQL notes
- Data note
- Outputs gallery
- Contribution note
- Portfolio PDF
- Original final report
- Original ERD PDF
- Assignment instructions
A Pharmacy Benefit Manager (PBM) provided a small sample of pharmacy claims data for a future warehouse setup. The raw file was not ready for relational reporting because it mixed member details, drug details, and repeated fill events in one table.
The task was to clean that structure, build a usable star schema, and prepare sample SQL reports before the full production data arrives.
- reviewed the raw sample structure
- converted the raw table into 3NF-style fact and dimension tables
- created a simple pharmacy claims star schema in MySQL
- set primary keys and foreign keys
- selected referential actions for update and delete cases
- created an ERD
- wrote three SQL reporting queries
- summarized the results in report form
| Item | Details |
|---|---|
| Course | ALY6030: Data Warehousing & SQL |
| Term | Spring 2025 |
| Project type | Individual course final project |
| Main topic | SQL, normalization, star schema, ERD, reporting |
| Raw data size | 5 rows x 21 columns |
| Processed fact table | 11 prescription fill rows |
| Main tools | MySQL 8+, SQL, CSV, spreadsheet cleanup |
| Final deliverables | SQL file, ERD, report, portfolio PDF |
data/raw/- original course sample data and data descriptiondata/processed/- normalized dimension and fact tablessql/- final SQL setup and reporting querieswalkthrough/- GitHub-friendly project explanationoutputs/- query screenshots and portfolio chartsreports/- portfolio PDF and original report filesarchive/- assignment reference file
data/raw/final_project_data.csvdata/raw/final_project_data_description.csvdata/processed/dim_member.csvdata/processed/dim_drug.csvdata/processed/fact_prescription.csvsql/pharmacy_claims_star_schema_queries.sql
- Ambien had the highest prescription count in the sample with 5 fills
- Ambien also had the highest insurance-paid total at $2,518
- The age 65+ group had 1 unique member and 6 total prescriptions
- For member 10003, the most recent fill was Ambien and insurance paid $322
- The final schema used:
dim_memberdim_drugfact_prescription
I picked this project because it shows more than just writing one SQL query. It shows how I think through a small data warehousing workflow:
- understand the raw data problem
- normalize the structure
- create fact and dimension tables
- design keys and relationships
- build SQL queries for reporting
- explain the results clearly
This was an individual project, so the database setup, normalized CSV tables, SQL logic, ERD preparation, and write-up shown here were my own work for the course.
For a fuller note, see contribution-note.md.
The dataset in this repo is the same small sample file used in the course final project.
Source inside this repo:
data/raw/final_project_data.csvdata/raw/final_project_data_description.csvarchive/final-project-instructions.pdf
This sample uses made-up members for classroom use and is included here so the project can be reviewed and reproduced in the same form as my original work. The raw sample file is kept in its original course form, and the description file explains the raw columns without changing that source artifact.
Built a small pharmacy claims data warehouse prototype in MySQL 8+ by normalizing raw data into fact and dimension tables, designing PK/FK relationships, creating an ERD, and writing business reporting SQL queries.
This project shows a simple but complete SQL workflow: raw data cleanup, schema design, ERD communication, and reporting queries for a pharmacy claims use case.
For the full project story, start with the project walkthrough.



