Skip to content

Commit f34846b

Browse files
committed
Add comprehensive RSpec test suite (106 examples)
- 12 spec files covering all classes: module, Options, Logger, SqliteRam, DbTable, CountryTable, StateTable, CountyTable, ZipcodeTable, DataSource, Runner, CsvSource, and the full ETL pipeline - Fixtures: GeoNames-format TSV, CSV with headers, zip archive, config YAML - Database test helpers for in-memory SQLite setup and seeding - Add csv gem as runtime dependency (removed from Ruby 3.4 default gems) - Add CLAUDE.md for Claude Code guidance
1 parent e5186bb commit f34846b

23 files changed

Lines changed: 1244 additions & 36 deletions

CLAUDE.md

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Project Overview
6+
7+
A Ruby gem that downloads postal/zipcode data from GeoNames.org, processes it via an ETL pipeline, and outputs an SQLite3 database and optional CSV files. Supports single-country or all-countries processing.
8+
9+
## Commands
10+
11+
```bash
12+
# Install dependencies (vendored to vendor/bundle, binstubs in stubs/)
13+
bundle install
14+
15+
# Run all tests
16+
bundle exec rspec
17+
18+
# Run a single test file
19+
bundle exec rspec spec/path/to/file_spec.rb
20+
21+
# Run a specific test by line number
22+
bundle exec rspec spec/path/to/file_spec.rb:42
23+
24+
# Lint
25+
bundle exec rubocop
26+
27+
# Lint with auto-correct
28+
bundle exec rubocop -a
29+
30+
# Version bumping (do on develop branch, not master)
31+
bundle exec rake version:bump_patch
32+
bundle exec rake version:bump_minor
33+
bundle exec rake version:bump_major
34+
35+
# Build and install gem
36+
bundle exec rake build
37+
bundle exec rake install
38+
39+
# Release gem
40+
bundle exec rake release
41+
```
42+
43+
## Architecture
44+
45+
The gem follows an ETL (Extract, Transform, Load) pattern using the Kiba gem:
46+
47+
1. **Extract**: `DataSource` downloads zip files from GeoNames.org, extracts them, and prepares CSV files with headers
48+
2. **Transform**: `CsvSource` (Kiba source) reads rows from the prepared CSV
49+
3. **Load**: Four Kiba destination table classes write rows into an in-memory SQLite database
50+
51+
### Key Flow
52+
53+
`bin/free_zipcode_data``Runner#start``DataSource#download``SqliteRam` (in-memory DB) → `ETL::FreeZipcodeDataJob` (Kiba pipeline) → `SqliteRam#save_to_disk`
54+
55+
### Core Classes
56+
57+
- **`FreeZipcodeData::Runner`** - CLI entry point; parses args via Optimist, orchestrates the full pipeline
58+
- **`FreeZipcodeData::DataSource`** - Downloads and extracts GeoNames zip files, prepares CSV with headers
59+
- **`SqliteRam`** - Wraps SQLite3; works entirely in-memory then saves to disk via `SQLite3::Backup`
60+
- **`FreeZipcodeData::DbTable`** - Base class for all table classes; provides progress bar, SQL helpers, and country lookup from `country_lookup_table.yml`
61+
- **`CountryTable`/`StateTable`/`CountyTable`/`ZipcodeTable`** - Kiba destinations; each has `build` (creates schema + indexes) and `write` (inserts rows, swallows duplicate constraint violations)
62+
- **`ETL::FreeZipcodeDataJob`** - Configures the Kiba pipeline with one source and four destinations
63+
- **`CsvSource`** - Kiba-compatible CSV reader
64+
65+
### Singletons
66+
67+
`Options` and `Logger` are singletons (via Ruby's `Singleton` module). `Runner` uses a manual singleton pattern.
68+
69+
## Configuration
70+
71+
- `.ruby-version`: 3.0.2
72+
- Bundle path: `vendor/bundle` (binstubs in `stubs/`)
73+
- Environment: `APP_ENV` controls environment (`test`, `development`)
74+
- Config file: `~/.free_zipcode_data.yml` (overridable via `FZD_CONFIG_FILE` env var; uses `spec/fixtures/` version in test)
75+
76+
## Rubocop
77+
78+
Key style settings (`.rubocop.yml`):
79+
- Target Ruby 2.7
80+
- Max line length: 110
81+
- Max method length: 30 lines
82+
- `Style/ClassVars`, `Style/Documentation`, `Metrics/AbcSize`, `Lint/SuppressedException` disabled
83+
- `vendor/` and `stubs/` excluded
84+
85+
## Git Workflow
86+
87+
- `master` is the release branch
88+
- `develop` is the development branch
89+
- Version bumps should happen on `develop`, then merge to `master` before `rake release`

Gemfile.lock

Lines changed: 47 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ PATH
33
specs:
44
free_zipcode_data (1.0.6)
55
colored (~> 1.2)
6+
csv
67
kiba (~> 4.0)
78
optimist (~> 3.0)
89
ruby-progressbar (~> 1.9)
@@ -12,63 +13,74 @@ PATH
1213
GEM
1314
remote: https://rubygems.org/
1415
specs:
15-
ast (2.4.2)
16+
ast (2.4.3)
1617
coderay (1.1.3)
1718
colored (1.2)
18-
diff-lcs (1.4.4)
19-
docile (1.4.0)
19+
csv (3.3.5)
20+
diff-lcs (1.6.2)
21+
docile (1.4.1)
22+
json (2.18.1)
2023
kiba (4.0.0)
24+
language_server-protocol (3.17.0.5)
25+
lint_roller (1.1.0)
2126
method_source (0.9.2)
2227
mini_portile2 (2.8.9)
2328
optimist (3.2.1)
24-
parallel (1.21.0)
25-
parser (3.0.2.0)
29+
parallel (1.27.0)
30+
parser (3.3.10.1)
2631
ast (~> 2.4.1)
32+
racc
33+
prism (1.9.0)
2734
pry (0.12.2)
2835
coderay (~> 1.1.0)
2936
method_source (~> 0.9.0)
3037
pry-nav (0.3.0)
3138
pry (>= 0.9.10, < 0.13.0)
32-
rainbow (3.0.0)
33-
rake (13.0.6)
34-
regexp_parser (2.1.1)
35-
rexml (3.4.2)
36-
rspec (3.10.0)
37-
rspec-core (~> 3.10.0)
38-
rspec-expectations (~> 3.10.0)
39-
rspec-mocks (~> 3.10.0)
40-
rspec-core (3.10.1)
41-
rspec-support (~> 3.10.0)
42-
rspec-expectations (3.10.1)
39+
racc (1.8.1)
40+
rainbow (3.1.1)
41+
rake (13.3.1)
42+
regexp_parser (2.11.3)
43+
rspec (3.13.2)
44+
rspec-core (~> 3.13.0)
45+
rspec-expectations (~> 3.13.0)
46+
rspec-mocks (~> 3.13.0)
47+
rspec-core (3.13.6)
48+
rspec-support (~> 3.13.0)
49+
rspec-expectations (3.13.5)
4350
diff-lcs (>= 1.2.0, < 2.0)
44-
rspec-support (~> 3.10.0)
45-
rspec-mocks (3.10.2)
51+
rspec-support (~> 3.13.0)
52+
rspec-mocks (3.13.7)
4653
diff-lcs (>= 1.2.0, < 2.0)
47-
rspec-support (~> 3.10.0)
48-
rspec-support (3.10.3)
49-
rubocop (1.22.3)
54+
rspec-support (~> 3.13.0)
55+
rspec-support (3.13.7)
56+
rubocop (1.84.2)
57+
json (~> 2.3)
58+
language_server-protocol (~> 3.17.0.2)
59+
lint_roller (~> 1.1.0)
5060
parallel (~> 1.10)
51-
parser (>= 3.0.0.0)
61+
parser (>= 3.3.0.2)
5262
rainbow (>= 2.2.2, < 4.0)
53-
regexp_parser (>= 1.8, < 3.0)
54-
rexml
55-
rubocop-ast (>= 1.12.0, < 2.0)
63+
regexp_parser (>= 2.9.3, < 3.0)
64+
rubocop-ast (>= 1.49.0, < 2.0)
5665
ruby-progressbar (~> 1.7)
57-
unicode-display_width (>= 1.4.0, < 3.0)
58-
rubocop-ast (1.12.0)
59-
parser (>= 3.0.1.1)
66+
unicode-display_width (>= 2.4.0, < 4.0)
67+
rubocop-ast (1.49.0)
68+
parser (>= 3.3.7.2)
69+
prism (~> 1.7)
6070
ruby-prof (0.18.0)
61-
ruby-progressbar (1.11.0)
62-
rubyzip (3.1.1)
63-
simplecov (0.21.2)
71+
ruby-progressbar (1.13.0)
72+
rubyzip (3.2.2)
73+
simplecov (0.22.0)
6474
docile (~> 1.1)
6575
simplecov-html (~> 0.11)
6676
simplecov_json_formatter (~> 0.1)
67-
simplecov-html (0.12.3)
68-
simplecov_json_formatter (0.1.3)
77+
simplecov-html (0.13.2)
78+
simplecov_json_formatter (0.1.4)
6979
sqlite3 (1.7.3)
7080
mini_portile2 (~> 2.8.0)
71-
unicode-display_width (2.1.0)
81+
unicode-display_width (3.2.0)
82+
unicode-emoji (~> 4.1)
83+
unicode-emoji (4.2.0)
7284

7385
PLATFORMS
7486
ruby
@@ -84,4 +96,4 @@ DEPENDENCIES
8496
simplecov (~> 0.16)
8597

8698
BUNDLED WITH
87-
2.2.22
99+
2.6.9

free_zipcode_data.gemspec

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ Gem::Specification.new do |spec|
3131
spec.add_development_dependency 'ruby-prof', '~> 0.17'
3232
spec.add_development_dependency 'simplecov', '~> 0.16'
3333

34+
spec.add_runtime_dependency 'csv'
3435
spec.add_runtime_dependency 'colored', '~> 1.2'
3536
spec.add_runtime_dependency 'kiba', '~> 4.0'
3637
spec.add_runtime_dependency 'optimist', '~> 3.0'

spec/etl/csv_source_spec.rb

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# frozen_string_literal: true
2+
3+
require 'etl/csv_source'
4+
5+
RSpec.describe CsvSource do
6+
let(:fixture_csv) { File.join(FreeZipcodeData.root, 'spec', 'fixtures', 'test_data.csv') }
7+
8+
describe '#initialize' do
9+
it 'stores the filename and options' do
10+
source = described_class.new(filename: fixture_csv)
11+
expect(source.filename).to eq(fixture_csv)
12+
expect(source.headers).to be true
13+
expect(source.delimeter).to eq("\t")
14+
end
15+
16+
it 'accepts custom delimiter and quote char' do
17+
source = described_class.new(filename: fixture_csv, delimeter: ',', quote_char: '"')
18+
expect(source.delimeter).to eq(',')
19+
expect(source.quote_char).to eq('"')
20+
end
21+
end
22+
23+
describe '#each' do
24+
it 'yields each row as a hash with symbolized keys' do
25+
source = described_class.new(filename: fixture_csv, delimeter: ',', quote_char: '"')
26+
rows = []
27+
source.each { |row| rows << row }
28+
29+
expect(rows.length).to eq(5)
30+
expect(rows.first).to be_a(Hash)
31+
expect(rows.first.keys).to include(:country, :postal_code, :city)
32+
end
33+
34+
it 'parses the correct data from each row' do
35+
source = described_class.new(filename: fixture_csv, delimeter: ',', quote_char: '"')
36+
rows = []
37+
source.each { |row| rows << row }
38+
39+
first = rows.first
40+
expect(first[:country]).to eq('US')
41+
expect(first[:postal_code]).to eq('10001')
42+
expect(first[:city]).to eq('New York')
43+
expect(first[:short_state]).to eq('NY')
44+
end
45+
46+
it 'handles rows from multiple countries' do
47+
source = described_class.new(filename: fixture_csv, delimeter: ',', quote_char: '"')
48+
countries = []
49+
source.each { |row| countries << row[:country] }
50+
51+
expect(countries.uniq.sort).to eq(%w[CA GB US])
52+
end
53+
end
54+
end
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# frozen_string_literal: true
2+
3+
require 'kiba'
4+
require 'etl/free_zipcode_data_job'
5+
6+
RSpec.describe ETL::FreeZipcodeDataJob do
7+
let(:db) { create_test_database(line_count: 5) }
8+
let(:fixture_csv) { File.join(FreeZipcodeData.root, 'spec', 'fixtures', 'test_data.csv') }
9+
let(:logger) { FreeZipcodeData::Logger.instance }
10+
let(:string_io) { StringIO.new }
11+
let(:options) do
12+
OpenStruct.new(
13+
country_tablename: 'countries',
14+
state_tablename: 'states',
15+
county_tablename: 'counties',
16+
zipcode_tablename: 'zipcodes',
17+
verbose: false
18+
)
19+
end
20+
21+
before do
22+
FreeZipcodeData::Options.instance.initialize_hash(options)
23+
logger.log_provider = ::Logger.new(string_io)
24+
end
25+
26+
describe '.setup' do
27+
it 'returns a Kiba job definition' do
28+
job = described_class.setup(fixture_csv, db, logger, options)
29+
expect(job).not_to be_nil
30+
end
31+
end
32+
33+
describe 'full ETL pipeline' do
34+
before do
35+
# Build all tables
36+
FreeZipcodeData::CountryTable.new(database: db, tablename: 'countries').build
37+
FreeZipcodeData::StateTable.new(database: db, tablename: 'states').build
38+
FreeZipcodeData::CountyTable.new(database: db, tablename: 'counties').build
39+
FreeZipcodeData::ZipcodeTable.new(database: db, tablename: 'zipcodes').build
40+
41+
job = described_class.setup(fixture_csv, db, logger, options)
42+
Kiba.run(job)
43+
end
44+
45+
it 'populates the countries table' do
46+
rows = db.execute('SELECT alpha2 FROM countries ORDER BY alpha2')
47+
expect(rows.flatten).to include('CA', 'GB', 'US')
48+
end
49+
50+
it 'populates the states table' do
51+
rows = db.execute('SELECT abbr FROM states ORDER BY abbr')
52+
abbrs = rows.flatten
53+
expect(abbrs).to include('CA', 'IL', 'NY')
54+
end
55+
56+
it 'populates the counties table' do
57+
rows = db.execute('SELECT name FROM counties ORDER BY name')
58+
names = rows.flatten
59+
expect(names).to include('Cook', 'Los Angeles', 'New York')
60+
end
61+
62+
it 'populates the zipcodes table' do
63+
rows = db.execute('SELECT code FROM zipcodes ORDER BY code')
64+
codes = rows.flatten
65+
expect(codes).to include('10001', '60601', '90210')
66+
end
67+
68+
it 'links zipcodes to states' do
69+
rows = db.execute(<<-SQL)
70+
SELECT z.code, s.abbr
71+
FROM zipcodes z
72+
JOIN states s ON CAST(z.state_id AS INTEGER) = s.id
73+
WHERE z.code = '60601'
74+
SQL
75+
expect(rows[0]).to eq(['60601', 'IL'])
76+
end
77+
78+
it 'links states to countries' do
79+
rows = db.execute(<<-SQL)
80+
SELECT s.abbr, c.alpha2
81+
FROM states s
82+
JOIN countries c ON s.country_id = c.id
83+
WHERE s.abbr = 'NY'
84+
SQL
85+
expect(rows[0]).to eq(['NY', 'US'])
86+
end
87+
88+
it 'stores geocode data for zipcodes' do
89+
rows = db.execute("SELECT lat, lon FROM zipcodes WHERE code = '10001'")
90+
lat = rows[0][0].to_f
91+
lon = rows[0][1].to_f
92+
expect(lat).to be_within(0.01).of(40.7484)
93+
expect(lon).to be_within(0.01).of(-73.9967)
94+
end
95+
end
96+
end
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
---

spec/fixtures/US.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
US 10001 New York New York NY New York 061 Manhattan MN 40.7484 -73.9967 4
2+
US 90210 Beverly Hills California CA Los Angeles 037 LA 34.0901 -118.4065 4
3+
US 60601 Chicago Illinois IL Cook 031 CK 41.8819 -87.6278 4
4+
CA H2X Montreal Quebec QC Montreal 45.5088 -73.5878 4
5+
GB SW1A London England ENG Westminster City of Westminster 51.5014 -0.1419 1

spec/fixtures/US.zip

400 Bytes
Binary file not shown.

spec/fixtures/test_data.csv

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
COUNTRY,POSTAL_CODE,CITY,STATE,SHORT_STATE,COUNTY,SHORT_COUNTY,COMMUNITY,SHORT_COMMUNITY,LATITUDE,LONGITUDE,ACCURACY
2+
US,10001,New York,New York,NY,New York,061,Manhattan,MN,40.7484,-73.9967,4
3+
US,90210,Beverly Hills,California,CA,Los Angeles,037,,LA,34.0901,-118.4065,4
4+
US,60601,Chicago,Illinois,IL,Cook,031,,CK,41.8819,-87.6278,4
5+
CA,H2X,Montreal,Quebec,QC,,,Montreal,,45.5088,-73.5878,4
6+
GB,SW1A,London,England,ENG,Westminster,,City of Westminster,,51.5014,-0.1419,1

spec/fixtures/test_data.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
US 10001 New York New York NY New York 061 Manhattan MN 40.7484 -73.9967 4
2+
US 90210 Beverly Hills California CA Los Angeles 037 LA 34.0901 -118.4065 4
3+
US 60601 Chicago Illinois IL Cook 031 CK 41.8819 -87.6278 4
4+
CA H2X Montreal Quebec QC Montreal 45.5088 -73.5878 4
5+
GB SW1A London England ENG Westminster City of Westminster 51.5014 -0.1419 1

0 commit comments

Comments
 (0)