A lightweight local desktop + CLI tool for collecting, normalizing, and maintaining Boston Arrest data from the public ArcGIS endpoint.
This tool:
- Pulls arrest records from the Boston ArcGIS API
- Normalizes arrest numbers and keys
- Prevents duplicate records using a composite primary key
- Supports CSV import and export
- Maintains a persistent local SQLite database
- Works as both a CLI tool and a simple desktop GUI
- Can be packaged as a standalone Windows
.exe
- Full sync: pulls all available records
- Recent sync: pulls last N months using YEAR + MONTH fields
- Safe upsert logic (no duplicate rows)
- Detailed sync statistics:
- Rows processed
- Inserted
- Updated
- Unchanged
- Skipped
- Final unique row count
- Handles case-insensitive column names
- Normalizes arrest number formats:
25-008332→20250008332
- Upserts into DB safely
- Reports insert/update stats
- Exports full database to CSV
- Preserves normalized structure
- Automatically appends
.csvif missing - Shows final file path
- Import button
- Sync Recent button
- Sync Full button
- Export button
- Log output panel
- Shows database location
- Displays row count and health status
- Works without GUI
- Supports file paths from terminal
- Can be packaged into a single
.exe
config.py → paths and application configuration
db.py → SQLite schema + upsert + import/export logic
api.py → ArcGIS API sync logic
app.py → CLI entry point
gui.py → Desktop GUI entry point
The database is stored in:
%APPDATA%\OPATDataCollector\data.db
Exports default to:
%APPDATA%\OPATDataCollector\exports\
- Clone the repository
git clone https://github.com/yourusername/opat-data-collector.git
cd opat-data-collector
- Create virtual environment (optional but recommended)
python -m venv venv
venv\Scripts\activate
- Install requirements
pip install -r requirements.txt
python app.py sync-full
python app.py sync
python app.py import --file path\to\input.csv
python app.py export --file path\to\output.csv
Launch:
python gui.py
Optional pre-filled file paths:
python gui.py --import-file input.csv --export-file output.csv
pyinstaller --onefile --name OPATDataCollector app.py
pyinstaller --onefile --windowed --name OPATDataCollector gui.py
Output will be in:
dist/OPATDataCollector.exe
You can distribute the single .exe file.
Boston Arrest data is retrieved from the public ArcGIS FeatureServer endpoint:
Boston_Arrests_Tbl_Pubview
No API key required.
The database uses a composite primary key:
PRIMARY KEY (ARREST_NUM, CHARGE_SEQ_NUM)
Arrest numbers are normalized:
YY-NNNNNN→YYYYNNNNNNN- Non-digit characters stripped
- Left-padded to preserve format
This ensures:
- No duplicate logical records
- Clean consistent IDs across API + CSV
The system verifies:
- arrests table exists
- sync_state table exists
- total row count
Displayed in both CLI and GUI.
- Python 3.11+
- SQLite
- Tkinter (GUI)
- Requests (API calls)
- PyInstaller (packaging)
- Incremental sync using last ARR_DATE
- Progress bar in GUI
- Auto-update mechanism
MIT License