Stop manual copy-pasting. Transition your Airbnb management from messy spreadsheets to a robust data warehouse.
This project provides an automated ETL pipeline that triggers when an Airbnb earnings CSV is uploaded to Google Cloud Storage (GCS). It cleanses the data and performs an UPSERT (MERGE) into BigQuery, ensuring your financial records are always up-to-date and free of duplicates.
This implementation is written in Ruby and is designed to run on Cloud Run with Eventarc. It preserves the behavior of the original Python pipeline as closely as possible, except for unavoidable language-level differences.
- Automated ETL: Fully event-driven. Upload a file to GCS, and your data appears in BigQuery seconds later.
- Idempotency (Smart Upsert): Implements SHA256 row hashing. It uniquely identifies every entry, including payouts without confirmation codes, preventing duplicate rows when the same file is uploaded multiple times.
- Data Cleansing & Normalization:
- Maps Japanese headers to standardized English column names.
- Converts US-style dates (
MM/DD/YYYY) to ISO-compatible values for BigQuery. - Normalizes financial columns with
BigDecimalfor BigQueryNUMERICcompatibility. - Preserves unmapped source columns and emits actionable warnings when Airbnb changes the export format.
- Cloud Run Ready: Accepts HTTP requests from Eventarc and supports both structured CloudEvent payloads and raw event-style payloads.
- BI & Analytics Ready: Query with SQL or connect BigQuery to Google Sheets or Looker Studio for financial dashboards.
- Tested for OSS Use: Includes a
Minitestsuite withSimpleCovcoverage enforcement. Current coverage is above 80%.
- Storage: Google Cloud Storage (trigger bucket).
- Routing: Eventarc (object finalized trigger).
- Compute: Cloud Run (Ruby + Rack/Puma).
- Warehouse: Google BigQuery.
- Interface: Google Sheets (Connected Sheets) or Looker Studio.
The pipeline is controlled via environment variables in Cloud Run.
| Variable | Description | Default |
|---|---|---|
GCP_PROJECT_ID |
Your Google Cloud Project ID. | - |
BQ_DATASET_ID |
Destination BigQuery dataset ID. | airbnb_management |
BQ_TABLE_ID |
Destination BigQuery table ID. | earnings_cleaned |
PORT |
HTTP port used by Cloud Run. | 8080 |
Ensure the Cloud Run service account has the following permissions:
Storage Object Viewer: To read CSV files from GCS.BigQuery Data Editor: To insert, copy, and merge data into tables.BigQuery Job User: To run query and load jobs.
If you deploy the Eventarc trigger from CI/CD or the command line, that identity also needs the relevant Eventarc and Cloud Run administration permissions.
bundle install
bundle exec puma -C config/puma.rbHealth check:
curl http://localhost:8080/upbundle exec rake testDeploy the Cloud Run service with deploy.sh or through Cloud Build (cloudbuild.yaml). Create the Eventarc trigger separately with scripts/create_trigger.sh.
chmod +x deploy.sh scripts/create_trigger.sh
SERVICE_NAME=airbnb-payouts-import \
REGION=asia-northeast1 \
PROJECT_ID=your-project-id \
SERVICE_ACCOUNT_EMAIL=etl-runner@your-project-id.iam.gserviceaccount.com \
BQ_DATASET_ID=airbnb_management \
BQ_TABLE_ID=earnings_cleaned \
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/... \
./deploy.shCreate or update the Eventarc trigger separately:
SERVICE_NAME=airbnb-payouts-import \
TRIGGER_NAME=airbnb-payouts-import-gcs-finalized \
REGION=asia-northeast1 \
PROJECT_ID=your-project-id \
TRIGGER_BUCKET=your-bucket \
SERVICE_ACCOUNT_EMAIL=etl-runner@your-project-id.iam.gserviceaccount.com \
./scripts/create_trigger.sh- Export your Transaction History CSV from the Airbnb hosting dashboard.
- Upload the CSV to your designated GCS bucket.
- Eventarc sends the object-finalized event to Cloud Run.
- The service cleans, stages, and merges the data into BigQuery.
- Receive a notification in Slack (if configured).
- Analyze your data in BigQuery, Google Sheets, or Looker Studio.
When SLACK_WEBHOOK_URL is provided, the pipeline sends a rich attachment message to your channel:
- Success: Shows the filename, import mode (Full Import for new tables vs. Merge Import for existing tables), and the count of inserted and updated rows.
- Failure: Sends an alert with the error message and filename to help you troubleshoot quickly (e.g., schema mismatches or permission issues).
- Like the original implementation, this project uses a staging table and then performs a
MERGEinto the target table. - Like the original implementation, if Airbnb introduces a new column and your target BigQuery table schema is not updated, the merge can fail until the schema is aligned.
- The service entrypoint is HTTP-based because Cloud Run receives events through Eventarc rather than Cloud Functions-style
event, contexthandlers.
Contributions are welcome. Feel free to open an issue or submit a pull request if you have ideas for improvement.
This project is licensed under the MIT License. See LICENSE for details.