The Gateway lets you write Python functions that can be called from SQL in Redshift (and other clouds in the future). Your Python code runs on AWS Lambda, making it easy to use external libraries and complex logic from your database queries.
All deployment logic lives in gateway/logic/. See CLAUDE.md for the complete technical guide.
Before starting, make sure you have:
- Python 3.10 or newer
- AWS credentials with permissions to create Lambda functions
- A Redshift cluster (if you want to deploy functions)
# Create a Python virtual environment
make venv
# Copy the configuration template
cp .env.template .env
# Edit .env and add your AWS and Redshift credentialsThe .env.template file has detailed comments explaining each setting.
Here's a minimal example of what you need in your .env file:
# AWS settings
AWS_REGION=us-east-1
AWS_PROFILE=my-profile # Or use AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY
# Lambda settings
RS_LAMBDA_PREFIX=yourname-at- # Prefix for Lambda functions (max 46 chars, total name ≤64)
RS_LAMBDA_OVERRIDE=1 # Override existing Lambdas (1=yes, 0=no)
# Redshift settings
RS_SCHEMA=yourname_carto # Schema name for gateway functions
RS_HOST=<your-cluster>.redshift.amazonaws.com
RS_DATABASE=<database>
RS_USER=<user>
RS_PASSWORD=<password>
RS_LAMBDA_INVOKE_ROLE=arn:aws:iam::<account>:role/<role># Build your functions (this copies shared code where it's needed)
make build cloud=redshift
# Run the tests
make test-unit cloud=redshift
# Deploy to your dev environment
make deploy cloud=redshiftEach function lives in its own folder with this structure:
gateway/functions/<module>/<function_name>/
├── function.yaml # Describes your function
├── code/
│ ├── lambda/python/
│ │ ├── handler.py # Your Python code
│ │ └── requirements.txt (optional - Python dependencies)
│ └── redshift.sql (optional - can be auto-generated)
└── tests/
├── unit/ # Unit tests
└── integration/ # Integration tests (optional)
For a simple function, you just need to describe it in function.yaml:
name: s2_fromtoken
module: s2
# Define your parameters and return type
parameters:
- name: token
type: string
- name: resolution
type: int
returns: bigint
clouds:
redshift:
type: lambda
lambda_name: s2_ftok # Keep this short (≤18 chars total with your prefix)
code_file: code/lambda/python/handler.pyThe SQL will be generated automatically! Generic types like string and int are converted to the right types for each cloud.
If you need custom SQL or want to use external Python packages:
name: quadbin_polyfill
module: quadbin
clouds:
redshift:
type: lambda
lambda_name: qb_polyfill
code_file: code/lambda/python/handler.py
requirements_file: code/lambda/python/requirements.txt
external_function_template: code/redshift.sql
shared_libs:
- quadbin # Reuses code from _shared/python/quadbin/
config:
memory_size: 512 # MB of memory
timeout: 300 # Seconds
max_batch_rows: 50 # How many rows to process at onceYour handler.py file processes the data:
from carto.lambda_wrapper import redshift_handler
@redshift_handler
def process_row(row):
"""Process a single row of data."""
if not row or row[0] is None:
return None
# Your logic here
result = do_something(row[0])
return result
lambda_handler = process_rowThe @redshift_handler decorator handles batching and error handling for you.
If multiple functions need the same code, put it in gateway/functions/_shared/python/<lib_name>/ and list it in your function.yaml:
shared_libs:
- quadbin
- utilsWhen you build, this code gets copied to each function that needs it.
# Always build first (copies shared code)
make build cloud=redshift
# Run all tests
make test-unit cloud=redshift
# Test a specific module
make test-unit cloud=redshift modules=quadbin
# Test a specific function
make test-unit cloud=redshift functions=quadbin_polyfill
# Integration tests (needs a real Redshift cluster)
make test-integration cloud=redshift# Deploy everything to your dev environment
make deploy cloud=redshift
# Deploy to production (no dev prefixes)
make deploy cloud=redshift production=1
# Deploy just one function
make deploy cloud=redshift functions=quadbin_polyfill
# Deploy all functions in a module
make deploy cloud=redshift modules=quadbin
# Deploy only what changed
make deploy cloud=redshift diff=1When you deploy, the system:
- Packages your code and dependencies into a .zip file
- Uploads it to AWS Lambda
- Creates the SQL function in Redshift that calls your Lambda
# Check your code
make lint
# Auto-fix issues
make lint-fix# Create a package for distribution
make create-package cloud=redshift
# Production package
make create-package cloud=redshift production=1This creates dist/carto-analytics-toolbox-redshift-<version>.zip.
- Build before testing: Always run
make buildbeforemake test-unit. This copies shared libraries where they're needed. - Short Lambda names: Keep the
lambda_namefield short (≤18 characters including your prefix) to avoid AWS limits. - Dev vs Production: Dev mode adds prefixes to your schema and function names. Production mode doesn't.
- Check .env.template: It has detailed documentation for all configuration options.
For everything technical:
- CLAUDE.md - Complete architecture, type mapping system, troubleshooting, and development guidelines
- .env.template - All configuration options explained
If something isn't working:
- Check the troubleshooting section in CLAUDE.md
- Verify your
.envfile has the right credentials - Make sure your
function.yamlfollows the structure shown above - Check that your AWS credentials have the necessary permissions