feat: route use_sea=True through ADBC-Rust kernel via PyO3 by vikrantpuppala · Pull Request #782 · databricks/databricks-sql-python

vikrantpuppala · 2026-04-29T09:31:54Z

Summary

Draft / RFC. Companion to the PyO3 satellite binding being prototyped in adbc-drivers/databricks#423.

Adds a new backend, AdbcDatabricksClient, that delegates query execution to the databricks_adbc_pyo3 extension module (PyO3 bindings over the Databricks ADBC Rust kernel). When use_sea=True is passed to sql.connect, requests now flow through the Rust kernel instead of the existing Python-SEA backend.

What this proves out

The kernel-strategy design (docs/kernel-strategy-final-recommendation.md in the kernel repo) calls for use_sea=True to be powered by a single Rust SEA implementation shared across all Databricks language drivers. This PR is the Python-side wiring to validate that path end-to-end.

Performance vs the existing Thrift backend on a dogfood warehouse, randomized interleaved benchmark, median wall time, fetchall_arrow path:

size	ADBC-Rust	Thrift	ratio
`SELECT 1`	394ms	387ms	1.02×
10K	893ms	1014ms	0.88×
100K	1148ms	1145ms	1.00×
500K	2178ms	3305ms	0.66×
1M	3579ms	3814ms	0.94×
10M	8677ms	8802ms	0.99×

What's wired through the public API

sql.connect(..., use_sea=True) opens a Rust-kernel-backed session
cursor.execute(sql) runs queries (sync, PAT-only)
cursor.fetchone() / fetchmany(n) / fetchall() returns Row namedtuples
cursor.fetchall_arrow() / fetchmany_arrow(n) returns pyarrow.Table (zero-copy from Rust via Arrow C Data Interface)
cursor.description returns PEP-249 7-tuples derived from the Arrow schema
iteration (for row in cursor) and context managers

What is NOT yet wired (raises `NotImplementedError`)

Parameterized queries (parameters=[...])
Async execution (async_op=True) and cancel()
Metadata methods (cursor.catalogs() / schemas() / tables() / columns())
Auth: PAT only — no OAuth M2M, U2M, Azure SP, or external credential providers
Staging operations
No Ctrl-C signal handling, no logging bridge into Python logging
No native exception hierarchy — all kernel errors map to DatabaseError / OperationalError / ProgrammingError

Code layout

src/databricks/sql/backend/adbc/
├── __init__.py      # re-exports AdbcDatabricksClient
├── client.py        # DatabricksClient impl, delegates to PyO3
└── result_set.py    # ResultSet impl over the streaming PyO3 ResultSet,
                     # with batch buffering for fetchone / fetchmany.

The old backend/sea/ tree is left in place and unreachable from sql.connect; deletion is a separate cleanup once this backend reaches parity with the rest of the design doc.

Why draft?

databricks_adbc_pyo3 is not yet on PyPI. CI here will fail to import the new backend until the satellite is published. To run locally:

git clone https://github.com/adbc-drivers/databricks
cd databricks/rust-pyo3
python3 -m venv .venv && source .venv/bin/activate
pip install 'maturin>=1.5,<2.0' pyarrow
maturin develop --release

cd /path/to/databricks-sql-python
pip install -e .
DATABRICKS_HOST=... DATABRICKS_HTTP_PATH=... DATABRICKS_TOKEN=... python -c "
from databricks import sql
with sql.connect(server_hostname=..., http_path=..., access_token=..., use_sea=True) as c:
    print(c.cursor().execute('SELECT 1').fetchall())
"

Open questions

Should this PR ship together with the kernel + satellite PRs, or sequenced (kernel first, satellite second, this third)?
The original Python-SEA design doc (python-driver-rust-adbc-sea-design.md in the kernel repo) plans for deletion of the existing backend/sea/ tree. Is keeping it in place for one release acceptable for the migration window?
Authentication: do we want to bring up OAuth/M2M before merging, or is PAT-only acceptable as a v0?

Test plan

import databricks.sql works
sql.connect(use_sea=True) succeeds against a dogfood warehouse with a PAT
Small inline result via fetchone() / fetchall_arrow()
1M-row CloudFetch result via fetchall_arrow()
fetchmany(n) slices correctly across batch boundaries
cursor.description returns sensible types
OAuth, async, metadata, parameterized queries — all explicitly out of scope
CI integration — depends on the PyO3 binding being published

This pull request and its description were written by Isaac.

Adds a new backend, `AdbcDatabricksClient`, that delegates query execution to the `databricks_adbc_pyo3` extension module (PyO3 bindings over the Databricks ADBC Rust kernel). When `use_sea=True` is passed to `sql.connect`, requests now flow through the Rust kernel instead of the existing Python-SEA backend. This is the Python-side companion to the satellite PyO3 binding being prototyped in adbc-drivers/databricks#423. **Draft** while that binding is not yet on PyPI — `import databricks_adbc_pyo3` will fail unless the binding is installed locally via `maturin develop`. What's wired through the public API: - sql.connect(..., use_sea=True) → Rust kernel - cursor.execute(...) → SEA + CloudFetch - cursor.fetchone() / fetchmany(n) / fetchall() → Row tuples - cursor.fetchall_arrow() / fetchmany_arrow(n) → pyarrow.Table - cursor.description → PEP-249 7-tuples - iteration (`for row in cursor`), context mgrs What is NOT yet wired (raises NotImplementedError): - Parameterized queries (`parameters=[...]`) - Async execution (`async_op=True`) - Metadata methods (catalogs, schemas, tables, columns) Auth: PAT only for now; OAuth M2M / U2M / Azure SP / external credential providers are not yet plumbed through the Rust binding. Code layout: src/databricks/sql/backend/adbc/ __init__.py — re-exports AdbcDatabricksClient client.py — DatabricksClient impl, delegates to PyO3 result_set.py — ResultSet impl over the streaming PyO3 ResultSet, with batch buffering for fetchone / fetchmany. The old `backend/sea/` tree is left in place and unreachable from sql.connect; deletion is a separate cleanup once this backend reaches parity.

vikrantpuppala had a problem deploying to azure-prod April 29, 2026 09:32 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: route use_sea=True through ADBC-Rust kernel via PyO3#782

feat: route use_sea=True through ADBC-Rust kernel via PyO3#782
vikrantpuppala wants to merge 1 commit intomainfrom
adbc-rust-backend

vikrantpuppala commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vikrantpuppala commented Apr 29, 2026

Summary

What this proves out

What's wired through the public API

What is NOT yet wired (raises NotImplementedError)

Code layout

Why draft?

Open questions

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

What is NOT yet wired (raises `NotImplementedError`)