Is your feature request related to a problem? Please describe.
Pathway supports S3 and MinIO as object storage backends for reading pipeline data, but there is no equivalent connector for Azure Blob Storage. Users running Pathway on Azure infrastructure have no native way to read from or write to Blob Storage, and must resort to custom Python connectors that bypass engine-level guarantees.
Describe the solution you'd like
Add pw.io.azure_blob_storage.read and pw.io.azure_blob_storage.write, with an interface as close as possible to pw.io.s3.read and pw.io.s3.write (planned in #216).
Crates: azure_core, azure_storage, azure_storage_blobs — all MIT license. Already used by Pathway's persistence checkpoint layer; no new dependencies needed.
API:
pw.io.azure_blob_storage.read(
container,
path, # blob prefix / directory to read from
format, # "csv" | "json" | ...
...
)
pw.io.azure_blob_storage.write(
table,
container,
path, # blob prefix / directory to write into
format, # "csv" | "json" | ...
*,
write_interval,
...
)
The write connector follows the same buffering and flush semantics as planned for pw.io.s3.write (#216): rows are accumulated in an in-memory buffer, and on each flush call the connector checks whether write_interval has elapsed. If yes, the buffer is uploaded as a new blob under the configured prefix and the buffer is cleared. If no, the buffer is retained for the next flush cycle.
Describe alternatives you've considered
Azure Blob Storage exposes an Append Blob type that supports appending blocks to an existing blob, unlike S3. This could in principle allow a simpler write strategy. However, adopting it would diverge the implementation and semantics from the S3/MinIO connectors without a compelling benefit — the batched write approach is predictable, cost-efficient, and consistent across all object storage connectors. Append Blob support can be considered as a follow-up.
Additional context
Since the azure_storage_blobs crate is already in use for persistence, the authentication setup (connection strings, SAS tokens, managed identity) is already solved and can be reused directly.
Testing should follow the same pattern as the S3/MinIO integration tests. Coverage should include read, write, flush timing, buffer retention, multiple flush cycles, and format coverage (at minimum CSV and JSON). An Azurite container (the Azure Blob Storage emulator) should be used in the Docker Compose test suite in place of a real Azure account.
Is your feature request related to a problem? Please describe.
Pathway supports S3 and MinIO as object storage backends for reading pipeline data, but there is no equivalent connector for Azure Blob Storage. Users running Pathway on Azure infrastructure have no native way to read from or write to Blob Storage, and must resort to custom Python connectors that bypass engine-level guarantees.
Describe the solution you'd like
Add
pw.io.azure_blob_storage.readandpw.io.azure_blob_storage.write, with an interface as close as possible topw.io.s3.readandpw.io.s3.write(planned in #216).Crates:
azure_core,azure_storage,azure_storage_blobs— all MIT license. Already used by Pathway's persistence checkpoint layer; no new dependencies needed.API:
The
writeconnector follows the same buffering and flush semantics as planned forpw.io.s3.write(#216): rows are accumulated in an in-memory buffer, and on eachflushcall the connector checks whetherwrite_intervalhas elapsed. If yes, the buffer is uploaded as a new blob under the configured prefix and the buffer is cleared. If no, the buffer is retained for the next flush cycle.Describe alternatives you've considered
Azure Blob Storage exposes an Append Blob type that supports appending blocks to an existing blob, unlike S3. This could in principle allow a simpler write strategy. However, adopting it would diverge the implementation and semantics from the S3/MinIO connectors without a compelling benefit — the batched write approach is predictable, cost-efficient, and consistent across all object storage connectors. Append Blob support can be considered as a follow-up.
Additional context
Since the
azure_storage_blobscrate is already in use for persistence, the authentication setup (connection strings, SAS tokens, managed identity) is already solved and can be reused directly.Testing should follow the same pattern as the S3/MinIO integration tests. Coverage should include read, write, flush timing, buffer retention, multiple flush cycles, and format coverage (at minimum CSV and JSON). An Azurite container (the Azure Blob Storage emulator) should be used in the Docker Compose test suite in place of a real Azure account.