Azube Blob Storage support

**Is your feature request related to a problem? Please describe.**

Pathway supports S3 and MinIO as object storage backends for reading pipeline data, but there is no equivalent connector for Azure Blob Storage. Users running Pathway on Azure infrastructure have no native way to read from or write to Blob Storage, and must resort to custom Python connectors that bypass engine-level guarantees.

**Describe the solution you'd like**

Add `pw.io.azure_blob_storage.read` and `pw.io.azure_blob_storage.write`, with an interface as close as possible to `pw.io.s3.read` and `pw.io.s3.write` (planned in #216).

**Crates:** [`azure_core`](https://crates.io/crates/azure_core), [`azure_storage`](https://crates.io/crates/azure_storage), [`azure_storage_blobs`](https://crates.io/crates/azure_storage_blobs) — all **MIT** license. Already used by Pathway's persistence checkpoint layer; no new dependencies needed.

**API:**

```python
pw.io.azure_blob_storage.read(
    container,
    path,        # blob prefix / directory to read from
    format,      # "csv" | "json" | ...
    ...
)

pw.io.azure_blob_storage.write(
    table,
    container,
    path,        # blob prefix / directory to write into
    format,      # "csv" | "json" | ...
    *,
    write_interval,
    ...
)
```

The `write` connector follows the same buffering and flush semantics as planned for `pw.io.s3.write` (#216): rows are accumulated in an in-memory buffer, and on each `flush` call the connector checks whether `write_interval` has elapsed. If yes, the buffer is uploaded as a new blob under the configured prefix and the buffer is cleared. If no, the buffer is retained for the next flush cycle.

**Describe alternatives you've considered**

Azure Blob Storage exposes an [Append Blob](https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs) type that supports appending blocks to an existing blob, unlike S3. This could in principle allow a simpler write strategy. However, adopting it would diverge the implementation and semantics from the S3/MinIO connectors without a compelling benefit — the batched write approach is predictable, cost-efficient, and consistent across all object storage connectors. Append Blob support can be considered as a follow-up.

**Additional context**

Since the `azure_storage_blobs` crate is already in use for persistence, the authentication setup (connection strings, SAS tokens, managed identity) is already solved and can be reused directly.

Testing should follow the same pattern as the S3/MinIO integration tests. Coverage should include read, write, flush timing, buffer retention, multiple flush cycles, and format coverage (at minimum CSV and JSON). An Azurite container (the Azure Blob Storage emulator) should be used in the Docker Compose test suite in place of a real Azure account.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azube Blob Storage support #217

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Azube Blob Storage support #217

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions