Skip to content

Azube Blob Storage support #217

@zxqfd555

Description

@zxqfd555

Is your feature request related to a problem? Please describe.

Pathway supports S3 and MinIO as object storage backends for reading pipeline data, but there is no equivalent connector for Azure Blob Storage. Users running Pathway on Azure infrastructure have no native way to read from or write to Blob Storage, and must resort to custom Python connectors that bypass engine-level guarantees.

Describe the solution you'd like

Add pw.io.azure_blob_storage.read and pw.io.azure_blob_storage.write, with an interface as close as possible to pw.io.s3.read and pw.io.s3.write (planned in #216).

Crates: azure_core, azure_storage, azure_storage_blobs — all MIT license. Already used by Pathway's persistence checkpoint layer; no new dependencies needed.

API:

pw.io.azure_blob_storage.read(
    container,
    path,        # blob prefix / directory to read from
    format,      # "csv" | "json" | ...
    ...
)

pw.io.azure_blob_storage.write(
    table,
    container,
    path,        # blob prefix / directory to write into
    format,      # "csv" | "json" | ...
    *,
    write_interval,
    ...
)

The write connector follows the same buffering and flush semantics as planned for pw.io.s3.write (#216): rows are accumulated in an in-memory buffer, and on each flush call the connector checks whether write_interval has elapsed. If yes, the buffer is uploaded as a new blob under the configured prefix and the buffer is cleared. If no, the buffer is retained for the next flush cycle.

Describe alternatives you've considered

Azure Blob Storage exposes an Append Blob type that supports appending blocks to an existing blob, unlike S3. This could in principle allow a simpler write strategy. However, adopting it would diverge the implementation and semantics from the S3/MinIO connectors without a compelling benefit — the batched write approach is predictable, cost-efficient, and consistent across all object storage connectors. Append Blob support can be considered as a follow-up.

Additional context

Since the azure_storage_blobs crate is already in use for persistence, the authentication setup (connection strings, SAS tokens, managed identity) is already solved and can be reused directly.

Testing should follow the same pattern as the S3/MinIO integration tests. Coverage should include read, write, flush timing, buffer retention, multiple flush cycles, and format coverage (at minimum CSV and JSON). An Azurite container (the Azure Blob Storage emulator) should be used in the Docker Compose test suite in place of a real Azure account.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions