Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2) by arthurpassos · Pull Request #1785 · Altinity/ClickHouse

arthurpassos · 2026-05-12T16:54:07Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate

Backport of ClickHouse#100420 and ClickHouse#100295

Documentation entry for user-facing changes

...

CI/CD Options

Exclude tests:

Regression jobs to run:

…_spark_azure_fixes Resolve problems with paths and compatibility problems with Spark in Azure (v2)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9762404674

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-12T17:10:06Z

    per_file_record_counts.push_back(static_cast<Int64>(*current_file_num_rows));
-    per_file_byte_sizes.push_back(static_cast<Int64>(file_bytes));
+    /// todo arthur fix the wrong counter for file bytes, probably by backporting something else
+    per_file_byte_sizes.push_back(static_cast<Int64>(buffer_bytes));


Store the Azure fallback size in per-file stats

When the write buffer does not report bytes (the Azure path handled just above), buffer_bytes remains 0, so per_file_byte_sizes records 0 even though total_bytes was corrected from object metadata. This value is returned by getDataFileEntries and written to the import/export sidecar, which later becomes the Iceberg manifest file_size_in_bytes, so exported/imported Azure data files can be committed with a zero file size. Use the fallback size for both counters instead of only total_bytes.

Useful? React with 👍 / 👎.

github-actions · 2026-05-12T17:16:32Z

Workflow [PR], commit [e5ef89f]

…hunk` In `addRequestedFileLikeStorageVirtualsToChunk`, the `_row_number` handling block uses `return` instead of `continue` after adding the column to the chunk. This causes the function to exit the loop early, skipping any remaining virtual columns (e.g. `_data_lake_snapshot_version`). When a query requests both `_row_number` and another virtual column after it, the chunk has fewer columns than expected, resulting in: "Invalid number of columns in chunk pushed to OutputPort." The fix was originally in ClickHouse#100116 but was lost during merge because ClickHouse#100208 (revert of ClickHouse#99163) had reintroduced the `return` on master after the fix branch had already resolved it via a different code structure. The regression test `04050_iceberg_virtual_columns_return_bug` is already on master from ClickHouse#100116. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=f10711f066fd101124e088ce33061de51ebae0e9&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29 ClickHouse#87890 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ianton-ru · 2026-05-13T14:31:59Z

Looks like tests test_named_collections_encrypted2 can't be executed multiple times in parallel.

divanik and others added 3 commits May 12, 2026 10:44

Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert…

c354738

…_spark_azure_fixes Resolve problems with paths and compatibility problems with Spark in Azure (v2)

attempt to fix some conflicts

9762404

Merge branch 'antalya-26.3' into backport/antalya/iceberg_path_100420

008b2d7

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

arthurpassos changed the title ~~Backport of #100420 - Resolve problems with paths and compatibility problems with Spark in Azure (v2)~~ Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2) May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1785

Backport of #100420/#100295 - Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1785
arthurpassos wants to merge 4 commits into
antalya-26.3from
backport/antalya/iceberg_path_100420

arthurpassos commented May 12, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026 •

edited

Loading

Uh oh!

ianton-ru commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arthurpassos commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

CI/CD Options

Exclude tests:

Regression jobs to run:

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ianton-ru commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

arthurpassos commented May 12, 2026 •

edited

Loading

github-actions Bot commented May 12, 2026 •

edited

Loading