Skip to content

Antalya 26.3: address OAuth security audit#1777

Open
zvonand wants to merge 55 commits into
antalya-26.3from
fix/antalya-26.3/oauth-address-audit
Open

Antalya 26.3: address OAuth security audit#1777
zvonand wants to merge 55 commits into
antalya-26.3from
fix/antalya-26.3/oauth-address-audit

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented May 11, 2026

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Address OAuth security audit

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

zvonand and others added 18 commits May 11, 2026 14:41
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvonand zvonand added port-antalya PRs to be ported to all new Antalya releases antalya-26.3 labels May 11, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Workflow [PR], commit [75b0d44]

zvonand and others added 7 commits May 11, 2026 20:25
`test_arrowflight_interface` failed with `Token authentication is
not configured` after the M-19 split between Basic and Bearer
authentication paths.

pyarrow's `authenticate_basic_token` performs a `Handshake` with
`Basic <base64>`, captures the `authorization` response header, and
reuses it verbatim on every subsequent RPC. The middleware
unconditionally emitted `Bearer <token>` in `SendingHeaders`, so the
client's next call arrived as `Bearer <base64(user:password)>` and
was routed through the token-auth path -- which has no processors
configured in this test, hence the failure.

Echo back the same scheme the client sent: Basic stays on the Basic
path, Bearer keeps going through token validation.

CI report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=f1a7b011b93262d89f3e02dbff3876c965259f38&name_0=PR&name_1=Integration%20tests%20%28amd_binary%2C%203%2F5%29
test_sql_create_jwt_user_with_processor_pin / _with_claims /
_no_pin_uses_auto_discovery asserted on `SHOW CREATE USER` output via the
default `TabSeparated` format, which escapes single quotes -- the test then
compared an unescaped SQL literal against an escaped one. Switch the three
`SHOW CREATE USER` calls to `FORMAT TSVRaw` so the SQL literal makes it
through verbatim.

test_keycloak_auth/* all failed with `Token authentication is disabled`
because the `openid` processor's `<configuration_endpoint>` is HTTP, and
H-26 now refuses non-HTTPS URLs returned in the OIDC discovery document
(defense in depth against poisoned discovery substituting cloud-metadata
or in-cluster targets). The test Keycloak speaks plain HTTP, so wire
`keycloak_discovery` in manual trust-chain mode -- exactly the escape
hatch the H-26 error message documents -- and update the
`test_openid_discovery` docstring accordingly.

CI:
- https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=bb1b56b0ad9ec85064ffd52990e8cf0fa403a858&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20db%20disk%2C%20old%20analyzer%2C%203%2F6%29
- https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=bb1b56b0ad9ec85064ffd52990e8cf0fa403a858&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20db%20disk%2C%20old%20analyzer%2C%202%2F6%29
H-26 unconditionally rejects non-HTTPS URLs returned by an OIDC discovery
document. The check is defense in depth (the primary defense is
<remote_url_allow_hosts>), but it has no escape hatch for operators who
knowingly run their IdP over plain HTTP -- they were forced to abandon
`configuration_endpoint` and hand-wire the trust chain.

Add a per-processor `<allow_http_discovery_urls>` flag, false by default.
When true, the HTTPS scheme check on userinfo_endpoint / introspection_endpoint
/ jwks_uri is suppressed (the <remote_url_allow_hosts> check still runs),
and a warning is logged at processor construction so the gap is visible
in operator logs.

Revert the keycloak integration test back to using `configuration_endpoint`
with the new opt-out, since the test Keycloak speaks HTTP.
L-02 added a cooldown branch in `JWKSClient::getJWKS`: when
`now - last_request_send < refresh_timeout` and no JWKS is cached, throw
"in cooldown after a recent failed fetch" so concurrent waiters do not
re-hammer a failing endpoint. The L-02 change updated `last_request_send`
unconditionally before issuing the HTTP call, so a thrown fetch still
arms the cooldown for subsequent callers.

But `last_request_send` is default-initialized to `time_point{}`, i.e.
the `steady_clock` epoch. On Linux the epoch is roughly system boot time
(and containerized hosts with isolated `CLOCK_MONOTONIC` can have a very
recent epoch), so `now - time_point{}` equals the steady-clock uptime.
When the host uptime is less than `jwks_cache_lifetime` (default 3600s)
the cooldown branch triggers on the very first call -- before any real
fetch has been attempted -- and serves "in cooldown after a recent
failed fetch" on freshly-started CI runners. That is what the
`test_keycloak_auth` flake on the targeted shard shows:

  https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=ec127073b3a4fdb7d1bf3f43978ecbce7173a549&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20targeted%29

Change `last_request_send` to `std::optional<time_point>`; treat
`nullopt` as "no attempt has been made yet" and skip the cooldown check
in that case. Also drop the leftover `high_resolution_clock` in the
write-locked branch (libc++ aliases it to `steady_clock` so it built,
but mixing clock types reads as accidental).
`test_sql_create_jwt_user_with_processor_pin` and
`test_sql_create_jwt_user_with_claims` asserted that the rejected
response did NOT contain the username. ClickHouse's auth-failure body
embeds the username (`Code: 516. DB::Exception: <user>: Authentication
failed: password is incorrect, or there is no user with such name.
(AUTHENTICATION_FAILED)`), so the assertion tripped on the exact case it
was meant to verify. Switch to a positive check on the
`AUTHENTICATION_FAILED` marker.

CI:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1777&sha=495a8cf5f3052eb33a760169cb81bb2b5770d25c&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20targeted%29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya-26.3 port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant