Skip to content

UID2-7063: Emit salts_rotated_last_cycle Prometheus gauge#636

Closed
swibi-ttd wants to merge 1 commit into
mainfrom
swi-UID2-7063
Closed

UID2-7063: Emit salts_rotated_last_cycle Prometheus gauge#636
swibi-ttd wants to merge 1 commit into
mainfrom
swi-UID2-7063

Conversation

@swibi-ttd
Copy link
Copy Markdown
Contributor

Summary

  • Adds uid2_salts_rotated_last_cycle, a Micrometer gauge emitted by uid2-admin that records the count of salts rotated in the most recent successful salt rotation cycle.
  • The gauge reports NaN until the first rotation completes, so any equality-based alert built on it does not fire on cold start.
  • Wired up in Main.setupMetrics via SaltRotationMetrics.register(Metrics.globalRegistry), alongside the existing DataStoreMetrics registrations. SaltRotation.rotateSalts calls SaltRotationMetrics.recordRotated(saltsToRotate.size()) after a successful rotation; the metric is not touched when rotateSalts returns noSnapshot or when rotateSaltsZero runs (which rotates no salts by design).

Why

Ticket: UID2-7063.

The existing P*.{PROD,INTEG}.Admin.Salt.UnexpectedSaltRotationVolume alerts key off a Loki log query (salt_count_type=rotated-salts). That has produced false positives when log ingestion glitches (Alloy re-streams, Loki push failures). A Prometheus gauge is sampled at scrape time and immune to that class of issue.

Follow-ups (not in this PR)

  • The companion alert change is in uid2-monitoring-configuration#swi-UID2-7063. That PR should be merged after this one is deployed through integ → prod, so the metric exists by the time the alert switches to query it.
  • The salt-rotation Grafana dashboard (c58e015d-b1f3-4a66-8b32-0bef4149540e) is managed in the Grafana UI, not as JSON in the monitoring repo. Adding a panel for the new metric needs to happen there (UID2 grafana.net, which auto-syncs to EUID).

Test plan

  • mvn test -Dtest=SaltRotationTest — 37 tests pass (34 existing + 3 new)
  • mvn test -Dtest='SaltRotationTest,SaltServiceTest,SaltSerializerTest,EncryptedSaltStoreWriterTest' — 51 tests pass
  • mvn compile clean
  • After merge + integ deploy: verify uid2_salts_rotated_last_cycle{env="integ",application="uid2-admin"} appears in Grafana Cloud and matches the count in the salt_count_type=rotated-salts log on the next rotation cycle
  • After integ verification: promote to prod

Adds a Micrometer gauge `uid2_salts_rotated_last_cycle` that records the
count of salts rotated in the most recent successful salt rotation
cycle. The gauge reports NaN until the first rotation completes so the
volume alert does not fire on cold start.

Lets the salt-rotation volume alert key off a Prometheus metric instead
of Loki log scraping, which has produced false positives when log
ingestion glitches (Alloy re-streams, Loki push failures).
@swibi-ttd swibi-ttd closed this May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant