Skip to content

fix: add SPIRE trust bundle ConfigMap to scoped cache#271

Open
ChristianZaccaria wants to merge 2 commits intokagenti:mainfrom
ChristianZaccaria:configmap-cache
Open

fix: add SPIRE trust bundle ConfigMap to scoped cache#271
ChristianZaccaria wants to merge 2 commits intokagenti:mainfrom
ChristianZaccaria:configmap-cache

Conversation

@ChristianZaccaria
Copy link
Copy Markdown
Contributor

@ChristianZaccaria ChristianZaccaria commented Apr 10, 2026

Summary

  • Add the SPIRE trust bundle ConfigMap to the manager's scoped cache via a field selector on metadata.name, conditionally when --require-a2a-signature=true
  • Remove E2E workaround that manually labeled spire-bundle with kagenti.io/defaults=true

Closes #251

Previously

The ConfigMap cache was scoped to kagenti-labeled ConfigMaps only. The SPIRE trust bundle (e.g. spire-bundle) is managed by SPIRE and has no kagenti labels, so mgr.GetClient().Get() returned "not found". Signature verification failed silently and SignatureVerified stayed False. This PR fixes this by adding a field selector entry for the trust bundle ConfigMap to the cache when signature verification is enabled.

Test plan

    1. Pull this branch, build the operator image, push to a registry, and deploy on OpenShift
    1. Add signature verification flags to the manager deployment:
    oc edit deployment kagenti-controller-manager -n kagenti-system
    Add these args:
    - "--require-a2a-signature=true"
    - "--spire-trust-domain=<your-trust-domain>"
    - "--spire-trust-bundle-configmap=spire-bundle"
    - "--spire-trust-bundle-configmap-namespace=<spire-namespace>"
    - "--spire-trust-bundle-configmap-key=bundle.crt"
    1. Verify the operator starts without "ConfigMap not found" errors:
    oc logs -n kagenti-system deployment/kagenti-controller-manager \
      | grep -i "trust\|bundle\|signature"
    Expected: "Signature verification enabled" with no trust bundle errors.

Copy link
Copy Markdown
Contributor

@varshaprasad96 varshaprasad96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChristianZaccaria The changes look good!
Since this is a sensitive change - its related to cache, and if we don't end up spinning right set of informers, our controller would just not work and fail silently - can we have some tests to cover this? (E2E or unit using env test would also be good).

Copy link
Copy Markdown
Contributor

@pdettori pdettori left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, well-motivated fix. The SPIRE trust bundle ConfigMap is managed by SPIRE (no kagenti labels), so using a field selector on metadata.name is the correct approach. Code is well-commented, the conditional guard is correct, and the E2E workaround removal is clean.

Areas reviewed: Go (cmd/main.go), E2E test utilities, CI
Commits: 1 commit, signed-off: yes
CI status: E2E failing — pre-existing flaky metrics test already fixed on main in PR #267. Rebase should resolve it.

controller.LabelNamespaceDefaults: "true",
}),
},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Namespace collision edge case — if spireTrustBundleConfigMapNS equals controller.ClusterDefaultsNamespace (typically kagenti-system), this map assignment overwrites the existing label-selector entry, losing cache visibility of kagenti-platform-config and kagenti-feature-gates.

Unlikely in practice (SPIRE is usually in spire-system), but a defensive check or at minimum a comment documenting the assumption would prevent a subtle silent failure:

if spireTrustBundleConfigMapNS == controller.ClusterDefaultsNamespace {
    // Log a warning or merge selectors
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Paolo, great suggestion and a real edge case.

I've added a defensive check: if spireTrustBundleConfigMapNS collides with an existing cache-scoped namespace (e.g. kagenti-system), we skip the SPIRE entry and log an error explaining that the trust bundle won't be cached and signature verification may fail. The operator stays operational, and the cluster defaults remain visible, and the admin gets a clear message to use a different namespace for the trust bundle ConfigMap.

controller.LabelNamespaceDefaults: "true",
}),
},
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: +1 to @varshaprasad96's request for tests. This is a sensitive cache-scoping change with a silent failure mode (Get returns "not found" if the informer doesn't cover the resource). A unit test using envtest that:

  1. Creates the SPIRE trust bundle ConfigMap
  2. Verifies it's visible through the manager's cache when requireA2ASignature=true
  3. Verifies existing kagenti ConfigMaps remain visible

would catch regressions and document the expected behavior.

By("labeling spire-bundle configmap for controller cache visibility")
cmd = exec.Command("kubectl", "label", "--overwrite", "configmap", "spire-bundle",
"-n", "spire-system", "kagenti.io/defaults=true")
_, err := Run(cmd)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Clean removal of the workaround — the operator now handles cache visibility properly via field selector, so the manual labeling hack is no longer needed.

@ChristianZaccaria
Copy link
Copy Markdown
Contributor Author

@ChristianZaccaria The changes look good! Since this is a sensitive change - its related to cache, and if we don't end up spinning right set of informers, our controller would just not work and fail silently - can we have some tests to cover this? (E2E or unit using env test would also be good).

Thank you for reviewing Varsha, and the great suggestion. I think having tests to protect this code is vital too.

I've Added both unit tests and an envtest-based test. The unit tests validate the cache config building logic, correct entries for each flag combination, and the namespace collision edge case mentioned by Paolo. The envtest test starts a real ctrl.NewManager with the scoped cache against a local kube-apiserver and verifies that the SPIRE trust bundle, cluster defaults, and namespace defaults are all visible through the cached client, while unrelated ConfigMaps in the same namespace are correctly filtered out.

Both tests run automatically in CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: ConfigMap cache scope prevents operator from reading SPIRE trust bundle

3 participants