Skip to content

OCPBUGS-81476: Fix timeout in PinnedImages GC test#30962

Open
isabella-janssen wants to merge 1 commit intoopenshift:mainfrom
isabella-janssen:ocpbugs-81476
Open

OCPBUGS-81476: Fix timeout in PinnedImages GC test#30962
isabella-janssen wants to merge 1 commit intoopenshift:mainfrom
isabella-janssen:ocpbugs-81476

Conversation

@isabella-janssen
Copy link
Copy Markdown
Member

@isabella-janssen isabella-janssen commented Apr 6, 2026

This increases the timeout for the process of a node joining an MCP to prevent MCP degrades.

@openshift-ci-robot
Copy link
Copy Markdown

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 6, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 6, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Apr 6, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This pull request references Jira Issue OCPBUGS-81476, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This fixes a race condition in the "All Nodes in a custom Pool should have the PinnedImages even after Garbage Collection" test that caused nodes to get stuck in degraded state with missing MachineConfig.

The Problem:
The test was using defers in the wrong order, causing cleanup to happen like this:

  1. Delete KubeletConfig
  2. Delete PinnedImageSet (triggers rendered-custom deletion)
  3. Unlabel node (triggers transition to worker pool)
  4. Wait for worker config

When step 3 triggered the transition, the node would reboot to apply the worker config. However, because the rendered-custom config was already deleted in step 2, the node would come back up with a reference to a non-existent config on disk and get stuck in degraded state:

currentConfig: rendered-custom-d356ed29481f2de2bb31c6443e1d29ca
desiredConfig: rendered-worker-82faad7319f9e10715adbfd98a4b67ba
state: Degraded
reason: "machineconfig 'rendered-custom-d356ed29481f2de2bb31c6443e1d29ca' not found"

The Fix:
Changed cleanup order to:

  1. Unlabel node (triggers transition)
  2. Wait for worker config transition to complete
  3. Delete KubeletConfig
  4. Delete PinnedImageSet

This ensures the node successfully transitions back to the worker pool BEFORE we delete any configs, eliminating the race condition.

Changes:

  • Removed defers for unlabelNode, waitTillNodeReadyWithConfig, deletePIS, and deleteKC
  • Added explicit cleanup after GCPISTest completes that performs operations in the correct order
  • Added logging to track cleanup progress
  • Removed defer deleteKC from GCPISTest function

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 6, 2026

Walkthrough

Timeout increased from 5 minutes to 10 minutes in the waitTillNodeReadyWithConfig function within machine config test file, affecting node readiness polling behavior.

Changes

Cohort / File(s) Summary
Test Timeout Configuration
test/extended/machine_config/pinnedimages.go
Increased timeout duration for node readiness polling from 5 to 10 minutes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 8 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Test Structure And Quality ⚠️ Warning Documentation comment states 5 minutes but implementation uses 10-minute timeout, creating a documentation-code mismatch. Update documentation comment at lines 601-603 from '5 minutes' to '10 minutes' to align with actual implementation.
Ipv6 And Disconnected Network Test Compatibility ⚠️ Warning Six new Ginkgo e2e tests pull images from quay.io without Disconnected skip markers or IPv6 handling, causing failures on disconnected non-metal or IPv6-only clusters. Add [Skipped:Disconnected] to all 6 test names or update logic to use internal registry, and wrap external pulls in InIPv4ClusterContext() checks.
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: increasing a timeout in the PinnedImages GC test to fix a related issue (OCPBUGS-81476).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names ✅ Passed All test names are stable, descriptive static strings with no dynamic values, generated identifiers, timestamps, or node names. The PR timeout change affects implementation, not test names.
Microshift Test Compatibility ✅ Passed This PR only adjusts an existing timeout value from 5 to 10 minutes and does not add any new Ginkgo e2e tests.
Single Node Openshift (Sno) Test Compatibility ✅ Passed PR only modifies timeout in existing helper function; no new Ginkgo e2e tests added, so custom check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only test code adjusting a timeout value; no topology-aware scheduling constraints introduced.
Ote Binary Stdout Contract ✅ Passed The change modifies a timeout value in a test helper function with no stdout operations or process-level code violations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 6, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 6, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/d7be16f0-31ca-11f1-9d47-a6fb2a91cc22-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 6, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This pull request references Jira Issue OCPBUGS-81476, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Apr 6, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 9, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/874ddc00-3454-11f1-9703-469c6b3fb240-0

@isabella-janssen isabella-janssen changed the title OCPBUGS-81476: Fix race condition in PinnedImages GC test OCPBUGS-81476: Fix timeout in PinnedImages GC test Apr 10, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This pull request references Jira Issue OCPBUGS-81476, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

This increases the timeout for the process of a node joining an MCP to prevent MCP degrades.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4728db40-34de-11f1-96d4-ab4c33428563-0

@isabella-janssen
Copy link
Copy Markdown
Member Author

/payload-aggregate periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive 5

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 10, 2026

@isabella-janssen: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

  • periodic-ci-openshift-machine-config-operator-release-4.22-periodics-e2e-gcp-mco-disruptive

See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f71abeb0-3510-11f1-84e8-8eb280e2392d-0

@isabella-janssen isabella-janssen marked this pull request as ready for review April 13, 2026 14:23
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 13, 2026
@openshift-ci openshift-ci bot requested review from pablintino and umohnani8 April 13, 2026 14:23
@openshift-ci-robot openshift-ci-robot removed the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This pull request references Jira Issue OCPBUGS-81476, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "4.23.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

This increases the timeout for the process of a node joining an MCP to prevent MCP degrades.

Summary by CodeRabbit

  • Tests
  • Updated test timing parameters to improve reliability of node configuration state verification during testing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This PR has been marked to be verified later by @isabella-janssen.

Details

In response to this:

/verified later @isabella-janssen

The best way to make sure this remediates the concerns outlined in OCPBUGS-81476 is to check the test's pass rate after this fix is included in nightlies.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@isabella-janssen
Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. and removed jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 13, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This pull request references Jira Issue OCPBUGS-81476, which is invalid:

  • expected the bug to target either version "4.22." or "openshift-4.22.", but it targets "5.0.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@isabella-janssen
Copy link
Copy Markdown
Member Author

/retest-required

1 similar comment
@isabella-janssen
Copy link
Copy Markdown
Member Author

/retest-required

@isabella-janssen
Copy link
Copy Markdown
Member Author

/jira refresh

@openshift-ci-robot openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Apr 14, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@isabella-janssen: This pull request references Jira Issue OCPBUGS-81476, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Apr 14, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/retest-required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 9df27cd and 2 for PR HEAD 51e6fb4 in total

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 53cc9e6 and 1 for PR HEAD 51e6fb4 in total

@isabella-janssen
Copy link
Copy Markdown
Member Author

/retest-required

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD d7ad0db and 0 for PR HEAD 51e6fb4 in total

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/hold

Revision 51e6fb4 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2026
@openshift-trt
Copy link
Copy Markdown

openshift-trt bot commented Apr 16, 2026

Job Failure Risk Analysis for sha: 51e6fb4

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi Low
[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]
This test has passed 0.00% of 14 runs on release 5.0 [Architecture:amd64 FeatureSet:default Installer:upi JobTier:standard Network:ovn NetworkStack:ipv4 OS:rhcos9 Owner:eng Platform:vsphere Procedure:none SecurityMode:default Topology:ha Upgrade:none] in the last week.

@isabella-janssen
Copy link
Copy Markdown
Member Author

/unhold

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 17, 2026
@isabella-janssen
Copy link
Copy Markdown
Member Author

/test e2e-vsphere-ovn-upi

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 104e20a and 2 for PR HEAD 51e6fb4 in total

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD dcd145e and 1 for PR HEAD 51e6fb4 in total

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD a3ffcaf and 0 for PR HEAD 51e6fb4 in total

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Apr 18, 2026

@isabella-janssen: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-vsphere-ovn-upi 51e6fb4 link true /test e2e-vsphere-ovn-upi

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-trt
Copy link
Copy Markdown

openshift-trt bot commented Apr 18, 2026

Job Failure Risk Analysis for sha: 51e6fb4

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi Low
[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Suite:openshift/conformance/parallel]
This test has passed 0.00% of 13 runs on release 5.0 [Architecture:amd64 FeatureSet:default Installer:upi JobTier:standard Network:ovn NetworkStack:ipv4 OS:rhcos9 Owner:eng Platform:vsphere Procedure:none SecurityMode:default Topology:ha Upgrade:none] in the last week.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/hold

Revision 51e6fb4 was retested 3 times: holding

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria verified-later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants