feat: support prechecking down peers before restarting tikv pod#6877
feat: support prechecking down peers before restarting tikv pod#6877liubog2008 wants to merge 4 commits intopingcap:mainfrom
Conversation
Signed-off-by: liubo02 <liubo02@pingcap.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #6877 +/- ##
==========================================
+ Coverage 37.44% 37.61% +0.17%
==========================================
Files 392 392
Lines 22432 22483 +51
==========================================
+ Hits 8399 8458 +59
+ Misses 14033 14025 -8
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR enhances TiKV pod restart safety by introducing PD-based prechecks (down-peer regions and leader eviction) before allowing TiKV pod recreation, and refactors leader-eviction condition syncing into the eviction task flow.
Changes:
- Add a PD API client method and types for querying regions with down peers (
/pd/api/v1/regions/check/down-peer). - Gate TiKV pod recreation on (a) zero non-self down peers and (b) leaders being evicted, and trigger leader-eviction scheduling when needed.
- Refactor syncing of
TiKVCondLeadersEvictedfrom the status task into the leader-eviction task, with updated/added unit tests.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/pdapi/v1/types.go | Adds PD response types for down-peer region checks. |
| pkg/pdapi/v1/client.go | Adds GetDownPeerRegions PD client call and endpoint constant. |
| pkg/pdapi/v1/mock_generated.go | Updates PD client mock to include GetDownPeerRegions. |
| pkg/pdapi/v1/client_test.go | Adds unit test coverage for GetDownPeerRegions. |
| pkg/controllers/tikv/tasks/util.go | Adds helper checks for leader-eviction status/timeout; fixes VolumeName import aliasing. |
| pkg/controllers/tikv/tasks/pod.go | Adds restart prechecks (down peers + leaders evicted) and wires PD client usage into restart flow. |
| pkg/controllers/tikv/tasks/pod_test.go | Extends pod task tests to cover down-peer filtering and leader-eviction gating behavior. |
| pkg/controllers/tikv/tasks/evict_leader.go | Changes eviction scheduler management based on ShouldEvictLeader and syncs LeadersEvicted condition here. |
| pkg/controllers/tikv/tasks/evict_leader_test.go | Adds tests for starting/stopping leader eviction scheduler behavior. |
| pkg/controllers/tikv/tasks/offline.go | Switches offline flow to use the new leader-eviction check helper and ShouldEvictLeader. |
| pkg/controllers/tikv/tasks/status.go | Removes leader-eviction condition syncing and related wait behavior from status task. |
| pkg/controllers/tikv/tasks/status_test.go | Updates expectations after removing leader-eviction condition management from status task. |
| pkg/controllers/tikv/tasks/ctx.go | Minor formatting/structure adjustments; no functional change observed. |
| pkg/controllers/tikv/builder.go | Updates runner wiring to pass PD client manager into TaskPod. |
| api/core/v1alpha1/tikv_types.go | Adds ReasonStoreNotExist and deprecates ReasonStoreIsRemoved. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return task.Wait().With("cannot recreate pod, check down peer: %v", err) | ||
| } | ||
|
|
||
| if err := CheckTiKVLeadersEvicted(state.TiKV()); err != nil { | ||
| return task.Wait().With("cannot recreate pod, check leader count: %v", err) |
| func countNonSelfDownPeers(downPeerInfo *pdapi.RegionsCheckInfo, store *pdv1.Store) int { | ||
| if store == nil || store.ID == "" { | ||
| return downPeerInfo.Count | ||
| } | ||
| if downPeerInfo.Count == 0 { | ||
| return 0 | ||
| } | ||
|
|
||
| nonSelfDownPeerCount := 0 | ||
| for _, region := range downPeerInfo.Regions { |
| case !state.PDSynced: | ||
| return task.Wait().With("pd is unsynced") | ||
| case state.Store == nil: | ||
| if state.Store == nil { |
| pc, ok := state.GetPDClient(cm) | ||
| if !ok { | ||
| return task.Wait().With("wait if pd client is not registered") | ||
| } | ||
|
|
Signed-off-by: liubo02 <liubo02@pingcap.com>
Uh oh!
There was an error while loading. Please reload this page.