feat(llm): introduce lightweight circuit breaker to prevent rate-limit bans and resource exhaustion#2095
Conversation
|
@ag9920 thanks for the contribution. Please add a unit test and configure support for this new feature. |
fe14c65 to
91f74cb
Compare
|
@WillemJiang hi, I just update the unit test, please take a look |
There was a problem hiding this comment.
Pull request overview
Introduces a lightweight circuit breaker inside LLMErrorHandlingMiddleware to fast-fail repeated transient LLM provider failures, reducing retry-loop hangs and avoiding repeated calls during outages/rate-limits.
Changes:
- Add circuit breaker state/config to
LLMErrorHandlingMiddlewareand fast-fail when OPEN. - Record successes/failures to reset or trip the circuit and add a user-facing circuit-breaker message.
- Add sync + async unit tests covering circuit breaker trip/open/half-open/recovery behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| backend/packages/harness/deerflow/agents/middlewares/llm_error_handling_middleware.py | Adds circuit breaker state, open-window logic, and fast-fail path around model calls. |
| backend/tests/test_llm_error_handling_middleware.py | Adds circuit breaker tests for sync/async execution and non-retriable error handling. |
|
@ag9920, thanks for your contribution. Please fix the lint error and address the review comments from Copilot. |
464d2be to
7bba514
Compare
|
@ag9920 two comments need to be addressed. |
|
@WillemJiang I have reviewed all the comments generated by Copilot. Actually, the latest commit 7bba514 has already addressed all of these concerns:
|
|
@ag9920 Here are some comments for the code
When the circuit is in half_open and the probe request raises GraphBubbleUp, neither _record_success() nor _record_failure() is called. This leaves _circuit_probe_in_flight = True permanently. Code path (llm_error_handling_middleware.py:210-215 and :252-257): Result: On every subsequent call, _check_circuit() sees state == half_open and probe_in_flight == True → returns True (fast-fail). The circuit is deadlocked in half_open — it will never recover because no probe Fix: Reset probe_in_flight in the GraphBubbleUp handler, or refactor so the probe flag is managed outside the try/except:
|
f7ea2e8 to
5d828af
Compare
|
@ag9920 Please fix the lint error. |
…t bans and resource exhaustion
5d828af to
c6d83b8
Compare
|
@WillemJiang Thanks for the review! I've addressed all the concerns:
|
Port of bytedance/deer-flow#2095. Thread-safe circuit breaker (closed/open/half_open) on LLMErrorHandlingMiddleware. After 5 consecutive failures, fast-fails for 60s before probing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t bans and resource exhaustion (bytedance#2095)
🤔 What's the problem this PR solves?
Currently,
LLMErrorHandlingMiddlewareimplements a robust retry mechanism with exponential backoff. However, when an LLM provider experiences a hard outage (e.g., persistent 502/503 errors) or when the user's IP/account is heavily rate-limited, the system will still blindly attempt to send requests for every new interaction.For self-hosted or single-tenant deployments, this causes two major issues:
🛠️ What's the proposed solution?
This PR introduces a minimal, dependency-free Circuit Breaker pattern directly into the middleware.
Ncomplete model calls (default 5, evaluated after internal retries are exhausted), the circuit trips toOPEN. Subsequent requests within the recovery window (default 60s) are immediately rejected with a graceful error message, bypassing the network and retry loop entirely.HALF-OPENstate and allows exactly one probe request using an explicit in-flight flag. Other concurrent requests will fast-fail, preventing a "thundering herd" effect on the struggling provider. If the probe succeeds, the circuit closes; if it fails, the circuit re-opens and extends the window.pybreakeror wrappers) were added. It uses a thread-safe state machine, and error logs are optimized to only print during state transitions, avoiding log spam during sustained outages.📊 Why is this necessary for self-hosted users?
Even for personal deployments, users pay per request or have strict rate limits. When the upstream API is down, failing fast saves the user from waiting 10+ seconds per interaction just to see a timeout error, and protects their API keys from being penalized for aggressive polling.