Fix unbounded attachable growth when monitoring actors exit#13
Open
Fix unbounded attachable growth when monitoring actors exit#13
Conversation
This reverts commit 78bb2bd.
When a detached actor reaches its max throughput limit in resume(), it was unconditionally calling sched->delay() to reschedule itself. This causes detached actors to run on the shared scheduler thread pool instead of their dedicated private thread, leading to thread starvation and potential deadlocks when many detached actors compete for the limited worker threads. Fix this by checking for a private_thread_ first and resuming on it when available, falling back to sched->delay() only for non-detached actors.
When actor A calls monitor(B, fn), do_monitor previously attached a functor_attachable (token::anonymous) to B's attachable list capturing A's address. Two problems followed: 1. B's list grew without bound: each monitoring actor that died before B left a stale entry that could only be cleared when B itself exited. 2. A's actor_control_block was kept alive by the actor_addr capture in the functor_attachable even after A's strong count hit zero. Root cause: functor_attachable uses token::anonymous, so detach() cannot find or remove it, and dispose() on the returned disposable only marked the action as cancelled without touching B's list. Fix: - Add token::monitor (= 3) to attachable::token. - Introduce detail::monitor_attachable, a detachable attachable with a token keyed on the abstract_monitor_action pointer. It replaces the anonymous functor_attachable in do_monitor. - Introduce monitor_disposable_impl: calling dispose() now both cancels the monitor_action and calls detach() on the monitored actor via a weak_actor_ptr, removing the entry from B's list immediately. - scheduled_actor tracks active monitors in active_monitors_. On on_cleanup() all are disposed automatically, so the monitoring actor's exit eagerly unregisters it from every actor it was watching. update_watched_disposables() prunes fired entries after normal delivery.
jachris
added a commit
to tenzir/tenzir
that referenced
this pull request
Apr 7, 2026
Points to tenzir/actor-framework#13, which fixes two problems in scheduled_actor::do_monitor: - B's attachable list grew without bound when many short-lived actors monitored B and exited before it, because functor_attachable (token::anonymous) cannot be removed via detach(). - A's actor_control_block was kept alive by the actor_addr capture in the stale attachable. The fix introduces a detachable monitor_attachable and a monitor_disposable_impl whose dispose() removes the entry from the monitored actor immediately. scheduled_actor now tracks active monitors in active_monitors_ and disposes them all in on_cleanup().
local_actor::do_monitor (the non-callback, deprecated overload) attached a default_attachable to the monitored actor but never tracked the relationship on the monitoring side. When the monitoring actor exited, its entries were never removed from the monitored actor's attachable list, causing: 1. Unbounded list growth — one stale entry per dead monitoring actor. 2. Control block leak — default_attachable::observer_ held a weak ref to the monitoring actor's actor_control_block until the monitored actor itself exited. The default_attachable already carries a matchable observe_token, so do_demonitor() can remove it correctly. The only missing piece was tracking and cleanup. Fix: record each monitored actor as a weak_actor_ptr in local_actor::monitored_actors_. In on_cleanup(), lock each entry and call do_demonitor() to detach the stale default_attachable before the actor is fully torn down. Entries that are already dead (lock returns null) are skipped safely.
jachris
added a commit
to tenzir/tenzir
that referenced
this pull request
Apr 7, 2026
Picks up tenzir/actor-framework#13, which fixes unbounded attachable list growth and actor_control_block leaks in both monitor() code paths: - Legacy monitor(whom): local_actor now tracks monitored actors and calls do_demonitor() for each in on_cleanup(), removing the stale default_attachable immediately on actor exit. - Callback monitor(whom, fn): new monitor_attachable with a detachable token replaces the anonymous functor_attachable; dispose() now removes the entry from the monitored actor's list. Both paths were triggered by tenzir::async_mail, which creates a temporary actor_companion per request that monitors the long-lived receiver via the legacy API.
tobim
requested changes
Apr 8, 2026
| if (ptr != nullptr) { | ||
| ptr->attach( | ||
| default_attachable::make_monitor(ptr->address(), address(), priority)); | ||
| monitored_actors_.emplace_back(ptr->ctrl(), add_ref); |
Member
There was a problem hiding this comment.
weak_actor_ptrs keep the control blocks that are pointed to alive, so we need to remove entries from this map non-lazily. I think that needs to be done in do_demonitor(), but also when a down_msg (+ node_down_msg?) is processed in scheduled actors. For completeness we might also want to do the same in blocking actors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two separate code paths in CAF's monitor implementation had the same fundamental flaw: when a monitoring actor exits before the monitored actor, stale entries accumulated in the monitored actor's attachable list and kept the monitoring actor's
actor_control_blockalive indefinitely.New callback API (
monitor(whom, fn)→scheduled_actor::do_monitor): used afunctor_attachablewithtoken::anonymous, whichdetach()can never match. So even callingdispose()on the returned handle couldn't remove the entry. Theactor_addrcapture kept the control block alive.Legacy API (
monitor(whom)→local_actor::do_monitor): useddefault_attachable, which is detachable viado_demonitor()— but nothing ever called it on actor exit. Same unbounded growth and control block leak resulted.Concrete manifestation:
tenzir::async_mailcreates a temporaryactor_companionper request and callscompanion->monitor(receiver)(legacy API) on the long-lived receiver. Each companion exits after receiving its response but left a permanent staledefault_attachablein the receiver's list. Under sustainedasync_mailtraffic against a single actor this caused unbounded memory growth.Solution
New callback API (
scheduled_actor):token::monitor = 3toattachable::token.detail::monitor_attachable: a detachable attachable keyed byabstract_monitor_action*, replacing the anonymousfunctor_attachable.monitor_disposable_impl:dispose()cancels the action and callsdetach()on the monitored actor via aweak_actor_ptr.scheduled_actortracks active monitors inactive_monitors_;on_cleanup()disposes all of them automatically.Legacy API (
local_actor):weak_actor_ptrinlocal_actor::monitored_actors_.on_cleanup(), lock each entry and calldo_demonitor()to detach the staledefault_attachable. Dead entries (lock returns null) are skipped safely.Review
For the new API: start in
detail/monitor_attachable.hpp, thenscheduled_actor.cppformonitor_disposable_impland the rewrittendo_monitor.For the legacy API:
local_actor.cpp—do_monitorandon_cleanupare the only changes, plusmonitored_actors_in the header.Race safety:
actor_exitedis called after the monitored actor releases its mutex (seeabstract_actor::cleanup), so callingdetach()fromon_cleanupcannot deadlock. Concurrent exits are safe:detach()on an already-swept list is a benign no-op;set_reason()on an already-disposed action returns false, preventing double-delivery.