celery · MicahLyle · May 10, 2021 · Jun 6, 2021 · Jun 6, 2021 · Jun 6, 2021
diff --git a/draft/jumpstarter.rst b/draft/jumpstarter.rst
@@ -64,6 +64,115 @@ the other features of Python's ``async/await`` programming model.
 Specification
 =============
 
+Jumpstarter is a Python implementation of an `Actor System`_ (which utilizies the `Actor Model`_). There
+are three fundamental axioms within the actor model (quoting the previous Wikipedia link): 
+
+(Start Blockquote)
+
+An actor is a computational entity that, in response to a message it receives, can _concurrently_ (emphasis ours):
+
+1. Send a finite number of messages to other actors;
+2. Create a finite number of new actors;
+3. Designate the behavior to be used for the next message it receives.
+
+(End Blockquote)
+
+It's important to remember that, although that is the technical definition of the actor, the interpretation and implementation of Actors and Actor Systems can be very flexible. Namely, what constitutes a "message" and "state" is very much up to the interpretation of the developer(s) and the system(s) they're using.
+
+In Jumpstarter, we've chosen to take direct/literal approach to 3., modeling the state of an Actor using an actual state machine abstraction, namely a `Hierarchical State Machine`_. The difference between a standard State Machine and a Hierarchical State Machine is that a standard State Machine is consistent from states and transitions between them, but in an Hierarchical State Machine, states can also have their own sub-state machines. Hierarchical State Machines help both tame the complexity of large (non-hierarchical) state machines and more clearly model the relationships and transitions between them. To give an example with Jumpstarter, we propose only a small number of parent states:
+
+* Initializing --> The initial state of the Actor when created.
+* Initialized --> he state of the actor when we start it for the very first time.
+* Starting --> The state of the actor immediately after calling ``actor.start()``. We'll have to transition through a number of substates of ``starting`` first (like starting dependencies, acquiring resources, and starting tasks), which we'll explain in more detail below (think of this like powering on a computer. You typically have to wait a few seconds for the computer to set up its internal state nicely before its fully operational. It also needs to connect to internal and external devices, and be ready for operation, etc.).
+* Stopping --> The state of the actor immediately after calling ``actor.stop()``. We'll have to transition through a number of substates of ``stopping`` first (like stopping tasks, releasing resources, and stopping dependencies), which we'll explain in more detail below (think of this like powering off a computer. You typically have to wait a few seconds for the computer to clean up its internal state nicely before it can fully shut down).
+* Stopped --> The state of the actor after it has finished all of its ``stopping`` activities (think about how when you power off a computer).
+* Crashed --> The state of the actor when an exception was raised during startup or shutdown.
+
+Within those parent states, we have sub-states. For example:
+
+* Starting
+  * Dependencies Started --> The state of the actor after all of the actor's dependencies have been started.
+  * Resources Acquired --> The state of the actor after all resources have been acquired.
+  * Tasks Started --> The state of the actor after all tasks have been started.
+* Started
+  * Paused --> The state of the actor when all tasks are halted without shutting down the entire actor.
+  * Running --> The state of the actor when all tasks are running.
+    * Healthy --> The state of the actor when the actor is functioning properly.
+    * Degraded --> The state of the actor when the actor is not functioning properly but is still able to perform some of its duties.
+    * Unhealthy --> The state of the actor when the actor is temporarily not functioning.
+* Stopping
+  * Tasks Stopped --> The state of the actor after all tasks have been started.
+  * Resources Released --> The state of the actor after all resources have been acquired.
+  * Dependencies Stopped --> The state of the actor after all of the actor's dependencies have been started.
+
+In order to effectively model these states in Python, we propose using the mature `transitions`_ library, along with the `transitions-anyio`_ library. This gives us:
+
+1. Mature Hierarchical State Machine library support thanks to `transitions`_.
+2. Asynchronous state machine transitions (opening up abilities for concurrency, parallelization, and the latest ``async/await`` python support that's part of the motivation of this CEP in the first place) with `AnyIO`_ (thanks to `transitions-anyio`_) to abstract away the specific event loop of choice (like `AsyncIO`_, `Trio`_, or potentially others in the future).
+3. Native support within `transitions`_ for integrating with ``diagrams``/``graphviz`` to generate state machine diagrams (like the one below). Additionally, `transitions-gui`_ provides some interesting and promising capabilities for future Celery Flower-like projects to be able to visualize in a live, animated fashion the various Jumpstarter Actors and their states as transitions happen across all the various actors within the system.
+
+For a high level view, the parent states, their substates, and the transitions between them can be seen in the diagram below:
+
+TODO: Insert Jumpstarter State Machine Diagram Here: https://user-images.githubusercontent.com/48936/107506089-43225a00-6ba6-11eb-810e-0ac14bf0e1e9.png
+
+Also, in that diagram you can also see the ``Restart`` state. We propose a separate state machine which we'll call *Actor Restart State Machine* that models the Actor's state as it relates to restarts:
+
+* Ignore --> A special state which is ignored by the Actor (effectively meaning we're not in any sort of restart state).
+* Restarting --> The state of the actor once it has begun restarting.
+  * Stopping --> The state of the actor while stopping during a restart.
+  * Starting --> The state of the actor while starting during a restart.
+* Restarted --> The state of the actor after it has been restarted.
+
+With these states and sub-states, for both the main state machine and the regular state machine, we provide a clear public API for code to hook into any part of the Actor's Lifecycle. Similar to how, for example, modern asynchronous frontend web frameworks like React and Vue provide hooks into the lifecycle of their components, `transitions`_ provides many different hooks to:
+
+* Have code run before a transition occurs or a state is entered, or conditionally block a transition from happening if certain conditions aren't met.
+* Have code run after a transition occurs (we could use this to, for example, fan out a result right before some hypothetical state ``"task_completed"`` is exited).
+* Do many other things at various granularities and moments. See https://github.com/pytransitions/transitions#callback-execution-order for specific details on the order and timing of when specific callbacks are invoked.
+
+With that base API, Jumpstarter provides a solid foundation and a lot of flexibility to help define self-contained pieces of business logic and facilitate communication between them while maintaining a separation of concerns.
+
+Three abstractions Jumpstarter provides that are addressed in both the ``starting`` and ``stopping`` states are:
+
+1. Dependencies
+2. Resources
+3. Tasks
+
+Dependencies
+------------
+Actors may depend on other actors to run before starting themselves. In some cases, they must depend on another actor if an actor consumes messages from another actor's stream. In `Actor System`_ language, that means that the dependent actor is a parent actor, and the actor consuming messages from the parent actor is the child actor. Just the fact of depending on another actor means that messages will be passed from the parent actor to the child actor (the child actor can also have a way to pass messages back to the parent, but that's out of the scope of this CEP and is something that may be explored as the implementation of Producers and Consumers is more refined).
+
+The proposed public API is as follows:
+
+```
+from jumpstarter import Actor, depends_on
+
+class AccountBalanceActor(Actor):
+  def __init__(self, user_id: int):
+    self.user_id = user_id
+
+class AccountBookkeepingActor(Actor):
+  def __init__(self, user_id: int, account_balance_actor: AccountBalanceActor):
+    self._account_balance_actor = account_balance_actor
+
+  @depends_on
+  def account_balance_actor(self):
+    return account_balance_actor
+```
+
+In this example, the ``AccountBalanceActor`` maintains the balance in a single user ID's account. The ``AccountBookkeepingActor`` is responsible for logging and auditing withdrawals and income, possibly passing these audit logs to another actor responsible for detecting fraud.
+
+Resources
+---------
+Actors have resources they manage during their lifetime, such as:
+* Connections to databases and message brokers
+* File Handles
+* Synchronization Mechanisms (useful for short-lived actors)
+
+A resource can be an asynchronous context manager or a synchronous context manager. It's entered whenever the Actor is ``starting``, specifically just before the state machine transitions to the ``starting -> resources_acquired`` state.
+It is exited whenever the Actor is stopping, specifically just before the state machine transitions to the ``starting -> resources_released`` state. Given the asynchronous nature of Jumpstarter, resources can be released concurrently (even if there are synchronous resource releases that are run, say, in a thread pool). Additionally, any and every actor, once resources are acquired, will be have `cancel scope`_ (acquired once ``starting -> resources_acquired`` state has been entered) in the that can be used to shut down the worker or cancel any running task(s), whether because of a timeout, a crash, a restart, or some other reason. Even if the task is run in a thread pool, the `cancel_scope` and fact that the Jumpstarter is running in an event loop means that more robust cancellation of tasks may be possible in future versions of Celery than have been up to this point (see https://vorpus.org/blog/timeouts-and-cancellation-for-humans/ for some helpful background on this).
+
+
+
 Motivation
 ==========
 
@@ -189,7 +298,13 @@ CC0 1.0 Universal license (https://creativecommons.org/publicdomain/zero/1.0/dee
 .. Cell https://github.com/celery/cell
 .. Thespian https://github.com/thespianpy/Thespian
 .. Pulsar https://github.com/quantmind/pulsar
-.. Asyncio https://docs.python.org/3/library/asyncio.html
+.. AsyncIO https://docs.python.org/3/library/asyncio.html
 .. Curio https://github.com/dabeaz/curio
 .. Trio https://github.com/python-trio/trio
 .. Trio-Asyncio https://github.com/python-trio/trio-asyncio
+.. Hierarchical State Machine https://www.eventhelix.com/design-patterns/hierarchical-state-machine/
+.. transitions https://github.com/pytransitions/transitions
+.. transitions-anyio https://github.com/pytransitions/transitions-anyio
+.. transitions-gui https://github.com/pytransitions/transitions-gui
+.. AnyIO https://github.com/agronholm/anyio
+.. cancel scope https://anyio.readthedocs.io/en/stable/api.html#anyio.CancelScope