Task Autopilot / 任务自动驾驶平台 for AI agents
Enter a destination, inspect the route, let the system execute what is safe, and take over when human judgment is required.
Cube Pets Office is evolving from a mission-first task operating system into a Task Autopilot platform.
It is not a chat playground where the main artifact is an answer. It is not only a workflow builder where users must manually draw every node. The product direction is to let a user state a goal, then make the task lifecycle visible and controllable:
- understand the intended destination and missing context
- recommend an executable route instead of exposing every low-level node first
- organize a role-based agent fleet around the route
- run work through the existing mission runtime, workflow engine, and executor stack
- surface drive state, logs, artifacts, evidence, audit records, and replay
- pause for clarification, approval, risk acceptance, budget, permission, or delivery review when needed
- replan when the current route is no longer safe, complete, or useful
The current engineering foundation remains mission-first. Task Autopilot is the next product layer above it: mission / workflow / runtime / task continue to be the implementation vocabulary, while Destination / Route / Drive State / Fleet / Takeover become the user-facing vocabulary.
This README intentionally keeps the product story aligned with the codebase and specs that exist today.
What is already present as foundation:
- A mission-first office shell and
/tasksworkbench for launching, monitoring, and reviewing task execution. - A Node + Express + Socket.IO server that coordinates mission state, workflow progress, events, replay, and APIs.
- A Lobster executor service with
mock,native, andrealexecution modes, including Docker-aware local behavior. - Human-in-the-loop control paths such as wait/resume, decision handling, approvals, and manual recovery hooks.
- Review, audit, replay, lineage, evidence, and runtime observability concepts across existing specs and mainline integration.
- A Web-AIGC mainline baseline where
58 / 58specs have been closed and multiple node/route families have been integrated into the server mainline. - A first-phase Task Autopilot specification baseline:
18specs, each withrequirements.md,design.md, andtasks.md.
What is not being claimed:
- The project is not an open-domain L5 fully autonomous operator.
- The system does not promise to complete every complex task without human review.
- High-risk side effects, permission changes, external writes, budget-sensitive actions, and ambiguous goals still require explicit governance and takeover.
- The new product language does not require an immediate large-scale rename of the existing
mission / workflow / runtimecode. - The 18 Task Autopilot specs are a completed first-phase documentation and modeling baseline; their implementation task checklists are intentionally the next body of work.
The previous product center was mission-first:
- A user launches a mission instead of asking for a one-off reply.
- The system tracks workflow stages, runtime state, artifacts, and decisions.
- Replay and audit preserve enough evidence to inspect what happened.
/and/tasksare the high-frequency execution surfaces.
Task Autopilot keeps that foundation and adds a clearer product model:
| Mission-first foundation | Task Autopilot product layer | Meaning |
|---|---|---|
mission |
Destination |
The outcome the user wants to reach |
workflow |
Route |
The planned path toward the destination |
| runtime / phase state | Drive State |
The user-readable state of the task journey |
| agents / skills / nodes / executors | Fleet |
The role-based capability group assembled for the route |
| HITL / decision / approval | Takeover Point |
A moment where the system gives control back to the user |
| retry / revision / reroute | Replan |
A formal route change after risk, failure, or new context |
This is a compatibility-first evolution. The product layer should be implemented through bindings, projections, view models, and server-side aggregation before any deep rename or schema rewrite is considered.
Task Autopilot is organized around a small set of product objects.
| Concept | Product meaning | Current implementation anchor |
|---|---|---|
Destination |
A structured form of the user's intended outcome, including goal, constraints, missing information, success criteria, and expected deliverables. | Mission metadata, mission summary, runtime context, workflow config |
Route |
A recommended executable path with stages, candidate routes, risks, takeover points, expected artifacts, and possible replans. | Workflow definition, workflow instance, route family, workflow phase |
Drive State |
A high-level state machine that explains what the system is doing now. | Mission runtime state, workflow state, node state, wait/resume state, review state |
Fleet |
A role-oriented capability group such as Planner, Clarifier, Researcher, Operator, Generator, Reviewer, Auditor, and Coordinator. | Agents, skills, tools, Web-AIGC nodes, MCP tools, executors, adapters |
Takeover Point |
A user decision point for clarification, route selection, permission, budget, risk acceptance, delivery acceptance, or exception handling. | HITL, MissionDecision, approval, WAITING_INPUT, resume(), escalate() |
Replan |
A route-level change caused by new constraints, lower confidence, elevated risk, failed tools, poor intermediate results, or user override. | Workflow revision, retry/escalate paths, reroute records, runtime events |
Confidence |
The system's confidence in goal understanding, route feasibility, execution completion, and result quality. | Runtime projection, review signals, evidence completeness, UI explanation layer |
Risk |
A structured view of ambiguity, missing data, tool failure, permissions, cost, compliance, external side effects, and result quality. | Runtime governance, audit, permission checks, risk actions, replay evidence |
The main chain is:
Destination -> Route -> Fleet -> Drive State -> Result
Takeover, replan, confidence, risk, evidence, audit, and replay make that chain inspectable rather than a black box.
The Task Autopilot specs define L1-L5 as an execution commitment model, not as marketing shorthand. The repository should not be described as globally L5.
| Level | Meaning | Current positioning |
|---|---|---|
L1 |
Route suggestion level. The system helps interpret the destination and recommend a route, while the user remains in control of execution. | A practical near-term baseline for productization. |
L2 |
Partial automatic execution. Low-risk steps may progress automatically, while key decisions require takeover. | A realistic target for current mission-first + HITL foundations. |
L3 |
Standard task automatic closure. Standardized tasks can mostly complete automatically inside bounded risk, review, audit, and recovery constraints. | A near-term design target for selected, well-governed task families. |
L4 |
High automation inside limited task domains. Requires whitelist policies for task domain, permissions, budget, and evidence. | Future limited-domain direction, not a blanket current claim. |
L5 |
Open-domain full automation. | Research and long-term concept only; not implemented or claimed today. |
The intended implementation model is task-level and phase-level. A mission may start with a target level, then downgrade when it hits risk, missing context, external side effects, or governance boundaries.
The first Task Autopilot phase has completed its spec modeling baseline: 18 specs across 54 markdown files. Each spec has:
requirements.mddesign.mdtasks.md
The current status is documentation-complete for phase 1, not implementation-complete for every task in those specs. The unchecked task lists are the next implementation backlog.
task-autopilot-platform-positioning: defines Task Autopilot as the next product layer above mission-first.task-autopilot-core-concepts: defines Destination, Route, Drive State, Fleet, Takeover, Replan, Confidence, and Risk.task-autopilot-levels-l1-to-l5: defines automation levels and prevents overclaiming open-domain autonomy.destination-model-and-parser: defines how user input becomes a structured destination.route-planner-and-route-model: defines route sets, candidate routes, stages, risks, and takeover points.mission-model-to-autopilot-model-mapping: defines the compatibility bridge frommission / workflow / runtimeto the autopilot product model.
autopilot-cockpit-information-architecture: defines the cockpit IA for destination, route, execution, takeover, evidence, and audit.destination-card-and-goal-summary: defines the destination card and stable goal summary.route-recommendation-and-selection: defines fastest, safest, and deepest route recommendation semantics.fleet-status-and-live-execution-view: defines the live fleet execution view above agents, nodes, executors, logs, and artifacts.takeover-panel-and-decision-points: defines unified takeover experiences for clarification, route confirmation, budget, permission, risk, delivery, and exceptions.drive-state-and-replan-state-machine: defines the high-level drive states and replan semantics.
fleet-organization-and-role-packaging: defines role packaging and maps agents, skills, nodes, tools, MCP, and executors into fleet roles.autopilot-runtime-orchestration: defines how Destination, Route, Fleet, and Takeover bind into Mission Runtime, workflow runtime, decisions, and executor signals.autopilot-explainability-and-telemetry: defines explanations, telemetry signals, confidence, risk, remaining steps, and evidence hints.autopilot-recovery-and-human-takeover-governance: defines recovery, downgrade, escalation, and human takeover governance.autopilot-evidence-replay-and-trust-chain: defines the driving timeline, evidence chain, replay chain, and trust chain.task-autopilot-success-metrics: defines delivery rate, takeover rate, replan rate, deviation rate, completion time, review pass rate, and drill-down evidence.
The next implementation direction is to connect these specs into the running product incrementally: projection objects first, cockpit surfaces second, runtime events and governance third, then replay/audit/metrics closure.
/is the default office cockpit. It brings the task queue, 3D office scene, unified launch surface, and right-side context into one desktop shell./tasksis the full-screen task workbench for focused execution and monitoring./tasks/:taskIdkeeps deep-linked task detail pages available./replay/:missionIdis the replay surface for completed runs and evidence review./debugremains a lower-frequency internal surface for diagnostics and supporting tools.
The current surface strategy is to keep the office cockpit and /tasks as the main operator work areas. Replay, audit, lineage, debug, and lower-level node views remain available without becoming the first thing a user must understand.
At a high level, the repository is organized around four layers:
client/: React 19 + Vite frontend, including the office shell, task workbench, replay views, 3D scene, launch surfaces, and cockpit components.server/: Node.js + Express + Socket.IO backend for missions, workflow state, events, replay, Web-AIGC routes, and APIs.services/lobster-executor/: execution service for mock, native, and real task execution.shared/: contracts and shared types used across frontend, backend, and executor.
The Task Autopilot architecture should be added as a product/projection layer above these foundations:
Product layer: Destination / Route / Drive State / Fleet / Takeover / Evidence
Projection layer: bindings, view models, server aggregation, event normalization
Runtime layer: Mission Runtime / workflow engine / HITL / review / audit / replay
Execution layer: Lobster executor / adapters / tools / Web-AIGC nodes / external services
The runtime architecture SVG is available here:
The Web-AIGC spec delivery baseline is closed at 58 / 58 completed specs and 238 / 238 checked top-level tasks, spanning 52 node specs and 6 platform specs. The project has moved from spec-count tracking into mainline integration, runtime hardening, and governance closure.
This matters for Task Autopilot because the Web-AIGC work supplies much of the lower-level route and fleet substrate:
- Built-in adapters, installed extra adapters, wait/resume control flow, and replay/audit observability are already part of the runtime mainline.
- The main server entry mounts multiple Web-AIGC route families, including MCP, Office/content nodes, search and QA,
transaction_flow,orchestration_recognition_jump, and vector update/delete endpoints. - Runtime coverage includes search/QA adapters, Office/content production nodes such as
ai_ppt,excel_read,dynamic_chart,file_slicing,file_generation, andfile_translation, plus governed execution paths such astransaction_flowandorchestration_recognition_jump.
Task Autopilot should not expose all of those nodes as the primary product mental model. It should package them into route stages, fleet roles, takeover points, and evidence trails.
For dated status snapshots and integration planning, see the steering docs linked in the documentation section below.
The repo currently has three practical runtime targets:
| Environment | Frontend | Server | Executor behavior |
|---|---|---|---|
| GitHub Pages preview | Yes | No | Browser-only preview runtime |
| Local with Docker available | Yes | Yes | real executor mode |
| Local without Docker | Yes | Yes | native executor mode fallback |
Important boundaries:
- GitHub Pages is a static preview target. It does not include the Node server or Lobster Executor.
pnpm run dev:allprefersrealexecution and automatically falls back tonativewhen Docker is unavailable.- If you explicitly set
LOBSTER_EXECUTION_MODE=mockorLOBSTER_EXECUTION_MODE=native, that choice is preserved.
For executor details, see docs/executor/lobster-executor.md.
The next Task Autopilot implementation work should stay incremental and compatibility-first.
Recommended sequence:
- Add stable projection objects for
Destination,Route,Drive State,Fleet, andTakeoverwithout renaming the existing runtime foundation. - Use those projections to upgrade the office cockpit,
/tasks, and task detail surfaces into a clearer autopilot cockpit. - Connect route recommendation, route selection, takeover, downgrade, and replan actions to existing mission/workflow/runtime control paths.
- Normalize runtime, decision, audit, replay, artifact, and lineage events into an evidence chain that can explain why the task moved the way it did.
- Add success metrics only where the required source-of-truth data exists, and mark partial or conflicted samples explicitly.
Guardrails:
- Do not turn Task Autopilot into a UI-only rebrand; every visible state should point back to runtime facts or clearly marked inference.
- Do not force users to manage 50+ nodes as the main flow; package capabilities into route stages and fleet roles.
- Do not hide governance behind "automation"; high-risk actions must remain auditable and interruptible.
- Do not treat replay as the source of truth when mission/runtime/audit facts disagree; replay is primarily a reconstruction and review surface.
This repository uses pnpm. If pnpm is not installed globally, you can replace commands below with corepack pnpm.
No API key is required for the browser-only preview flow.
pnpm install --frozen-lockfile
pnpm run dev:frontendUse this when you want to explore the office shell, the 3D scene, and the demo experience quickly.
Create a local environment file first:
cp .env.example .envPowerShell alternative:
Copy-Item .env.example .envThen fill the values you need in .env and start the stack:
pnpm run dev:allCommon AI-related variables:
LLM_API_KEY=your_api_key_here
LLM_BASE_URL=https://api.openai.com/v1
LLM_MODEL=gpt-5.4
LLM_WIRE_API=responsesThis is useful when you want to debug the frontend, server, and executor independently.
pnpm run dev:server
pnpm run dev:frontendStart the executor with an explicit mode:
LOBSTER_EXECUTION_MODE=real pnpm exec tsx services/lobster-executor/src/index.tsPowerShell example:
$env:LOBSTER_EXECUTION_MODE='native'
pnpm exec tsx services/lobster-executor/src/index.tsUseful commands:
pnpm run lint: check the guarded formatting targets used by release docs and workflows.pnpm run typecheck: run the TypeScript no-emit check.pnpm run test: run client, server, and executor test entrypoints.pnpm run build: build the frontend and server bundle.pnpm run test:guardrails: run the lighter decision and socket reconnect regression path.pnpm run test:release: run the pre-release aggregate check.pnpm run build:pages: build the GitHub Pages artifact.
For release-sensitive changes, the practical minimum is:
pnpm run lint
pnpm run typecheck
pnpm run test
pnpm run buildcube-pets-office/
|-- client/ # frontend app: office shell, tasks, replay, 3D scene
|-- server/ # backend APIs, workflow state, events, replay
|-- shared/ # shared contracts and types
|-- services/lobster-executor/ # executor service: mock / native / real
|-- docs/ # architecture, executor notes, reference docs
|-- scripts/ # local dev, build, smoke, and utility scripts
|-- data/ # local data and persisted runtime files
`-- .kiro/ # specs, steering, and execution planning artifacts
If you want to start from key entrypoints, read these first:
- client/src/App.tsx
- client/src/pages/Home.tsx
- client/src/pages/tasks/TasksPage.tsx
- client/src/components/office/OfficeTaskCockpit.tsx
- server/index.ts
- server/core/workflow-engine.ts
- services/lobster-executor/src/index.ts
- ROADMAP.md
- CHANGELOG.md
- docs/architecture.svg
- docs/architecture-runtime-2026-04-21.svg
- docs/executor/lobster-executor.md
- .kiro/steering/task-autopilot-spec-roadmap-2026-04-23.md
- .kiro/steering/execution-plan.md
- .kiro/steering/spec-execution-roadmap.md
- .kiro/steering/web-aigc-58-plan-progress-summary-2026-04-22.md
- .kiro/steering/web-aigc-runtime-mainline-checkpoints-2026-04-23.md
- .kiro/steering/web-aigc-phase-2-integration-plan.md
- .kiro/steering/web-aigc-next-phase-mainline-plan-2026-04-22.md
- .kiro/specs/task-autopilot-platform-positioning/
- .kiro/specs/task-autopilot-core-concepts/
- .kiro/specs/task-autopilot-levels-l1-to-l5/
- .kiro/specs/mission-model-to-autopilot-model-mapping/
- .kiro/specs/
README.md is kept as stable product documentation for GitHub. Rolling progress, active implementation details, and dated execution notes belong in ROADMAP.md, .kiro/steering/, and the spec archives.
Use corepack pnpm in place of pnpm, for example:
corepack pnpm install --frozen-lockfile
corepack pnpm run test:releaseBecause GitHub Pages is a static deployment target. It has no local backend process and no local executor. The Pages demo is browser-only preview runtime, not host-process execution.
No. The mission-first runtime foundation exists, the Web-AIGC mainline is integrated, and the first 18 Task Autopilot specs are documented. The next work is to land the projection layer, cockpit experience, runtime orchestration fields, takeover governance, evidence chain, and metrics in production code.
No. The specs explicitly recommend compatibility first. Keep mission / workflow / runtime / task as the engineering layer, then add Destination / Route / Drive State / Fleet / Takeover as product-facing projections and shared vocabulary.
At minimum:
pnpm run lint
pnpm run typecheck
pnpm run testIf your change affects packaging, deployment, or end-to-end runtime behavior, also run:
pnpm run build
pnpm run test:releaseMIT

