Skip to content

Commit 447472e

Browse files
Feat/OpenAI agents plugin sandbox support (#1452)
Add SandboxAgent support to Temporal OpenAI Agents plugin Enable workflows to use the OpenAI Agents SDK's SandboxAgent by routing all sandbox lifecycle and I/O operations through Temporal activities. The user passes a real sandbox client (e.g. DaytonaSandboxClient) to OpenAIAgentsPlugin(sandbox_client=...) and the plugin handles the rest — no direct imports from the sandbox subpackage are needed. Key changes: Add sandbox/ subpackage with internal modules for: - TemporalSandboxClient: workflow-side client that dispatches create/resume/delete as activities - TemporalSandboxSession: workflow-side session that routes exec, read, write, and other I/O through activities - TemporalSandboxActivities: worker-side activity implementations that delegate to the real BaseSandboxClient/BaseSandboxSession - Pydantic activity arg/result models for serialization Update TemporalOpenAIRunner to detect SandboxAgent in the agent graph and automatically inject TemporalSandboxClient when run_config.sandbox is configured Update OpenAIAgentsPlugin to accept sandbox_client and register sandbox activities on the worker Add tests covering: - SandboxAgent detection in agent graphs (direct, handoff, circular) - Validation errors (missing config, wrong client type) - Activity delegation (each activity correctly calls the real client/session) - Session caching and eviction in TemporalSandboxActivities - End-to-end integration test running sandbox activities through a real Temporal workflow
1 parent b93b5c0 commit 447472e

18 files changed

+2031
-63
lines changed

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ classifiers = [
2828
grpc = ["grpcio>=1.48.2,<2"]
2929
opentelemetry = ["opentelemetry-api>=1.11.1,<2", "opentelemetry-sdk>=1.11.1,<2"]
3030
pydantic = ["pydantic>=2.0.0,<3"]
31-
openai-agents = ["openai-agents>=0.3,<0.7", "mcp>=1.9.4, <2"]
31+
openai-agents = ["openai-agents>=0.14.0", "mcp>=1.9.4, <2"]
3232
google-adk = ["google-adk>=1.27.0,<2"]
3333
langsmith = ["langsmith>=0.7.0,<0.8"]
3434
lambda-worker-otel = [
@@ -71,8 +71,8 @@ dev = [
7171
"pytest-cov>=6.1.1",
7272
"httpx>=0.28.1",
7373
"pytest-pretty>=1.3.0",
74-
"openai-agents>=0.3,<0.7; python_version >= '3.14'",
75-
"openai-agents[litellm]>=0.3,<0.7; python_version < '3.14'",
74+
"openai-agents>=0.14.0; python_version >= '3.14'",
75+
"openai-agents[litellm]>=0.14.0; python_version < '3.14'",
7676
"litellm>=1.83.0",
7777
"openinference-instrumentation-google-adk>=0.1.8",
7878
"googleapis-common-protos==1.70.0",

temporalio/contrib/openai_agents/README.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ This document is organized as follows:
1717
- **[Background Concepts](#core-concepts).** Background on durable execution and AI agents.
1818
- **[Full Example](#full-example)** Running the Hello World Durable Agent example.
1919
- **[Tool Calling](#tool-calling).** Calling agent Tools in Temporal.
20+
- **[Sandbox Support](#sandbox-support).** Running sandbox agents in Temporal.
2021
- **[Feature Support](#feature-support).** Compatibility matrix.
2122

2223
The [samples repository](https://github.com/temporalio/samples-python/tree/main/openai_agents) contains examples including basic usage, common agent patterns, and more complete samples.
@@ -450,6 +451,131 @@ To recover from such failures, you need to implement your own application-level
450451

451452
For network-accessible MCP servers, you can also use `HostedMCPTool` from the OpenAI Agents SDK, which uses an MCP client hosted by OpenAI.
452453

454+
## Sandbox Support
455+
456+
⚠️ **Pre-release** - This functionality is subject to change prior to General Availability.
457+
458+
The sandbox integration lets `SandboxAgent` from the OpenAI Agents SDK execute inside a remote or local sandbox (Daytona, Docker, E2B, local Unix, etc.) while keeping all coordination durable in Temporal.
459+
460+
Every sandbox operation — creating a session, running commands, reading/writing files, PTY interactions — is dispatched as a Temporal activity. This means sandbox work is fully observable, retryable, and recoverable like any other activity, and sandbox session state is serialized with the workflow so it survives worker restarts.
461+
462+
### Architecture
463+
464+
```text
465+
Workflow Code
466+
467+
temporal_sandbox_client("daytona") [returns TemporalSandboxClient]
468+
469+
SandboxAgent.run(run_config=RunConfig(sandbox=SandboxRunConfig(client=...)))
470+
471+
sandbox agent calls session.exec / session.read / session.write / …
472+
473+
TemporalSandboxSession routes each call as a Temporal activity
474+
("daytona-sandbox_session_exec", "daytona-sandbox_session_read", …)
475+
476+
SandboxClientProvider activities on the worker call the real sandbox client
477+
478+
Actual sandbox backend (Daytona, Docker, local, …)
479+
```
480+
481+
### Worker Configuration
482+
483+
Register one or more `SandboxClientProvider` instances with the plugin. Each provider pairs a unique name with a real `BaseSandboxClient` implementation. The plugin automatically registers all required activities on the worker.
484+
485+
```python
486+
import asyncio
487+
import docker
488+
from temporalio.client import Client
489+
from temporalio.worker import Worker
490+
from temporalio.contrib.openai_agents import OpenAIAgentsPlugin, SandboxClientProvider, ModelActivityParameters
491+
from agents.extensions.sandbox.daytona import DaytonaSandboxClient
492+
from agents.extensions.sandbox.unix_local import UnixLocalSandboxClient
493+
494+
async def main():
495+
client = await Client.connect(
496+
"localhost:7233",
497+
plugins=[
498+
OpenAIAgentsPlugin(
499+
model_params=ModelActivityParameters(
500+
start_to_close_timeout=timedelta(seconds=30)
501+
),
502+
sandbox_clients=[
503+
SandboxClientProvider("daytona", DaytonaSandboxClient()),
504+
SandboxClientProvider("local", UnixLocalSandboxClient()),
505+
],
506+
),
507+
],
508+
)
509+
510+
worker = Worker(
511+
client,
512+
task_queue="my-task-queue",
513+
workflows=[MyWorkflow],
514+
)
515+
await worker.run()
516+
```
517+
518+
Provider names must be unique. Each name becomes the prefix for that backend's activities, allowing multiple backends to coexist on a single worker.
519+
520+
### Workflow Usage
521+
522+
In the workflow, use `temporal_sandbox_client()` to create a reference to a registered backend by name. Pass it to `SandboxRunConfig` inside `RunConfig`:
523+
524+
```python
525+
from temporalio import workflow
526+
from temporalio.contrib.openai_agents.workflow import temporal_sandbox_client
527+
from agents import Runner
528+
from agents.sandbox import SandboxAgent, SandboxRunConfig
529+
from agents.run import RunConfig
530+
531+
@workflow.defn
532+
class MyWorkflow:
533+
@workflow.run
534+
async def run(self, prompt: str) -> str:
535+
agent = SandboxAgent(
536+
name="Coding Assistant",
537+
instructions="You are a helpful coding assistant with access to a sandbox.",
538+
)
539+
540+
result = await Runner.run(
541+
agent,
542+
prompt,
543+
run_config=RunConfig(
544+
sandbox=SandboxRunConfig(
545+
client=temporal_sandbox_client("daytona"),
546+
options=DaytonaSandboxClientOptions(pause_on_exit=False),
547+
),
548+
),
549+
)
550+
return result.final_output
551+
```
552+
553+
The name passed to `temporal_sandbox_client()` must exactly match the name used in `SandboxClientProvider` on the worker.
554+
555+
### Multiple Backends
556+
557+
A single workflow can target different backends by name. Register all backends on the worker and reference each by name in the workflow:
558+
559+
```python
560+
# Run a task on the "daytona" backend
561+
result = await Runner.run(
562+
agent, prompt,
563+
run_config=RunConfig(sandbox=SandboxRunConfig(
564+
client=temporal_sandbox_client("daytona"),
565+
options=DaytonaSandboxClientOptions(pause_on_exit=False),
566+
)),
567+
)
568+
569+
# Run a different task on the "local" backend
570+
result = await Runner.run(
571+
agent, prompt,
572+
run_config=RunConfig(sandbox=SandboxRunConfig(
573+
client=temporal_sandbox_client("local"),
574+
options=UnixLocalSandboxClientOptions(),
575+
)),
576+
)
577+
```
578+
453579
## Feature Support
454580

455581
This integration is presently subject to certain limitations.

temporalio/contrib/openai_agents/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,9 @@
1313
OpenAIAgentsPlugin,
1414
OpenAIPayloadConverter,
1515
)
16+
from temporalio.contrib.openai_agents.sandbox._sandbox_client_provider import (
17+
SandboxClientProvider,
18+
)
1619
from temporalio.contrib.openai_agents.workflow import AgentsWorkflowError
1720

1821
from . import testing, workflow
@@ -22,6 +25,7 @@
2225
"ModelActivityParameters",
2326
"OpenAIAgentsPlugin",
2427
"OpenAIPayloadConverter",
28+
"SandboxClientProvider",
2529
"StatelessMCPServerProvider",
2630
"StatefulMCPServerProvider",
2731
"testing",

temporalio/contrib/openai_agents/_invoke_model_activity.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,13 @@
2727
UserError,
2828
WebSearchTool,
2929
)
30+
from agents.tool import (
31+
ApplyPatchTool,
32+
LocalShellTool,
33+
ShellTool,
34+
ShellToolEnvironment,
35+
ToolSearchTool,
36+
)
3037
from openai import (
3138
APIStatusError,
3239
AsyncOpenAI,
@@ -73,13 +80,47 @@ class HostedMCPToolInput:
7380
tool_config: Mcp
7481

7582

83+
@dataclass
84+
class ShellToolInput:
85+
"""Data conversion friendly representation of a ShellTool. Contains only the fields which are needed by the model
86+
execution to determine what tool to call, not the actual tool invocation, which remains in the workflow context.
87+
"""
88+
89+
name: str = "shell"
90+
environment: ShellToolEnvironment | None = None
91+
92+
93+
class _NoopApplyPatchEditor:
94+
"""Satisfies the ApplyPatchEditor protocol for tool reconstruction during model calls."""
95+
96+
def create_file(self, operation: Any) -> None: # type: ignore[reportUnusedParameter]
97+
return None
98+
99+
def update_file(self, operation: Any) -> None: # type: ignore[reportUnusedParameter]
100+
return None
101+
102+
def delete_file(self, operation: Any) -> None: # type: ignore[reportUnusedParameter]
103+
return None
104+
105+
106+
@dataclass
107+
class ApplyPatchToolInput:
108+
"""Data conversion friendly representation of an ApplyPatchTool."""
109+
110+
name: str = "apply_patch"
111+
112+
76113
ToolInput = (
77114
FunctionToolInput
78115
| FileSearchTool
79116
| WebSearchTool
80117
| ImageGenerationTool
81118
| CodeInterpreterTool
82119
| HostedMCPToolInput
120+
| ShellToolInput
121+
| LocalShellTool
122+
| ApplyPatchToolInput
123+
| ToolSearchTool
83124
)
84125

85126

@@ -181,9 +222,26 @@ def make_tool(tool: ToolInput) -> Tool:
181222
WebSearchTool,
182223
ImageGenerationTool,
183224
CodeInterpreterTool,
225+
LocalShellTool,
226+
ToolSearchTool,
184227
),
185228
):
186229
return tool
230+
elif isinstance(tool, ShellToolInput):
231+
232+
async def _noop_executor(*a: Any, **kw: Any) -> str: # type: ignore[reportUnusedParameter]
233+
return ""
234+
235+
return ShellTool(
236+
name=tool.name,
237+
environment=tool.environment,
238+
executor=_noop_executor,
239+
)
240+
elif isinstance(tool, ApplyPatchToolInput):
241+
return ApplyPatchTool(
242+
name=tool.name,
243+
editor=_NoopApplyPatchEditor(),
244+
)
187245
elif isinstance(tool, HostedMCPToolInput):
188246
return HostedMCPTool(
189247
tool_config=tool.tool_config,

temporalio/contrib/openai_agents/_mcp.py

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ class _StatelessCallToolsArguments:
4141
tool_name: str
4242
arguments: dict[str, Any] | None
4343
factory_argument: Any | None
44+
meta: dict[str, Any] | None = None
4445

4546

4647
@dataclasses.dataclass
@@ -100,11 +101,16 @@ async def list_tools(
100101
return tools
101102

102103
async def call_tool(
103-
self, tool_name: str, arguments: dict[str, Any] | None
104+
self,
105+
tool_name: str,
106+
arguments: dict[str, Any] | None,
107+
meta: dict[str, Any] | None = None,
104108
) -> CallToolResult:
105109
return await workflow.execute_activity(
106110
self.name + "-call-tool-v2",
107-
_StatelessCallToolsArguments(tool_name, arguments, self._factory_argument),
111+
_StatelessCallToolsArguments(
112+
tool_name, arguments, self._factory_argument, meta
113+
),
108114
result_type=CallToolResult,
109115
**self._config,
110116
)
@@ -190,7 +196,7 @@ async def call_tool(args: _StatelessCallToolsArguments) -> CallToolResult:
190196
server = self._create_server(args.factory_argument)
191197
try:
192198
await server.connect()
193-
return await server.call_tool(args.tool_name, args.arguments)
199+
return await server.call_tool(args.tool_name, args.arguments, args.meta)
194200
finally:
195201
await server.cleanup()
196202

@@ -275,6 +281,7 @@ async def wrapper(*args: Any, **kwargs: Any):
275281
class _StatefulCallToolsArguments:
276282
tool_name: str
277283
arguments: dict[str, Any] | None
284+
meta: dict[str, Any] | None = None
278285

279286

280287
@dataclasses.dataclass
@@ -362,15 +369,18 @@ async def list_tools(
362369

363370
@_handle_worker_failure
364371
async def call_tool(
365-
self, tool_name: str, arguments: dict[str, Any] | None
372+
self,
373+
tool_name: str,
374+
arguments: dict[str, Any] | None,
375+
meta: dict[str, Any] | None = None,
366376
) -> CallToolResult:
367377
if not self._connect_handle:
368378
raise ApplicationError(
369379
"Stateful MCP Server not connected. Call connect first."
370380
)
371381
return await workflow.execute_activity(
372382
self.name + "-call-tool-v2",
373-
_StatefulCallToolsArguments(tool_name, arguments),
383+
_StatefulCallToolsArguments(tool_name, arguments, meta),
374384
result_type=CallToolResult,
375385
**self._config,
376386
)
@@ -460,7 +470,7 @@ async def call_tool_deprecated(
460470
@activity.defn(name=self.name + "-call-tool-v2")
461471
async def call_tool(args: _StatefulCallToolsArguments) -> CallToolResult:
462472
return await self._servers[_server_id()].call_tool(
463-
args.tool_name, args.arguments
473+
args.tool_name, args.arguments, args.meta
464474
)
465475

466476
@activity.defn(name=self.name + "-list-prompts")

0 commit comments

Comments
 (0)