Skip to content

Commit e7c67aa

Browse files
authored
chore: Add instrumentation skill and update e2e test skill (#1663)
idk maybe a bit opinionated but less sloppy
1 parent 5480576 commit e7c67aa

4 files changed

Lines changed: 94 additions & 146 deletions

File tree

.agents/skills/e2e-tests/SKILL.md

Lines changed: 49 additions & 138 deletions
Original file line numberDiff line numberDiff line change
@@ -1,165 +1,76 @@
11
---
22
name: e2e-tests
3-
description: Write, run, and debug end-to-end tests for the Braintrust SDK. Use when asked to "add an e2e test", "create a scenario", "write an e2e scenario", "add e2e coverage", "debug e2e test", "fix e2e snapshot", or any task involving the e2e/ directory.
3+
description: Write, run, and debug end-to-end tests for the Braintrust SDK. Use when asked to add an e2e test, create a scenario, write an e2e scenario, add e2e coverage, debug e2e test, fix e2e snapshot, or any task involving the e2e/ directory.
44
---
55

66
# E2E Tests
77

8-
E2E tests run SDK scenarios in subprocesses against a mock Braintrust server. Read `e2e/README.md` for full details. **Always read the existing scenario closest to your task before writing a new one.**
8+
E2E tests run SDK scenarios in subprocesses against a mock Braintrust server. Prefer extending the closest existing scenario over inventing a new pattern.
99

10-
## Commands
11-
12-
```bash
13-
pnpm run build # Build SDK (required if source changed)
14-
cd e2e && npx vitest run scenarios/<name>/scenario.test.ts # Run one scenario
15-
cd e2e && npx vitest run --reporter=verbose scenarios/<name>/scenario.test.ts # Verbose
16-
cd e2e && npx vitest run --update scenarios/<name>/scenario.test.ts # Update snapshots
17-
cd e2e && npx vitest run -t "<exact test name>" # Isolate one test when file args over-match
18-
pnpm run test:e2e # Run all (from repo root)
19-
pnpm run test:e2e:hermetic # Run hermetic-only e2e tests
20-
pnpm run test:e2e:external # Run external-api-only e2e tests
21-
pnpm run fix:formatting # Always run before committing
22-
```
23-
24-
## Creating a Scenario
25-
26-
### 1. Create directory and entrypoint
27-
28-
```bash
29-
mkdir -p e2e/scenarios/<name>
30-
```
31-
32-
**Provider wrapper scenarios** — use `runTracedScenario` + `runOperation` from `provider-runtime.mjs`. This handles `initLogger`, root span, `testRunId` tagging, and flush. See `e2e/helpers/anthropic-scenario.mjs` or `e2e/helpers/openai-scenario.mjs` for examples.
33-
34-
**SDK primitive scenarios** — use `initLogger` + `logger.traced` + `logger.flush` directly. See `e2e/scenarios/trace-primitives-basic/scenario.ts`.
35-
36-
Both patterns use `runMain` from `scenario-runtime.ts` as the entrypoint wrapper.
37-
38-
### 2. Write the test (`scenario.test.ts`)
39-
40-
```typescript
41-
import { expect, test } from "vitest";
42-
import { normalizeForSnapshot, type Json } from "../../helpers/normalize";
43-
import {
44-
prepareScenarioDir,
45-
resolveScenarioDir,
46-
withScenarioHarness,
47-
} from "../../helpers/scenario-harness";
48-
import { findLatestSpan } from "../../helpers/trace-selectors";
49-
import { E2E_TAGS } from "../../helpers/tags";
50-
51-
// Module-level: copies scenario to temp dir + installs deps once
52-
const scenarioDir = await prepareScenarioDir({
53-
scenarioDir: resolveScenarioDir(import.meta.url),
54-
});
55-
56-
test(
57-
"my-scenario captures expected spans",
58-
{ tags: [E2E_TAGS.hermetic] },
59-
async () => {
60-
await withScenarioHarness(async ({ runScenarioDir, testRunEvents }) => {
61-
await runScenarioDir({ scenarioDir, timeoutMs: 90_000 });
62-
const events = testRunEvents();
63-
const root = findLatestSpan(events, "my-root");
64-
expect(root).toBeDefined();
65-
// ...assertions and snapshots
66-
});
67-
},
68-
);
69-
```
70-
71-
Key harness methods: `runScenarioDir()`, `runNodeScenarioDir()`, `runDenoScenarioDir()`, `testRunEvents()`, `events()`, `payloads()`, `requestsAfter(cursor)`, `testRunId`.
10+
Read first:
7211

73-
For wrapper scenarios use `events()` (not `testRunEvents()`) and scope payloads via `payloadRowsForRootSpan()`.
12+
- `e2e/README.md`
13+
- Closest `e2e/scenarios/<name>/scenario.test.ts`
14+
- Relevant shared helper in `e2e/helpers/`
15+
- Relevant `assertions.ts` file when the scenario family already factors shared checks that way
7416

75-
Tagging rules:
17+
## Workflow
7618

77-
- Tag every e2e test with exactly one tag from `e2e/helpers/tags.ts`.
78-
- Use `E2E_TAGS.hermetic` for scenarios that only use local mocks and fixtures.
79-
- Use `E2E_TAGS.externalApi` for provider-backed scenarios. The shared Vitest config applies `retry: 1` to this tag automatically.
80-
- Hermetic e2e tests are expected to run in the GitHub checks workflow. External-api tests run in the integration workflow.
19+
1. Start from the closest existing scenario and keep its structure unless the new case clearly needs a new pattern.
20+
2. Default to module-scope setup with `prepareScenarioDir({ scenarioDir: resolveScenarioDir(import.meta.url) })`. This copies the scenario into an isolated temp directory and installs any scenario-local dependencies before the test bodies run.
21+
3. Use `withScenarioHarness(...)` for every scenario test. Pick the runner that matches the real entrypoint:
22+
- `runScenarioDir()` for default `tsx`-driven TypeScript scenarios
23+
- `runNodeScenarioDir()` for plain Node entrypoints and hook coverage
24+
- `runDenoScenarioDir()` for nested Deno runners
25+
4. Snapshot stable contracts, not raw noise. Normalize before snapshotting and prefer focused summaries over full payload dumps.
26+
5. Run the narrowest test first, then rerun updated scenarios three times before treating snapshots as stable.
8127

82-
### 3. Scenario-local dependencies (optional)
83-
84-
Only needed for external packages not in `e2e/package.json`. Workspace packages (e.g. `@braintrust/langchain-js`, `@braintrust/otel`) go in `e2e/package.json` as `workspace:^` — never use `workspace:` in scenario manifests.
85-
86-
```json
87-
{
88-
"name": "@braintrust/e2e-my-scenario",
89-
"private": true,
90-
"braintrustScenario": {
91-
"canary": { "dependencies": { "some-pkg": "latest" } }
92-
},
93-
"dependencies": { "some-pkg": "1.2.3" }
94-
}
95-
```
28+
## Commands
9629

97-
Generate lockfile (**must be committed**):
30+
Run workspace scripts from the repo root when you want the standard e2e entrypoints:
9831

9932
```bash
100-
cd e2e/scenarios/<name> && pnpm install --ignore-workspace --lockfile-only --strict-peer-dependencies=false
33+
pnpm run test:e2e
34+
pnpm run test:e2e:hermetic # only run tests that don't rely on external services or llm providers
35+
pnpm run test:e2e:update # updates snapshots
10136
```
10237

103-
### 4. Verify stability
38+
Try not to use specific test narrowing commands unless hunting down a very nasty and specific bug.
10439

105-
Run the test **3 times** consecutively. Snapshots must be identical each run. If they aren't, normalize the non-deterministic values (see below).
40+
## Preferred Patterns
10641

107-
## Patterns
42+
- Keep the expensive setup at module scope with `prepareScenarioDir(...)`. Only call `installScenarioDependencies(...)` directly when you are testing installer behavior or need a nonstandard setup.
43+
- Run every scenario through `withScenarioHarness(...)`.
44+
- Tag every test with exactly one tag from `e2e/helpers/tags.ts`.
45+
- Keep reusable logic in `e2e/helpers/`. Keep one-off fixtures and scenario-specific files inside the scenario directory.
46+
- Snapshot stable contracts, not raw noise. Use `normalizeForSnapshot(...)` before inline snapshots and `formatJsonFileSnapshot(...)` plus file snapshots for larger payloads or version matrices.
47+
- When a scenario family already has `assertions.ts`, keep version- or provider-specific test setup in `scenario.test.ts` and reuse the shared assertions file.
48+
- Run new or updated scenarios three times in a row before considering snapshots stable.
10849

109-
### Version matrix
50+
## Scenario Patterns
11051

111-
Use npm aliases to test multiple package versions. Shared logic in `scenario.impl.ts`, version-specific entries import from aliases.
52+
- SDK primitive scenarios: use `scenario.ts` with normal SDK calls and assert on `testRunEvents()`. See `trace-primitives-basic`.
53+
- Wrapper scenarios: use `events()` rather than `testRunEvents()`, find the root span first, and scope payload snapshots with `payloadRowsForRootSpan(...)`. Pair span and payload snapshots when the wrapper emits merged log rows.
54+
- Provider instrumentation scenarios often split setup and shared assertions. See `e2e/scenarios/anthropic-instrumentation/assertions.ts`, `e2e/scenarios/google-genai-instrumentation/assertions.ts`, and similar directories before creating a new pattern.
55+
- Version matrix scenarios: put shared logic in `scenario.impl.*` or shared assertion helpers, then loop over versions from aliases or helper-generated scenario lists. Do not duplicate the same assertions per version by hand.
56+
- Test runner integration scenarios (deno, vitest, jest, ...): keep the outer e2e suite in `scenario.test.ts`, the spawned entry in `scenario.ts`, and nested test files in names like `runner.case.ts`. Do not name nested runner files `*.test.ts`.
11257

113-
```json
114-
{
115-
"dependencies": { "ai-sdk-v5": "npm:ai@5.0.82", "ai-sdk-v6": "npm:ai@6.0.1" }
116-
}
117-
```
118-
119-
```typescript
120-
// scenario.ai-sdk-v5.ts
121-
import * as ai from "ai-sdk-v5";
122-
import { runMyImpl } from "./scenario.impl";
123-
```
124-
125-
Test loops over versions with `for (const s of scenarios) { test(...) }`. See `wrap-ai-sdk-generation-traces` or `ai-sdk-otel-export`.
126-
127-
### Runner-wrapper (vitest/node:test/deno)
128-
129-
When the wrapper runs inside a nested test runner, `scenario.ts` spawns a second process via `runNodeSubprocess`. The nested runner file must NOT be named `*.test.ts`. Tag all data with `metadata.testRunId` and use `payloadRowsForTestRunId()`. See `wrap-vitest-suite-traces`.
130-
131-
Use:
132-
133-
- `runNodeScenarioDir()` for plain Node nested runners
134-
- `runDenoScenarioDir()` for Deno nested runners
135-
- `runner.case.ts` for nested Deno entrypoints
58+
## Scenario-Local Dependencies
13659

137-
Deno scenarios can have intentionally different runtime contracts from Node. Assert the actual Deno/browser behavior rather than copying Node parent-child expectations blindly. See `e2e/scenarios/deno-browser/`.
60+
- Only add a scenario-local `package.json` for truly scenario-specific external dependencies.
61+
- Workspace packages belong in `e2e/package.json` as `workspace:^`, not in scenario manifests.
62+
- Do not use `workspace:` specs in scenario-local manifests.
63+
- If a scenario manifest exists, commit its lockfile.
13864

139-
### OTEL export
65+
Generate the lockfile with:
14066

141-
Set up `BraintrustExporter`/`BraintrustSpanProcessor` pointed at the mock server, register globally, then assert on `/otel/v1/traces` requests via `requestsAfter()` + `extractOtelSpans()`. See `ai-sdk-otel-export` or `otel-span-processor-export`.
142-
143-
## Snapshot Stability
144-
145-
`normalizeForSnapshot()` handles IDs, timestamps, paths, and `system_fingerprint`. You must handle these yourself in a scenario-specific normalizer (see `e2e/scenarios/wrap-langchain-js-traces/assertions.ts` for an example):
146-
147-
| Non-deterministic value | Replacement |
148-
| -------------------------- | ------------------ |
149-
| LLM response text | `"<llm-response>"` |
150-
| Token counts | `0` |
151-
| Tool call IDs (`call_xxx`) | `"<tool_call_id>"` |
152-
153-
## Module Resolution
154-
155-
Scenarios run from `e2e/.bt-tmp/run-<id>/scenarios/<name>/`. Node walks up to `e2e/node_modules/` for workspace deps (`braintrust`, `@braintrust/otel`, etc.). Scenario-local deps are in the scenario's own `node_modules/`. Helper imports (`../../helpers/...`) work because `prepareScenarioDir` copies `e2e/helpers/` into the temp dir.
156-
157-
Deno nested runners use `runDenoScenarioDir()`, which invokes `deno test --no-check` with the harness env vars and the prepared temp scenario path.
67+
```bash
68+
pnpm install --dir e2e/scenarios/<name> --ignore-workspace --lockfile-only --strict-peer-dependencies=false
69+
```
15870

15971
## Debugging
16072

161-
- **Subprocess error**: Read the `STDERR` section in the error message.
162-
- **Module not found**: Is it a workspace pkg? → `e2e/package.json`. External? → scenario `package.json`.
163-
- **Flaky snapshot**: Add normalization for the changing field.
164-
- **Timeout**: Increase `timeoutMs` (90-120s typical for provider calls).
165-
- **Missing lockfile**: `cd e2e/scenarios/<name> && pnpm install --ignore-workspace --lockfile-only --strict-peer-dependencies=false`
73+
- Flaky snapshot: normalize the changing field instead of snapshotting around it.
74+
- Request-flow assertions: grab `requestCursor()` before running the scenario, then inspect `requestsAfter(...)`.
75+
- If the scenario is external-provider backed, confirm the required provider env var is set before debugging the assertions.
76+
- Deno/browser scenarios may intentionally differ from Node. Assert the real runtime contract instead of copying Node expectations blindly.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
name: instrumentation
3+
description: Add or update Braintrust SDK instrumentation. Use when working on auto-instrumentation configs, tracing channels, provider plugins, vendored SDK typings, wrappers, or instrumentation-specific tests.
4+
---
5+
6+
# Instrumentation Rules
7+
8+
Read first based on the task:
9+
10+
- `js/src/instrumentation/README.md` for plugin and tracing-channel architecture
11+
- Closest file in `js/src/instrumentation/core/` when changing shared channel semantics
12+
- Closest file in `js/src/instrumentation/plugins/` when changing provider-specific extraction or span mapping
13+
- Closest file in `js/src/wrappers/` when manual wrappers and auto-instrumentation need to stay aligned
14+
- Closest test in `js/tests/auto-instrumentations/` when changing hook, loader, bundler, or transform behavior
15+
- Closest e2e scenario in `e2e/scenarios/*instrumentation*` or `e2e/scenarios/*node-hook*` when the user-visible trace contract changes
16+
17+
Map the change before editing:
18+
19+
- `js/src/instrumentation/core/` - tracing-channel helpers, stream patching, shared types
20+
- `js/src/instrumentation/plugins/` - provider-specific channel subscriptions and event-to-span conversion
21+
- `js/src/wrappers/` - manual instrumentation entrypoints that should mirror the same logical contracts
22+
- `js/src/auto-instrumentations/` - loader and bundler instrumentation config
23+
- `js/tests/auto-instrumentations/` - functional coverage for transformed code
24+
- `e2e/scenarios/` - subprocess-level contract coverage against the mock Braintrust server
25+
26+
- Non-invasive: instrumentation must not change user-visible behavior. Errors must still propagate, streams and promise subclasses must keep their original semantics, and any patch must be behavior-preserving and idempotent.
27+
- Inputs are untrusted: treat args, results, events, headers, and metadata as hostile. Prototype pollution is a concrete risk here. Avoid unsafe property access patterns, prototype-sensitive operations, and unnecessary mutation of third-party objects.
28+
- Support both auto-instrumentation and manual instrumentation. Auto-instrumentation does not cover every environment, loader, or framework.
29+
- For orchestrion auto-instrumentation, prefer targeting public API functions. Instrumenting internal helpers is more likely to break across SDK versions.
30+
- Auto and manual paths should share logic. Prefer both paths emitting the same tracing-channel events, with provider plugins converting those events into spans/logs/errors. Manual wrappers should not directly emit observability data.
31+
- If a public instrumentation surface changes, check whether the export surface also needs updates in `js/src/instrumentation/index.ts` or `js/src/exports.ts`.
32+
- Preserve async context propagation. Changes around tracing channels, stream patching, or loader hooks must keep the current span context across awaits and stream consumption.
33+
- Maintain isomorphic behavior. Node and browser/bundled paths must use compatible channel implementations and avoid channel-registry mismatches.
34+
- Setup, teardown, and patching must be idempotent. Enabling twice, disabling twice, or applying a patch twice should remain safe.
35+
- Promise/stream behavior must be preserved. Patches need to keep subclass/helper semantics intact.
36+
- Contain instrumentation failures. Extraction/logging bugs should be logged or ignored as appropriate, but must not break the user call path.
37+
- Log only the useful surface. Prefer narrow, stable payloads over dumping full request/response objects; exclude redundant or overly large data when possible.
38+
39+
## Process
40+
41+
Before implementing or changing instrumentation it is advisable to add or adjust the e2e tests for the desired change, make it fail, then implement the new instrumentation until the test passes.

AGENTS.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -78,11 +78,3 @@ pnpm run lint # Run eslint checks
7878
pnpm run fix:formatting # Auto-fix formatting
7979
pnpm run fix:lint # Auto-fix eslint issues
8080
```
81-
82-
## Adding Agent Skills
83-
84-
Use the `dotagents` skill (in `.agents/skills/dotagents/`) to add new skills to this repo. For example:
85-
86-
```bash
87-
dotagents add getsentry/skills find-bugs
88-
```

agents.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,7 @@ source = "getsentry/dotagents"
88
[[skills]]
99
name = "e2e-tests"
1010
source = "path:.agents/skills/e2e-tests"
11+
12+
[[skills]]
13+
name = "instrumentation"
14+
source = "path:.agents/skills/instrumentation"

0 commit comments

Comments
 (0)