chore: Add instrumentation skill and update e2e test skill (#1663)

lforst · web-flow · commit e7c67aac60fa · 2026-03-25T20:52:52.000+01:00
idk maybe a bit opinionated but less sloppy
diff --git a/.agents/skills/e2e-tests/SKILL.md b/.agents/skills/e2e-tests/SKILL.md
@@ -1,165 +1,76 @@
 ---
 name: e2e-tests
-description: Write, run, and debug end-to-end tests for the Braintrust SDK. Use when asked to "add an e2e test", "create a scenario", "write an e2e scenario", "add e2e coverage", "debug e2e test", "fix e2e snapshot", or any task involving the e2e/ directory.
+description: Write, run, and debug end-to-end tests for the Braintrust SDK. Use when asked to add an e2e test, create a scenario, write an e2e scenario, add e2e coverage, debug e2e test, fix e2e snapshot, or any task involving the e2e/ directory.
 ---
 
 # E2E Tests
 
-E2E tests run SDK scenarios in subprocesses against a mock Braintrust server. Read `e2e/README.md` for full details. **Always read the existing scenario closest to your task before writing a new one.**
+E2E tests run SDK scenarios in subprocesses against a mock Braintrust server. Prefer extending the closest existing scenario over inventing a new pattern.
 
-## Commands
-
-```bash
-pnpm run build                        # Build SDK (required if source changed)
-cd e2e && npx vitest run scenarios/<name>/scenario.test.ts          # Run one scenario
-cd e2e && npx vitest run --reporter=verbose scenarios/<name>/scenario.test.ts  # Verbose
-cd e2e && npx vitest run --update scenarios/<name>/scenario.test.ts # Update snapshots
-cd e2e && npx vitest run -t "<exact test name>"                     # Isolate one test when file args over-match
-pnpm run test:e2e                     # Run all (from repo root)
-pnpm run test:e2e:hermetic            # Run hermetic-only e2e tests
-pnpm run test:e2e:external            # Run external-api-only e2e tests
-pnpm run fix:formatting               # Always run before committing
-```
-
-## Creating a Scenario
-
-### 1. Create directory and entrypoint
-
-```bash
-mkdir -p e2e/scenarios/<name>
-```
-
-**Provider wrapper scenarios** — use `runTracedScenario` + `runOperation` from `provider-runtime.mjs`. This handles `initLogger`, root span, `testRunId` tagging, and flush. See `e2e/helpers/anthropic-scenario.mjs` or `e2e/helpers/openai-scenario.mjs` for examples.
-
-**SDK primitive scenarios** — use `initLogger` + `logger.traced` + `logger.flush` directly. See `e2e/scenarios/trace-primitives-basic/scenario.ts`.
-
-Both patterns use `runMain` from `scenario-runtime.ts` as the entrypoint wrapper.
-
-### 2. Write the test (`scenario.test.ts`)
-
-```typescript
-import { expect, test } from "vitest";
-import { normalizeForSnapshot, type Json } from "../../helpers/normalize";
-import {
-  prepareScenarioDir,
-  resolveScenarioDir,
-  withScenarioHarness,
-} from "../../helpers/scenario-harness";
-import { findLatestSpan } from "../../helpers/trace-selectors";
-import { E2E_TAGS } from "../../helpers/tags";
-
-// Module-level: copies scenario to temp dir + installs deps once
-const scenarioDir = await prepareScenarioDir({
-  scenarioDir: resolveScenarioDir(import.meta.url),
-});
-
-test(
-  "my-scenario captures expected spans",
-  { tags: [E2E_TAGS.hermetic] },
-  async () => {
-    await withScenarioHarness(async ({ runScenarioDir, testRunEvents }) => {
-      await runScenarioDir({ scenarioDir, timeoutMs: 90_000 });
-      const events = testRunEvents();
-      const root = findLatestSpan(events, "my-root");
-      expect(root).toBeDefined();
-      // ...assertions and snapshots
-    });
-  },
-);
-```
-
-Key harness methods: `runScenarioDir()`, `runNodeScenarioDir()`, `runDenoScenarioDir()`, `testRunEvents()`, `events()`, `payloads()`, `requestsAfter(cursor)`, `testRunId`.
+Read first:
 
-For wrapper scenarios use `events()` (not `testRunEvents()`) and scope payloads via `payloadRowsForRootSpan()`.
+- `e2e/README.md`
+- Closest `e2e/scenarios/<name>/scenario.test.ts`
+- Relevant shared helper in `e2e/helpers/`
+- Relevant `assertions.ts` file when the scenario family already factors shared checks that way
 
-Tagging rules:
+## Workflow
 
-- Tag every e2e test with exactly one tag from `e2e/helpers/tags.ts`.
-- Use `E2E_TAGS.hermetic` for scenarios that only use local mocks and fixtures.
-- Use `E2E_TAGS.externalApi` for provider-backed scenarios. The shared Vitest config applies `retry: 1` to this tag automatically.
-- Hermetic e2e tests are expected to run in the GitHub checks workflow. External-api tests run in the integration workflow.
+1. Start from the closest existing scenario and keep its structure unless the new case clearly needs a new pattern.
+2. Default to module-scope setup with `prepareScenarioDir({ scenarioDir: resolveScenarioDir(import.meta.url) })`. This copies the scenario into an isolated temp directory and installs any scenario-local dependencies before the test bodies run.
+3. Use `withScenarioHarness(...)` for every scenario test. Pick the runner that matches the real entrypoint:
+   - `runScenarioDir()` for default `tsx`-driven TypeScript scenarios
+   - `runNodeScenarioDir()` for plain Node entrypoints and hook coverage
+   - `runDenoScenarioDir()` for nested Deno runners
+4. Snapshot stable contracts, not raw noise. Normalize before snapshotting and prefer focused summaries over full payload dumps.
+5. Run the narrowest test first, then rerun updated scenarios three times before treating snapshots as stable.
 
-### 3. Scenario-local dependencies (optional)
-
-Only needed for external packages not in `e2e/package.json`. Workspace packages (e.g. `@braintrust/langchain-js`, `@braintrust/otel`) go in `e2e/package.json` as `workspace:^` — never use `workspace:` in scenario manifests.
-
-```json
-{
-  "name": "@braintrust/e2e-my-scenario",
-  "private": true,
-  "braintrustScenario": {
-    "canary": { "dependencies": { "some-pkg": "latest" } }
-  },
-  "dependencies": { "some-pkg": "1.2.3" }
-}
-```
+## Commands
 
-Generate lockfile (**must be committed**):
+Run workspace scripts from the repo root when you want the standard e2e entrypoints:
 
 ```bash
-cd e2e/scenarios/<name> && pnpm install --ignore-workspace --lockfile-only --strict-peer-dependencies=false
+pnpm run test:e2e
+pnpm run test:e2e:hermetic # only run tests that don't rely on external services or llm providers
+pnpm run test:e2e:update # updates snapshots
 ```
 
-### 4. Verify stability
+Try not to use specific test narrowing commands unless hunting down a very nasty and specific bug.
 
-Run the test **3 times** consecutively. Snapshots must be identical each run. If they aren't, normalize the non-deterministic values (see below).
+## Preferred Patterns
 
-## Patterns
+- Keep the expensive setup at module scope with `prepareScenarioDir(...)`. Only call `installScenarioDependencies(...)` directly when you are testing installer behavior or need a nonstandard setup.
+- Run every scenario through `withScenarioHarness(...)`.
+- Tag every test with exactly one tag from `e2e/helpers/tags.ts`.
+- Keep reusable logic in `e2e/helpers/`. Keep one-off fixtures and scenario-specific files inside the scenario directory.
+- Snapshot stable contracts, not raw noise. Use `normalizeForSnapshot(...)` before inline snapshots and `formatJsonFileSnapshot(...)` plus file snapshots for larger payloads or version matrices.
+- When a scenario family already has `assertions.ts`, keep version- or provider-specific test setup in `scenario.test.ts` and reuse the shared assertions file.
+- Run new or updated scenarios three times in a row before considering snapshots stable.
 
-### Version matrix
+## Scenario Patterns
 
-Use npm aliases to test multiple package versions. Shared logic in `scenario.impl.ts`, version-specific entries import from aliases.
+- SDK primitive scenarios: use `scenario.ts` with normal SDK calls and assert on `testRunEvents()`. See `trace-primitives-basic`.
+- Wrapper scenarios: use `events()` rather than `testRunEvents()`, find the root span first, and scope payload snapshots with `payloadRowsForRootSpan(...)`. Pair span and payload snapshots when the wrapper emits merged log rows.
+- Provider instrumentation scenarios often split setup and shared assertions. See `e2e/scenarios/anthropic-instrumentation/assertions.ts`, `e2e/scenarios/google-genai-instrumentation/assertions.ts`, and similar directories before creating a new pattern.
+- Version matrix scenarios: put shared logic in `scenario.impl.*` or shared assertion helpers, then loop over versions from aliases or helper-generated scenario lists. Do not duplicate the same assertions per version by hand.
+- Test runner integration scenarios (deno, vitest, jest, ...): keep the outer e2e suite in `scenario.test.ts`, the spawned entry in `scenario.ts`, and nested test files in names like `runner.case.ts`. Do not name nested runner files `*.test.ts`.
 
-```json
-{
-  "dependencies": { "ai-sdk-v5": "npm:ai@5.0.82", "ai-sdk-v6": "npm:ai@6.0.1" }
-}
-```
-
-```typescript
-// scenario.ai-sdk-v5.ts
-import * as ai from "ai-sdk-v5";
-import { runMyImpl } from "./scenario.impl";
-```
-
-Test loops over versions with `for (const s of scenarios) { test(...) }`. See `wrap-ai-sdk-generation-traces` or `ai-sdk-otel-export`.
-
-### Runner-wrapper (vitest/node:test/deno)
-
-When the wrapper runs inside a nested test runner, `scenario.ts` spawns a second process via `runNodeSubprocess`. The nested runner file must NOT be named `*.test.ts`. Tag all data with `metadata.testRunId` and use `payloadRowsForTestRunId()`. See `wrap-vitest-suite-traces`.
-
-Use:
-
-- `runNodeScenarioDir()` for plain Node nested runners
-- `runDenoScenarioDir()` for Deno nested runners
-- `runner.case.ts` for nested Deno entrypoints
+## Scenario-Local Dependencies
 
-Deno scenarios can have intentionally different runtime contracts from Node. Assert the actual Deno/browser behavior rather than copying Node parent-child expectations blindly. See `e2e/scenarios/deno-browser/`.
+- Only add a scenario-local `package.json` for truly scenario-specific external dependencies.
+- Workspace packages belong in `e2e/package.json` as `workspace:^`, not in scenario manifests.
+- Do not use `workspace:` specs in scenario-local manifests.
+- If a scenario manifest exists, commit its lockfile.
 
-### OTEL export
+Generate the lockfile with:
 
-Set up `BraintrustExporter`/`BraintrustSpanProcessor` pointed at the mock server, register globally, then assert on `/otel/v1/traces` requests via `requestsAfter()` + `extractOtelSpans()`. See `ai-sdk-otel-export` or `otel-span-processor-export`.
-
-## Snapshot Stability
-
-`normalizeForSnapshot()` handles IDs, timestamps, paths, and `system_fingerprint`. You must handle these yourself in a scenario-specific normalizer (see `e2e/scenarios/wrap-langchain-js-traces/assertions.ts` for an example):
-
-| Non-deterministic value    | Replacement        |
-| -------------------------- | ------------------ |
-| LLM response text          | `"<llm-response>"` |
-| Token counts               | `0`                |
-| Tool call IDs (`call_xxx`) | `"<tool_call_id>"` |
-
-## Module Resolution
-
-Scenarios run from `e2e/.bt-tmp/run-<id>/scenarios/<name>/`. Node walks up to `e2e/node_modules/` for workspace deps (`braintrust`, `@braintrust/otel`, etc.). Scenario-local deps are in the scenario's own `node_modules/`. Helper imports (`../../helpers/...`) work because `prepareScenarioDir` copies `e2e/helpers/` into the temp dir.
-
-Deno nested runners use `runDenoScenarioDir()`, which invokes `deno test --no-check` with the harness env vars and the prepared temp scenario path.
+```bash
+pnpm install --dir e2e/scenarios/<name> --ignore-workspace --lockfile-only --strict-peer-dependencies=false
+```
 
 ## Debugging
 
-- **Subprocess error**: Read the `STDERR` section in the error message.
-- **Module not found**: Is it a workspace pkg? → `e2e/package.json`. External? → scenario `package.json`.
-- **Flaky snapshot**: Add normalization for the changing field.
-- **Timeout**: Increase `timeoutMs` (90-120s typical for provider calls).
-- **Missing lockfile**: `cd e2e/scenarios/<name> && pnpm install --ignore-workspace --lockfile-only --strict-peer-dependencies=false`
+- Flaky snapshot: normalize the changing field instead of snapshotting around it.
+- Request-flow assertions: grab `requestCursor()` before running the scenario, then inspect `requestsAfter(...)`.
+- If the scenario is external-provider backed, confirm the required provider env var is set before debugging the assertions.
+- Deno/browser scenarios may intentionally differ from Node. Assert the real runtime contract instead of copying Node expectations blindly.
diff --git a/.agents/skills/instrumentation/SKILL.md b/.agents/skills/instrumentation/SKILL.md
@@ -0,0 +1,41 @@
+---
+name: instrumentation
+description: Add or update Braintrust SDK instrumentation. Use when working on auto-instrumentation configs, tracing channels, provider plugins, vendored SDK typings, wrappers, or instrumentation-specific tests.
+---
+
+# Instrumentation Rules
+
+Read first based on the task:
+
+- `js/src/instrumentation/README.md` for plugin and tracing-channel architecture
+- Closest file in `js/src/instrumentation/core/` when changing shared channel semantics
+- Closest file in `js/src/instrumentation/plugins/` when changing provider-specific extraction or span mapping
+- Closest file in `js/src/wrappers/` when manual wrappers and auto-instrumentation need to stay aligned
+- Closest test in `js/tests/auto-instrumentations/` when changing hook, loader, bundler, or transform behavior
+- Closest e2e scenario in `e2e/scenarios/*instrumentation*` or `e2e/scenarios/*node-hook*` when the user-visible trace contract changes
+
+Map the change before editing:
+
+- `js/src/instrumentation/core/` - tracing-channel helpers, stream patching, shared types
+- `js/src/instrumentation/plugins/` - provider-specific channel subscriptions and event-to-span conversion
+- `js/src/wrappers/` - manual instrumentation entrypoints that should mirror the same logical contracts
+- `js/src/auto-instrumentations/` - loader and bundler instrumentation config
+- `js/tests/auto-instrumentations/` - functional coverage for transformed code
+- `e2e/scenarios/` - subprocess-level contract coverage against the mock Braintrust server
+
+- Non-invasive: instrumentation must not change user-visible behavior. Errors must still propagate, streams and promise subclasses must keep their original semantics, and any patch must be behavior-preserving and idempotent.
+- Inputs are untrusted: treat args, results, events, headers, and metadata as hostile. Prototype pollution is a concrete risk here. Avoid unsafe property access patterns, prototype-sensitive operations, and unnecessary mutation of third-party objects.
+- Support both auto-instrumentation and manual instrumentation. Auto-instrumentation does not cover every environment, loader, or framework.
+- For orchestrion auto-instrumentation, prefer targeting public API functions. Instrumenting internal helpers is more likely to break across SDK versions.
+- Auto and manual paths should share logic. Prefer both paths emitting the same tracing-channel events, with provider plugins converting those events into spans/logs/errors. Manual wrappers should not directly emit observability data.
+- If a public instrumentation surface changes, check whether the export surface also needs updates in `js/src/instrumentation/index.ts` or `js/src/exports.ts`.
+- Preserve async context propagation. Changes around tracing channels, stream patching, or loader hooks must keep the current span context across awaits and stream consumption.
+- Maintain isomorphic behavior. Node and browser/bundled paths must use compatible channel implementations and avoid channel-registry mismatches.
+- Setup, teardown, and patching must be idempotent. Enabling twice, disabling twice, or applying a patch twice should remain safe.
+- Promise/stream behavior must be preserved. Patches need to keep subclass/helper semantics intact.
+- Contain instrumentation failures. Extraction/logging bugs should be logged or ignored as appropriate, but must not break the user call path.
+- Log only the useful surface. Prefer narrow, stable payloads over dumping full request/response objects; exclude redundant or overly large data when possible.
+
+## Process
+
+Before implementing or changing instrumentation it is advisable to add or adjust the e2e tests for the desired change, make it fail, then implement the new instrumentation until the test passes.
diff --git a/AGENTS.md b/AGENTS.md
@@ -78,11 +78,3 @@ pnpm run lint            # Run eslint checks
 pnpm run fix:formatting  # Auto-fix formatting
 pnpm run fix:lint        # Auto-fix eslint issues
 ```
-
-## Adding Agent Skills
-
-Use the `dotagents` skill (in `.agents/skills/dotagents/`) to add new skills to this repo. For example:
-
-```bash
-dotagents add getsentry/skills find-bugs
-```
diff --git a/agents.toml b/agents.toml
@@ -8,3 +8,7 @@ source = "getsentry/dotagents"
 [[skills]]
 name = "e2e-tests"
 source = "path:.agents/skills/e2e-tests"
+
+[[skills]]
+name = "instrumentation"
+source = "path:.agents/skills/instrumentation"