|
1 | 1 | --- |
2 | 2 | name: e2e-tests |
3 | | -description: Write, run, and debug end-to-end tests for the Braintrust SDK. Use when asked to "add an e2e test", "create a scenario", "write an e2e scenario", "add e2e coverage", "debug e2e test", "fix e2e snapshot", or any task involving the e2e/ directory. |
| 3 | +description: Write, run, and debug end-to-end tests for the Braintrust SDK. Use when asked to add an e2e test, create a scenario, write an e2e scenario, add e2e coverage, debug e2e test, fix e2e snapshot, or any task involving the e2e/ directory. |
4 | 4 | --- |
5 | 5 |
|
6 | 6 | # E2E Tests |
7 | 7 |
|
8 | | -E2E tests run SDK scenarios in subprocesses against a mock Braintrust server. Read `e2e/README.md` for full details. **Always read the existing scenario closest to your task before writing a new one.** |
| 8 | +E2E tests run SDK scenarios in subprocesses against a mock Braintrust server. Prefer extending the closest existing scenario over inventing a new pattern. |
9 | 9 |
|
10 | | -## Commands |
11 | | - |
12 | | -```bash |
13 | | -pnpm run build # Build SDK (required if source changed) |
14 | | -cd e2e && npx vitest run scenarios/<name>/scenario.test.ts # Run one scenario |
15 | | -cd e2e && npx vitest run --reporter=verbose scenarios/<name>/scenario.test.ts # Verbose |
16 | | -cd e2e && npx vitest run --update scenarios/<name>/scenario.test.ts # Update snapshots |
17 | | -cd e2e && npx vitest run -t "<exact test name>" # Isolate one test when file args over-match |
18 | | -pnpm run test:e2e # Run all (from repo root) |
19 | | -pnpm run test:e2e:hermetic # Run hermetic-only e2e tests |
20 | | -pnpm run test:e2e:external # Run external-api-only e2e tests |
21 | | -pnpm run fix:formatting # Always run before committing |
22 | | -``` |
23 | | - |
24 | | -## Creating a Scenario |
25 | | - |
26 | | -### 1. Create directory and entrypoint |
27 | | - |
28 | | -```bash |
29 | | -mkdir -p e2e/scenarios/<name> |
30 | | -``` |
31 | | - |
32 | | -**Provider wrapper scenarios** — use `runTracedScenario` + `runOperation` from `provider-runtime.mjs`. This handles `initLogger`, root span, `testRunId` tagging, and flush. See `e2e/helpers/anthropic-scenario.mjs` or `e2e/helpers/openai-scenario.mjs` for examples. |
33 | | - |
34 | | -**SDK primitive scenarios** — use `initLogger` + `logger.traced` + `logger.flush` directly. See `e2e/scenarios/trace-primitives-basic/scenario.ts`. |
35 | | - |
36 | | -Both patterns use `runMain` from `scenario-runtime.ts` as the entrypoint wrapper. |
37 | | - |
38 | | -### 2. Write the test (`scenario.test.ts`) |
39 | | - |
40 | | -```typescript |
41 | | -import { expect, test } from "vitest"; |
42 | | -import { normalizeForSnapshot, type Json } from "../../helpers/normalize"; |
43 | | -import { |
44 | | - prepareScenarioDir, |
45 | | - resolveScenarioDir, |
46 | | - withScenarioHarness, |
47 | | -} from "../../helpers/scenario-harness"; |
48 | | -import { findLatestSpan } from "../../helpers/trace-selectors"; |
49 | | -import { E2E_TAGS } from "../../helpers/tags"; |
50 | | - |
51 | | -// Module-level: copies scenario to temp dir + installs deps once |
52 | | -const scenarioDir = await prepareScenarioDir({ |
53 | | - scenarioDir: resolveScenarioDir(import.meta.url), |
54 | | -}); |
55 | | - |
56 | | -test( |
57 | | - "my-scenario captures expected spans", |
58 | | - { tags: [E2E_TAGS.hermetic] }, |
59 | | - async () => { |
60 | | - await withScenarioHarness(async ({ runScenarioDir, testRunEvents }) => { |
61 | | - await runScenarioDir({ scenarioDir, timeoutMs: 90_000 }); |
62 | | - const events = testRunEvents(); |
63 | | - const root = findLatestSpan(events, "my-root"); |
64 | | - expect(root).toBeDefined(); |
65 | | - // ...assertions and snapshots |
66 | | - }); |
67 | | - }, |
68 | | -); |
69 | | -``` |
70 | | - |
71 | | -Key harness methods: `runScenarioDir()`, `runNodeScenarioDir()`, `runDenoScenarioDir()`, `testRunEvents()`, `events()`, `payloads()`, `requestsAfter(cursor)`, `testRunId`. |
| 10 | +Read first: |
72 | 11 |
|
73 | | -For wrapper scenarios use `events()` (not `testRunEvents()`) and scope payloads via `payloadRowsForRootSpan()`. |
| 12 | +- `e2e/README.md` |
| 13 | +- Closest `e2e/scenarios/<name>/scenario.test.ts` |
| 14 | +- Relevant shared helper in `e2e/helpers/` |
| 15 | +- Relevant `assertions.ts` file when the scenario family already factors shared checks that way |
74 | 16 |
|
75 | | -Tagging rules: |
| 17 | +## Workflow |
76 | 18 |
|
77 | | -- Tag every e2e test with exactly one tag from `e2e/helpers/tags.ts`. |
78 | | -- Use `E2E_TAGS.hermetic` for scenarios that only use local mocks and fixtures. |
79 | | -- Use `E2E_TAGS.externalApi` for provider-backed scenarios. The shared Vitest config applies `retry: 1` to this tag automatically. |
80 | | -- Hermetic e2e tests are expected to run in the GitHub checks workflow. External-api tests run in the integration workflow. |
| 19 | +1. Start from the closest existing scenario and keep its structure unless the new case clearly needs a new pattern. |
| 20 | +2. Default to module-scope setup with `prepareScenarioDir({ scenarioDir: resolveScenarioDir(import.meta.url) })`. This copies the scenario into an isolated temp directory and installs any scenario-local dependencies before the test bodies run. |
| 21 | +3. Use `withScenarioHarness(...)` for every scenario test. Pick the runner that matches the real entrypoint: |
| 22 | + - `runScenarioDir()` for default `tsx`-driven TypeScript scenarios |
| 23 | + - `runNodeScenarioDir()` for plain Node entrypoints and hook coverage |
| 24 | + - `runDenoScenarioDir()` for nested Deno runners |
| 25 | +4. Snapshot stable contracts, not raw noise. Normalize before snapshotting and prefer focused summaries over full payload dumps. |
| 26 | +5. Run the narrowest test first, then rerun updated scenarios three times before treating snapshots as stable. |
81 | 27 |
|
82 | | -### 3. Scenario-local dependencies (optional) |
83 | | - |
84 | | -Only needed for external packages not in `e2e/package.json`. Workspace packages (e.g. `@braintrust/langchain-js`, `@braintrust/otel`) go in `e2e/package.json` as `workspace:^` — never use `workspace:` in scenario manifests. |
85 | | - |
86 | | -```json |
87 | | -{ |
88 | | - "name": "@braintrust/e2e-my-scenario", |
89 | | - "private": true, |
90 | | - "braintrustScenario": { |
91 | | - "canary": { "dependencies": { "some-pkg": "latest" } } |
92 | | - }, |
93 | | - "dependencies": { "some-pkg": "1.2.3" } |
94 | | -} |
95 | | -``` |
| 28 | +## Commands |
96 | 29 |
|
97 | | -Generate lockfile (**must be committed**): |
| 30 | +Run workspace scripts from the repo root when you want the standard e2e entrypoints: |
98 | 31 |
|
99 | 32 | ```bash |
100 | | -cd e2e/scenarios/<name> && pnpm install --ignore-workspace --lockfile-only --strict-peer-dependencies=false |
| 33 | +pnpm run test:e2e |
| 34 | +pnpm run test:e2e:hermetic # only run tests that don't rely on external services or llm providers |
| 35 | +pnpm run test:e2e:update # updates snapshots |
101 | 36 | ``` |
102 | 37 |
|
103 | | -### 4. Verify stability |
| 38 | +Try not to use specific test narrowing commands unless hunting down a very nasty and specific bug. |
104 | 39 |
|
105 | | -Run the test **3 times** consecutively. Snapshots must be identical each run. If they aren't, normalize the non-deterministic values (see below). |
| 40 | +## Preferred Patterns |
106 | 41 |
|
107 | | -## Patterns |
| 42 | +- Keep the expensive setup at module scope with `prepareScenarioDir(...)`. Only call `installScenarioDependencies(...)` directly when you are testing installer behavior or need a nonstandard setup. |
| 43 | +- Run every scenario through `withScenarioHarness(...)`. |
| 44 | +- Tag every test with exactly one tag from `e2e/helpers/tags.ts`. |
| 45 | +- Keep reusable logic in `e2e/helpers/`. Keep one-off fixtures and scenario-specific files inside the scenario directory. |
| 46 | +- Snapshot stable contracts, not raw noise. Use `normalizeForSnapshot(...)` before inline snapshots and `formatJsonFileSnapshot(...)` plus file snapshots for larger payloads or version matrices. |
| 47 | +- When a scenario family already has `assertions.ts`, keep version- or provider-specific test setup in `scenario.test.ts` and reuse the shared assertions file. |
| 48 | +- Run new or updated scenarios three times in a row before considering snapshots stable. |
108 | 49 |
|
109 | | -### Version matrix |
| 50 | +## Scenario Patterns |
110 | 51 |
|
111 | | -Use npm aliases to test multiple package versions. Shared logic in `scenario.impl.ts`, version-specific entries import from aliases. |
| 52 | +- SDK primitive scenarios: use `scenario.ts` with normal SDK calls and assert on `testRunEvents()`. See `trace-primitives-basic`. |
| 53 | +- Wrapper scenarios: use `events()` rather than `testRunEvents()`, find the root span first, and scope payload snapshots with `payloadRowsForRootSpan(...)`. Pair span and payload snapshots when the wrapper emits merged log rows. |
| 54 | +- Provider instrumentation scenarios often split setup and shared assertions. See `e2e/scenarios/anthropic-instrumentation/assertions.ts`, `e2e/scenarios/google-genai-instrumentation/assertions.ts`, and similar directories before creating a new pattern. |
| 55 | +- Version matrix scenarios: put shared logic in `scenario.impl.*` or shared assertion helpers, then loop over versions from aliases or helper-generated scenario lists. Do not duplicate the same assertions per version by hand. |
| 56 | +- Test runner integration scenarios (deno, vitest, jest, ...): keep the outer e2e suite in `scenario.test.ts`, the spawned entry in `scenario.ts`, and nested test files in names like `runner.case.ts`. Do not name nested runner files `*.test.ts`. |
112 | 57 |
|
113 | | -```json |
114 | | -{ |
115 | | - "dependencies": { "ai-sdk-v5": "npm:ai@5.0.82", "ai-sdk-v6": "npm:ai@6.0.1" } |
116 | | -} |
117 | | -``` |
118 | | - |
119 | | -```typescript |
120 | | -// scenario.ai-sdk-v5.ts |
121 | | -import * as ai from "ai-sdk-v5"; |
122 | | -import { runMyImpl } from "./scenario.impl"; |
123 | | -``` |
124 | | - |
125 | | -Test loops over versions with `for (const s of scenarios) { test(...) }`. See `wrap-ai-sdk-generation-traces` or `ai-sdk-otel-export`. |
126 | | - |
127 | | -### Runner-wrapper (vitest/node:test/deno) |
128 | | - |
129 | | -When the wrapper runs inside a nested test runner, `scenario.ts` spawns a second process via `runNodeSubprocess`. The nested runner file must NOT be named `*.test.ts`. Tag all data with `metadata.testRunId` and use `payloadRowsForTestRunId()`. See `wrap-vitest-suite-traces`. |
130 | | - |
131 | | -Use: |
132 | | - |
133 | | -- `runNodeScenarioDir()` for plain Node nested runners |
134 | | -- `runDenoScenarioDir()` for Deno nested runners |
135 | | -- `runner.case.ts` for nested Deno entrypoints |
| 58 | +## Scenario-Local Dependencies |
136 | 59 |
|
137 | | -Deno scenarios can have intentionally different runtime contracts from Node. Assert the actual Deno/browser behavior rather than copying Node parent-child expectations blindly. See `e2e/scenarios/deno-browser/`. |
| 60 | +- Only add a scenario-local `package.json` for truly scenario-specific external dependencies. |
| 61 | +- Workspace packages belong in `e2e/package.json` as `workspace:^`, not in scenario manifests. |
| 62 | +- Do not use `workspace:` specs in scenario-local manifests. |
| 63 | +- If a scenario manifest exists, commit its lockfile. |
138 | 64 |
|
139 | | -### OTEL export |
| 65 | +Generate the lockfile with: |
140 | 66 |
|
141 | | -Set up `BraintrustExporter`/`BraintrustSpanProcessor` pointed at the mock server, register globally, then assert on `/otel/v1/traces` requests via `requestsAfter()` + `extractOtelSpans()`. See `ai-sdk-otel-export` or `otel-span-processor-export`. |
142 | | - |
143 | | -## Snapshot Stability |
144 | | - |
145 | | -`normalizeForSnapshot()` handles IDs, timestamps, paths, and `system_fingerprint`. You must handle these yourself in a scenario-specific normalizer (see `e2e/scenarios/wrap-langchain-js-traces/assertions.ts` for an example): |
146 | | - |
147 | | -| Non-deterministic value | Replacement | |
148 | | -| -------------------------- | ------------------ | |
149 | | -| LLM response text | `"<llm-response>"` | |
150 | | -| Token counts | `0` | |
151 | | -| Tool call IDs (`call_xxx`) | `"<tool_call_id>"` | |
152 | | - |
153 | | -## Module Resolution |
154 | | - |
155 | | -Scenarios run from `e2e/.bt-tmp/run-<id>/scenarios/<name>/`. Node walks up to `e2e/node_modules/` for workspace deps (`braintrust`, `@braintrust/otel`, etc.). Scenario-local deps are in the scenario's own `node_modules/`. Helper imports (`../../helpers/...`) work because `prepareScenarioDir` copies `e2e/helpers/` into the temp dir. |
156 | | - |
157 | | -Deno nested runners use `runDenoScenarioDir()`, which invokes `deno test --no-check` with the harness env vars and the prepared temp scenario path. |
| 67 | +```bash |
| 68 | +pnpm install --dir e2e/scenarios/<name> --ignore-workspace --lockfile-only --strict-peer-dependencies=false |
| 69 | +``` |
158 | 70 |
|
159 | 71 | ## Debugging |
160 | 72 |
|
161 | | -- **Subprocess error**: Read the `STDERR` section in the error message. |
162 | | -- **Module not found**: Is it a workspace pkg? → `e2e/package.json`. External? → scenario `package.json`. |
163 | | -- **Flaky snapshot**: Add normalization for the changing field. |
164 | | -- **Timeout**: Increase `timeoutMs` (90-120s typical for provider calls). |
165 | | -- **Missing lockfile**: `cd e2e/scenarios/<name> && pnpm install --ignore-workspace --lockfile-only --strict-peer-dependencies=false` |
| 73 | +- Flaky snapshot: normalize the changing field instead of snapshotting around it. |
| 74 | +- Request-flow assertions: grab `requestCursor()` before running the scenario, then inspect `requestsAfter(...)`. |
| 75 | +- If the scenario is external-provider backed, confirm the required provider env var is set before debugging the assertions. |
| 76 | +- Deno/browser scenarios may intentionally differ from Node. Assert the real runtime contract instead of copying Node expectations blindly. |
0 commit comments