evaluate --exit-code: escape quotes (or emit JSON Lines) in LLM-judge FAIL feedback field

## Context

PR #45 (merged) verified the LLM-judge AI.GENERATE path end-to-end against live BigQuery. During that smoke, one FAIL line surfaced an unescaped embedded double-quote inside the `feedback="..."` value when the judge's justification itself contained quoted text:

```
FAIL session=4ca31e85 metric=faithfulness score=0.6 feedback="The agent added "(Design)" to Jordan Lee's name, which was not present in the user's request or any provided context."
```

That parses fine for a human eyeball but breaks `awk -F'"'`, `cut -d'"'`, or any shell parser that splits on `"` to extract the feedback field. Apostrophes inside the feedback (`Jordan Lee's`) also wouldn't survive a switch to single-quote wrapping.

## Why this is not a blocker

- Doesn't affect SDK correctness, AI.GENERATE behavior, or judge scoring.
- Doesn't affect blog post #3 (#82) — the human-readable CI screenshot looks correct, the cover demo's value is unchanged.
- Pass/fail outcome and exit code are unchanged.

Tracking as a follow-up so it doesn't get lost.

## Two options

1. **Escape `"` and `\` inside the feedback value.** Single-line, minimal change to `cli._format_feedback_snippet` (or to the FAIL-line composition in `_emit_evaluate_failures`):

   ```python
   escaped = feedback.replace("\\", "\\\\").replace('"', '\\"')
   parts.append(f'feedback="{escaped}"')
   ```

   Pros: smallest delta; preserves the current `key=value` shape readers may already grep.
   Cons: still a hand-rolled DSL — anyone parsing it has to know about the SDK's escape convention.

2. **Add an opt-in JSON Lines mode for CI parsing.** New flag like `--exit-code --emit=jsonl` that emits one JSON object per failing (session, metric) pair to stderr instead of the current `key=value` line. Default stays human-readable.

   Pros: CI integrations get a proper machine-parseable contract; no escape ambiguity ever; future fields (e.g. multi-metric judge results, criterion arrays) can grow without breaking parsers.
   Cons: bigger surface; needs a flag-shape decision; existing readers stay on the human format anyway.

## Recommendation

Ship Option 1 first (~10-line change, drops in next polish window). Track Option 2 as a separate larger ask if a CI consumer actually needs the JSONL contract — don't pre-add the surface for a hypothetical use.

## Surface to touch

- `src/bigquery_agent_analytics/cli.py` — `_format_feedback_snippet` (or the line composition in `_emit_evaluate_failures`).
- `tests/test_cli.py::TestFormatFeedbackSnippet` — add an escape-roundtrip case.
- `tests/test_cli.py::test_evaluate_exit_code_llm_judge_emits_feedback_snippet` — extend the assertion to cover an embedded-quote justification (use the real `(Design)` example from the PR #45 smoke as the regression seed).

## Ref

- PR #45 (live AI.GENERATE rewrite): https://github.com/haiyuan-eng-google/BigQuery-Agent-Analytics-SDK-fork/pull/45
- Blog post #3 plan: #82
- Series plan: #51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

evaluate --exit-code: escape quotes (or emit JSON Lines) in LLM-judge FAIL feedback field #84

Context

Why this is not a blocker

Two options

Recommendation

Surface to touch

Ref

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

evaluate --exit-code: escape quotes (or emit JSON Lines) in LLM-judge FAIL feedback field #84

Description

Context

Why this is not a blocker

Two options

Recommendation

Surface to touch

Ref

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions