You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BigQuery Agent Analytics went GA, and the SDK (this repo) is the consumption-layer story for users who want to turn plugin-logged traces into observability, evaluation, and analytics wins. This issue proposes a Medium blog post series that walks BQ AA users through the highest-leverage downstream workflows the SDK unlocks, ordered by downstream value (breadth of audience × time-to-value × pain relieved × differentiation vs. raw BigQuery).
Ranking is based on an analysis of main as of commit de2add2 and reviewed against README.md, SDK.md, src/bigquery_agent_analytics/cli.py, and the examples directory.
Final ranked series
#
Post
Tier
1
Trace reconstruction + DAG rendering
Top-of-funnel
2
Code-based evals in CI (--exit-code)
Top-of-funnel
3
Analyst-friendly views in dbt + Looker Studio
Top-of-funnel
4
LLM-as-Judge in BigQuery (AI.GENERATE → Gemini API fallback)
Aha
5
client.insights() + drift detection
Aha
6
Real-time categorical dashboards (streaming cron)
Aha
7
Long-horizon agent memory
Power-user
8
Trajectory matching + pass@k
Power-user
9
Ontology + binding + property graphs
Power-user
10
HITL safety / world-change detection
Power-user
Editorial briefs
1. Trace Reconstruction + DAG Rendering
Audience: Any engineer who just installed the BQ Agent Analytics plugin and opened the agent_events table for the first time.
Promise: See your agent's full conversation as a readable tree in under 10 lines of Python.
Proof: Before/after — raw BQ row dump vs. client.get_session_trace(id).render() ASCII tree, plus trace.tool_calls / final_response / error_spans access.
CTA: "Install bq-agent-sdk, run doctor, and paste your worst session ID into .render()."
2. Code-Based Evals in CI
Audience: Platform/infra engineers responsible for not letting agent regressions ship.
Promise: Fail a PR when p95 latency, error rate, or token cost crosses a threshold on the last 24h of production traffic.
Proof: A 20-line GitHub Actions YAML running bq-agent-sdk evaluate --evaluator=latency --last=24h --threshold=5000 --exit-code, with a red/green screenshot.
CTA: "Fork this workflow file — it's the minimum agent quality gate."
3. Analyst-Friendly Views (Re-angled for plugin-created views)
Audience: Analytics engineers and BI folks who can't or won't write Python against agent_events.
Promise: Your plugin already created typed views — point dbt and Looker Studio at them today, no SDK call required.
Proof: Three recipes — a dbt source.yml, a Looker Studio explore screenshot, and an ad-hoc latency SQL query — all against agent_events_LLM_REQUEST. Sidebar: ViewManager.create_all_views() for pre-v1.27 plugins or custom prefixes (SDK.md §16).
CTA: "Send this post to your data team. You already have the schema you wanted."
4. LLM-as-Judge in BigQuery
Audience: Teams whose agent quality question is "was the answer actually good?" not "was it fast?"
Promise: Score thousands of sessions for correctness, hallucination, or sentiment without moving data out of BigQuery.
Proof: client.evaluate(evaluator=LLMAsJudge.correctness(threshold=0.7, strict=True)) running AI.GENERATE first with a Gemini API fallback; side-by-side cost table at 1k / 10k / 100k session scale.
CTA: "Start with correctness, add hallucination when you're ready, tune strict=True before you trust a dashboard."
5. Insights + Drift Detection
Audience: PMs, UX researchers, and agent owners who want to know what users actually do — not what the spec says they should do.
Promise: One Python call produces a 7-section report on friction, tool usage, and emerging task areas, plus a coverage report against your golden set.
Proof: client.insights(filters=...) rendering friction_analysis + task_areas + suggestions, paired with client.drift_detection(golden_dataset=...) showing which production questions your eval suite misses.
CTA: "Run this weekly. Promote the top 5 uncovered questions into your eval suite."
6. Real-Time Categorical Dashboards
Audience: Ops teams running support/sales/assistant agents who need live quality signals, not weekly retros.
Promise: A Looker Studio dashboard refreshing every 5 minutes with tone, outcome, and escalation rates across all live sessions.
Proof: bq-agent-sdk categorical-eval --last=5m --persist --prompt-version=v1 on a Cloud Scheduler cron + categorical-views generating the 4 dedup views + a dashboard screenshot.
CTA: "Fork the deploy/streaming_evaluation/ template; provision a reservation; ship before your next oncall shift."
7. Long-Horizon Agent Memory
Audience: Agent builders moving from single-turn demos to agents that remember users across sessions.
Promise: Retrieve prior episodes, search them semantically, and budget tokens — all on traces the plugin already logs for you.
Proof: BigQueryMemoryService.get_session_context() → UserProfileBuilder.build_profile() → ContextManager.select_relevant_context() in one notebook, ending with a before/after of agent behavior with memory off vs. on.
CTA: "You already have the data. Add 30 lines and your agent stops forgetting."
8. Trajectory Matching + pass@k
Audience: Eval-mature teams with curated golden sets who need regression tests for non-deterministic agents.
Promise: Prove that your agent still takes the right tool path N% of the time, across K trials per task.
Proof: BigQueryTraceEvaluator.evaluate_batch() with MatchType.IN_ORDER + TrialRunner(num_trials=10) producing pass@k and pass^k over an eval suite.
Audience: Data platform teams who care about governance, reuse, and querying agent traces as a graph.
Promise: Author semantics once (ontology.yaml), bind to any warehouse (binding.yaml), materialize a BigQuery Property Graph, query it with GQL.
Proof: A 3-file walkthrough — finance.ontology.yaml + finance-bq-prod.binding.yaml + bq-agent-sdk ontology-build — ending with a GQL query traversing HOLDS / OWNS edges and a quick mention of OWL/TTL import.
CTA: "Start with one entity and one relationship. The rest is iteration."
10. HITL Safety / World-Change Detection
Audience: Teams running autonomous agents that take real-world actions (writes, transactions, approvals).
Promise: Detect when the world has changed between an agent's decision and a human's approval — and fail closed when the check itself fails.
Proof: ContextGraphManager.detect_world_changes(session_id, current_state_fn) producing a WorldChangeReport with is_safe_to_approve=False on drift, and a second demo where the check itself fails closed.
CTA: "If your agent can spend money or send messages, this is table stakes. Wire it into your approval queue today."
Structural notes
CTA tiers: Posts 1–3 share a CTA ("install and try it"). Posts 4–6 share one ("evaluate your production traffic"). Posts 7–10 share one ("level up your agent architecture"). Link forward within each tier and backward across tiers.
Cross-promotion: Short integration posts ("BQ Agent Analytics + LangChain / ADK / dbt / Looker Studio") can link back into Tier 1 posts for SEO and partner amplification.
Revise README for clarity and updated link #3 Analyst views (re-angled) — recent ADK plugin versions ship automatic per-event-type views, so the post should lead with plugin-created views and position ViewManager (SDK.md §16) as the fallback for custom prefixes or older plugins.
Filed for tracking and community input. Happy to take suggestions on ordering, additional topics, or which posts maintainers want to co-author or amplify.
Context
BigQuery Agent Analytics went GA, and the SDK (this repo) is the consumption-layer story for users who want to turn plugin-logged traces into observability, evaluation, and analytics wins. This issue proposes a Medium blog post series that walks BQ AA users through the highest-leverage downstream workflows the SDK unlocks, ordered by downstream value (breadth of audience × time-to-value × pain relieved × differentiation vs. raw BigQuery).
Ranking is based on an analysis of
mainas of commitde2add2and reviewed againstREADME.md,SDK.md,src/bigquery_agent_analytics/cli.py, and the examples directory.Final ranked series
--exit-code)AI.GENERATE→ Gemini API fallback)client.insights()+ drift detectionEditorial briefs
1. Trace Reconstruction + DAG Rendering
agent_eventstable for the first time.client.get_session_trace(id).render()ASCII tree, plustrace.tool_calls/final_response/error_spansaccess.bq-agent-sdk, rundoctor, and paste your worst session ID into.render()."2. Code-Based Evals in CI
bq-agent-sdk evaluate --evaluator=latency --last=24h --threshold=5000 --exit-code, with a red/green screenshot.3. Analyst-Friendly Views (Re-angled for plugin-created views)
agent_events.source.yml, a Looker Studio explore screenshot, and an ad-hoc latency SQL query — all againstagent_events_LLM_REQUEST. Sidebar:ViewManager.create_all_views()for pre-v1.27 plugins or custom prefixes (SDK.md§16).4. LLM-as-Judge in BigQuery
client.evaluate(evaluator=LLMAsJudge.correctness(threshold=0.7, strict=True))runningAI.GENERATEfirst with a Gemini API fallback; side-by-side cost table at 1k / 10k / 100k session scale.correctness, addhallucinationwhen you're ready, tunestrict=Truebefore you trust a dashboard."5. Insights + Drift Detection
client.insights(filters=...)renderingfriction_analysis+task_areas+suggestions, paired withclient.drift_detection(golden_dataset=...)showing which production questions your eval suite misses.6. Real-Time Categorical Dashboards
bq-agent-sdk categorical-eval --last=5m --persist --prompt-version=v1on a Cloud Scheduler cron +categorical-viewsgenerating the 4 dedup views + a dashboard screenshot.deploy/streaming_evaluation/template; provision a reservation; ship before your next oncall shift."7. Long-Horizon Agent Memory
BigQueryMemoryService.get_session_context()→UserProfileBuilder.build_profile()→ContextManager.select_relevant_context()in one notebook, ending with a before/after of agent behavior with memory off vs. on.8. Trajectory Matching + pass@k
BigQueryTraceEvaluator.evaluate_batch()withMatchType.IN_ORDER+TrialRunner(num_trials=10)producingpass@kandpass^kover an eval suite.9. Ontology + Binding + Property Graphs
ontology.yaml), bind to any warehouse (binding.yaml), materialize a BigQuery Property Graph, query it with GQL.finance.ontology.yaml+finance-bq-prod.binding.yaml+bq-agent-sdk ontology-build— ending with a GQL query traversingHOLDS/OWNSedges and a quick mention of OWL/TTL import.10. HITL Safety / World-Change Detection
ContextGraphManager.detect_world_changes(session_id, current_state_fn)producing aWorldChangeReportwithis_safe_to_approve=Falseon drift, and a second demo where the check itself fails closed.Structural notes
Why this ordering
README.md§Key Features,SDK.md§2).bq-agent-sdk evaluate --exit-codeis production-ready (cli.py:267).ViewManager(SDK.md§16) as the fallback for custom prefixes or older plugins.client.insights()+ drift is a lower-setup "aha" than streaming categorical dashboards, which require cron/job scheduling and dashboard plumbing.BigQueryMemoryService.add_session_to_memory()rides on plugin-logged traces with no separate eval corpus needed); trajectory matching is more rigorous but carries higher adoption tax (curated golden trajectories).Filed for tracking and community input. Happy to take suggestions on ordering, additional topics, or which posts maintainers want to co-author or amplify.