Feat: Runtime entity resolution primitives — OntologyRuntime, concept index, EntityResolver protocol (design proposal — feedback wanted)

> **Status**: Design proposal. Not yet implemented. Comments welcome — especially on the "Open questions" section at the bottom and on the agent-vs-SDK boundary.

## Goal

Today the SDK's ontology pipeline stops at DDL compilation (`gm compile` emits `CREATE PROPERTY GRAPH` + table scaffolding). Runtime — the point where an agent receives a user/client input like `format_ids: ["display_static"]` or `geo: ["San Francisco-Stockton-Modesto"]` and needs to resolve it against a declared ontology — is left entirely to the application layer.

Feedback from a production user building agentic media buying on top of this SDK quantified the gap: ~85% of brief-validation value for their use case sits at runtime, not schema time. They implemented a 5-layer resolver (notation match → lexical → token-set equality → Jaccard → Levenshtein) on top of ~10K lines of TTL (274 SKOS concepts, 942 synonyms, 210 GAM DMA display names). It works — but every vertical building on the SDK will rewrite some version of this, and today there is no supported runtime surface for them to build against.

This issue proposes a small, opinion-light set of runtime primitives that make resolution implementable in application code without pushing domain-specific matching logic into the SDK core.

## Guiding principle: SDK provides, agent decides

The SDK and the agent layer make different kinds of claims:

- **SDK**: knows what's *declared* in the ontology. Entities, relationships, synonyms, notations, concept schemes, taxonomy structure. Stable, typed, queryable.
- **Agent**: knows what's *intended*. Which matcher to try first, what confidence threshold is safe for this domain, how to phrase a "did you mean" suggestion, whether fuzzy match on a free-text `company_name` is acceptable or dangerous.

Consequences for the runtime:

1. The SDK exposes read access over loaded ontologies (annotations, synonyms, scheme membership, taxonomy edges). No matching logic.
2. The SDK optionally materializes an ontology-derived **concept index** into BigQuery so agents can do SQL-native fuzzy match using BQ's existing `EDIT_DISTANCE` / `SOUNDEX` / UDFs.
3. The SDK defines an `EntityResolver` **protocol** and ships two trivial references (`ExactMatchResolver`, `SynonymResolver`). Anything beyond exact-match lives outside core.
4. Domain-specific resolvers (advertising, healthcare, finance) live in `contrib/` or user code, never in the runtime's required surface.

The SDK stays general. Verticals get a contract to build against instead of reaching into YAML or reconstructing structure from BQ tables.

## Current gaps

1. **No runtime accessor over loaded ontologies.** `load_ontology()` returns Pydantic models, but there's no shape-agnostic API like `rt.synonyms("DMA")` or `rt.annotation("DMA", "skos:notation")`. Agents parse the model directly, which couples them to schema details the SDK otherwise hides.

2. **Annotations are not queryable at runtime.** Issue #57 proposes persisting SKOS annotations (`skos:definition`, `skos:notation`, `skos:prefLabel`, etc.) through import. Nothing today reads those annotations at runtime. They live in the YAML and die there.

3. **No concept index.** Synonyms and notations are scattered across per-entity YAML nodes. Agents that want to do SQL-level matching have to flatten this themselves at query time on every request.

4. **No resolver interface.** Every SDK user writes their own resolution entry point, with their own return type, with their own "did you mean" shape. No convention, no reuse.

## Proposed primitives

### 1. `OntologyRuntime` — read accessor over loaded ontology + binding

Small, stateless, zero external dependencies at read time. Built on top of existing `load_ontology()` + `load_binding()`.

```python
from bigquery_agent_analytics import OntologyRuntime

rt = OntologyRuntime.load(
    ontology_path="ontology.yaml",
    binding_path="binding.yaml",
)

rt.entities()                              # list[str]
rt.entity("DMA")                           # Entity with annotations + synonyms
rt.synonyms("DMA")                         # ["Designated Market Area", ...]
rt.annotation("DMA", "skos:notation")      # "807"
rt.in_scheme("NielsenDMA")                 # list[Entity] — all concepts in scheme
rt.broader("RetailBanking")                # list[Entity] — skos:broader targets
rt.narrower("Banking")                     # inverse
rt.related("Account")                      # skos:related abstract-relationship targets
```

Design notes:

- Only *reads*. Never mutates ontology or binding.
- Covers both concrete and abstract (SKOS-derived) entities and relationships. Abstract elements are first-class at the runtime layer — they're the whole reason users care about SKOS at runtime.
- Works against the annotations produced by issue #57's SKOS import without coupling to SKOS specifically. `rt.annotation(name, key)` treats `skos:definition`, `owl:equivalentClass`, or a user's custom annotation identically.

**Identity rules (important after #57 lands):**

- **Entities are name-addressed.** `rt.entity(name)`, `rt.synonyms(name)`, `rt.annotation(name, key)` are singular lookups — entity names remain globally unique.
- **Relationships are traversal-first, not name-addressed.** Issue #57 relaxes relationship uniqueness to `(name, from, to)` for abstract relationships, so a single `skos_broader` can repeat across endpoint pairs. A hypothetical `rt.relationship(name)` would be unsafe because it has no single answer to return.
- **All relationship accessors take an entity and traverse.** `rt.broader(entity)`, `rt.narrower(entity)`, `rt.related(entity)` return the set of entities reachable from the given starting point via the named predicate. That's a well-defined question regardless of how many `skos_broader` edges exist in the ontology.
- **If a relationship-by-name accessor is ever added**, its contract must be compound identity (`rt.relationship(name, from, to) -> Relationship | None`) or list-returning (`rt.relationships(name) -> list[Relationship]`). Never singular-by-name.

### 2. Concept index materialization (opt-in)

At `gm compile` time, optionally emit a BigQuery sidecar table:

```sql
CREATE TABLE `{dataset}.ontology_concept_index` (
  entity_name STRING NOT NULL,
  label STRING NOT NULL,         -- for label_kind='notation', this holds the notation value
  label_kind STRING NOT NULL,    -- 'name' | 'pref' | 'alt' | 'hidden' | 'synonym' | 'notation'
  notation STRING,               -- per-entity notation for display; repeats across rows of the same entity
  scheme STRING,                 -- concept scheme this row's entity belongs to;
                                 -- NULL means "entity is not a member of any scheme"
  language STRING,               -- ISO-639 tag; NULL means unspecified or N/A (notation rows)
  is_abstract BOOL NOT NULL,     -- TRUE for SKOS-derived informational entities
  compile_id STRING NOT NULL     -- pair-consistency tag; see "Provenance and compatibility contract"
);
```

**`notation` is a first-class row kind.** For every entity that has a `skos:notation`, the compiler emits a row with `label_kind='notation'` and `label=<notation value>` — so resolvers searching by `label` naturally catch notation matches without a separate `OR notation = @input` predicate. The `notation` *column* is kept as per-entity metadata that repeats across all rows of the same entity, for display convenience (a caller with a winning match can read the entity's notation directly from the candidate row without a separate lookup).

Row multiplicity contract:

- **One row per `(entity_name, label, label_kind, language, scheme)` membership tuple.** A SKOS concept can legally belong to multiple `skos:inScheme` schemes (a DMA concept may be in both `NielsenDMA` and `CensusMSA`, a banking concept may be in both `BankingTaxonomy` and `FinancialProductsTaxonomy`). This is denormalized — a concept in 3 schemes × 5 labels produces 15 rows. Intentional; see below.
- Entities that aren't members of any scheme produce rows with `scheme IS NULL`. They're still in the index; `entity=` resolution finds them, `scheme=` resolution skips them.
- `notation` is per-entity (not per-scheme), so it repeats across membership rows for the same entity. Callers selecting a single notation per entity use `DISTINCT notation` or aggregate.

Why denormalized rather than `ARRAY<STRING> scheme` or a separate membership table:

- `WHERE scheme = @x` stays a trivial clustered lookup — critical for the common `scheme=<name>` resolver path.
- Predicate push-down into BQ clustering is straightforward; the clustering key `(scheme, entity_name)` stays usable.
- `ARRAY<STRING>` forces `WHERE @x IN UNNEST(scheme)` on every scheme-scoped query, which is less indexable and harder for less-experienced SQL callers to write correctly.
- A separate membership table adds a join to every resolver query, defeats the "one-table SQL lookup" simplicity that motivates the index.
- Row multiplication is bounded: even for pathological multi-scheme ontologies, row count is linear in (concepts × labels × schemes), which stays tractable at BQ scale.

Agents do fuzzy match in SQL:

```sql
-- exact, scheme-scoped (the common case)
SELECT DISTINCT entity_name
FROM ontology_concept_index
WHERE scheme = @scheme AND LOWER(label) = LOWER(@input);

-- fuzzy fallback with BQ native functions
SELECT entity_name, MIN(EDIT_DISTANCE(LOWER(label), LOWER(@input))) AS dist
FROM ontology_concept_index
WHERE scheme = @scheme
  AND EDIT_DISTANCE(LOWER(label), LOWER(@input)) <= 3
GROUP BY entity_name
ORDER BY dist ASC
LIMIT 5;
```

The `DISTINCT`/`GROUP BY` on `entity_name` is how callers collapse the denormalized rows back to one result per matched concept.

Matches the SDK's agent-native ethos: any action an agent takes in SQL is something a user or another tool can also take. No new Python-only runtime, no new service, no new matcher implementation to maintain.

Opt-in. **v1 ships with a CLI flag only**: `gm compile --emit-concept-index`. Default off for users who don't need it. A binding-side toggle (`index: concept_index` block on `Binding`) was considered but deferred — it requires schema and loader changes in `bigquery_ontology.binding_models` + `binding_loader.py` that are worth scoping as their own change once the CLI behavior is settled. If v2 adds it, the explicit precedence rule will be: CLI flag overrides binding setting; binding setting serves as the project default when the CLI flag is absent.

#### Index population contract

The existing DDL compiler (`src/bigquery_ontology/graph_ddl_compiler.py`) only emits schema SQL — `CREATE TABLE` / `CREATE PROPERTY GRAPH`. A concept index needs rows, which is a new kind of output. This subsection names who writes those rows and when.

**Who writes the rows**: the ontology compiler itself, in the same `gm compile` invocation that emits the DDL. The index is a deterministic function of both the ontology YAML and the binding — see "What's in the index" below. Treating it as a separate build step creates two sources of truth and a refresh-skew class of bugs that the SDK shouldn't inherit.

**What's in the index (scope relative to binding)**: `compile_concept_index(ontology, binding)` takes both inputs because the index respects the binding's subset semantics. Since a binding may legally realize only a subset of the declared ontology (`binding_models.py:147`), the compiler needs a rule for which entities participate in the index. The rule is:

- **All abstract entities** from the ontology, regardless of binding — they're informational-only and never bound by construction (#57's binding rejection rule). Their value is precisely in being available for runtime resolution even when the agent's BQ tables don't materialize them.
- **Only concrete entities that are bound** in this binding. Concrete + unbound entities are deliberately excluded from this deployment's runtime surface; including them would let a resolver return matches the agent then can't query. That's worse than a miss.

In short: **abstract: always. Concrete: iff bound.** This matches the SDK-level invariant from the adapter design ("every element in GraphSpec is bindable and has data") while preserving the taxonomy-browse value that abstract SKOS entities add at runtime.

Consequence: two different bindings over the same ontology produce different indexes. A narrow deployment binding only `Account` and `Customer` emits a smaller index than a wide deployment binding all 40 concrete entities, but both share the same abstract `skos_Banking` / `skos_FinancialProduct` / etc. nodes. Abstract relationships between abstract entities are always in scope; abstract relationships touching an unbound concrete entity are included (they're informational metadata, not runtime operations).

The `is_abstract` column in the index row lets resolvers filter at query time: a resolver that wants only runtime-materializable matches does `WHERE NOT is_abstract`; a resolver producing taxonomy-aware "did you mean" suggestions keeps both.

**Table naming contract**: because two bindings against the same ontology produce legitimately different indexes, a single global table name is unsafe — the second compile would silently overwrite the first. The output table name is therefore **a required parameter**, not a fixed convention:

```python
def compile_concept_index(
    ontology: Ontology,
    binding: Binding,
    *,
    output_table: str,   # required — fully-qualified `project.dataset.table`
) -> str: ...
```

CLI:

```bash
gm compile --emit-concept-index \
           --concept-index-table my-project.my_dataset.ontology_concept_index__retail
```

Both library and CLI error cleanly if the name is missing when `--emit-concept-index` is set. No silent global default. Users with a single binding per dataset pick any unique name they like (`ontology_concept_index` is fine); users with multiple bindings per dataset pick distinct names per binding (`ontology_concept_index__retail`, `ontology_concept_index__investment_bank`, etc.).

Why required rather than auto-derived:

- Bindings do carry a `binding: str` identifier (`binding_models.py:159`), but it isn't a safe or stable source for a BQ table name: it's an identity tag for the binding document, not a deployment-unique BQ-legal identifier. Using it would couple operational naming to a field authors rename for non-operational reasons, and would collide across environments (dev/stage/prod) that share the same binding identity.
- Hash-derived defaults like `ontology_concept_index__{sha1(binding)[:8]}` are collision-free but unreadable and change on every trivial binding edit — bad ergonomics for a table name that appears in user-written resolver SQL.
- Explicit naming forces the deployment-operator-level decision at compile time, where it belongs.

`OntologyRuntime` reads the index via the same name the caller passed at compile time — runtime construction takes a matching `concept_index_table: str` parameter (or reads it from configuration) so lookups target the right table. The name is not stored on the ontology or binding model; it's a runtime/deployment concern.

**Provenance and compatibility contract**: because the table name is caller-supplied and binding-scoped, nothing in the data columns alone would catch a mismatched wiring like `OntologyRuntime.from_models(ontology_A, binding_B, concept_index_table=table_C)` where `table_C` was actually compiled from a *different* `(ontology, binding)` pair. Plausible-but-wrong matches are worse than no matches — the agent gets confident answers against stale or unrelated data.

The compiler therefore emits a **sibling metadata table** named `{output_table}__meta`, written in the same `gm compile` invocation. One row per compile:

```sql
CREATE OR REPLACE TABLE `{output_table}__meta` AS
SELECT * FROM UNNEST([
  STRUCT(
    'retail' AS ontology_name,                         -- from Ontology.name
    'sha256:abc123...' AS ontology_fingerprint,        -- see "Fingerprint algorithm" below
    'sha256:def456...' AS binding_fingerprint,         -- same algorithm, over Binding model
    'my-project' AS target_project,                    -- from Binding.target.project
    'my_dataset' AS target_dataset,                    -- from Binding.target.dataset
    'gm-1.2.0' AS compiler_version,                    -- version of bigquery_ontology that compiled
    'a1b2c3d4e5f6' AS compile_id                       -- pair-consistency tag; deterministic from inputs
  )
]);
```

Sibling rather than embedded columns so the bulk of the index (the label/notation rows) stays lean.

**Fingerprint algorithm**: fingerprints are SHA-256 hashes over a **canonical serialization of the validated Ontology / Binding Pydantic models** — not over raw YAML text. Concretely:

1. Load YAML → validated model (existing `load_ontology()` / `load_binding()` path). Validation normalizes optional fields, default values, and type coercion.
2. Serialize the validated model to a canonical JSON form: keys sorted lexicographically at every nesting level, no extra whitespace, UTF-8, stable encoding of `None` / booleans / numbers, lists preserved in declaration order (list order is semantically meaningful in the ontology model — e.g., key columns).
3. Hash the resulting bytes with SHA-256, prefix with `sha256:`.

The same approach is used for both ontology and binding fingerprints, with one runtime difference: ontology fingerprinting covers every field of the Ontology model. Binding fingerprinting covers every field of the Binding model **except** ephemeral annotations (if any are introduced later) — the binding's identity for the purpose of "does this index correspond to this binding" is its declared structure, not its documentation metadata.

Why model-based and not YAML-text-based:

- Two semantically identical YAML documents with different formatting, comment placement, or emitter behavior must produce the same fingerprint. A strict verification gate that rejects non-semantic edits would be a constant source of false positives and would push operators to disable verification — worse than no verification.
- Pydantic-validated models are already the canonical in-memory form the SDK works with (`src/bigquery_agent_analytics/runtime_spec.py:199` and adjacent). Hashing at that layer matches the layer where the rest of the SDK's determinism lives.
- The existing compile contract is already model-based (`compile_graph(ontology, binding) -> str` takes models, not YAML strings). Keeping fingerprint input at the same layer maintains consistency across compile output and runtime verification.

Two bindings produced from the same source YAML by different emitters (e.g., one with trailing newlines, one without) fingerprint identically. Two bindings that disagree on any declared field — entity names, target dataset, property types — fingerprint differently and correctly fail strict verification.

Canonicalization rules in brief (formal spec in the implementation):

- Keys sorted at every nesting level (stable across Python dict iteration).
- Model fields serialized via `Pydantic.model_dump(mode="json", by_alias=False, exclude_none=False)` so defaults materialize consistently.
- Enum values serialized as their canonical string form, not member name.
- `None` / missing-but-defaulted fields serialized as explicit `null` to distinguish "absent" from "defaulted."
- List order preserved; no reordering of entity/relationship/property lists (order is semantically load-bearing).
- Output encoded as UTF-8 JSON with `separators=(",", ":")` (no extra whitespace).

`OntologyRuntime` runtime verification:

- At construction, `OntologyRuntime.load(...)` / `.from_models(...)` computes the same fingerprints on the loaded Ontology and Binding models.
- On first access to the concept index (lazy — construction doesn't hit BQ), the runtime reads the `__meta` sibling and compares fingerprints.
- **Mismatch** raises `ConceptIndexMismatchError` with a clear message naming the expected vs actual fingerprints and the table name involved. The runtime refuses to return matches from an index that doesn't correspond to the loaded models.
- **Missing `__meta` sibling** (e.g., a manually-created index or one compiled with an older toolchain) raises a distinct `ConceptIndexProvenanceMissing` — caller can explicitly opt out with `OntologyRuntime(..., verify_concept_index="off")` for read-only dashboards or interactive exploration.
- **Verification re-checks on a configurable TTL**, not once-per-lifetime. See "Long-lived runtime verification" below.

**Long-lived runtime verification (strict is strict for the whole lifetime, not just the first call).** A naive "verify once then cache forever" contract would let a long-lived service sail past an index refresh that swapped in a different `(ontology, binding)` pair — returning matches against the new index while still believing it was verified. That defeats the "plausible-but-wrong matches are worse than no matches" argument behind the strict default.

The contract:

- After the first successful verification, `OntologyRuntime` caches the expected `compile_id`, `ontology_fingerprint`, and `binding_fingerprint` on the instance.
- On each resolve / validate call, the runtime checks whether the cached verification is still fresh under a configurable TTL (`verify_ttl_seconds`, default 60). If the cache is fresh, the call proceeds without a BQ round-trip.
- If the cache is stale, the runtime re-runs **the full pair-consistency check plus a full-fingerprint freshness check**, not just a single-table sentinel. Concretely:
  1. `SELECT DISTINCT compile_id FROM {output_table} LIMIT 2` returns exactly one value.
  2. `SELECT compile_id, ontology_fingerprint, binding_fingerprint FROM {output_table}__meta LIMIT 1` — read compile_id *and* the full fingerprints.
  3. Verify: `main.compile_id == meta.compile_id` (pair consistency).
  4. Verify: `meta.compile_id == cached.compile_id` AND `meta.ontology_fingerprint == cached.ontology_fingerprint` AND `meta.binding_fingerprint == cached.binding_fingerprint` (full-fingerprint freshness).
- Outcomes:
  - All checks hold → refresh the cache timestamp and proceed.
  - Pair consistent but any cached value differs from meta → raise `ConceptIndexRefreshed`. Service operator recreates `OntologyRuntime` with updated models; new instance's full fingerprint verification catches whether the new index matches or not.
  - Main and meta disagree (refresh in progress) → one-shot 2s retry, then raise `ConceptIndexInconsistentPair`. Same contract as first-load.

**Why the sentinel must read both tables, not just meta.** An earlier draft checked only `meta.compile_id`. That has a correctness hole: the inline refresh order is "main first, meta second," so during the swap window a reader could see the old meta compile_id (matches cache, accepted), then query the new main table, and serve data from the refreshed index under stale verification. Reading both tables on TTL re-check closes that window — main's compile_id is authoritative for "which compile does the data belong to," and the meta comparison catches inconsistent pairs.

**Why the freshness check compares full fingerprints, not just `compile_id`.** The `compile_id` column is a 12-hex-char truncation of `sha256(ontology_fingerprint || binding_fingerprint || compiler_version)` — 48 bits of entropy, chosen to keep the per-row `compile_id` column short (storage efficiency on a column that repeats across every data row). That's enough for first-pass pair consistency: two tables with different compile_ids definitely belong to different compiles, and the birthday bound on distinct compiles for a single `(output_table)` over realistic deployment lifetimes is comfortably below collision probability.

But "comfortably below" is not "zero," and a strict verification contract shouldn't rely on it. The meta row carries the **full** `ontology_fingerprint` and `binding_fingerprint` (SHA-256, 256 bits each) — storing those in a single-row meta table costs nothing. The TTL re-check therefore compares all three (`compile_id` + both full fingerprints) against the cache. A hypothetical 48-bit collision where a legitimately-different `(ontology, binding)` pair happens to share a 12-char prefix is caught because the full fingerprints won't match.

Pair consistency between the two tables still runs on the short `compile_id` — it only needs to detect "are these from the same compile or different compiles," and 48 bits is overkill for that single-dataset comparison. The strict freshness check runs on the full 256-bit fingerprints where the safety story demands it.

The three reads are still cheap. Main's `SELECT DISTINCT compile_id FROM {output_table} LIMIT 2` reads at most two rows from a clustered column; meta reads exactly one row (and always has, just with more columns than before). Per-TTL-window cost remains negligible even at default 60s.

Configuration surface on `OntologyRuntime` construction:

- `verify_ttl_seconds: int = 60` — default 60. Balance between correctness-staleness window and re-verification cost.
- `verify_ttl_seconds=0` — check on every call. Useful for low-QPS services where correctness matters more than cost.
- `verify_ttl_seconds=None` — snapshot-bound: verify once on first use, never again. Explicit opt-in for services that coordinate refresh out-of-band (e.g., rolling-restart on recompile). Matches the old "verify once" behavior for callers who want it.

Why TTL rather than check-every-call by default: the pair re-check is cheap but not free, and for high-QPS resolver workloads it adds up. A 60s staleness window matches typical service-refresh cadences while keeping per-call cost bounded at `O(1)` with no BQ hit in the common case.

**Pair-consistency contract (the two tables must agree on the same compile).** Because `{output_table}` and `{output_table}__meta` are written as two separate `CREATE OR REPLACE TABLE` statements, a reader interleaved with a refresh could otherwise observe:

- new meta + old data → strict verification would pass against stale data (plausible-but-wrong matches).
- new data + old meta → strict verification would raise an incorrect mismatch.

To make the pair coherent without requiring DDL-level transactions (which BigQuery doesn't offer for `CREATE OR REPLACE TABLE`), both tables carry a `compile_id` tag that is **derived deterministically from compile inputs** — not a per-run UUID or a timestamped value:

```
compile_id = sha256(ontology_fingerprint || binding_fingerprint || compiler_version)[:12]
```

(First 12 hex chars is enough to make accidental collisions vanishingly unlikely while keeping the column short.)

- **`compile_id STRING NOT NULL` column on the main table** — every row of `{output_table}` shares the same value.
- **`compile_id` field on the single `__meta` row** — same value.
- **Write order: main table first, meta second.** Readers never see "new meta promising data that doesn't exist yet."

Why deterministic rather than per-run:

- Preserves the byte-identical output contract on `compile_concept_index()` (see Compiler output contract below). Two compiles of the same ontology + binding + compiler version produce character-identical SQL.
- Pair consistency still works: interleaved compiles with *different* inputs produce different compile_ids and the runtime check catches the inconsistency. Interleaved compiles with *identical* inputs produce identical compile_ids and the data is also identical — the worst case is wasted work, not wrong data.
- Callers auditing compile output in code review can diff it against the previous compile and see only the changes caused by ontology/binding edits, not a new UUID every run.

**`compiled_at` is deliberately not in the emitted SQL.** An earlier draft included a `compiled_at TIMESTAMP` field in the meta row; that's been removed to preserve byte-identical output. Operators who want compile timestamp visibility can read it from `INFORMATION_SCHEMA.TABLES.creation_time` on the `__meta` table, which BigQuery maintains automatically. The tradeoff is deliberate: runtime correctness (deterministic compile output, reviewable diffs) over embedded operator metadata that BQ already provides.

Runtime pair-consistency check (first concept-index access):

1. `SELECT * FROM {output_table}__meta LIMIT 1` → get `expected_compile_id`, expected fingerprints.
2. `SELECT DISTINCT compile_id FROM {output_table} LIMIT 2` → verify exactly one compile_id is present and it equals `expected_compile_id`.
3. If `compile_id` mismatches or multiple distinct compile_ids are observed (which would indicate a broken compile), retry once after a short backoff (default 2 seconds) — handles the narrow interleaving window during normal refresh.
4. If the retry also fails, raise `ConceptIndexInconsistentPair` with both observed compile_ids. This is distinct from `ConceptIndexMismatchError` (which is a wiring/fingerprint error, not a timing one) so callers can handle them differently.
5. Once pair-consistency is established, fingerprint verification proceeds against the meta row.

The retry is deliberately one-shot and small: a legitimately long refresh window indicates operator misbehavior (concurrent compiles against the same table) and should fail loudly.

Compatibility flag vocabulary:

- `verify_concept_index="strict"` (default) — fingerprint mismatch, missing meta, or persistent pair inconsistency all raise.
- `verify_concept_index="missing_ok"` — fingerprint mismatch and pair inconsistency raise, missing meta warns and proceeds.
- `verify_concept_index="off"` — no verification, purely caller-managed. Intended for explicit "I know what I'm doing" paths.

Rejected alternatives for pair consistency:

- **Transactional multi-statement `BEGIN TRANSACTION; ... COMMIT;`**. BigQuery's transaction support doesn't cover `CREATE OR REPLACE TABLE` — DDL is generally non-transactional. Not a viable primitive.
- **Shadow-version both tables + atomic pointer-table swap**. Three tables (main, meta, pointer) adds significant operational complexity for a narrow window. The compile_id approach gets the same correctness property with one extra column.
- **BQ table `OPTIONS(description=...)` tagging**. Requires INFORMATION_SCHEMA lookups, has its own freshness semantics, more tooling surface. compile_id in a data column is simpler to query from plain SQL.

Rejected alternatives for the provenance storage shape:

- **Embed provenance as repeated columns on every index row**. Wastes storage proportional to the number of rows; makes diff-based review noisier; still requires runtime verification logic. Sibling table is strictly better. (The `compile_id` column is a deliberate exception — it's a single short fixed-length tag needed for pair consistency, not full provenance.)
- **Encode provenance in BQ table `OPTIONS(description=...)`**. BQ-native and elegant, but INFORMATION_SCHEMA queries have their own cost and tooling constraints, and the sibling table approach is easier to inspect from plain SQL (`SELECT * FROM concept_index__meta`).
- **Caller-managed in v1 with no verification**. Considered and rejected — shipping a primitive that silently produces wrong results under a plausible operator mistake is a feature bug the SDK shouldn't ship with. v1 ships strict verification on by default, with documented escape hatches.

**How the rows reach BQ (atomic-swap semantics for runtime readers)**: the compiler emits a single `CREATE OR REPLACE TABLE ... AS SELECT ...` statement. In BigQuery this is atomic — concurrent readers see either the previous table's rows or the new table's rows, never an empty intermediate state. This is the critical difference from a `DELETE + INSERT` pair, which would expose a window where the index is queryable but empty. For a runtime lookup primitive, that window is a correctness hazard, not just a performance one.

```sql
-- gated on --emit-concept-index, name passed via --concept-index-table
-- write order: main table first, __meta second. compile_id ties the pair.
CREATE OR REPLACE TABLE `{output_table}` AS
SELECT * FROM UNNEST([
  STRUCT('DMA' AS entity_name, 'DMA' AS label, 'name' AS label_kind,
         '807' AS notation, 'NielsenDMA' AS scheme,
         CAST(NULL AS STRING) AS language,
         FALSE AS is_abstract,
         'a1b2c3d4e5f6' AS compile_id),
  STRUCT('DMA', 'Designated Market Area', 'synonym',
         '807', 'NielsenDMA', 'en', FALSE, 'a1b2c3d4e5f6'),
  STRUCT('DMA', 'Marché de diffusion désigné', 'pref',
         '807', 'NielsenDMA', 'fr', FALSE, 'a1b2c3d4e5f6'),
  -- first-class notation row: label holds the notation value, label_kind='notation'
  STRUCT('DMA', '807', 'notation',
         '807', 'NielsenDMA', CAST(NULL AS STRING), FALSE, 'a1b2c3d4e5f6'),
  -- multi-scheme example: same entity appearing in two schemes
  STRUCT('BayAreaMetro', 'Bay Area Metro', 'pref',
         CAST(NULL AS STRING), 'NielsenDMA', 'en', FALSE, 'a1b2c3d4e5f6'),
  STRUCT('BayAreaMetro', 'Bay Area Metro', 'pref',
         CAST(NULL AS STRING), 'CensusMSA', 'en', FALSE, 'a1b2c3d4e5f6'),
  -- abstract SKOS concept: informational, no scheme membership
  STRUCT('skos_Banking', 'Banking', 'pref',
         CAST(NULL AS STRING), CAST(NULL AS STRING), 'en', TRUE, 'a1b2c3d4e5f6'),
  ...
]);
```

The separate `CREATE TABLE` scaffold is not emitted — `CREATE OR REPLACE TABLE` creates the table on first run and atomically replaces it on every subsequent run. This collapses "ensure table exists" and "populate rows" into one statement, eliminating the empty-table-existing intermediate state entirely.

For ontologies with tens of thousands of concepts (Yahoo's YAMO example: 274 SKOS concepts × multiple labels ≈ ~1K-10K rows), inline `UNNEST(ARRAY<STRUCT<...>>)` is well within BigQuery's query-text limits. For ontologies above **~50K rows**, the compiler emits a shadow-table swap pattern for **both tables in the pair** — the pair-consistency contract applies to the shadow path too:

```sql
-- both tables get a shadow; suffix is "_shadow" on each production name
CREATE OR REPLACE TABLE `{output_table}_shadow` (...);
INSERT INTO `{output_table}_shadow` VALUES (...);  -- batched, includes compile_id column
CREATE OR REPLACE TABLE `{output_table}__meta_shadow` AS
  SELECT * FROM UNNEST([STRUCT(... 'a1b2c3d4e5f6' AS compile_id)]);

-- swap order: data first, then meta (matches the inline-path write order)
DROP TABLE IF EXISTS `{output_table}`;
ALTER TABLE `{output_table}_shadow` RENAME TO <short name from output_table>;
DROP TABLE IF EXISTS `{output_table}__meta`;
ALTER TABLE `{output_table}__meta_shadow` RENAME TO <short name from output_table>__meta;
```

Two distinct non-atomicity windows exist on this path:

1. **Table-existence window**: between `DROP` and `RENAME` on each table, that table name does not resolve. Readers get BigQuery's "table not found" error, which they must tolerate as transient.
2. **Pair-inconsistency window**: between "main renamed" and "meta renamed", the main table carries the new `compile_id` while the meta row still carries the old one. Readers in this window see `compile_id` disagreement → `ConceptIndexInconsistentPair` on the pair-consistency check.

The pair-inconsistency window on the shadow path can exceed the inline path's one-shot 2-second retry budget, because large-ontology rename operations take longer than small-ontology `CREATE OR REPLACE TABLE` statements. This means strict verification **will** raise during a legitimate shadow-path refresh if it happens to sample during the swap. That's by-design, not a defect: strict verification correctly rejects an inconsistent pair even when the inconsistency is transient. The alternative (silently serving old data against a new meta, or vice versa) is the failure mode strict verification exists to prevent.

Operational contract for the shadow path:

- Treat shadow-path refreshes as **offline/admin operations**. Pause reader traffic (or accept `ConceptIndexInconsistentPair` exceptions) during `gm compile` runs that hit the shadow path.
- If traffic cannot be paused, the caller has two options, neither of which involves `missing_ok` (which per the verification-mode contract above still raises on pair inconsistency — transient or not):
  - Increase `verify_ttl_seconds` so the pair re-check samples less frequently. Reduces the probability of landing inside a swap window at the cost of a longer staleness tolerance.
  - Catch `ConceptIndexInconsistentPair` at the application layer and retry the call after a short delay. Cleanest at the service-mesh level where transient 5xx handling already exists.
- For services where neither is acceptable: bind the main + meta pair under a higher-level indirection (a separate `{output_table}__current` pointer table that callers resolve through). Not shipped in v1 — out of scope as a third level of indirection, tracked as follow-up work if real users hit this constraint.

This limitation is specific to the shadow path. The inline-UNNEST path (the default for ontologies under 50K rows, covering the motivating use cases including Yahoo's YAMO) remains fully atomic per-statement and doesn't exhibit either window.

**When refresh happens**: on every `gm compile` run with `--emit-concept-index`. No incremental build. If the user edits ontology YAML, they re-run compile, same as any other DDL change. This matches the compile model users already have for schema changes and avoids adding a second refresh command.

**Alternatives considered and rejected**:

- **Separate `gm build-concept-index` step.** Adds a command users have to remember and introduces drift between "DDL is up to date" and "index is up to date." Two invocations for one conceptual change.
- **Runtime lazy-build.** Rebuild the index in memory on `OntologyRuntime.load()` and optionally push to BQ. Surfaces inconsistent state when multiple agent instances load simultaneously and makes the "query the index in SQL" path unreliable until someone has pushed.
- **Streaming incremental updates.** Possible future work for ontologies with externally-sourced concept rolls. Out of scope for the initial primitive.

**Failure modes**:

- Inline `CREATE OR REPLACE TABLE` path: if the statement fails (quota, permissions, query text too large), `gm compile` errors with a message naming the concept index. Because `CREATE OR REPLACE` is atomic, there is no half-written state — the previous table (if any) remains queryable, or no table exists at all. The user can re-run compile without cleanup.
- Shadow-table swap path: failure mid-swap can leave the pair in an inconsistent state (main renamed but meta not, or either table dropped but not renamed). `gm compile` retry detects the orphaned `_shadow` table(s) and resumes from the swap step. Runtime readers during the orphaned window get either "table not found" or `ConceptIndexInconsistentPair` — both expected transient conditions on the shadow path, tolerated via the operational contract above (pause traffic during refresh, or accept transient failures).

### 3. `EntityResolver` Protocol + two reference implementations

Interface-only in core:

```python
from typing import Protocol
from dataclasses import dataclass

@dataclass
class Candidate:
    entity_name: str         # unique per Candidate in ResolveResult.candidates
    label: str               # the winning label that produced this match
    label_kind: str          # 'name' | 'pref' | 'alt' | 'hidden' | 'synonym' | 'notation'
    scheme: Optional[str]    # scheme the winning match came through (None = entity-scoped or no scheme)
    confidence: float
    reason: str              # 'exact' | 'notation' | 'synonym' | 'fuzzy' | 'none'

@dataclass
class ResolveResult:
    match: Optional[str]           # resolved entity_name (None = no match)
    confidence: float              # 0.0 - 1.0; 1.0 = exact
    candidates: list[Candidate]    # top-k "did you mean" suggestions, one per entity
    reason: str                    # why `match` resolved (or 'none')

class EntityResolver(Protocol):
    def resolve(
        self,
        value: str,
        *,
        scheme: str | None = None,
        entity: str | None = None,
        limit: int = 5,
    ) -> ResolveResult: ...
```

`scheme` and `entity` are mutually exclusive — see "Scope semantics" in the Library API impact section below. Exactly one must be provided.

**Candidate dedup contract (important once the index is denormalized per `(entity_name, label, label_kind, language, scheme)`):**

- **`ResolveResult.candidates` contains at most one entry per `entity_name`.** The denormalized index naturally produces multiple matching rows for the same entity (same entity, different label or different scheme). For an agent-facing "did you mean" list, duplicates are noise — the agent wants a list of distinct concepts, each annotated with the best evidence for why it matched.
- **`limit=N` means N distinct entities**, not N raw rows. Resolvers do the dedup before truncating.
- **Winning-label rule** when the same entity matches through multiple rows: pick the row with the highest confidence under the resolver's matching rule. Ties broken by `label_kind` priority, in this order: `name` > `pref` > `alt` > `hidden` > `synonym` > `notation`. Further ties broken by lexicographic `label` order for determinism.
- **`Candidate.label` / `Candidate.label_kind` / `Candidate.scheme` / `Candidate.reason`** reflect the *winning* row. The other rows that also matched are discarded — callers wanting the full provenance use the concept index directly via SQL.
- **`reason` values** are resolver-defined but drawn from a shared vocabulary so callers can branch on them without ambiguity: `exact` (name match), `notation` (notation match), `synonym` (any label other than name), `fuzzy` (non-exact match produced by a fuzzy resolver), `none` (no match found — only valid on the `ResolveResult.reason`, not on `Candidate.reason`).

SDK ships two references in core:

- **`ExactMatchResolver`** — O(1) lookup against name + `skos:notation`. Confidence is 1.0 or 0.0. Good for notation-heavy inputs (Nielsen DMA codes, Google Ads Criteria IDs).
- **`SynonymResolver`** — extends `ExactMatchResolver` by also matching against `prefLabel` / `altLabel` / `hiddenLabel` / `synonyms`. Still exact on each label; still confidence 1.0 or 0.0.

Everything above exact-match — token-set equality, Jaccard, Levenshtein, phonetic, weighted ensembles — lives in user code or `contrib/` packages. Verticals pick (or write) a resolver tuned for their domain.

### 4. `validate_against_ontology` — small validation helper

Not resolution — just pass/fail against the declared ontology. Return shape is bounded by design so it stays useful on large concept schemes (IAB Taxonomy, Nielsen DMAs, SNOMED excerpts) where the candidate universe is hundreds to tens of thousands of entries:

```python
rt.validate(
    {"format_ids": ["display_static", "display_banner"]},
    scheme="AdFormat",       # see "Scope semantics" below
    sample_limit=10,         # default — cap on known_values_sample
)
# → ValidationResult(
#     valid=["display_banner"],
#     invalid=["display_static"],
#     known_value_count=47,
#     known_values_sample=["display_banner", "display_native", ...],  # up to sample_limit
#     candidates=None,   # populated only when composed with a resolver
# )
```

Agents combine `validate_against_ontology` with a resolver to produce "did you mean." The SDK doesn't match; it only knows what exists.

Design notes on the return shape:

- **`known_value_count`** is always the full count. Tells the caller whether the sample is representative.
- **`known_values_sample`** is capped at `sample_limit` (default 10). Enough for a "did you mean" hint without bloating every validation miss on a 10K-concept scheme. Callers who genuinely need the full set use `rt.in_scheme(...)` or `rt.entities()` — that's what those accessors are for.
- **`candidates`** stays `None` unless the caller composes validation with a resolver. Keeps `validate` pure set-membership; keeps ranking logic in resolver-land. No double-duty.
- Sample order is not specified by the contract — callers should not rely on alphabetical or any other ordering. If deterministic ordering matters for a specific use, pass a sorted `known_values_sample` through a resolver that ranks.

## Library API impact

This section pins down the parts of the proposal that touch existing public APIs, so they're clear before implementation starts.

### Compiler output contract

The existing `bigquery_ontology.compile_graph(ontology, binding) -> str` is documented to be deterministic — "same inputs → byte-identical text." That contract is preserved. Concept-index emission does **not** modify `compile_graph()`.

Instead, a new sibling function ships alongside:

```python
# existing, unchanged
def compile_graph(ontology: Ontology, binding: Binding) -> str: ...

# new, additive
def compile_concept_index(
    ontology: Ontology,
    binding: Binding,
    *,
    output_table: str,   # required — see "Table naming contract" above
) -> str: ...
```

Both return deterministic strings. `compile_concept_index()` extends the `compile_graph()` byte-identical contract in the same spirit: **same inputs → byte-identical DML text, including row order**.

"Same inputs" for `compile_concept_index()` = `(ontology, binding, output_table, compiler_version)`. Everything in the emitted SQL is derived from those four values:

- `compile_id` is `sha256(ontology_fingerprint || binding_fingerprint || compiler_version)[:12]` — deterministic.
- No per-run timestamps, UUIDs, or process identifiers appear in the emitted SQL. Compile timestamps are recoverable from `INFORMATION_SCHEMA.TABLES.creation_time` on the emitted tables.
- Row order is determined by the sort key below, applied before SQL generation.

Rows are sorted by a stable key before SQL generation:

```
(scheme, entity_name, label_kind, language, label, notation, is_abstract)
```

with NULLs ordered last consistently per column. `is_abstract` is last because it's determined by `entity_name` — included only for defensive stability if the invariant ever loosens. This sort order guarantees that two invocations of `compile_concept_index()` on the same ontology + binding emit character-identical SQL — critical for diffing compile output in code review, caching compiled artifacts, and verifying that ontology edits produced only the expected row changes.

Library callers who want only DDL keep calling `compile_graph()` as today; callers who want the concept index call `compile_concept_index()` for a separate DML script. The CLI layer composes the two:

```python
# CLI behavior for `gm compile --emit-concept-index`
sql_parts = [compile_graph(ont, binding)]
if args.emit_concept_index:
    sql_parts.append(
        compile_concept_index(ont, binding, output_table=args.concept_index_table)
    )
print("\n\n-- concept index --\n\n".join(sql_parts))
```

Why a sibling and not a composed option:

- Preserves the byte-identical contract on `compile_graph()`.
- No breaking change to existing callers.
- Each function has one job; easier to test, version, and reason about.
- CLI callers with shell-orchestrated pipelines can write DDL and DML to separate files if they want — composition stays a caller concern.

Rejected alternatives:

- `compile_graph(..., emit_concept_index=True)` returning concatenated DDL+DML — breaks the byte-identical contract for one config mode and creates a function whose return value depends on a flag.
- Return an object (`CompileResult(ddl=..., dml=...)`) — breaks every existing caller of `compile_graph()` that treats the return as a string.
- CLI-only (no library-layer API for the index DML) — forces library users to reimplement concept-index generation themselves, defeats the point of the primitive.

### `OntologyRuntime` construction: paths and models

The example in section 1 shows `OntologyRuntime.load(ontology_path=..., binding_path=...)`. But existing SDK code already carries validated `Ontology` and `Binding` models around in memory (e.g., `src/bigquery_agent_analytics/runtime_spec.py:199` passes models directly). Forcing callers to reparse YAML or round-trip through disk would be a step backward.

Two classmethods cover both cases:

```python
class OntologyRuntime:
    @classmethod
    def load(
        cls,
        ontology_path: str | Path,
        binding_path: str | Path,
    ) -> "OntologyRuntime":
        """Load from YAML files on disk."""
        ...

    @classmethod
    def from_models(
        cls,
        ontology: Ontology,
        binding: Binding,
    ) -> "OntologyRuntime":
        """Wrap already-validated models."""
        ...
```

`load()` is the convenience path for one-off scripts and the CLI. `from_models()` is the integration path for the SDK's existing flows — `runtime_spec`, `ontology_orchestrator`, adapters downstream of `load_ontology()` can all wrap without touching disk again.

Internal implementation: `load()` calls `load_ontology()` + `load_binding()` then delegates to `from_models()`. Zero code duplication.

### Scope semantics: `scheme` vs `entity`

Resolvers and `validate()` need an explicit target set. Two mutually-exclusive named parameters, no polymorphism:

```python
# Scheme-scoped: resolve/validate against all members of a concept scheme.
# Most common case — this is what the motivating examples (AdFormat, DMA, IAB) want.
rt.validate({"dma": ["Nielsen 807"]}, scheme="NielsenDMA")
resolver.resolve("San Francisco-Oakland", scheme="NielsenDMA")

# Entity-scoped: resolve/validate against a single named entity. Identity check only.
# Rare — used when you want "is this exactly this one entity?" rather than
# "is this a member of a taxonomy?"
rt.validate({"customer_id": ["C-42"]}, entity="Customer")
```

Rules:

- Exactly one of `scheme` or `entity` must be provided. Passing both or neither is an error with a clear message.
- **`scheme=<name>`** resolves against the set `{e : e.in_scheme(name) or name == e.name and e.is_abstract_scheme_root}`. This is the motivating case. Works for both explicit SKOS concept schemes and abstract entities that act as taxonomy roots.
- **`entity=<name>`** resolves against the singleton set `{e : e.name == name}`. Identity check. Returns match iff the input exactly matches the entity's name, notation, or a declared label/synonym.
- **Narrower-closure** scoping (e.g., "all narrower-than some abstract node") is explicitly deferred. When the need surfaces, it'll come back as `scope=Scope.narrower_closure(name)` or similar, without changing the meaning of `scheme` and `entity`.

Why not polymorphic ("`entity=` means scheme-scoped if it's a scheme, entity-scoped otherwise"):

- Two implementers following the spec would return different answers for the same call, depending on their interpretation of the ontology's structure.
- Ontology authors who later change an entity from concrete to abstract-scheme-root would silently change the semantics of every `entity=` call targeting it.
- Callers would need ontology knowledge to predict what a given `entity=` call does — defeats the point of a stable API.

Explicit parameters keep the contract boring and predictable.

## Non-goals

- **Ship a general string-matching library.** BQ already has `EDIT_DISTANCE`, `SOUNDEX`, `JACCARD` UDFs. If the concept index is materialized, users get these for free. Don't wrap.
- **Ship the 5-layer resolver in core.** Token-set equality thresholds, Jaccard coefficients, Levenshtein cutoffs — all domain-tuned. Advertising's tuning for DMAs is not the right tuning for SNOMED or legal-entity names. The feedback author's resolver is valuable *as a reference for their domain* and belongs in `contrib/` or a separate package.
- **Promise a `<50ms` SLA.** Latency is a function of index size and resolver choice, both of which vary by user. The SDK can guarantee the primitive shapes; it can't guarantee the performance of every application that uses them.
- **Provide a concept-scheme browser UI.** Out of scope — this is an analytics SDK, not an ontology editor.
- **Take a position on "did you mean" phrasing.** The SDK returns structured candidates; the agent composes user-facing copy.

## How this lands on top of existing code

Using the current SDK's module boundaries:

| Piece | Belongs in | Notes |
|---|---|---|
| `OntologyRuntime` class with `load()` + `from_models()` classmethods | `bigquery_agent_analytics/ontology_runtime.py` (new) | Wraps `load_ontology` + `load_binding` from `bigquery_ontology`. Pure Python, no BQ calls. Both construction paths share one implementation. |
| `compile_graph()` (existing) | `bigquery_ontology/graph_ddl_compiler.py` | **Unchanged.** Preserves byte-identical contract. |
| `compile_concept_index()` (new sibling) | `bigquery_ontology/graph_ddl_compiler.py` (new function) | Separate deterministic DML emitter. CLI composes with `compile_graph()` when `--emit-concept-index` is set. |
| `EntityResolver` Protocol + references | `bigquery_agent_analytics/entity_resolver.py` (new) | Core SDK layer. Protocol + two implementations. Both accept `scheme=` or `entity=` (mutually exclusive). |
| `validate_against_ontology` | Method on `OntologyRuntime` | Same `scheme=` / `entity=` scope parameters. |
| Domain packs and layered resolvers | `bigquery_ontology/contrib/` or external packages | Advertising, healthcare, finance. Never in core. |

Changes to existing modules are limited but not zero. Most of the proposal is additive (new files, new functions, new classmethods). Two concrete edits to existing code are needed:

- **`src/bigquery_ontology/cli.py:299`** — the existing `compile` command gains `--emit-concept-index` and `--concept-index-table <name>` flags. When `--emit-concept-index` is set, the command composes the existing `compile_graph()` output with `compile_concept_index(..., output_table=...)`. Without the flag, the command's behavior is byte-identical to today.
- **`src/bigquery_ontology/graph_ddl_compiler.py`** — adds the new `compile_concept_index()` function in the same module. `compile_graph()` itself is not modified.

The runtime accessor (`OntologyRuntime`) reads the same Ontology/Binding models already loaded today — that path is purely additive in the SDK package.

## Ties to issue #57 (SKOS import)

This proposal depends on issue #57 landing first, because the concept index's value comes almost entirely from SKOS annotations (`skos:notation`, `skos:prefLabel`, `skos:altLabel`, `skos:broader`) being preserved through import. Without #57, the concept index is a thin wrapper over entity names and existing synonyms — useful but not transformative.

Specifically:

- `skos:notation` in annotations → `notation` column in concept index → L1 code match becomes trivial
- `skos:prefLabel` / `altLabel` / `hiddenLabel` → rows in concept index with `label_kind` discriminator → L2 lexical becomes trivial
- `skos_broader` abstract relationships → `rt.broader()` traversal → taxonomy-aware "did you mean a parent or sibling"
- Abstract entities with `skos_` prefix → `rt.in_scheme()` enumerates all concepts in a taxonomy → agent can present the scheme to the LLM as context

## Open questions — feedback wanted

1. **Is `OntologyRuntime` the right wrapper, or should the accessors live as methods on `Ontology` / `Binding` directly?** Pro-wrapper: keeps `bigquery_ontology` pure-data and the runtime layer in `bigquery_agent_analytics`. Pro-direct: fewer classes to learn. *Proposal leans wrapper — the accessor layer is SDK-runtime concern, not ontology-package concern.*

2. **Should the concept index be opt-in or opt-out?** Pro-opt-in: users who don't need it don't pay storage. Pro-opt-out: users discover the primitive because it just exists. *Proposal leans opt-in: no silent BQ table creation.*

3. **Should `OntologyRuntime` cache the concept index in memory for pure-Python access, or always go to BQ?** Pro-memory: fast, no BQ cost, works offline. Pro-BQ-only: always consistent with DDL, scales to ontologies with 100K+ concepts. *Proposal: pure-Python by default for ontologies under some size threshold; explicit BQ-backed resolver for large ones.*

4. **Does `EntityResolver` need an `async` variant?** Resolution against a BQ-backed index is I/O. *Proposal: ship sync; add async later if users ask.*

5. **Should the SDK ship a richer `FuzzyResolver` reference (just exact + prefix, not full 5-layer) so users have a middle option?** *Proposal: no — either exact or bring-your-own. Avoids the "SDK partially solves fuzzy matching" trap where the reference becomes everyone's default despite being domain-unaware.*

6. **Should the `Protocol` be `typing.Protocol` or an `ABC`?** Protocol allows duck typing; ABC forces inheritance. *Proposal: Protocol — matches modern typing conventions and doesn't force users to inherit.*

7. **Should `rt.validate()` also return a `nearest` field when values are invalid?** Would require calling a resolver inside validate, coupling the two. *Proposal: no — keep `validate` pure set-membership, let callers compose it with a resolver.*

8. **Concept index: do we need a per-row `score` or `priority` for when multiple labels map to the same entity?** Some verticals (IAB) prefer one label over another as the "canonical" display form. *Proposal: defer — `label_kind` (`name` vs `pref` vs `alt`) already lets callers prioritize. Add score if needed.*

9. **Is `contrib/` the right home for domain resolvers, or should they be separate packages?** Pro-contrib: easy discovery, versioned together. Pro-separate: community can ship without depending on SDK releases. *Proposal: contrib for reference implementations (advertising, healthcare); external packages for user-owned domains.*

10. **Should narrower-closure scoping ship in v1?** The current proposal settled on two explicit parameters — `scheme=` for concept-scheme membership and `entity=` for single-entity identity. A third mode (narrower-closure: "resolve against all entities narrower-than some abstract node") is deferred. Advertising taxonomies nest (IAB Tier 1 → Tier 2), and a caller may want to resolve against the subtree under a specific abstract node rather than a flat scheme. *Proposal: ship `scheme=` and `entity=` only in v1; add `scope=Scope.narrower_closure(name)` in v2 if real callers need it. For most cases, scheme membership plus `rt.narrower(entity)` traversal covers the need without a new API.*

---

Related:
- Issue #57 — SKOS import support (prerequisite for the concept index to carry notation/labels/taxonomy edges)
- Feedback gist: https://gist.github.com/haiyuan-eng-google/54c3d3366b3d75b659561ef4e24e9374 (original context from production user)

Please comment if you have opinions, real-world resolver implementations you'd like to see supported, or disagreements about where the SDK/agent boundary should sit.



---

## Final design decisions — detailed

After twelve rounds of review the design is frozen. In-repo implementation plan at [`docs/implementation_plan_concept_index_runtime.md`](../blob/main/docs/implementation_plan_concept_index_runtime.md). This section is the design-level recap, split by package.

### Ontology package (`bigquery_ontology`) — changes

**New files (all under `src/bigquery_ontology/`):**

- **`_fingerprint.py`** *(internal — underscore prefix)*. Single source of truth for model fingerprinting and the `compile_id` pair-consistency tag. Two functions: `fingerprint_model(model) -> "sha256:<64 hex>"` and `compile_id(ont_fp, bnd_fp, compiler_version) -> "<12 hex>"`. Contract pinned in docstring (W1): `model_dump(mode="json", by_alias=False, exclude_none=False)` → `json.dumps(sort_keys=True, separators=(",",":"), ensure_ascii=False)` → SHA-256. Not re-exported; both packages import via `from bigquery_ontology._fingerprint import ...`. Landed in PR #71.

- **`concept_index.py`** *(module importable but not re-exported in v1)*. Row builder. Function: `build_rows(ontology, binding) -> list[ConceptIndexRow]`. Applies the "abstract always included, concrete iff bound" rule. Emits one row per `(entity_name, label, label_kind, language, scheme)` membership tuple, plus one notation row per `skos:notation`. Sorts deterministically by `(scheme, entity_name, label_kind, language, label, notation, is_abstract)` with NULLs last. Package-level re-export may be added later; kept out of the root for v1 to avoid growing semver surface ahead of need.

**Modified files:**

- **`graph_ddl_compiler.py`** — gains a new public function `compile_concept_index(ontology, binding, *, output_table) -> str` alongside the existing `compile_graph()`. `compile_graph()` contract is preserved byte-identically; the existing function body is not touched. `compile_concept_index()` emits two statements by default: `CREATE OR REPLACE TABLE {output_table} AS SELECT * FROM UNNEST([STRUCT(...), ...])` for the main index and a matching `CREATE OR REPLACE TABLE {output_table}__meta AS SELECT * FROM UNNEST([STRUCT(...)])` for the meta sibling. Shadow-swap fallback activates at > 50K rows. Every row in both tables carries the same `compile_id`; the meta row additionally carries full `ontology_fingerprint` and `binding_fingerprint`.

- **`cli.py:299`** (the `compile` command) — gains two new flags: `--emit-concept-index` (boolean) and `--concept-index-table <fqn>` (required when `--emit-concept-index` is set — no silent global default). When both flags are absent, command output is byte-identical to today. No other CLI flags change.

- **`__init__.py`** — adds `from .graph_ddl_compiler import compile_concept_index` so the new public function is importable as `from bigquery_ontology import compile_concept_index`, matching the existing `compile_graph` re-export. No other exports change. `_fingerprint` stays unexported.

**Unchanged:**

- `ontology_models.py` — model changes for `abstract: bool = False` landed in #62 (issue #57). No further model changes for concept-index work.
- `binding_models.py` — no changes in v1. A binding-side `index:` opt-in block was considered and deferred to v2; precedence rule documented in the plan when/if it lands.
- All other `bigquery_ontology/*.py` — untouched.

**New CLI surface summary:**

```bash
gm compile \
  --ontology ontology.yaml \
  --binding binding.yaml \
  --emit-concept-index \
  --concept-index-table my-proj.my_ds.ontology_concept_index
```

Produces: the existing `CREATE PROPERTY GRAPH` DDL + two concept-index tables (`ontology_concept_index` and `ontology_concept_index__meta`). Re-running the same command produces byte-identical SQL — the `compile_id` is deterministic from inputs (no timestamps, no UUIDs).

**Version bump:** **Minor** — new public function (`compile_concept_index`) and new CLI flags. Existing API byte-identical.

---

### SDK package (`bigquery_agent_analytics`) — changes

**New files (all under `src/bigquery_agent_analytics/`):**

- **`ontology_runtime.py`**. Hosts `OntologyRuntime` (the read accessor wrapper), the verification machinery (first-call + TTL re-check), and all four exception classes (`ConceptIndexMismatchError`, `ConceptIndexProvenanceMissing`, `ConceptIndexInconsistentPair`, `ConceptIndexRefreshed`). `OntologyRuntime` exposes two constructors — `.load(ontology_path, binding_path, ...)` and `.from_models(ontology, binding, ...)` — both routing through one shared implementation.

- **`entity_resolver.py`**. Hosts the `EntityResolver` `Protocol` (not `ABC` — duck-typed for modern typing), the `Candidate` and `ResolveResult` dataclasses, and two reference implementations: `ExactMatchResolver` (name + notation) and `SynonymResolver` (extends exact with label-based match). Candidate dedup: one candidate per entity, winning-label priority (`name > pref > alt > hidden > synonym > notation`, lexicographic tiebreaker), `limit=N` returns `N` distinct entities.

**Modified files:**

- **`__init__.py`** — adds to the existing try/except re-export block (same pattern as `Client`, `CodeEvaluator`, etc.):
  - `OntologyRuntime` — from `.ontology_runtime`
  - `EntityResolver`, `ExactMatchResolver`, `SynonymResolver`, `Candidate`, `ResolveResult` — from `.entity_resolver`
  - `ConceptIndexMismatchError`, `ConceptIndexProvenanceMissing`, `ConceptIndexInconsistentPair`, `ConceptIndexRefreshed` — from `.ontology_runtime`

**Unchanged:**

- All other SDK modules. The runtime accessor layer is strictly additive.

**Read accessors on `OntologyRuntime` (pure-Python, no BQ round-trip):**

| Method | Returns | Notes |
|---|---|---|
| `entities()` | `list[str]` | Names of concrete + abstract entities |
| `entity(name)` | `Entity` | With annotations, synonyms, abstract flag |
| `synonyms(name)` | `list[str]` | Pref + alt + hidden labels |
| `annotation(name, key)` | `str \| None` | E.g. `skos:notation`, `skos:definition` |
| `in_scheme(scheme_name)` | `list[Entity]` | Concepts in a `skos:ConceptScheme` |
| `broader(name)` | `list[Entity]` | `skos:broader` traversal |
| `narrower(name)` | `list[Entity]` | Inverse |
| `related(name)` | `list[Entity]` | `skos:related` traversal |

Identity rules: entities are name-addressed (singular lookup); relationships are **traversal-first, not name-addressed** — a single `skos_broader` can repeat across endpoint pairs after #62's relaxed uniqueness, so a hypothetical `rt.relationship(name)` would have no single answer.

**Validation accessor:**

| Method | Returns | Notes |
|---|---|---|
| `validate_against_ontology(values, *, scheme=None, entity=None, sample_limit=20)` | `ValidationResult` | `scheme=` and `entity=` are mutually exclusive; neither or both = `ValueError`. Bounded output via `known_value_count` + `known_values_sample`. `candidates` is `None` unless a resolver is explicitly composed by the caller. |

**Verification configuration (on construction):**

| Parameter | Default | Notes |
|---|---|---|
| `verify_concept_index` | `"strict"` | `"strict"` (raises on any provenance issue), `"missing_ok"` (tolerates missing meta), `"off"` (disables verification entirely — for read-only dashboards) |
| `verify_ttl_seconds` | `60` | `0` = every-call check; `None` = snapshot-bound (verify once, never re-check) |

**Verification lifecycle:**

1. **Construction** — `OntologyRuntime.load(...)` / `.from_models(...)` computes local `ontology_fingerprint` and `binding_fingerprint` (both full SHA-256). No BQ round-trip.
2. **First concept-index access** (lazy — not on construction) — reads the `__meta` sibling, compares fingerprints. Mismatch → `ConceptIndexMismatchError`. Missing meta → `ConceptIndexProvenanceMissing`.
3. **TTL re-check** (each resolve / validate call past the TTL window) — runs two queries:
   - `SELECT DISTINCT compile_id FROM {output_table} LIMIT 2` — asserts exactly one value (pair consistency).
   - `SELECT compile_id, ontology_fingerprint, binding_fingerprint FROM {output_table}__meta LIMIT 1` — full-fingerprint freshness.
   - Main/meta disagreement → 2s one-shot retry → persistent disagreement = `ConceptIndexInconsistentPair`.
   - Fingerprints drift from cache = `ConceptIndexRefreshed`.

The TTL re-check reading **both tables with full fingerprints** is a W2 watchpoint in the plan — a single-table sentinel or short-compile-id-only comparison reintroduces either the meta/main race or the 48-bit collision hole.

**Resolver surface:**

```python
from bigquery_agent_analytics import (
    OntologyRuntime,
    ExactMatchResolver,
    SynonymResolver,
)

rt = OntologyRuntime.load(
    ontology_path="ontology.yaml",
    binding_path="binding.yaml",
    concept_index_table="my-proj.my_ds.ontology_concept_index",
    verify_concept_index="strict",    # default
    verify_ttl_seconds=60,            # default
)

resolver = SynonymResolver(runtime=rt)
result = resolver.resolve(
    input_value="Consumer Banking",
    scheme="BankingTaxonomy",         # scheme= XOR entity=
    limit=5,
)
# result.candidates: list[Candidate] with entity_name, matched_label, label_kind, scheme
```

Both reference resolvers query the concept index via BigQuery; `ExactMatchResolver` uses `WHERE label = @input` and `SynonymResolver` composes with `label_kind` preference ordering.

**Version bump:** **Minor** — new public API surface (`OntologyRuntime`, four resolver-related classes, four exception types). No existing behavior changes.

**Existing user code:** No deprecation. Users with their own resolution layers continue unaffected until they opt into the SDK primitive.

---

### Sequencing (from the plan)

PR stack, in merge order:

1. **A1** — `_fingerprint.py` — **#71, open**.
2. **A2** — `concept_index.py` row builder.
3. **A3-A5** — `compile_concept_index` + inline-UNNEST SQL emission.
4. **A7** — CLI flags.
5. **A8** (partial) — `docs/ontology/concept-index.md`.
6. **B1-B7** — SDK read accessors + resolver Protocol + references (verification **off** as intermediate default).
7. **C1-C6** — verification layer (strict default on) + shadow-swap full impl + all four exception types.
8. **Phase 4** — integration tests, `examples/concept_index_quickstart.py`, full docs.
9. **Phase 5** — `contrib/` scaffolding (Yahoo advertising resolver when contributed).

Each PR leaves `main` shippable.



Piece	Belongs in	Notes
`OntologyRuntime` class with `load()` + `from_models()` classmethods	`bigquery_agent_analytics/ontology_runtime.py` (new)	Wraps `load_ontology` + `load_binding` from `bigquery_ontology`. Pure Python, no BQ calls. Both construction paths share one implementation.
`compile_graph()` (existing)	`bigquery_ontology/graph_ddl_compiler.py`	Unchanged. Preserves byte-identical contract.
`compile_concept_index()` (new sibling)	`bigquery_ontology/graph_ddl_compiler.py` (new function)	Separate deterministic DML emitter. CLI composes with `compile_graph()` when `--emit-concept-index` is set.
`EntityResolver` Protocol + references	`bigquery_agent_analytics/entity_resolver.py` (new)	Core SDK layer. Protocol + two implementations. Both accept `scheme=` or `entity=` (mutually exclusive).
`validate_against_ontology`	Method on `OntologyRuntime`	Same `scheme=` / `entity=` scope parameters.
Domain packs and layered resolvers	`bigquery_ontology/contrib/` or external packages	Advertising, healthcare, finance. Never in core.

Method	Returns	Notes
`entities()`	`list[str]`	Names of concrete + abstract entities
`entity(name)`	`Entity`	With annotations, synonyms, abstract flag
`synonyms(name)`	`list[str]`	Pref + alt + hidden labels
`annotation(name, key)`	`str \| None`	E.g. `skos:notation`, `skos:definition`
`in_scheme(scheme_name)`	`list[Entity]`	Concepts in a `skos:ConceptScheme`
`broader(name)`	`list[Entity]`	`skos:broader` traversal
`narrower(name)`	`list[Entity]`	Inverse
`related(name)`	`list[Entity]`	`skos:related` traversal

Parameter	Default	Notes
`verify_concept_index`	`"strict"`	`"strict"` (raises on any provenance issue), `"missing_ok"` (tolerates missing meta), `"off"` (disables verification entirely — for read-only dashboards)
`verify_ttl_seconds`	`60`	`0` = every-call check; `None` = snapshot-bound (verify once, never re-check)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Runtime entity resolution primitives — OntologyRuntime, concept index, EntityResolver protocol (design proposal — feedback wanted) #58

Goal

Guiding principle: SDK provides, agent decides

Current gaps

Proposed primitives

1. `OntologyRuntime` — read accessor over loaded ontology + binding

2. Concept index materialization (opt-in)

Index population contract

3. `EntityResolver` Protocol + two reference implementations

4. `validate_against_ontology` — small validation helper

Library API impact

Compiler output contract

`OntologyRuntime` construction: paths and models

Scope semantics: `scheme` vs `entity`

Non-goals

How this lands on top of existing code

Ties to issue #57 (SKOS import)

Open questions — feedback wanted

Final design decisions — detailed

Ontology package (`bigquery_ontology`) — changes

SDK package (`bigquery_agent_analytics`) — changes

Sequencing (from the plan)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feat: Runtime entity resolution primitives — OntologyRuntime, concept index, EntityResolver protocol (design proposal — feedback wanted) #58

Description

Goal

Guiding principle: SDK provides, agent decides

Current gaps

Proposed primitives

1. OntologyRuntime — read accessor over loaded ontology + binding

2. Concept index materialization (opt-in)

Index population contract

3. EntityResolver Protocol + two reference implementations

4. validate_against_ontology — small validation helper

Library API impact

Compiler output contract

OntologyRuntime construction: paths and models

Scope semantics: scheme vs entity

Non-goals

How this lands on top of existing code

Ties to issue #57 (SKOS import)

Open questions — feedback wanted

Final design decisions — detailed

Ontology package (bigquery_ontology) — changes

SDK package (bigquery_agent_analytics) — changes

Sequencing (from the plan)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `OntologyRuntime` — read accessor over loaded ontology + binding

3. `EntityResolver` Protocol + two reference implementations

4. `validate_against_ontology` — small validation helper

`OntologyRuntime` construction: paths and models

Scope semantics: `scheme` vs `entity`

Ontology package (`bigquery_ontology`) — changes

SDK package (`bigquery_agent_analytics`) — changes