Status: Design proposal. Not yet implemented. Comments welcome — especially on the "Open questions" section at the bottom.
Goal
Many real-world ontologies mix OWL (formal classes and properties) with SKOS (human-readable labels, taxonomies, cross-vocabulary mappings). The current gm import-owl importer recognizes SKOS only partially: skos:altLabel and skos:prefLabel populate synonyms, and everything else (skos:definition, skos:broader, skos:scopeNote, skos:notation, skos:exactMatch, standalone skos:Concept resources, etc.) is silently dropped.
This issue proposes first-class SKOS support in the same gm import-owl command, so users importing FIBO, SNOMED excerpts, library vocabularies, and other hybrid OWL+SKOS sources don't lose descriptive metadata or taxonomy structure.
Audit of current behavior
On main (src/bigquery_ontology/owl_importer.py):
skos:prefLabel, skos:altLabel → synonyms (lines 213–218)
- All other SKOS predicates → silently dropped (no code path reads them)
- Standalone
skos:Concept (no owl:Class typing) → not imported (entity iteration is g.subjects(RDF.type, OWL.Class) only, line 350)
- Design doc
docs/ontology/owl-import.md:168-170 claims literal annotations are preserved as annotations: {iri: value}, but the implementation only covers a fixed OWL allowlist — SKOS literals fall through unhandled
- No tests or fixtures cover SKOS input
Guiding principle: informational by default
SKOS and OWL make different kinds of claims, and the importer should preserve the distinction:
- OWL makes formal claims.
rdfs:subClassOf means every instance of the child is an instance of the parent. Drives inheritance, keys, substitutability.
- SKOS makes informational claims.
skos:broader means "this is a narrower topic" — explicitly not subsumption per the W3C SKOS Primer. Drives documentation, browsing, search.
Consequences for the importer:
- OWL constructs flow into structural fields (
extends, from/to, keys, properties).
- SKOS graph-shaped predicates (
skos:broader, skos:related, skos:*Match) flow into abstract relationships — edges declared in the ontology but not expected to be bound to BigQuery tables.
- SKOS literal predicates (
skos:definition, skos:notation, skos:scopeNote, etc.) flow into annotations.
- Nothing from SKOS flows into
extends or description. The ontology never claims inheritance or human-readable descriptions that the source author didn't assert via OWL/RDFS core vocabulary.
Proposed model extensions
Two small, symmetric additions:
1. abstract: bool = False on Entity and Relationship
An abstract entity is declared but not bound to a BigQuery table. Primary key not required. gm validate does not demand one; gm bind rejects bindings that target it.
An abstract relationship is declared but not bound. It describes the graph without pretending to be backed by rows.
Rule: a concrete relationship cannot have an abstract endpoint (nothing to bind). An abstract relationship may have any combination of endpoints.
2. Relaxed name uniqueness for abstract relationships
Today, relationship names must be globally unique in the ontology (ontology_loader.py:80, rule 1 in ontology.md). This prevents emitting multiple skos_broader edges with different endpoints.
Proposed change, scoped narrowly:
- Concrete relationships:
(name,) must be unique. Unchanged.
- Abstract relationships:
(name, from, to) must be unique. A single predicate like skos_broader can repeat across endpoint pairs.
Why this scoping is safe:
compile_graph only emits DDL for elements that have bindings. Abstract relationships have no bindings, so they never reach the DDL emitter. Edge-table naming in CREATE PROPERTY GRAPH is unaffected.
- Binding resolution targets concrete relationships, whose names are still globally unique.
- GQL queries over the generated property graph reference concrete relationship names, still unique.
In short, the relaxation only affects the informational/documentation surface of the ontology. Structural, bindable, and DDL-facing surfaces are untouched.
Naming convention
Provenance is encoded in names so that a reader can tell at a glance which elements are SKOS-derived (informational) versus OWL-derived (structural):
- Pure SKOS entity (resource typed only as
skos:Concept): name prefixed skos_ → skos_Banking, skos_RetailBanking.
- Mixed OWL+SKOS entity (same resource typed as both): name unprefixed →
Account. OWL won the structure, so it wins the name.
- SKOS relationship (always abstract, derived from
skos:broader, skos:related, skos:*Match): name prefixed skos_ → skos_broader, skos_related, skos_exactMatch.
- OWL relationship: unprefixed. Unchanged.
- Annotation keys: keep the
skos: colon-style prefix (e.g., skos:definition, skos:notation). This matches the existing owl: convention in annotations and preserves RDF provenance for potential round-trip tooling.
The rule reduces to: skos_ name prefix = "this element is purely SKOS-sourced and informational." Unprefixed = has OWL structure (or is a core ontology element) and can be trusted as formal.
A note on the dual prefix form. SKOS provenance appears as skos_ in names and skos: in annotation keys. The split isn't stylistic — it reflects the identifier/metadata boundary: names flow into YAML parse keys, BigQuery labels, GQL syntax, and Python/SQL code, where colons are unsafe or already syntactic. Annotation keys are free-form map keys where colons parse cleanly and where owl: already sets the convention. Any tool bridging RDF and code hits the same boundary (JSON-LD contexts, rdflib's CURIE-to-attribute translation, protobuf-to-Python name generation). Learn the split once and it stays out of the way.
Mapping table (SKOS additions)
| SKOS construct |
Ontology equivalent |
Notes |
skos:Concept (no owl:Class) |
Abstract entity with skos_ name prefix |
Keys not required |
owl:Class + skos:Concept |
Concrete entity, unprefixed |
OWL provides structure |
skos:ConceptScheme |
Ontology-level annotation |
|
skos:prefLabel |
synonyms (when ≠ name) |
Label data |
skos:altLabel, skos:hiddenLabel |
synonyms |
Label data |
skos:definition |
Annotation skos:definition |
Not a label; preserved with provenance |
skos:broader, skos:narrower |
Abstract relationship skos_broader (normalized: narrower → broader) |
|
skos:related |
Abstract relationship skos_related |
|
skos:exactMatch, skos:closeMatch, skos:broadMatch, skos:narrowMatch, skos:relatedMatch |
Abstract relationship skos_exactMatch etc. if target is an imported entity; annotation if external IRI |
|
skos:notation, skos:scopeNote, skos:example, skos:historyNote, skos:editorialNote, skos:changeNote |
Annotation (prefix preserved) |
|
skos:inScheme, skos:topConceptOf |
Annotation (prefix preserved) |
|
Rule of thumb: the predicate's object is another imported entity → relationship; otherwise → annotation.
Label and description handling
SKOS never populates description directly. The rules are one-to-one per predicate, no fallback chain:
rdfs:label → description (unchanged from today)
rdfs:comment → description (appended) (unchanged)
skos:prefLabel → synonyms if different from the entity name; otherwise elided
skos:altLabel, skos:hiddenLabel → synonyms
skos:definition, skos:scopeNote, etc. → annotations with skos: prefix
Consequence: a pure-SKOS file produces entities with empty description. The definition is available in annotations.skos:definition if anyone needs it. This is intentional — promoting a skos:definition into description is the same kind of structural smuggling we rejected for skos:broader → extends.
Multilingual labels are selected by --language (default en). Non-selected languages are preserved as language-suffixed annotations (e.g., skos:prefLabel@fr: Chat).
Examples
Example 1 — Mixed OWL + SKOS
Input:
:Account a owl:Class, skos:Concept ;
rdfs:label "Account" ;
skos:altLabel "Acct"@en ;
skos:definition "A record of financial transactions."@en ;
skos:related :Ledger ;
skos:exactMatch <http://fibo.org/ontology/FBC/Account> ;
owl:hasKey ( :account_id ) .
:Ledger a owl:Class ;
rdfs:label "Ledger" ;
owl:hasKey ( :ledger_id ) .
Output:
entities:
- name: Account
description: Account
synonyms: [Acct]
keys:
primary: [account_id]
properties:
- name: account_id
type: string
annotations:
skos:definition: A record of financial transactions.
skos:exactMatch: http://fibo.org/ontology/FBC/Account
- name: Ledger
description: Ledger
keys:
primary: [ledger_id]
properties:
- name: ledger_id
type: string
relationships:
- name: skos_related
abstract: true
from: Account
to: Ledger
Concrete entities (OWL built them, names unprefixed), abstract skos_related relationship (SKOS is informational), literal annotations for the definition and cross-vocab match.
Example 2 — Pure SKOS taxonomy
Input:
:Banking a skos:Concept ;
skos:prefLabel "Banking"@en ;
skos:definition "Activities of financial institutions."@en .
:RetailBanking a skos:Concept ;
skos:prefLabel "Retail Banking"@en ;
skos:altLabel "Consumer Banking"@en ;
skos:broader :Banking ;
skos:notation "RB" .
:InvestmentBanking a skos:Concept ;
skos:prefLabel "Investment Banking"@en ;
skos:broader :Banking .
Output:
entities:
- name: skos_Banking
abstract: true
annotations:
skos:definition: Activities of financial institutions.
- name: skos_InvestmentBanking
abstract: true
- name: skos_RetailBanking
abstract: true
synonyms: [Consumer Banking]
annotations:
skos:notation: RB
relationships:
- name: skos_broader
abstract: true
from: skos_InvestmentBanking
to: skos_Banking
- name: skos_broader
abstract: true
from: skos_RetailBanking
to: skos_Banking
All entities are prefixed (pure SKOS, informational), all relationships are skos_broader with different endpoints (legal under the scoped uniqueness relaxation), and description is empty on every entity — the SKOS source offered no rdfs:label or rdfs:comment. A stderr hint suggests the user consider representing the taxonomy as a dimension column instead of entity types, but the import proceeds regardless.
Example 3 — Cross-kind reference
A concrete OWL class pointing to a pure-SKOS concept via skos:broader:
:Account a owl:Class ;
rdfs:label "Account" ;
skos:broader :FinancialProduct ;
owl:hasKey ( :account_id ) .
:FinancialProduct a skos:Concept ;
skos:prefLabel "Financial Product"@en .
Output:
entities:
- name: Account
description: Account
keys:
primary: [account_id]
properties:
- name: account_id
type: string
- name: skos_FinancialProduct
abstract: true
relationships:
- name: skos_broader
abstract: true
from: Account
to: skos_FinancialProduct
Concrete entity, abstract entity, abstract relationship. The name prefix travels with the element wherever it's referenced, so provenance is visible at every site.
CLI additions
Only one new flag:
--language <tag> — language tag used to select labels and notes (default en).
Everything else is default behavior. Pure SKOS concepts are imported as abstract entities, SKOS graph predicates become abstract relationships, SKOS literals become annotations. No flag gates SKOS support on or off, and no flag converts SKOS claims into structural claims — consistent with the "informational by default" principle. If you want formal subsumption, write rdfs:subClassOf in your source.
The command itself stays gm import-owl — same input pipeline, same namespace filter.
Drop summary additions
The existing stderr drop summary gets new lines for:
- SKOS predicates seen and mapped to annotations
- Labels and notes discarded by
--language selection
skos:*Match targets outside included namespaces, preserved as annotations
- Taxonomy-shaped imports (all-abstract entities) with a hint to consider dimension columns
Fix for an existing bug
docs/ontology/owl-import.md:168-170 states that unknown literal-valued predicates should be preserved as annotations: {iri: value}. The code does not implement this generically — only a fixed OWL allowlist is handled. As part of this work, a generic literal-annotation pass will be added, closing the silent-drop gap for any RDF predicate (SKOS, Dublin Core, custom).
Open questions — feedback wanted
-
Is "informational by default" the right stance? Or should skos:broader map to extends by default, since hybrid ontology authors sometimes conflate the two? (Proposal leans conservative: SKOS never maps to structural fields.)
-
Is abstract: true the right name and shape? Alternatives considered: bindable: false, informational: true, splitting entities into two types. The abstract name parallels OO semantics most closely.
-
Relaxing relationship uniqueness to (name, from, to) for abstract relationships only is required to emit multiple skos_broader edges without synthetic naming. Analysis above argues this doesn't affect DDL, binding, or GQL. Anyone see a downstream assumption this would break?
-
Is the skos_ name prefix the right choice? Alternative: no prefix, trust abstract: true alone to signal informational status. Pro-prefix: greppability, collision safety, provenance visible at every reference site. Pro-no-prefix: cleaner names in GQL queries and bindings, less visual noise, redundant with abstract: true.
Why there isn't a --preserve-skos-prefix flag. A flag feels cheap but fragments the tool's output: teams on the same project end up with different conventions in committed YAML, and downstream tooling (diffs, audit scripts, code review) has to handle both. We'd rather pick one default based on feedback here than punt the decision to every user. If this thread surfaces a roughly even split between prefix and no-prefix camps, a flag becomes justified and we'll add it; until then, post-import renaming via sd/sed covers the escape hatch.
-
Should skos:prefLabel ever populate description when rdfs:label is absent? Current proposal says no: pure-SKOS entities get empty description and the label lives in synonyms. A one-level fallback would be simpler than the earlier four-way proposal and would prevent awkward-looking empty fields. Open to either.
-
Multilingual handling currently proposes one selected language populates standard fields and others go to language-suffixed annotations (e.g., skos:prefLabel@fr). Alternative: all languages in synonyms with @lang tags preserved. Preference?
-
Scope: should Dublin Core (dc:title, dc:description, dcterms:creator, etc.) be handled in the same pass, since it shows up in many of the same files? Or deferred to a follow-up?
-
Naming of CLI flag. --language — open to bikeshedding.
-
Should we ever support an opt-in flag to promote SKOS to structural semantics? The current proposal says no: SKOS is informational, full stop. A future --promote-skos-broader flag was considered and dropped for simplicity. Revisit if real users request it.
-
Should there be a flag to opt out of SKOS import entirely? For users who want OWL-only behavior, --no-skos could suppress all SKOS processing. Dropped from the initial proposal to keep surface minimal.
Please comment if you have opinions or real-world SKOS+OWL ontologies you'd like to see handled.
Goal
Many real-world ontologies mix OWL (formal classes and properties) with SKOS (human-readable labels, taxonomies, cross-vocabulary mappings). The current
gm import-owlimporter recognizes SKOS only partially:skos:altLabelandskos:prefLabelpopulatesynonyms, and everything else (skos:definition,skos:broader,skos:scopeNote,skos:notation,skos:exactMatch, standaloneskos:Conceptresources, etc.) is silently dropped.This issue proposes first-class SKOS support in the same
gm import-owlcommand, so users importing FIBO, SNOMED excerpts, library vocabularies, and other hybrid OWL+SKOS sources don't lose descriptive metadata or taxonomy structure.Audit of current behavior
On
main(src/bigquery_ontology/owl_importer.py):skos:prefLabel,skos:altLabel→synonyms(lines 213–218)skos:Concept(noowl:Classtyping) → not imported (entity iteration isg.subjects(RDF.type, OWL.Class)only, line 350)docs/ontology/owl-import.md:168-170claims literal annotations are preserved asannotations: {iri: value}, but the implementation only covers a fixed OWL allowlist — SKOS literals fall through unhandledGuiding principle: informational by default
SKOS and OWL make different kinds of claims, and the importer should preserve the distinction:
rdfs:subClassOfmeans every instance of the child is an instance of the parent. Drives inheritance, keys, substitutability.skos:broadermeans "this is a narrower topic" — explicitly not subsumption per the W3C SKOS Primer. Drives documentation, browsing, search.Consequences for the importer:
extends,from/to,keys,properties).skos:broader,skos:related,skos:*Match) flow into abstract relationships — edges declared in the ontology but not expected to be bound to BigQuery tables.skos:definition,skos:notation,skos:scopeNote, etc.) flow into annotations.extendsordescription. The ontology never claims inheritance or human-readable descriptions that the source author didn't assert via OWL/RDFS core vocabulary.Proposed model extensions
Two small, symmetric additions:
1.
abstract: bool = FalseonEntityandRelationshipAn abstract entity is declared but not bound to a BigQuery table. Primary key not required.
gm validatedoes not demand one;gm bindrejects bindings that target it.An abstract relationship is declared but not bound. It describes the graph without pretending to be backed by rows.
Rule: a concrete relationship cannot have an abstract endpoint (nothing to bind). An abstract relationship may have any combination of endpoints.
2. Relaxed name uniqueness for abstract relationships
Today, relationship names must be globally unique in the ontology (
ontology_loader.py:80, rule 1 inontology.md). This prevents emitting multipleskos_broaderedges with different endpoints.Proposed change, scoped narrowly:
(name,)must be unique. Unchanged.(name, from, to)must be unique. A single predicate likeskos_broadercan repeat across endpoint pairs.Why this scoping is safe:
compile_graphonly emits DDL for elements that have bindings. Abstract relationships have no bindings, so they never reach the DDL emitter. Edge-table naming inCREATE PROPERTY GRAPHis unaffected.In short, the relaxation only affects the informational/documentation surface of the ontology. Structural, bindable, and DDL-facing surfaces are untouched.
Naming convention
Provenance is encoded in names so that a reader can tell at a glance which elements are SKOS-derived (informational) versus OWL-derived (structural):
skos:Concept): name prefixedskos_→skos_Banking,skos_RetailBanking.Account. OWL won the structure, so it wins the name.skos:broader,skos:related,skos:*Match): name prefixedskos_→skos_broader,skos_related,skos_exactMatch.skos:colon-style prefix (e.g.,skos:definition,skos:notation). This matches the existingowl:convention in annotations and preserves RDF provenance for potential round-trip tooling.The rule reduces to:
skos_name prefix = "this element is purely SKOS-sourced and informational." Unprefixed = has OWL structure (or is a core ontology element) and can be trusted as formal.A note on the dual prefix form. SKOS provenance appears as
skos_in names andskos:in annotation keys. The split isn't stylistic — it reflects the identifier/metadata boundary: names flow into YAML parse keys, BigQuery labels, GQL syntax, and Python/SQL code, where colons are unsafe or already syntactic. Annotation keys are free-form map keys where colons parse cleanly and whereowl:already sets the convention. Any tool bridging RDF and code hits the same boundary (JSON-LD contexts, rdflib's CURIE-to-attribute translation, protobuf-to-Python name generation). Learn the split once and it stays out of the way.Mapping table (SKOS additions)
skos:Concept(noowl:Class)skos_name prefixowl:Class+skos:Conceptskos:ConceptSchemeskos:prefLabelsynonyms(when ≠ name)skos:altLabel,skos:hiddenLabelsynonymsskos:definitionskos:definitionskos:broader,skos:narrowerskos_broader(normalized: narrower → broader)skos:relatedskos_relatedskos:exactMatch,skos:closeMatch,skos:broadMatch,skos:narrowMatch,skos:relatedMatchskos_exactMatchetc. if target is an imported entity; annotation if external IRIskos:notation,skos:scopeNote,skos:example,skos:historyNote,skos:editorialNote,skos:changeNoteskos:inScheme,skos:topConceptOfRule of thumb: the predicate's object is another imported entity → relationship; otherwise → annotation.
Label and description handling
SKOS never populates
descriptiondirectly. The rules are one-to-one per predicate, no fallback chain:rdfs:label→description(unchanged from today)rdfs:comment→description(appended) (unchanged)skos:prefLabel→synonymsif different from the entity name; otherwise elidedskos:altLabel,skos:hiddenLabel→synonymsskos:definition,skos:scopeNote, etc. → annotations withskos:prefixConsequence: a pure-SKOS file produces entities with empty
description. The definition is available inannotations.skos:definitionif anyone needs it. This is intentional — promoting askos:definitionintodescriptionis the same kind of structural smuggling we rejected forskos:broader→extends.Multilingual labels are selected by
--language(defaulten). Non-selected languages are preserved as language-suffixed annotations (e.g.,skos:prefLabel@fr: Chat).Examples
Example 1 — Mixed OWL + SKOS
Input:
Output:
Concrete entities (OWL built them, names unprefixed), abstract
skos_relatedrelationship (SKOS is informational), literal annotations for the definition and cross-vocab match.Example 2 — Pure SKOS taxonomy
Input:
Output:
All entities are prefixed (pure SKOS, informational), all relationships are
skos_broaderwith different endpoints (legal under the scoped uniqueness relaxation), anddescriptionis empty on every entity — the SKOS source offered nordfs:labelorrdfs:comment. A stderr hint suggests the user consider representing the taxonomy as a dimension column instead of entity types, but the import proceeds regardless.Example 3 — Cross-kind reference
A concrete OWL class pointing to a pure-SKOS concept via
skos:broader:Output:
Concrete entity, abstract entity, abstract relationship. The name prefix travels with the element wherever it's referenced, so provenance is visible at every site.
CLI additions
Only one new flag:
--language <tag>— language tag used to select labels and notes (defaulten).Everything else is default behavior. Pure SKOS concepts are imported as abstract entities, SKOS graph predicates become abstract relationships, SKOS literals become annotations. No flag gates SKOS support on or off, and no flag converts SKOS claims into structural claims — consistent with the "informational by default" principle. If you want formal subsumption, write
rdfs:subClassOfin your source.The command itself stays
gm import-owl— same input pipeline, same namespace filter.Drop summary additions
The existing stderr drop summary gets new lines for:
--languageselectionskos:*Matchtargets outside included namespaces, preserved as annotationsFix for an existing bug
docs/ontology/owl-import.md:168-170states that unknown literal-valued predicates should be preserved asannotations: {iri: value}. The code does not implement this generically — only a fixed OWL allowlist is handled. As part of this work, a generic literal-annotation pass will be added, closing the silent-drop gap for any RDF predicate (SKOS, Dublin Core, custom).Open questions — feedback wanted
Is "informational by default" the right stance? Or should
skos:broadermap toextendsby default, since hybrid ontology authors sometimes conflate the two? (Proposal leans conservative: SKOS never maps to structural fields.)Is
abstract: truethe right name and shape? Alternatives considered:bindable: false,informational: true, splitting entities into two types. Theabstractname parallels OO semantics most closely.Relaxing relationship uniqueness to
(name, from, to)for abstract relationships only is required to emit multipleskos_broaderedges without synthetic naming. Analysis above argues this doesn't affect DDL, binding, or GQL. Anyone see a downstream assumption this would break?Is the
skos_name prefix the right choice? Alternative: no prefix, trustabstract: truealone to signal informational status. Pro-prefix: greppability, collision safety, provenance visible at every reference site. Pro-no-prefix: cleaner names in GQL queries and bindings, less visual noise, redundant withabstract: true.Why there isn't a
--preserve-skos-prefixflag. A flag feels cheap but fragments the tool's output: teams on the same project end up with different conventions in committed YAML, and downstream tooling (diffs, audit scripts, code review) has to handle both. We'd rather pick one default based on feedback here than punt the decision to every user. If this thread surfaces a roughly even split between prefix and no-prefix camps, a flag becomes justified and we'll add it; until then, post-import renaming viasd/sedcovers the escape hatch.Should
skos:prefLabelever populatedescriptionwhenrdfs:labelis absent? Current proposal says no: pure-SKOS entities get emptydescriptionand the label lives insynonyms. A one-level fallback would be simpler than the earlier four-way proposal and would prevent awkward-looking empty fields. Open to either.Multilingual handling currently proposes one selected language populates standard fields and others go to language-suffixed annotations (e.g.,
skos:prefLabel@fr). Alternative: all languages in synonyms with@langtags preserved. Preference?Scope: should Dublin Core (
dc:title,dc:description,dcterms:creator, etc.) be handled in the same pass, since it shows up in many of the same files? Or deferred to a follow-up?Naming of CLI flag.
--language— open to bikeshedding.Should we ever support an opt-in flag to promote SKOS to structural semantics? The current proposal says no: SKOS is informational, full stop. A future
--promote-skos-broaderflag was considered and dropped for simplicity. Revisit if real users request it.Should there be a flag to opt out of SKOS import entirely? For users who want OWL-only behavior,
--no-skoscould suppress all SKOS processing. Dropped from the initial proposal to keep surface minimal.Please comment if you have opinions or real-world SKOS+OWL ontologies you'd like to see handled.