Skip to content

Feat: SKOS import support alongside OWL (design proposal — feedback wanted) #57

@libei

Description

@libei

Status: Design proposal. Not yet implemented. Comments welcome — especially on the "Open questions" section at the bottom.

Goal

Many real-world ontologies mix OWL (formal classes and properties) with SKOS (human-readable labels, taxonomies, cross-vocabulary mappings). The current gm import-owl importer recognizes SKOS only partially: skos:altLabel and skos:prefLabel populate synonyms, and everything else (skos:definition, skos:broader, skos:scopeNote, skos:notation, skos:exactMatch, standalone skos:Concept resources, etc.) is silently dropped.

This issue proposes first-class SKOS support in the same gm import-owl command, so users importing FIBO, SNOMED excerpts, library vocabularies, and other hybrid OWL+SKOS sources don't lose descriptive metadata or taxonomy structure.

Audit of current behavior

On main (src/bigquery_ontology/owl_importer.py):

  • skos:prefLabel, skos:altLabelsynonyms (lines 213–218)
  • All other SKOS predicates → silently dropped (no code path reads them)
  • Standalone skos:Concept (no owl:Class typing) → not imported (entity iteration is g.subjects(RDF.type, OWL.Class) only, line 350)
  • Design doc docs/ontology/owl-import.md:168-170 claims literal annotations are preserved as annotations: {iri: value}, but the implementation only covers a fixed OWL allowlist — SKOS literals fall through unhandled
  • No tests or fixtures cover SKOS input

Guiding principle: informational by default

SKOS and OWL make different kinds of claims, and the importer should preserve the distinction:

  • OWL makes formal claims. rdfs:subClassOf means every instance of the child is an instance of the parent. Drives inheritance, keys, substitutability.
  • SKOS makes informational claims. skos:broader means "this is a narrower topic" — explicitly not subsumption per the W3C SKOS Primer. Drives documentation, browsing, search.

Consequences for the importer:

  1. OWL constructs flow into structural fields (extends, from/to, keys, properties).
  2. SKOS graph-shaped predicates (skos:broader, skos:related, skos:*Match) flow into abstract relationships — edges declared in the ontology but not expected to be bound to BigQuery tables.
  3. SKOS literal predicates (skos:definition, skos:notation, skos:scopeNote, etc.) flow into annotations.
  4. Nothing from SKOS flows into extends or description. The ontology never claims inheritance or human-readable descriptions that the source author didn't assert via OWL/RDFS core vocabulary.

Proposed model extensions

Two small, symmetric additions:

1. abstract: bool = False on Entity and Relationship

An abstract entity is declared but not bound to a BigQuery table. Primary key not required. gm validate does not demand one; gm bind rejects bindings that target it.

An abstract relationship is declared but not bound. It describes the graph without pretending to be backed by rows.

Rule: a concrete relationship cannot have an abstract endpoint (nothing to bind). An abstract relationship may have any combination of endpoints.

2. Relaxed name uniqueness for abstract relationships

Today, relationship names must be globally unique in the ontology (ontology_loader.py:80, rule 1 in ontology.md). This prevents emitting multiple skos_broader edges with different endpoints.

Proposed change, scoped narrowly:

  • Concrete relationships: (name,) must be unique. Unchanged.
  • Abstract relationships: (name, from, to) must be unique. A single predicate like skos_broader can repeat across endpoint pairs.

Why this scoping is safe:

  • compile_graph only emits DDL for elements that have bindings. Abstract relationships have no bindings, so they never reach the DDL emitter. Edge-table naming in CREATE PROPERTY GRAPH is unaffected.
  • Binding resolution targets concrete relationships, whose names are still globally unique.
  • GQL queries over the generated property graph reference concrete relationship names, still unique.

In short, the relaxation only affects the informational/documentation surface of the ontology. Structural, bindable, and DDL-facing surfaces are untouched.

Naming convention

Provenance is encoded in names so that a reader can tell at a glance which elements are SKOS-derived (informational) versus OWL-derived (structural):

  • Pure SKOS entity (resource typed only as skos:Concept): name prefixed skos_skos_Banking, skos_RetailBanking.
  • Mixed OWL+SKOS entity (same resource typed as both): name unprefixedAccount. OWL won the structure, so it wins the name.
  • SKOS relationship (always abstract, derived from skos:broader, skos:related, skos:*Match): name prefixed skos_skos_broader, skos_related, skos_exactMatch.
  • OWL relationship: unprefixed. Unchanged.
  • Annotation keys: keep the skos: colon-style prefix (e.g., skos:definition, skos:notation). This matches the existing owl: convention in annotations and preserves RDF provenance for potential round-trip tooling.

The rule reduces to: skos_ name prefix = "this element is purely SKOS-sourced and informational." Unprefixed = has OWL structure (or is a core ontology element) and can be trusted as formal.

A note on the dual prefix form. SKOS provenance appears as skos_ in names and skos: in annotation keys. The split isn't stylistic — it reflects the identifier/metadata boundary: names flow into YAML parse keys, BigQuery labels, GQL syntax, and Python/SQL code, where colons are unsafe or already syntactic. Annotation keys are free-form map keys where colons parse cleanly and where owl: already sets the convention. Any tool bridging RDF and code hits the same boundary (JSON-LD contexts, rdflib's CURIE-to-attribute translation, protobuf-to-Python name generation). Learn the split once and it stays out of the way.

Mapping table (SKOS additions)

SKOS construct Ontology equivalent Notes
skos:Concept (no owl:Class) Abstract entity with skos_ name prefix Keys not required
owl:Class + skos:Concept Concrete entity, unprefixed OWL provides structure
skos:ConceptScheme Ontology-level annotation
skos:prefLabel synonyms (when ≠ name) Label data
skos:altLabel, skos:hiddenLabel synonyms Label data
skos:definition Annotation skos:definition Not a label; preserved with provenance
skos:broader, skos:narrower Abstract relationship skos_broader (normalized: narrower → broader)
skos:related Abstract relationship skos_related
skos:exactMatch, skos:closeMatch, skos:broadMatch, skos:narrowMatch, skos:relatedMatch Abstract relationship skos_exactMatch etc. if target is an imported entity; annotation if external IRI
skos:notation, skos:scopeNote, skos:example, skos:historyNote, skos:editorialNote, skos:changeNote Annotation (prefix preserved)
skos:inScheme, skos:topConceptOf Annotation (prefix preserved)

Rule of thumb: the predicate's object is another imported entity → relationship; otherwise → annotation.

Label and description handling

SKOS never populates description directly. The rules are one-to-one per predicate, no fallback chain:

  • rdfs:labeldescription (unchanged from today)
  • rdfs:commentdescription (appended) (unchanged)
  • skos:prefLabelsynonyms if different from the entity name; otherwise elided
  • skos:altLabel, skos:hiddenLabelsynonyms
  • skos:definition, skos:scopeNote, etc. → annotations with skos: prefix

Consequence: a pure-SKOS file produces entities with empty description. The definition is available in annotations.skos:definition if anyone needs it. This is intentional — promoting a skos:definition into description is the same kind of structural smuggling we rejected for skos:broaderextends.

Multilingual labels are selected by --language (default en). Non-selected languages are preserved as language-suffixed annotations (e.g., skos:prefLabel@fr: Chat).

Examples

Example 1 — Mixed OWL + SKOS

Input:

:Account a owl:Class, skos:Concept ;
    rdfs:label "Account" ;
    skos:altLabel "Acct"@en ;
    skos:definition "A record of financial transactions."@en ;
    skos:related :Ledger ;
    skos:exactMatch <http://fibo.org/ontology/FBC/Account> ;
    owl:hasKey ( :account_id ) .

:Ledger a owl:Class ;
    rdfs:label "Ledger" ;
    owl:hasKey ( :ledger_id ) .

Output:

entities:
  - name: Account
    description: Account
    synonyms: [Acct]
    keys:
      primary: [account_id]
    properties:
      - name: account_id
        type: string
    annotations:
      skos:definition: A record of financial transactions.
      skos:exactMatch: http://fibo.org/ontology/FBC/Account

  - name: Ledger
    description: Ledger
    keys:
      primary: [ledger_id]
    properties:
      - name: ledger_id
        type: string

relationships:
  - name: skos_related
    abstract: true
    from: Account
    to: Ledger

Concrete entities (OWL built them, names unprefixed), abstract skos_related relationship (SKOS is informational), literal annotations for the definition and cross-vocab match.

Example 2 — Pure SKOS taxonomy

Input:

:Banking a skos:Concept ;
    skos:prefLabel "Banking"@en ;
    skos:definition "Activities of financial institutions."@en .

:RetailBanking a skos:Concept ;
    skos:prefLabel "Retail Banking"@en ;
    skos:altLabel "Consumer Banking"@en ;
    skos:broader :Banking ;
    skos:notation "RB" .

:InvestmentBanking a skos:Concept ;
    skos:prefLabel "Investment Banking"@en ;
    skos:broader :Banking .

Output:

entities:
  - name: skos_Banking
    abstract: true
    annotations:
      skos:definition: Activities of financial institutions.

  - name: skos_InvestmentBanking
    abstract: true

  - name: skos_RetailBanking
    abstract: true
    synonyms: [Consumer Banking]
    annotations:
      skos:notation: RB

relationships:
  - name: skos_broader
    abstract: true
    from: skos_InvestmentBanking
    to: skos_Banking

  - name: skos_broader
    abstract: true
    from: skos_RetailBanking
    to: skos_Banking

All entities are prefixed (pure SKOS, informational), all relationships are skos_broader with different endpoints (legal under the scoped uniqueness relaxation), and description is empty on every entity — the SKOS source offered no rdfs:label or rdfs:comment. A stderr hint suggests the user consider representing the taxonomy as a dimension column instead of entity types, but the import proceeds regardless.

Example 3 — Cross-kind reference

A concrete OWL class pointing to a pure-SKOS concept via skos:broader:

:Account a owl:Class ;
    rdfs:label "Account" ;
    skos:broader :FinancialProduct ;
    owl:hasKey ( :account_id ) .

:FinancialProduct a skos:Concept ;
    skos:prefLabel "Financial Product"@en .

Output:

entities:
  - name: Account
    description: Account
    keys:
      primary: [account_id]
    properties:
      - name: account_id
        type: string

  - name: skos_FinancialProduct
    abstract: true

relationships:
  - name: skos_broader
    abstract: true
    from: Account
    to: skos_FinancialProduct

Concrete entity, abstract entity, abstract relationship. The name prefix travels with the element wherever it's referenced, so provenance is visible at every site.

CLI additions

Only one new flag:

  • --language <tag> — language tag used to select labels and notes (default en).

Everything else is default behavior. Pure SKOS concepts are imported as abstract entities, SKOS graph predicates become abstract relationships, SKOS literals become annotations. No flag gates SKOS support on or off, and no flag converts SKOS claims into structural claims — consistent with the "informational by default" principle. If you want formal subsumption, write rdfs:subClassOf in your source.

The command itself stays gm import-owl — same input pipeline, same namespace filter.

Drop summary additions

The existing stderr drop summary gets new lines for:

  • SKOS predicates seen and mapped to annotations
  • Labels and notes discarded by --language selection
  • skos:*Match targets outside included namespaces, preserved as annotations
  • Taxonomy-shaped imports (all-abstract entities) with a hint to consider dimension columns

Fix for an existing bug

docs/ontology/owl-import.md:168-170 states that unknown literal-valued predicates should be preserved as annotations: {iri: value}. The code does not implement this generically — only a fixed OWL allowlist is handled. As part of this work, a generic literal-annotation pass will be added, closing the silent-drop gap for any RDF predicate (SKOS, Dublin Core, custom).

Open questions — feedback wanted

  1. Is "informational by default" the right stance? Or should skos:broader map to extends by default, since hybrid ontology authors sometimes conflate the two? (Proposal leans conservative: SKOS never maps to structural fields.)

  2. Is abstract: true the right name and shape? Alternatives considered: bindable: false, informational: true, splitting entities into two types. The abstract name parallels OO semantics most closely.

  3. Relaxing relationship uniqueness to (name, from, to) for abstract relationships only is required to emit multiple skos_broader edges without synthetic naming. Analysis above argues this doesn't affect DDL, binding, or GQL. Anyone see a downstream assumption this would break?

  4. Is the skos_ name prefix the right choice? Alternative: no prefix, trust abstract: true alone to signal informational status. Pro-prefix: greppability, collision safety, provenance visible at every reference site. Pro-no-prefix: cleaner names in GQL queries and bindings, less visual noise, redundant with abstract: true.

    Why there isn't a --preserve-skos-prefix flag. A flag feels cheap but fragments the tool's output: teams on the same project end up with different conventions in committed YAML, and downstream tooling (diffs, audit scripts, code review) has to handle both. We'd rather pick one default based on feedback here than punt the decision to every user. If this thread surfaces a roughly even split between prefix and no-prefix camps, a flag becomes justified and we'll add it; until then, post-import renaming via sd/sed covers the escape hatch.

  5. Should skos:prefLabel ever populate description when rdfs:label is absent? Current proposal says no: pure-SKOS entities get empty description and the label lives in synonyms. A one-level fallback would be simpler than the earlier four-way proposal and would prevent awkward-looking empty fields. Open to either.

  6. Multilingual handling currently proposes one selected language populates standard fields and others go to language-suffixed annotations (e.g., skos:prefLabel@fr). Alternative: all languages in synonyms with @lang tags preserved. Preference?

  7. Scope: should Dublin Core (dc:title, dc:description, dcterms:creator, etc.) be handled in the same pass, since it shows up in many of the same files? Or deferred to a follow-up?

  8. Naming of CLI flag. --language — open to bikeshedding.

  9. Should we ever support an opt-in flag to promote SKOS to structural semantics? The current proposal says no: SKOS is informational, full stop. A future --promote-skos-broader flag was considered and dropped for simplicity. Revisit if real users request it.

  10. Should there be a flag to opt out of SKOS import entirely? For users who want OWL-only behavior, --no-skos could suppress all SKOS processing. Dropped from the initial proposal to keep surface minimal.

Please comment if you have opinions or real-world SKOS+OWL ontologies you'd like to see handled.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions