Skip to content

Latest commit

 

History

History
952 lines (741 loc) · 29.5 KB

File metadata and controls

952 lines (741 loc) · 29.5 KB

Morpheus Graph DSL Reference

Morpheus exposes seven high-level DSL endpoints that cover the common graph analytics patterns. All endpoints accept and return JSON. A database field may be included in any request body to target a specific named database; alternatively, a per-request header may carry it.


Table of Contents

  1. Common Building Blocks
  2. Query DSL — POST /v1/dsl/query
  3. Subgraph DSL — POST /v1/dsl/graph/subgraph
  4. MST DSL — POST /v1/dsl/graph/mst
  5. Shortest Path DSL — POST /v1/dsl/graph/path
  6. Spreading Activation DSL — POST /v1/dsl/graph/activation
  7. Similarity DSL — POST /v1/dsl/graph/similarity
  8. GraphRAG Context DSL — POST /v1/dsl/graph/context
  9. Typed Value Encoding

Common Building Blocks

Predicate

Predicates appear in match (pre-traversal seed filter) and where (post-traversal row filter). They compose recursively.

// Simple comparison
{ "field": "age", "op": ">", "value": 30 }

// Boolean AND
{ "and": [
    { "field": "status", "op": "=", "value": "active" },
    { "field": "score",  "op": ">=", "value": 0.8 }
]}

// Boolean OR
{ "or": [
    { "field": "role", "op": "=", "value": "admin" },
    { "field": "role", "op": "=", "value": "owner" }
]}

// Negation
{ "not": { "field": "deleted", "op": "is_not_null" } }

Comparison operators:

op Meaning
= Equal
!= Not equal
> Greater than
>= Greater than or equal
< Less than
<= Less than or equal
contains String/list contains value
in Value is in the provided array
is_null Field is absent / null
is_not_null Field is present and not null

value is omitted for is_null / is_not_null.

Field paths use dot-notation to reference a traversal alias:

  • "name" — field name on the root vertex (no alias prefix)
  • "friend.name" — field name on the vertex bound to alias friend
  • "edge_alias.weight" — field on an edge bound via edge_as

Only a single dot is supported (no deeper nesting).


SearchClause

Full-text and/or semantic search against an indexed field.

{
    "field": "content",         // must have a fulltext or embedding index
    "text": "transformer model",
    "top_k": 50,                // optional, defaults to engine default
    "mode": "hybrid"            // optional: "fulltext" | "semantic" | "hybrid"
}

When mode is omitted the engine auto-detects: uses hybrid if the field has both index types, semantic if only an embedding index, and fulltext if only a fulltext index.

Hybrid search uses Reciprocal Rank Fusion (RRF) to merge the two result lists.


TraverseDirection

"out"         — follow edges where the current vertex is the source
"in"          — follow edges where the current vertex is the target
"undirected"  — follow edges in either role

Edge Response Semantics

All graph DSL edge lists use the same contract:

  • cell_id is the base58 edge cell ID when the traversed edge has a backing body cell; otherwise it is null.
  • from / to describe the edge itself, using the stored edge orientation.
  • direction describes how that edge was traversed relative to the current vertex:
    • "out" — traversed from source to target
    • "in" — traversed from target to source
    • "none" — undirected edge

Field Projection (select / edge_select)

All graph DSL operations that return vertices or edges support optional field-projection fields at the top level of their request body:

Field Applies to Description
select vertices / nodes List of field names to include in each vertex/node object
edge_select edges List of field names to include in each edge object

Semantics:

  • Omitted (null / not present) — all fields are returned.
  • Empty array ([]) — all fields are returned (same as omitted).
  • Non-empty array — only the listed field names are included; unlisted fields are dropped.
// Return only "name" from vertices and "weight" from edges:
{
    "from": "Person",
    "expand": { "edges": ["Knows"] },
    "select": ["name"],
    "edge_select": ["weight"]
}

select applies to every vertex/node object in the response (seeds, context vertices, path nodes, activated nodes, similar nodes). edge_select applies to every edge object. Operations that do not produce edges (activation, similarity) ignore edge_select. cell_id is not part of fields, so it is unaffected by edge_select and still appears as either a base58 string or null.


Query DSL — POST /v1/dsl/query

The general-purpose relational query over vertex/edge data. Supports filtering, fulltext/semantic search, multi-hop graph traversal, projection, aggregation, ordering, and pagination.

Request

{
    "database": "my_db",           // optional

    // Root vertex schema to start from.
    "from": "Person",

    // Pre-seed filter applied to the root schema before expansion.
    "match": { "field": "active", "op": "=", "value": true },

    // Full-text / semantic / hybrid search to rank seed vertices.
    "search": {
        "field": "bio",
        "text": "machine learning researcher",
        "top_k": 100
    },

    // Zero or more traversal steps, executed in order.
    "traverse": [
        {
            "from_alias": null,        // optional — which alias to expand from (default: root)
            "edges": ["AuthoredBy"],   // edge schema names; [] = all schemas
            "direction": "out",        // "out" | "in" | "undirected"
            "as": "paper",             // alias for the destination vertices
            "edge_as": "authorship",   // optional alias for the edge cell itself
            "max_depth": 1             // optional BFS depth (default 1)
        },
        {
            "from_alias": "paper",
            "edges": ["CitedBy"],
            "direction": "out",
            "as": "cited_paper"
        }
    ],

    // Post-traversal filter applied to fully-expanded rows.
    "where": { "field": "paper.year", "op": ">=", "value": 2020 },

    // Fields to project into the result rows. Omit for all root fields.
    "select": ["name", "paper.title", "paper.year", "authorship.weight"],

    // Grouping keys for aggregation.
    "group_by": ["name"],

    // Aggregate expressions.
    "aggregate": [
        { "fn": "count", "as": "paper_count" },
        { "fn": "avg",   "field": "paper.year", "as": "avg_year" }
    ],

    // Sort order. Can reference projected fields or aggregate aliases.
    "order_by": [
        { "field": "paper_count", "direction": "desc" },
        { "field": "name",        "direction": "asc"  }
    ],

    "limit": 20,
    "offset": 0
}

Traverse step fields

Field Type Default Description
from_alias string root Alias of the vertex to expand from
edges string[] all Edge schema names to follow; empty = all schemas
edge string Legacy single-edge name; prefer edges
direction string "out" Traversal direction
as string required Alias bound to destination vertices
edge_as string none Alias bound to the edge cell (allows projecting edge fields)
max_depth usize 1 Maximum BFS depth from the current alias

Aggregate functions

fn field required Description
count No Count of rows / group
sum Yes Sum of numeric field
avg Yes Average
min Yes Minimum
max Yes Maximum

When aggregate is present, group_by should list all non-aggregate projected keys. order_by may reference both projected fields and aggregate aliases.

Response

{
    "mode": "graph_traversal",   // or "neb_direct" for queries with no traverse steps
    "rows": [
        {
            "id": "3mFx...",     // base58 cell ID of the root vertex (omitted in aggregate mode)
            "columns": [
                { "name": "name",        "value": "Alice"   },
                { "name": "paper.title", "value": "Attention Is All You Need" },
                { "name": "paper_count", "value": 7         }
            ]
        }
    ],
    // Only present when a search clause was used. Maps field name → id → score.
    "hit_table": {
        "bio": { "3mFx...": 0.92 }
    }
}

Query explain — POST /v1/dsl/query/explain

Same request body; returns the binding and execution plan without running the query.

Examples

Simple filter:

{
    "from": "Product",
    "match": { "field": "in_stock", "op": "=", "value": true },
    "select": ["name", "price"],
    "order_by": [{ "field": "price", "direction": "asc" }],
    "limit": 10
}

Semantic search + traversal:

{
    "from": "Document",
    "search": { "field": "embedding", "text": "knowledge graphs" },
    "traverse": [{ "edges": ["LinkedTo"], "direction": "out", "as": "related" }],
    "select": ["title", "related.title"],
    "limit": 20
}

Grouped aggregation:

{
    "from": "Order",
    "group_by": ["customer_id"],
    "aggregate": [
        { "fn": "count", "as": "total_orders" },
        { "fn": "sum",   "field": "amount", "as": "total_spent" }
    ],
    "order_by": [{ "field": "total_spent", "direction": "desc" }],
    "limit": 10
}

Subgraph DSL — POST /v1/dsl/graph/subgraph

Extracts all vertices and edges reachable from a set of seed vertices within a bounded BFS horizon.

Request

{
    "database": "my_db",

    // Root vertex schema.
    "from": "Person",

    // Filter / search to select seed vertices (same as query DSL).
    "match": { "field": "name", "op": "=", "value": "Alice" },
    "search": null,

    "expand": {
        "edges": ["Knows", "WorksWith"],  // edge schemas; [] = all
        "depth": 2,                        // BFS depth (default 1)

        // Optional Lisp predicates for pruning during BFS.
        "vertex_filter": null,
        "edge_filter": null,

        "max_vertices": 500,               // hard cap on returned vertices (default 1000)
        "include_edges": true              // whether to include edge data (default true)
    },

    // Field projection: only return these fields per vertex. Omit or [] = all fields.
    "select": ["name", "age"],

    // Field projection for edge bodies. Omit or [] = all fields.
    "edge_select": ["since"]
}

Expand clause fields

Field Type Default Description
edges string[] all Edge schemas to traverse
direction string all Traversal direction (all directions when omitted)
depth usize 1 BFS depth
vertex_filter string none Lisp expression; vertex is pruned when it evaluates false
edge_filter string none Lisp expression; edge is pruned when it evaluates false
max_vertices usize 1000 Maximum vertices in response
include_edges bool true Include edge records in response

Top-level projection fields

Field Type Default Description
select string[] all Vertex fields to include. null or [] = all fields
edge_select string[] all Edge body fields to include. null or [] = all

Response

{
    "vertices": [
        {
            "id": "3mFx...",
            "schema": "Person",
            "fields": { "name": "Alice", "age": 32 }
        }
    ],
    "edges": [
        {
            "from": "3mFx...",
            "to":   "7kLp...",
            "direction": "out",
            "schema": "Knows",
            "fields": { "since": 2019 }
        }
    ]
}

Seed vertices are included in vertices. When include_edges is false, edges is an empty array.

Example

{
    "from": "Article",
    "search": { "field": "content", "text": "neural network" },
    "expand": {
        "edges": ["References"],
        "direction": "out",
        "depth": 2,
        "max_vertices": 200
    }
}

MST DSL — POST /v1/dsl/graph/mst

Builds a rooted minimum-cost arborescence subgraph from one or more seed vertices using the GAS SSSP-with-parent program. In the current implementation, this is the rooted shortest-path forest induced by the configured traversal weights.

Request

{
    "database": "my_db",

    // Seed vertex schema.
    "from": "Person",

    // Seed selection predicate.
    "match": { "field": "_cell_id", "op": "=", "value": "3mFx..." },

    "via": {
        "edges": ["Knows", "Mentors"],
        "direction": "out",
        "max_depth": 6,
        "weight": {
            "schema": {
                "Knows": 1.0,
                "Mentors": 0.5
            }
        }
    },

    "select": ["name"],
    "edge_select": ["since"]
}

Response

Returns the same shape as subgraph:

{
    "vertices": [
        {
            "id": "3mFx...",
            "schema": "Person",
            "fields": { "name": "Alice" }
        }
    ],
    "edges": [
        {
            "from": "3mFx...",
            "to": "7nPq...",
            "schema": "Mentors",
            "fields": { "since": 2021 }
        }
    ]
}

Notes

  • mst is rooted: it starts from the seed set resolved by from + match.
  • For a single seed, the result is a rooted shortest-path arborescence.
  • For multiple seeds, the result is the union of the rooted shortest-path forest.
  • Seed filtering happens before GAS starts: use match (and root search, when present) to choose the starting vertices.
  • via.vertex_filter is evaluated during GAS traversal on the current frontier vertex before its outgoing edges are expanded. A reached vertex can still appear in the result even if it is later prevented from expanding further.
  • via.edge_filter is evaluated during GAS traversal on each candidate edge before that edge is exposed to scatter.
  • via.max_visited is still rejected for this endpoint because the current GAS execution path does not support a traversal-wide visit budget.

Shortest Path DSL — POST /v1/dsl/graph/path

Finds a shortest path between two vertex sets using distributed bidirectional BFS.

Request

{
    "database": "my_db",

    // Start vertex set: all vertices of `schema` that satisfy `match`/`search`.
    "from": {
        "schema": "Person",
        "match": { "field": "name", "op": "=", "value": "Alice" }
    },

    // Goal vertex set.
    "to": {
        "schema": "Person",
        "match": { "field": "name", "op": "=", "value": "Bob" }
    },

    "via": {
        "edges": [],             // [] = all edge schemas
        "direction": "undirected",
        "max_depth": 10,         // maximum path length (default 10)
        "max_visited": 100000    // BFS budget (default 100 000)
    },

    // Field projection: only return these fields per path vertex. Omit or [] = all fields.
    "select": ["name"],

    // Field projection for path edge bodies. Omit or [] = all fields.
    "edge_select": ["weight"]
}

from / to each accept match (predicate) and/or search (semantic/fulltext) to identify the endpoint vertices.

Response

{
    "found": true,
    "hops": 3,
    "path": [
        { "id": "3mFx...", "schema": "Person", "fields": { "name": "Alice" } },
        { "id": "9qRz...", "schema": "Person", "fields": { "name": "Carol" } },
        { "id": "2pNw...", "schema": "Company","fields": { "name": "ACME" }  },
        { "id": "7kLp...", "schema": "Person", "fields": { "name": "Bob" }   }
    ],
    "edges": [
        { "from": "3mFx...", "to": "9qRz...", "direction": "out",  "schema": "Knows",    "fields": {} },
        { "from": "9qRz...", "to": "2pNw...", "direction": "out",  "schema": "WorksAt",  "fields": {} },
        { "from": "2pNw...", "to": "7kLp...", "direction": "out",  "schema": "Employs",  "fields": {} }
    ]
}

When found is false, hops is 0 and both path and edges are empty. path and edges are ordered from start to goal; edges[i] connects path[i]path[i+1].

Example

{
    "from": { "schema": "Airport", "match": { "field": "iata", "op": "=", "value": "LAX" } },
    "to":   { "schema": "Airport", "match": { "field": "iata", "op": "=", "value": "SYD" } },
    "via":  { "edges": ["Route"], "direction": "out", "max_depth": 6 }
}

Weighted shortest path example

The via.weight clause supports the same composed traversal weighting used by other weighted graph algorithms. The example below minimizes:

edge_cost + ln(degree(Route, out))

That means each step pays the edge body's cost field plus a logarithmic penalty for expanding from high-degree airports.

{
    "from": { "schema": "Airport", "match": { "field": "iata", "op": "=", "value": "LAX" } },
    "to":   { "schema": "Airport", "match": { "field": "iata", "op": "=", "value": "SYD" } },
    "via": {
        "edges": ["Route"],
        "direction": "out",
        "max_depth": 6,
        "weight": {
            "field": "cost",
            "formula": "(+ edge_weight (ln (degree Route out)))"
        }
    },
    "select": ["iata", "name"],
    "edge_select": ["cost", "airline"]
}

Available inputs inside via.weight.formula include:

  • edge_weight — resolved cost from expr, field, or schema
  • (degree edge_schema direction) — source-vertex degree for that schema and direction

direction may be in, out, undirected, or any. The shorthand (degree Route) defaults to any.

Source vertex weight example

When you provide via.weight.source_expr, it is evaluated against the current source vertex cell for each traversal step. The resulting scalar is exposed to the formula as both source_weight and source_vertex_weight.

{
    "from": { "schema": "Airport", "match": { "field": "iata", "op": "=", "value": "LAX" } },
    "to":   { "schema": "Airport", "match": { "field": "iata", "op": "=", "value": "SYD" } },
    "via": {
        "edges": ["Route"],
        "direction": "out",
        "max_depth": 6,
        "weight": {
            "field": "cost",
            "source_expr": "hub_penalty",
            "formula": "(+ edge_weight source_weight)"
        }
    }
}

In that example, each step pays the edge body's cost plus the hub_penalty read from the current source airport vertex.

Other formula inputs are also available when configured:

  • schema_weight — per-edge-schema baseline cost
  • lookup_weight — resolved lookup-backed cost
  • source_weight / source_vertex_weight — resolved source_expr value

Spreading Activation DSL — POST /v1/dsl/graph/activation

Propagates numerical activation scores outward from seed vertices through the graph, with per-hop decay. Useful for ranked relevance propagation, influence scoring, and recommendation.

Request

{
    "database": "my_db",

    // Seed schema.
    "from": "Concept",

    // Seed selection.
    "match": null,
    "search": { "field": "name", "text": "graph neural network" },

    "via": {
        "edges": ["RelatedTo", "BroaderThan"],
        "direction": "undirected",

        // Per-schema base weights (default 1.0 for unlisted schemas).
        "schema_weights": {
            "RelatedTo":  1.0,
            "BroaderThan": 0.5
        },

        // Optional: read weight from this edge body field (overrides schema_weights).
        "weight_field": "strength",

        // Optional: Lisp expression to compute weight (takes precedence over weight_field).
        // `schema_weight` is available as a pre-resolved variable.
        "weight_expr": "(* schema_weight 0.9)",

        "max_depth": 4,
        "max_visited": 50000
    },

    // Per-hop decay multiplier applied to activation before propagation (default 0.85).
    "decay": 0.85,

    // Prune vertices with activation below this threshold (default 0.0).
    "threshold": 0.05,

    // Return at most this many vertices, ordered by activation descending.
    "top_k": 50,

    // Field projection: only return these fields per activated node. Omit or [] = all fields.
    "select": ["name", "category"]
}

Weight resolution order (highest priority first)

  1. weight_expr — Lisp expression evaluated against the edge body cell
  2. weight_field — numeric field read from the edge body
  3. schema_weights — per-schema map lookup
  4. 1.0 — default

Seed vertices start with activation 1.0. Each hop multiplies the incoming activation by edge_weight × decay. The final activation of a vertex is the maximum over all incoming paths. Seed vertices are excluded from the result.

Response

{
    "nodes": [
        { "id": "5kJm...", "schema": "Concept", "activation": 0.72, "fields": { "name": "GNN" } },
        { "id": "8nPq...", "schema": "Concept", "activation": 0.61, "fields": { "name": "Message Passing" } }
    ]
}

nodes is sorted by activation descending and capped at top_k.

Example

{
    "from": "Tag",
    "match": { "field": "name", "op": "=", "value": "rust" },
    "via": {
        "edges": ["CoOccursWith"],
        "direction": "undirected",
        "schema_weights": { "CoOccursWith": 1.0 }
    },
    "decay": 0.8,
    "threshold": 0.02,
    "top_k": 30
}

Similarity DSL — POST /v1/dsl/graph/similarity

Finds vertices structurally similar to an anchor set by counting shared graph neighbors. Two scoring metrics are available: raw common-neighbor count and Jaccard coefficient.

Request

{
    "database": "my_db",

    // Anchor vertex schema.
    "from": "User",

    // Select anchors by predicate or semantic search.
    "match": { "field": "user_id", "op": "=", "value": "alice" },
    "search": null,

    "via": {
        "edges": ["Purchased", "Viewed"],   // edge schemas defining "neighborhood"
        "direction": "out",

        // Neighborhood expansion depth. Depth 1 (default) is fast and recommended
        // for dense graphs. Increase for sparse graphs.
        "depth": 1,

        // Keep expanding (up to `depth`) if candidates < this. Omit to always stop at `depth`.
        "min_candidates": 100,

        // Hard cap on candidate set size before scoring (default 10 000).
        "max_candidates": 5000
    },

    // Scoring metric: "common_neighbors" | "jaccard" (default "jaccard").
    "metric": "jaccard",

    // Return at most this many vertices (default 20).
    "top_k": 20,

    // Exclude vertices scoring below this (default 0.0).
    "threshold": 0.01,

    // Field projection: only return these fields per similar node. Omit or [] = all fields.
    "select": ["name", "joined"]
}

Via clause fields

Field Type Default Description
edges string[] all Edge schemas defining the neighborhood
direction string all Traversal direction when building neighborhood
depth usize 1 Hops for neighborhood expansion
min_candidates usize none Adaptive expansion target candidate count
max_candidates usize 10 000 Hard cap on candidates before scoring

Metrics

Metric Score range Description
common_neighbors [0, ∞) Raw count of shared neighbors
jaccard [0, 1] |common| / |N(A) ∪ N(c)| (default)

Response

{
    "nodes": [
        {
            "id": "4fTt...",
            "schema": "User",
            "score": 0.41,
            "common_neighbors": 13,
            "fields": { "name": "Bob" }
        }
    ]
}

nodes is sorted by score descending. The anchor vertices themselves are excluded.

Example

{
    "from": "Movie",
    "match": { "field": "title", "op": "=", "value": "Inception" },
    "via": {
        "edges": ["WatchedBy"],
        "direction": "in",
        "min_candidates": 200
    },
    "metric": "jaccard",
    "top_k": 10
}

GraphRAG Context DSL — POST /v1/dsl/graph/context

Designed for Retrieval-Augmented Generation pipelines. Performs a semantic/hybrid seed search and then expands the neighborhood of the seed vertices to gather a rich context subgraph.

Request

{
    "database": "my_db",

    // Vertex schema that has an embedding-indexed field.
    "from": "Chunk",

    // Optional pre-filter applied before the seed search.
    "match": { "field": "doc_id", "op": "=", "value": "doc-42" },

    // Semantic / hybrid search to locate seed chunks.
    "search": { "field": "embedding", "text": "attention mechanism transformers", "top_k": 5 },

    // Graph expansion from the found seeds (same as SubgraphRequest.expand).
    "expand": {
        "edges": ["NextChunk", "LinkedEntity"],
        "direction": "out",
        "depth": 2,
        "max_vertices": 100
    },

    // Field projection for seed and context vertices. Omit or [] = all fields.
    "select": ["text", "doc_id"],

    // Field projection for expansion edges. Omit or [] = all fields.
    "edge_select": ["type"]
}

The expand clause is identical to the one in the Subgraph DSL.

Response

{
    // Seed vertices from the search, sorted by similarity score descending.
    "seeds": [
        {
            "id": "6hYw...",
            "schema": "Chunk",
            "score": 0.94,
            "fields": { "text": "The attention mechanism...", "doc_id": "doc-42" }
        }
    ],

    // Additional vertices discovered by graph expansion (seeds excluded).
    "vertices": [
        { "id": "1cZa...", "schema": "Chunk",  "fields": { "text": "..." } },
        { "id": "9bXe...", "schema": "Entity", "fields": { "name": "Transformer" } }
    ],

    // Edges discovered during expansion.
    "edges": [
        { "from": "6hYw...", "to": "1cZa...", "direction": "out", "schema": "NextChunk", "fields": {} }
    ]
}

seeds are excluded from vertices. Together seeds + vertices + edges form a local subgraph that can be serialized as context for a language model.

Example

{
    "from": "Passage",
    "search": { "field": "embed", "text": "how does HNSW work", "top_k": 3 },
    "expand": {
        "edges": ["SeeAlso"],
        "direction": "undirected",
        "depth": 1,
        "max_vertices": 50
    }
}

Typed Value Encoding

Morpheus stores strongly-typed values. When the engine cannot represent a value as a plain JSON scalar, it emits a tagged object:

{ "$type": "u64",  "value": 1234567890123 }
{ "$type": "i64",  "value": -1 }
{ "$type": "f32",  "value": 3.14 }
{ "$type": "bytes","value": "aGVsbG8=" }   // base64

Plain JSON true/false, null, strings, and numbers within the safe double-precision range are emitted as-is.

Cell IDs are always base58-encoded strings (e.g., "3mFxKpQ...").


Database Selection

All endpoints accept an optional database field in the request body:

{ "database": "my_db", "from": "Person", ... }

When omitted, the server uses the default database. A per-request X-Database HTTP header is also accepted and takes the same effect.

Alternatively, the database name may be embedded in the URL path:

POST /v1/db/{database}/dsl/query
POST /v1/db/{database}/dsl/graph/subgraph
POST /v1/db/{database}/dsl/graph/path
POST /v1/db/{database}/dsl/graph/activation
POST /v1/db/{database}/dsl/graph/context
POST /v1/db/{database}/dsl/graph/similarity

When the database is specified in the URL, the database field in the request body is ignored.