eclipse-rdf4j
diff --git a/‎.codex/skills/query-plan-snapshot-cli/SKILL.md‎
Lines changed: 38 additions & 0 deletions b/‎.codex/skills/query-plan-snapshot-cli/SKILL.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎.codex/skills/query-plan-snapshot-cli/references/workflow.md‎
Lines changed: 54 additions & 0 deletions b/‎.codex/skills/query-plan-snapshot-cli/references/workflow.md‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎.codex/skills/query-plan-snapshot-cli/scripts/interpret_query_plan_regression.py‎
Lines changed: 150 additions & 0 deletions b/‎.codex/skills/query-plan-snapshot-cli/scripts/interpret_query_plan_regression.py‎
Lines changed: 150 additions & 0 deletions
diff --git a/‎.codex/skills/query-plan-snapshot-cli/scripts/run_query_plan_snapshot.sh‎
Lines changed: 87 additions & 0 deletions b/‎.codex/skills/query-plan-snapshot-cli/scripts/run_query_plan_snapshot.sh‎
Lines changed: 87 additions & 0 deletions
diff --git a/‎testsuites/benchmark/src/main/resources/plan/cli/lmdb/lmdb-engineering-q0-96ba43f4c495a7432edbc82600d3d9b1ffe357298316ab64c84b5a2c7909fc52-20260217-140843352-f07eec27.json‎
Lines changed: 56 additions & 0 deletions b/‎testsuites/benchmark/src/main/resources/plan/cli/lmdb/lmdb-engineering-q0-96ba43f4c495a7432edbc82600d3d9b1ffe357298316ab64c84b5a2c7909fc52-20260217-140843352-f07eec27.json‎
Lines changed: 56 additions & 0 deletions
@@ -0,0 +1,38 @@
+---
+name: query-plan-snapshot-cli
+description: Use QueryPlanSnapshotCli to capture and compare RDF4J query plans, then assess likely performance improvements/regressions from execution verification and semantic plan diffs. Trigger when users ask about optimizer impact, query-plan drift, join algorithm changes, or query performance regressions in testsuites/benchmark.
+---
+
+# query-plan-snapshot-cli
+
+Use this skill to run reproducible query-plan captures and classify likely regression/improvement signals.
+
+## Fast workflow
+
+1. Capture baseline run (main/reference commit).
+2. Capture candidate run (changed commit) with same query selector + `--query-id`.
+3. Produce semantic diff (`--compare-existing`).
+4. Interpret runtime + diff together.
+
+## Commands
+
+Use wrapper (enforces pre-install and optional logging):
+
+- Baseline:
+  - `./.codex/skills/query-plan-snapshot-cli/scripts/run_query_plan_snapshot.sh --log /tmp/qps-baseline.log -- --store memory --theme MEDICAL_RECORDS --query-index 0 --query-id med-q0`
+- Candidate:
+  - `./.codex/skills/query-plan-snapshot-cli/scripts/run_query_plan_snapshot.sh --log /tmp/qps-candidate.log -- --store memory --theme MEDICAL_RECORDS --query-index 0 --query-id med-q0 --compare-latest --diff-mode structure+estimates`
+- Compare existing snapshots explicitly:
+  - `mvn -o -Dmaven.repo.local=.m2_repo -pl testsuites/benchmark -DskipTests exec:java@query-plan-snapshot -Dexec.args="--compare-existing --query-id med-q0 --compare-indices 1,0 --no-interactive --diff-mode structure+estimates" | tee /tmp/qps-compare.log`
+- Summarize improvement/regression signal:
+  - `python3 ./.codex/skills/query-plan-snapshot-cli/scripts/interpret_query_plan_regression.py --baseline-log /tmp/qps-baseline.log --candidate-log /tmp/qps-candidate.log --comparison-log /tmp/qps-compare.log`
+
+## Interpretation rule-of-thumb
+
+- `averageMillis` down with stable `resultCount`: improvement signal.
+- `averageMillis` up with stable `resultCount`: regression signal.
+- `actualResultSizes=diff`: semantic/data-shape risk; perf conclusion low confidence.
+- `joinAlgorithms=diff` or `structure=diff`: optimizer behavior changed; correlate with runtime delta.
+- `estimates=diff` only: model/statistics shift; validate with repeated runs.
+
+For more detailed reading patterns and triage prompts, use `references/workflow.md`.
@@ -0,0 +1,54 @@
+# QueryPlanSnapshotCli workflow
+
+## Goal
+
+Read optimizer/query-plan changes as performance signals without mixing in unrelated variables.
+
+## Guardrails
+
+- Same store, theme, and query selector between baseline/candidate.
+- Same `--query-id` to simplify lookup.
+- Keep JVM/system-property flags identical unless intentionally testing a flag.
+- Always refresh build artifacts first:
+  - `mvn -T 1C -o -Dmaven.repo.local=.m2_repo -Pquick clean install | tail -200`
+
+## Minimal run pair
+
+1. Baseline:
+
+`./.codex/skills/query-plan-snapshot-cli/scripts/run_query_plan_snapshot.sh --log /tmp/qps-baseline.log -- --store memory --theme MEDICAL_RECORDS --query-index 0 --query-id med-q0`
+
+2. Candidate:
+
+`./.codex/skills/query-plan-snapshot-cli/scripts/run_query_plan_snapshot.sh --log /tmp/qps-candidate.log -- --store memory --theme MEDICAL_RECORDS --query-index 0 --query-id med-q0 --compare-latest --diff-mode structure+estimates`
+
+3. Explicit compare-existing (stable reproducible diff text):
+
+`mvn -o -Dmaven.repo.local=.m2_repo -pl testsuites/benchmark -DskipTests exec:java@query-plan-snapshot -Dexec.args="--compare-existing --query-id med-q0 --compare-indices 1,0 --no-interactive --diff-mode structure+estimates" | tee /tmp/qps-compare.log`
+
+4. Regression/improvement summary:
+
+`python3 ./.codex/skills/query-plan-snapshot-cli/scripts/interpret_query_plan_regression.py --baseline-log /tmp/qps-baseline.log --candidate-log /tmp/qps-candidate.log --comparison-log /tmp/qps-compare.log`
+
+## Reading semantic diff fields
+
+- `structure=diff`: operator tree changed.
+- `joinAlgorithms=diff`: join strategy changed; usually high-impact for runtime.
+- `actualResultSizes=diff`: result-size flow changed; can indicate data-shape or semantic shifts.
+- `estimates=diff`: cost model changed. In isolation, not enough to claim runtime regression.
+
+## Confidence ladder
+
+- High confidence regression:
+  - `averageMillis` up >= 10% and `structure`/`joinAlgorithms` diff.
+- Medium confidence regression:
+  - `averageMillis` up >= 10% and no semantic diff file available.
+- Low confidence / inconclusive:
+  - Runtime neutral but semantic diff exists, or result counts changed.
+
+## Common mistakes
+
+- Comparing different query IDs or different query text.
+- Forgetting pre-install (`-Pquick clean install`) before CLI run.
+- Treating estimate-only diffs as hard regressions.
+- Ignoring `resultCount` mismatch in execution verification.
@@ -0,0 +1,150 @@
+#!/usr/bin/env python3
+"""Summarize likely query-plan performance regression/improvement signals."""
+
+from __future__ import annotations
+
+import argparse
+import re
+from pathlib import Path
+from typing import Dict, List, Optional
+
+EXECUTION_LINE = re.compile(
+    r"runs=(?P<runs>\d+),\s*"
+    r"totalMillis=(?P<total>\d+),\s*"
+    r"averageMillis=(?P<avg>\d+),\s*"
+    r"resultCount=(?P<results>\d+),\s*"
+    r"softLimitMillis=(?P<soft_limit>\d+),\s*"
+    r"softLimitReached=(?P<soft_reached>true|false),\s*"
+    r"maxRunsReached=(?P<max_reached>true|false)"
+)
+
+DIFF_LINE = re.compile(
+    r"^\s*(?P<level>unoptimized|optimized|executed):\s+"
+    r".*structure=(?P<structure>[^,]+),\s*"
+    r"joinAlgorithms=(?P<joins>[^,]+),\s*"
+    r"actualResultSizes=(?P<actual>[^,]+),\s*"
+    r"estimates=(?P<estimates>[^,\s]+)"
+)
+
+
+def parse_execution_metrics(path: Path) -> Dict[str, int]:
+    text = path.read_text(encoding="utf-8", errors="replace")
+    matches = list(EXECUTION_LINE.finditer(text))
+    if not matches:
+        raise ValueError(f"No execution verification line found in {path}")
+    last = matches[-1]
+    return {
+        "runs": int(last.group("runs")),
+        "total": int(last.group("total")),
+        "avg": int(last.group("avg")),
+        "results": int(last.group("results")),
+    }
+
+
+def parse_semantic_diff(path: Optional[Path]) -> List[Dict[str, str]]:
+    if path is None:
+        return []
+    text = path.read_text(encoding="utf-8", errors="replace")
+    rows: List[Dict[str, str]] = []
+    for line in text.splitlines():
+        match = DIFF_LINE.search(line)
+        if not match:
+            continue
+        rows.append(
+            {
+                "level": match.group("level"),
+                "structure": match.group("structure").strip(),
+                "joins": match.group("joins").strip(),
+                "actual": match.group("actual").strip(),
+                "estimates": match.group("estimates").strip(),
+            }
+        )
+    return rows
+
+
+def runtime_classification(delta_percent: Optional[float]) -> str:
+    if delta_percent is None:
+        return "unknown"
+    if delta_percent <= -10.0:
+        return "improvement"
+    if delta_percent >= 10.0:
+        return "regression"
+    return "neutral"
+
+
+def find_diff(rows: List[Dict[str, str]], key: str) -> bool:
+    return any(row[key] == "diff" for row in rows)
+
+
+def main() -> int:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument("--baseline-log", required=True, type=Path)
+    parser.add_argument("--candidate-log", required=True, type=Path)
+    parser.add_argument("--comparison-log", type=Path)
+    args = parser.parse_args()
+
+    baseline = parse_execution_metrics(args.baseline_log)
+    candidate = parse_execution_metrics(args.candidate_log)
+    semantic_rows = parse_semantic_diff(args.comparison_log)
+
+    avg_base = baseline["avg"]
+    avg_candidate = candidate["avg"]
+    delta_percent: Optional[float]
+    if avg_base == 0:
+        delta_percent = None
+    else:
+        delta_percent = ((avg_candidate - avg_base) / avg_base) * 100.0
+
+    runtime_signal = runtime_classification(delta_percent)
+    result_count_changed = baseline["results"] != candidate["results"]
+
+    structure_changed = find_diff(semantic_rows, "structure")
+    joins_changed = find_diff(semantic_rows, "joins")
+    actual_changed = find_diff(semantic_rows, "actual")
+    estimates_changed = find_diff(semantic_rows, "estimates")
+
+    if result_count_changed:
+        verdict = "semantic regression risk: result count changed; runtime delta not comparable"
+    elif runtime_signal == "regression" and (structure_changed or joins_changed or actual_changed):
+        verdict = "likely performance regression with plan-shape change"
+    elif runtime_signal == "improvement" and (structure_changed or joins_changed):
+        verdict = "likely performance improvement with optimizer-plan change"
+    elif runtime_signal == "regression":
+        verdict = "possible performance regression (no semantic diff evidence provided)"
+    elif runtime_signal == "improvement":
+        verdict = "possible performance improvement"
+    elif structure_changed or joins_changed or actual_changed or estimates_changed:
+        verdict = "plan changed but runtime signal neutral"
+    else:
+        verdict = "no clear regression/improvement signal"
+
+    print("QueryPlanSnapshotCli regression summary")
+    print(f"- baseline avgMillis: {avg_base}")
+    print(f"- candidate avgMillis: {avg_candidate}")
+    if delta_percent is None:
+        print("- delta: n/a (baseline averageMillis=0)")
+    else:
+        print(f"- delta: {delta_percent:+.2f}%")
+    print(f"- baseline resultCount: {baseline['results']}")
+    print(f"- candidate resultCount: {candidate['results']}")
+    print(f"- runtime signal: {runtime_signal}")
+
+    if semantic_rows:
+        print("- semantic diff:")
+        for row in semantic_rows:
+            print(
+                "  "
+                f"{row['level']}: structure={row['structure']}, "
+                f"joinAlgorithms={row['joins']}, "
+                f"actualResultSizes={row['actual']}, "
+                f"estimates={row['estimates']}"
+            )
+    else:
+        print("- semantic diff: not provided")
+
+    print(f"- verdict: {verdict}")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
@@ -0,0 +1,87 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+usage() {
+  cat <<'USAGE'
+Usage:
+  run_query_plan_snapshot.sh [--log <path>] [--online] -- <QueryPlanSnapshotCli args>
+
+Examples:
+  run_query_plan_snapshot.sh --log /tmp/qps.log -- \
+    --store memory --theme MEDICAL_RECORDS --query-index 0 --query-id med-q0
+
+Notes:
+  - Always runs root install first: mvn -T 1C [-o] -Dmaven.repo.local=.m2_repo -Pquick clean install
+  - Pass QueryPlanSnapshotCli args after '--'
+USAGE
+}
+
+log_file=""
+offline_flag="-o"
+
+while [[ $# -gt 0 ]]; do
+  case "$1" in
+  --log)
+    [[ $# -ge 2 ]] || {
+      echo "Missing value for --log" >&2
+      exit 2
+    }
+    log_file="$2"
+    shift 2
+    ;;
+  --online)
+    offline_flag=""
+    shift
+    ;;
+  --help|-h)
+    usage
+    exit 0
+    ;;
+  --)
+    shift
+    break
+    ;;
+  *)
+    echo "Unknown wrapper option: $1" >&2
+    usage
+    exit 2
+    ;;
+  esac
+done
+
+if [[ $# -eq 0 ]]; then
+  echo "No QueryPlanSnapshotCli args provided. Pass args after '--'." >&2
+  usage
+  exit 2
+fi
+
+raw_cli_args=("$@")
+printf -v cli_args '%q ' "${raw_cli_args[@]}"
+cli_args="${cli_args% }"
+
+install_cmd=(mvn -T 1C)
+if [[ -n "$offline_flag" ]]; then
+  install_cmd+=("$offline_flag")
+fi
+install_cmd+=(-Dmaven.repo.local=.m2_repo -Pquick install)
+
+cli_cmd=(mvn)
+if [[ -n "$offline_flag" ]]; then
+  cli_cmd+=("$offline_flag")
+fi
+cli_cmd+=(-Dmaven.repo.local=.m2_repo -pl testsuites/benchmark -DskipTests exec:java@query-plan-snapshot)
+cli_cmd+=(-Dexec.args="$cli_args")
+
+echo ">>> Refreshing artifacts"
+"${install_cmd[@]}" | tail -200
+
+echo ">>> Running QueryPlanSnapshotCli"
+echo ">>> args: $cli_args"
+
+if [[ -n "$log_file" ]]; then
+  mkdir -p "$(dirname "$log_file")"
+  "${cli_cmd[@]}" | tee "$log_file"
+  echo ">>> log: $log_file"
+else
+  "${cli_cmd[@]}"
+fi