From 22e07755a3519b694863f5fac2ee380d0a3704ae Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?H=C3=A5vard=20M=2E=20Ottestad?= Date: Sat, 13 Sep 2025 17:56:49 +0200 Subject: [PATCH] GH-5418 first attempt at adding an AGENTS.md file (#5419) --- AGENTS.md | 453 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 453 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000000..1d4c5634a20 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,453 @@ +# Codex Agent Playbook + +Welcome, AI Agent! Your persistence, curiosity, and craftsmanship make a difference. Take your time, work methodically, validate thoroughly, and iterate. This repository is large and tests can take time — that’s expected and supported. + +> **Timebox:** Aim to complete each autonomous run in **15–30 minutes**. + +## Purpose & Contract +- **Bold goal:** deliver correct, minimal, well‑tested changes with clear handoff. No monkey‑patching or band‑aid fixes — always fix the underlying problem at its source. +- **Bias to action:** when inputs are ambiguous, choose a reasonable path, state assumptions, and proceed. +- **Ask only when blocked or irreversible:** escalate only if truly blocked (permissions, missing deps, conflicting requirements) or if a choice is high‑risk/irreversible. +- **Definition of Done** + - Code formatted and imports sorted. + - Compiles with a quick profile / targeted modules. + - Relevant module tests pass; failures triaged or crisply explained. + - Only necessary files changed; headers correct for new files. + - Clear final summary: what changed, why, where, how verified, next steps. + +### No Monkey‑Patching or Band‑Aid Fixes (Non‑Negotiable) + +This repository requires durable, root‑cause fixes. Superficial changes that mask symptoms, mute tests, or add ad‑hoc toggles are not acceptable. + +What this means in practice +- Find and fix the root cause in the correct layer/module. +- Add or adjust targeted tests that fail before the fix and pass after. +- Keep changes minimal and surgical; do not widen APIs/configs to “make tests green”. +- Maintain consistency with existing style and architecture; prefer refactoring over hacks. + +Strictly avoid +- Sleeping/timeouts to hide race conditions or flakiness. +- Broad catch‑and‑ignore or logging‑and‑continue of exceptions. +- Muting, deleting, or weakening assertions in tests to pass builds. +- Reflection or internal state manipulation to bypass proper interfaces. +- Feature flags/toggles that disable validation or logic instead of fixing it. +- Changing public APIs or configs without necessity and clear rationale tied to the root cause. + +Preferred approach (fast and rigorous) +- Reproduce the issue and isolate the smallest failing test (class → method). +- Trace to the true source; fix it in the right module. +- Add focused tests covering the behavior and any critical edge cases. +- Run tight, targeted verifies for the impacted module(s) and broaden scope only if needed. + +Review bar and enforcement +- Treat this policy as a blocking requirement. Changes that resemble workarounds will be rejected. +- Your final handoff must demonstrate: failing test before the fix, explanation of the root cause, minimal fix at source, and passing targeted tests after. + +## Environment +- **JDK:** 11 (minimum). The project builds and runs on Java 11+. +- **Maven default:** run **offline** using `-o` whenever possible. +- **Network:** only when needed to fetch missing deps/plugins; then rerun the exact command **without** `-o` once, and return to offline. +- **Large project:** some module test suites can take **5–10 minutes**. Be patient, but bias toward **targeted** runs to keep momentum. + +### Maven `-am` usage (house rule) + +`-am` (also-make) pulls in required upstream modules. That’s helpful for **compiles**, but hazardous for **tests**: Maven will advance included modules to the same lifecycle phase and run **their** tests too. + +**Rule of thumb** +- ✅ Use `-am` **only** for compile/verify with tests skipped (e.g. `-Pquick`).: + - `mvn -o -pl -am -Pquick verify` +- ❌ Do **not** use `-am` with `verify` when tests are enabled. + +**Two-step pattern (fast + safe)** +1) **Compile deps fast (skip tests):** + `mvn -o -pl -am -Pquick verify` +2) **Run tests:** + `mvn -o -pl verify | tail -500` + +It is illegal to `-am` when running tests! +It is illegal to `-q` when running tests! + +## Quick Start (First 10 Minutes) +1. **Discover** + - List modules: inspect root `pom.xml` (aggregator) and the module tree (see “Maven Module Overview” below). + - Search fast with ripgrep: `rg -n ""` +2. **Build sanity (fast, skip tests)** + - **Preferred:** `mvn -o -Pquick install | tail -200` + - **Alternative:** `mvn -o -Pquick verify | tail -200` +3. **Format (Java, imports, XML)** + - `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` +4. **Targeted tests (tight loops)** + - By module (incl. deps): `mvn -o -pl verify | tail -500` + - Single class: `mvn -o -pl -Dtest=ClassName verify | tail -500` + - Single method: `mvn -o -pl -Dtest=ClassName#method verify | tail -500` +5. **Inspect failures** + - **Unit (Surefire):** `/target/surefire-reports/` + - **IT (Failsafe):** `/target/failsafe-reports/` + +It is illegal to `-am` when running tests! +It is illegal to `-q` when running tests! + + +## Working Loop +- **Plan** + - Break task into **small, verifiable steps**; keep one step in progress. + - Announce a short preamble before long actions (builds/tests). + - Decide and proceed autonomously; document assumptions inline. +- **Change** + - Make minimal, surgical edits. Keep style and structure consistent. +- **Format** + - `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` +- **Compile (fast)** + - **Iterate locally:** `mvn -o -pl -am -Pquick verify | tail -500` +- **Test** + - Start with the smallest scope that exercises your change (class → module). + - For integration‑impacted changes, run module `verify` (includes ITs). +- **Triage** + - Read reports; fix root cause; expand scope **only when needed**. +- **Iterate** + - Keep moving without waiting for permission between steps. Escalate only at blocking points. + - Repeat until **Definition of Done** is satisfied. + +It is illegal to `-am` when running tests! +It is illegal to `-q` when running tests! + +## Planning & Progress +- **Living plan:** update as you learn; one active step at a time (5–7 words each). +- **Progress updates:** one crisp sentence when switching steps or after long runs. +- **Decide early:** if scope is unclear, pick the most reasonable option, note the assumption, and continue. +- **Escalate sparingly:** ask only if options diverge significantly in cost/impact or you are blocked (permissions, network policy, missing secrets). +- **Checkpoint cadence:** inform to maintain visibility; do **not** block on approvals unless required. + +## Testing Strategy +- **Prefer module tests you touched:** `-pl ` +- **Narrow further** to a class/method for tight loops; then broaden to the module. +- **Expand scope** when: + - Your change crosses module boundaries, or + - Neighbor module failures indicate integration impact. +- **Read reports** + - Surefire (unit): `target/surefire-reports/` + - Failsafe (IT): `target/failsafe-reports/` +- **Helpful flags** + - `-Dtest=Class#method` (unit selection) + - `-Dit.test=ITClass#method` (integration selection) + - `-DtrimStackTrace=false` (full traces) + - `-DskipITs` (focus on unit tests) + - `-DfailIfNoTests=false` (when selecting a class that has no tests on some platforms) + +## Assertions: Make invariants explicit + +Assertions are executable claims about what must be true. They’re the fastest way to surface “impossible” states and to localize bugs at the line that crossed a boundary it had no business crossing. Use them both as **temporary tripwires** during investigation and as **permanent contracts** once an invariant is known to matter. + +**Two useful flavors** + +- **Temporary tripwires (debug asserts):** Add while hunting a failing test or weird behavior. Keep them cheap, contextual, and local to the suspect path. Remove after the mystery is solved **or** convert to permanent checks if the invariant is genuinely important. +- **Permanent contracts:** Encode **preconditions** (valid inputs), **postconditions** (valid outputs), and **invariants** (state that must always hold). These stay and prevent regressions. + +**Where to add assertions** + +- At **module boundaries** and **after parsing/external calls** (validate assumptions about returned/decoded data). +- Around **state transitions** (illegal transitions should fail loudly). +- In **concurrency hotspots** (e.g., “lock must be held”, “no concurrent mutation”). +- Before/after **caching, batching, or memoization** (keys, sizes, ordering, monotonicity). +- For **exhaustive enums** in `switch` statements (treat unexpected values as hard errors). + +**How to write good assertions** + +- One fact per assert. Fail **fast**, fail **usefully**. +- Include **stable context** in the message (ids, sizes, states) so the failure is self‑explanatory. +- Avoid side effects in the condition or message. Assertions may be disabled in some runtimes. +- Keep them **cheap**: no I/O, heavy allocations, or deep logging in the message. +- Don’t use asserts for **user‑facing validation**. Raise exceptions for expected bad inputs. + +**Java specifics** + +- **Enable VM assertions in tests.** Tests must run with `-ea` so `assert` is active. +- Use **`assert`** for debug‑only invariants that “cannot happen.” Use **exceptions** for runtime guarantees: + - Preconditions: `IllegalArgumentException` / `Objects.requireNonNull` (or Guava `Preconditions` if present). + - Invariants: `IllegalStateException`. +- Prefer treating unexpected enum values as **hard errors** rather than adding a quiet `default` path. + +**Concrete examples** + +Precondition (permanent) +```java +void setPort(int port) { + if (port < 1 || port > 65_535) { + throw new IllegalArgumentException("port out of range: " + port); + } + this.port = port; +} +``` + +Invariant (permanent) +```java +void advance(State next) { + if (!allowedTransitions.get(state).contains(next)) { + throw new IllegalStateException("Illegal transition " + state + " → " + next); + } + state = next; +} +``` + +Debug tripwire (temporary; remove or convert later) +```java +// Narrow a flaky failure around ordering +assert isSorted(results) : "unsorted results, size=" + results.size() + " ids=" + ids(results); +``` + +Unreachable (hard error) +```java +switch (kind) { + case A: return handleA(); + case B: return handleB(); + default: + throw new IllegalStateException("Unhandled kind: " + kind); +} +``` + +Concurrency assumption +```java +synchronized void put(String k, String v) { + assert Thread.holdsLock(this) : "put must hold instance monitor"; + // ... +} +``` + + +House rule: Asserts are allowed and encouraged. Removing or weakening an assertion to “make it pass” is strictly forbidden — fix the cause, not the guardrail. + + +## Triage Playbook +- **Missing dep/plugin offline** + - Remedy: **rerun the exact command without `-o`** once to fetch; then return offline. +- **Compilation errors** + - Fix imports, generics, visibility; re‑run quick verify (skip tests) in the **module**. +- **Flaky/slow tests** + - Run the specific failing test; read its report; stabilize root cause before broad runs. +- **Formatting failures** + - Run formatter/import/XML sort; re‑verify. +- **License header missing** + - Add header for **new** files only (see “Source File Headers”); **do not** change years on existing files. + +## Code Formatting +- **Always run before finalizing:** + - `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` +- **Style:** no wildcard imports; 120‑char width; curly braces always; LF line endings. +- **Tip:** formatting/import sort may be validated during `verify`. Running the commands proactively avoids CI/style failures. + +## Source File Headers +Use this exact header for **new Java files only** (replace `${year}` with current year): + +``` +/******************************************************************************* + * Copyright (c) ${year} Eclipse RDF4J contributors. + * + * All rights reserved. This program and the accompanying materials + * are made available under the terms of the Eclipse Distribution License v1.0 + * which accompanies this distribution, and is available at + * http://www.eclipse.org/org/documents/edl-v10.php. + * + * SPDX-License-Identifier: BSD-3-Clause + *******************************************************************************/ + ``` + +Use this exact header. Be very precise. + +Do **not** modify existing headers’ years. + +## Pre‑Commit Checklist +- **Format:** `mvn -o -q -T 2C formatter:format impsort:sort xml-format:xml-format` +- **Compile (fast path):** `mvn -o -Pquick verify | tail -200` +- **Tests (targeted):** `mvn -o -pl verify | tail -500` (broaden scope if needed) +- **Reports:** zero new failures in `target/surefire-reports/` or `target/failsafe-reports/`, or explain precisely. + +## Navigation & Search +- Fast file search: `rg --files` +- Fast content search: `rg -n ""` +- Read big files in chunks: + - `sed -n '1,200p' path/to/File.java` + - `sed -n '201,400p' path/to/File.java` + +## Autonomy Rules (Act > Ask) +- **Default:** act with assumptions. Document assumptions in your plan and final answer. +- **Keep going:** chain steps without waiting for permission; send short progress updates before long actions. +- **Ask only when:** + - Blocked by sandbox/approvals/network policy or missing secrets. + - The decision is destructive/irreversible, repo‑wide, or impacts public APIs. + - Adding dependencies, changing build profiles, or altering licensing. +- **Prefer reversible moves:** take the smallest local change that unblocks progress; validate with targeted tests before expanding scope. +- **Choose defaults** + - **Tests:** start with `-pl `, then `-Dtest=Class#method` / `-Dit.test=ITClass#method`. + - **Build:** use `-o` quick/profiled commands; briefly drop `-o` to fetch missing deps, then return offline. + - **Formatting:** run formatter/impsort/xml‑format proactively before verify. + - **Reports:** read surefire/failsafe locally; expand scope only when necessary. +- **Error handling** + - On compile/test failure: fix root cause locally, rerun targeted tests, then broaden. + - On flaky tests: rerun class/method; stabilize cause before repo‑wide runs. + - On formatting/license issues: apply prescribed commands/headers immediately. +- **Communication** + - **Preambles:** 1–2 sentences grouping upcoming actions. + - **Updates:** inform to maintain visibility; do **not** request permission unless in “Ask only when” above. + +## Answer Template (Use This) +- **What changed:** summary of approach and rationale. +- **Files touched:** list file paths. +- **Commands run:** key build/test commands. +- **Verification:** which tests passed, where you checked reports. +- **Assumptions:** key assumptions and autonomous decisions you made. +- **Limitations:** anything left or risky edge cases. +- **Next steps:** optional suggestions for follow‑ups. + +## Running Tests +- By module: + - `mvn -o -pl core/sail/shacl verify | tail -500` +- Entire repo: + - `mvn -o verify` (long; only when appropriate) +- Useful flags: + - `-Dtest=ClassName` + - `-Dtest=ClassName#method` + - `-Dit.test=ITClass#method` + - `-DtrimStackTrace=false` + +## Build +- **Build without tests (fast path):** + - `mvn -o -Pquick verify` +- **Verify with tests:** + - Targeted module(s): `mvn -o -pl verify` + - Entire repo: `mvn -o verify` (use only when appropriate) +- **When offline fails due to missing deps:** + - Re‑run the **exact** command **without** `-o` once to fetch, then return to `-o`. + +## Maven Module Overview + +The project is organised as a multi-module Maven build. The diagram below lists +all modules and submodules with a short description for each. + +``` +rdf4j: root project +├── assembly-descriptors: RDF4J: Assembly Descriptors +├── core: Core modules for RDF4J + ├── common: RDF4J common: shared classes + │ ├── annotation: RDF4J common annotation classes + │ ├── exception: RDF4J common exception classes + │ ├── io: RDF4J common IO classes + │ ├── iterator: RDF4J common iterators + │ ├── order: Order of vars and statements + │ ├── text: RDF4J common text classes + │ ├── transaction: RDF4J common transaction classes + │ └── xml: RDF4J common XML classes + ├── model-api: RDF model interfaces. + ├── model-vocabulary: Well-Known RDF vocabularies. + ├── model: RDF model implementations. + ├── sparqlbuilder: A fluent SPARQL query builder + ├── rio: Rio (RDF I/O) is an API for parsers and writers of various RDF file formats. + │ ├── api: Rio API. + │ ├── languages: Rio Language handler implementations. + │ ├── datatypes: Rio Datatype handler implementations. + │ ├── binary: Rio parser and writer implementation for the binary RDF file format. + │ ├── hdt: Experimental Rio parser and writer implementation for the HDT file format. + │ ├── jsonld-legacy: Rio parser and writer implementation for the JSON-LD file format. + │ ├── jsonld: Rio parser and writer implementation for the JSON-LD file format. + │ ├── n3: Rio writer implementation for the N3 file format. + │ ├── nquads: Rio parser and writer implementation for the N-Quads file format. + │ ├── ntriples: Rio parser and writer implementation for the N-Triples file format. + │ ├── rdfjson: Rio parser and writer implementation for the RDF/JSON file format. + │ ├── rdfxml: Rio parser and writer implementation for the RDF/XML file format. + │ ├── trix: Rio parser and writer implementation for the TriX file format. + │ ├── turtle: Rio parser and writer implementation for the Turtle file format. + │ └── trig: Rio parser and writer implementation for the TriG file format. + ├── queryresultio: Query result IO API and implementations. + │ ├── api: Query result IO API + │ ├── binary: Query result parser and writer implementation for RDF4J's binary query results format. + │ ├── sparqljson: Query result writer implementation for the SPARQL Query Results JSON Format. + │ ├── sparqlxml: Query result parser and writer implementation for the SPARQL Query Results XML Format. + │ └── text: Query result parser and writer implementation for RDF4J's plain text boolean query results format. + ├── query: Query interfaces and implementations + ├── queryalgebra: Query algebra model and evaluation. + │ ├── model: A generic query algebra for RDF queries. + │ ├── evaluation: Evaluation strategy API and implementations for the query algebra model. + │ └── geosparql: Query algebra implementations to support the evaluation of GeoSPARQL. + ├── queryparser: Query parser API and implementations. + │ ├── api: Query language parsers API. + │ └── sparql: Query language parser implementation for SPARQL. + ├── http: Client and protocol for repository communication over HTTP. + │ ├── protocol: HTTP protocol (REST-style) + │ └── client: Client functionality for communicating with an RDF4J server over HTTP. + ├── queryrender: Query Render and Builder tools + ├── repository: Repository API and implementations. + │ ├── api: API for interacting with repositories of RDF data. + │ ├── manager: Repository manager + │ ├── sail: Repository that uses a Sail stack. + │ ├── dataset: Implementation that loads all referenced datasets into a wrapped repository + │ ├── event: Implementation that notifies listeners of events on a wrapped repository + │ ├── http: "Virtual" repository that communicates with a (remote) repository over the HTTP protocol. + │ ├── contextaware: Implementation that allows default values to be set on a wrapped repository + │ └── sparql: The SPARQL Repository provides a RDF4J Repository interface to any SPARQL end-point. + ├── sail: Sail API and implementations. + │ ├── api: RDF Storage And Inference Layer ("Sail") API. + │ ├── base: RDF Storage And Inference Layer ("Sail") API. + │ ├── inferencer: Stackable Sail implementation that adds RDF Schema inferencing to an RDF store. + │ ├── memory: Sail implementation that stores data in main memory, optionally using a dump-restore file for persistence. + │ ├── nativerdf: Sail implementation that stores data directly to disk in dedicated file formats. + │ ├── model: Sail implementation of Model. + │ ├── shacl: Stacked Sail with SHACL validation capabilities + │ ├── lmdb: Sail implementation that stores data to disk using LMDB. + │ ├── lucene-api: StackableSail API offering full-text search on literals, based on Apache Lucene. + │ ├── lucene: StackableSail implementation offering full-text search on literals, based on Apache Lucene. + │ ├── solr: StackableSail implementation offering full-text search on literals, based on Solr. + │ ├── elasticsearch: StackableSail implementation offering full-text search on literals, based on Elastic Search. + │ ├── elasticsearch-store: Store for utilizing Elasticsearch as a triplestore. + │ └── extensible-store: Store that can be extended with a simple user-made backend. + ├── spin: SPARQL input notation interfaces and implementations + ├── client: Parent POM for all RDF4J parsers, APIs and client libraries + ├── storage: Parent POM for all RDF4J storage and inferencing libraries + └── collection-factory: Collection Factories that may be reused for RDF4J + ├── api: Evaluation + ├── mapdb: Evaluation + └── mapdb3: Evaluation +├── tools: Server, Workbench, Console and other end-user tools for RDF4J. + ├── config: RDF4J application configuration classes + ├── console: Command line user interface to RDF4J repositories. + ├── federation: A federation engine for virtually integrating SPARQL endpoints + ├── server: HTTP server implementing a REST-style protocol + ├── server-spring: HTTP server implementing a REST-style protocol + ├── workbench: Workbench to interact with RDF4J servers. + ├── runtime: Runtime dependencies for an RDF4J application + └── runtime-osgi: OSGi Runtime dependencies for an RDF4J application +├── spring-components: Components to use with Spring + ├── spring-boot-sparql-web: HTTP server component implementing only the SPARQL protocol + ├── rdf4j-spring: Spring integration for RDF4J + └── rdf4j-spring-demo: Demo of a spring-boot project using an RDF4J repo as its backend +├── testsuites: Test suites for Eclipse RDF4J modules + ├── model: Reusable tests for Model API implementations + ├── rio: Test suite for Rio + ├── queryresultio: Reusable tests for QueryResultIO implementations + ├── sparql: Test suite for the SPARQL query language + ├── repository: Reusable tests for Repository API implementations + ├── sail: Reusable tests for Sail API implementations + ├── lucene: Generic tests for Lucene Sail implementations. + ├── geosparql: Test suite for the GeoSPARQL query language + └── benchmark: RDF4J: benchmarks +├── compliance: Eclipse RDF4J compliance and integration tests + ├── repository: Compliance testing for the Repository API implementations + ├── rio: Tests for parsers and writers of various RDF file formats. + ├── model: RDF4J: Model compliance tests + ├── sparql: Tests for the SPARQL query language implementation + ├── lucene: Compliance Tests for LuceneSail. + ├── solr: Tests for Solr Sail. + ├── elasticsearch: Tests for Elasticsearch. + └── geosparql: Tests for the GeoSPARQL query language implementation +├── examples: Examples and HowTos for use of RDF4J in Java +├── bom: RDF4J Bill of Materials (BOM) +└── assembly: Distribution bundle assembly +``` + +## Safety & Boundaries +- Don’t commit or push unless explicitly asked. +- Don’t add new dependencies without explicit approval. +- Use approvals sparingly: request approval only for network fetches when offline fails, destructive operations, or repo‑wide impacts. Otherwise proceed locally and continue working. + + +It is illegal to `-am` when running tests! +It is illegal to `-q` when running tests! +You must follow these rules and instructions exactly as stated.