Skip to content

GH-4950 LMDB: extensible ID scheme#4956

Merged
hmottestad merged 14 commits intoeclipse-rdf4j:developfrom
kenwenzel:GH-4950-lmdb-extensible-ids
Apr 23, 2026
Merged

GH-4950 LMDB: extensible ID scheme#4956
hmottestad merged 14 commits intoeclipse-rdf4j:developfrom
kenwenzel:GH-4950-lmdb-extensible-ids

Conversation

@kenwenzel
Copy link
Copy Markdown
Member

@kenwenzel kenwenzel commented Apr 17, 2024

Implement an extensible ID scheme for the LmdbStore that also allows to embed values into IDs.

GitHub issue resolved: #4950

Briefly describe the changes proposed in this PR:

Use a binary encoding for IDs that allows to represent the value type, if the value itself is embedded within the id and the embedded value itself.


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 474eadf to 6fd1d40 Compare April 18, 2024 06:11
@hmottestad
Copy link
Copy Markdown
Contributor

The failing integration tests: It's a test that is marked as disabled, but has somehow started to run again. Not sure how that happened.

@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 6fd1d40 to a53aae5 Compare April 22, 2024 08:16
@kenwenzel
Copy link
Copy Markdown
Member Author

@hmottestad It would be good to have this in 5.0.0 as it is a breaking change but I am not sure if I can finish the whole logic.

@hmottestad
Copy link
Copy Markdown
Contributor

Sorry. I've already published the M3 milestone build and we now have a feature freeze until the final release. The LMDB Store is experimental, so you should be able to make breaking changes for 5.1.0. Might be better too, since then users can upgrade to 5.0.0 and get all the fixes and enhancements without any major breaking changes.

@hmottestad
Copy link
Copy Markdown
Contributor

PS: The develop branch is now 5.1.0 so you can merge into the develop branch without affecting the 5.0.0 release.

@nguyenm100
Copy link
Copy Markdown
Contributor

rdf 1.2 is largely done and jena has an implementation. apache/jena#2805

might be a good idea to brush the dust off of this pr. how much is left to do @kenwenzel ?

@kenwenzel
Copy link
Copy Markdown
Member Author

@nguyenm100 Currently, only the ID scheme in the value store is implemented.
What is missing:

  • embedding of (numeric) values into IDs
  • handling of triple terms
  • indexes for triple terms as part of the triple store

@hmottestad Do we already have an issue for RDF 1.2 and/or can we just adapt the current RDF-star support?

@nguyenm100
Copy link
Copy Markdown
Contributor

Hey @kenwenzel is it possible to just do (1) embedding of values into IDs and merge this pr and then (2) and (3) with rdf 1.2 implementation? or is there some dependency from 1.2 for this to be usable?

@kenwenzel
Copy link
Copy Markdown
Member Author

@nguyenm100 Yes, that's perfectly possible.

@nguyenm100
Copy link
Copy Markdown
Contributor

Thanks @kenwenzel are you able to finish (1) and merge this pr? If so, and at that point does it make sense to pull the experimental designation from lmdbstore? (or does that force a major version upgrade?)

@kenwenzel
Copy link
Copy Markdown
Member Author

@nguyenm100 (1) is a bit of work - I think at least 2 or 3 days, but there is no direct dependency to RDF 1.2.

@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from a53aae5 to e615879 Compare June 4, 2025 08:18
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 5 times, most recently from 156de3d to 61a53a4 Compare June 17, 2025 13:47
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 9 times, most recently from 46b8fa7 to 3ec78e6 Compare June 23, 2025 13:00
@kenwenzel
Copy link
Copy Markdown
Member Author

kenwenzel commented Jun 24, 2025

@odysa @nguyenm100 @hmottestad Do we need to make this backwards compatible with the current ID scheme or do you think a breaking change is OK?

@nguyenm100
Copy link
Copy Markdown
Contributor

Hi Ken, what's the loe for backward compat? we have one client keeping a rolling weekly snapshot that I would need to chat with.. alternatively, we can introduce it an lmdb2 sail (is that what tdb did?)

@kenwenzel
Copy link
Copy Markdown
Member Author

@nguyenm100 It should be possible by extracting some constants and considering them in ValueStore and TripleStore. It might have a small performance impact.

@nguyenm100
Copy link
Copy Markdown
Contributor

Hey Ken, all good on our side. changes don't need to be backward compat for us. tx

@kenwenzel kenwenzel removed this from RDF4J STF Feb 25, 2026
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch 3 times, most recently from 002edfb to a06c3a9 Compare March 26, 2026 15:42
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 9404058 to 8faf37d Compare March 31, 2026 18:20
@kenwenzel kenwenzel self-assigned this Apr 17, 2026
@kenwenzel kenwenzel force-pushed the GH-4950-lmdb-extensible-ids branch from 8faf37d to a53a4ef Compare April 19, 2026 17:16
@kenwenzel
Copy link
Copy Markdown
Member Author

@hmottestad Can I merge this?

@hmottestad hmottestad merged commit 7806479 into eclipse-rdf4j:develop Apr 23, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants