Implement Basic Functionalities for the Query Module#2
Implement Basic Functionalities for the Query Module#2angelip2303 wants to merge 20 commits intorust-rdf:masterfrom
Conversation
|
I've made a small refactor to the solution. Instead of using generics, I switched to dynamic dispatch, which I believe aligns better with the code style we've adopted in the |
|
@angelip2303 When would you expect this to be ready to review? (Since it's still in draft mode) |
I believe that this will be ready before Friday, as I am currently trying to draft the basic query functionality |
|
The current implementation is almost fully functional, the only issue is that the |
|
@race-of-sloths invite |
|
@artob Thank you for calling! @angelip2303 Thank you for the contribution! Join Race of Sloths by simply mentioning me in your comment/PRs description and start collecting Sloth Points through contributions to open source projects. What is the Race of SlothsRace of Sloths is a friendly competition where you can participate in challenges and compete with other open-source contributors within your normal workflow For contributors:
For maintainers:
Feel free to check our website for additional details! Bot commands
|
|
This is ready for review 😎 |
imunproductive
left a comment
There was a problem hiding this comment.
Thanks for your work on this PR. I apologize for the delayed review. Overall the code seems good to me at first glance, but before going any further there is an immediate issue with building caused by OxrdfReader and BorshReader implementing trait Enumerable but not implementing its method grep. Could you please fix it?
| pub trait Enumerable: | ||
| Countable + Iterator<Item = Result<Box<dyn Statement>, Box<dyn Error>>> | ||
| { | ||
| fn grep(&self, pattern: &impl PartialEq<Box<dyn Statement>>) -> Self |
There was a problem hiding this comment.
grep is not implemented for OxrdfReader and BorshReader
There was a problem hiding this comment.
Hello!
Generally, your implementation is good. Where it struggles is fitting with the existing Reader / Enumerable implementations.
Since in lib/rdf-query/src/lib.rs the module traits and in lib/rdf-query/src/pattern.rs the method Pattern::execute(&self, &Queryable) are non-public, I assume your intended interface for this package would be something like this:
use rdf_query::{pattern::Pattern, query::Query};
use rdf_reader::{
// rdf_reader::providers::* isn't actually reachable because the `providers` module isn't public,
// I modified it locally in lib/rdf-reader/lib/lib.rs.
providers::OxrdfReader,
Format,
ReaderOptions,
};
fn main() {
let file = std::fs::File::open("/path/to/data.ttl").unwrap();
let options = ReaderOptions {
format: Some(Format::Turtle),
};
let reader = OxrdfReader::new(file, options).unwrap();
let pat = Pattern::new("<subject>", "<predicate>", "<object>", None);
let query = Query::new(vec![pat]);
let solutions = query.execute(&reader);
for solution in solutions {
for (variable, value) in solution.each_binding() {
println!("{:?} => {:?}", variable, value);
}
}
}This would be fine otherwise but the Queryable makes some assumptions that Reader / Enumerable can't promise. In essence all those types are wrapping some kind of iteration functionality but they don't—and sometimes can't—provide the sort of seek or "reset" that this snippet in the PR requires:
let solutions: Vec<_> = self
.patterns
.iter()
.flat_map(|pattern| {
let statements = pattern.execute(queryable);
statements.filter_map(|res| res.ok().map(|stmt| pattern.solution(stmt)))
})
.collect();The flow for that is:
- iter over patterns, for the first one run
pattern.execute(quaryable). The quaryable reads statements from the underlying iterator until it is finished and returnsNone - repeat for the second pattern, the iterator is now empty and returns
None.
Regarding the traits
- Enumerable is a wrapper around
CountableandIterator<_>. - Reader and it's implementations like OxrdfReader implementations are a wrapper around Source which is around Enumerable, which is around Iterator, etc. etc.
- Iterators are stateful and they give item for each call to
Iterator::nextuntil they are finished. - The reader implementations themselves are also wrappers around std::io:Read which only provides a method to get more bytes, has no resetting/seeking.
So all in all I can see some at least a couple methods to help with this:
- Instead of executing each pattern sequentially over the
Enumerable, read one item fromEnumerableand match it against each pattern. - Have tighter requirements for
QueryablethanEnumerable, e.g. require thatQueryableimplementations also provide a reset method. - Read statements to memory and then match the query patterns over them. Not ideal for many reasons, there could be many more statements that fit memory, the reader could be over a network socket that doesn't "end", etc.
- Perhaps tying
Queryableto the readers (BorshReader/OxrdfReader) isn't the correct strategy? As far as I can tell from the RDF.rb side,executetakes# @param [RDF::Queryable] queryable # the graph or repository to queryas input. I'd say not trying to implementQueryablefor readers and waiting until there is a type that actually can be queried, like a repository, would be a better approach.
| impl From<&str> for Matcher { | ||
| fn from(name: &str) -> Self { | ||
| name.into() | ||
| } | ||
| } |
There was a problem hiding this comment.
This implementation is self-recursing.
warning: function cannot return without recursing
--> lib/rdf-query/src/matcher.rs:38:5
|
38 | / fn from(name: &str) -> Self {
39 | | name.into()
40 | | }
| |_____^
|
note: recursive call site
--> lib/rdf-query/src/matcher.rs:39:9
|
39 | name.into()
| ^^^^^^^^^^^
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unconditional_recursion
= note: `#[warn(clippy::unconditional_recursion)]` on by default
| impl From<&str> for Matcher { | |
| fn from(name: &str) -> Self { | |
| name.into() | |
| } | |
| } | |
| impl From<&str> for Matcher { | |
| fn from(name: &str) -> Self { | |
| Variable::from(name).into() | |
| } | |
| } |
|
|
||
| /// Executes the query on the given graph. | ||
| /// | ||
| /// If the query nas no patterns, it returns a single empty solution as per |
There was a problem hiding this comment.
| /// If the query nas no patterns, it returns a single empty solution as per | |
| /// If the query has no patterns, it returns a single empty solution as per |
| } | ||
|
|
||
| /// Returns true if the query has no patterns. | ||
| pub fn empty(&self) -> bool { |
There was a problem hiding this comment.
Optional: I think this would be good to rename to match the idiomatic format rust uses. Also you're using empty elsewhere with different semantics (Solutions::empty() -> Solutions).
| pub fn empty(&self) -> bool { | |
| pub fn is_empty(&self) -> bool { |
There was a problem hiding this comment.
Pull Request Overview
This PR introduces basic query functionalities by defining generic structs and traits to support a SPARQL-like query system. Key changes include:
- Implementing the Enumerable trait’s grep method for OxrdfReader and BorshReader with a stub (todo!()).
- Adding new modules for Variable, Pattern, Query, and related query execution logic.
- Updating traits in the rdf-model and rdf-query crates for query processing support.
Reviewed Changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/rdf-reader/src/providers/oxrdf.rs | Adds a stub for the Enumerable trait’s grep method implementation. |
| lib/rdf-query/src/query.rs | Adds query execution logic with documentation for empty queries. |
| lib/rdf-query/src/pattern.rs | Implements pattern matching and solution extraction logic. |
| lib/rdf-query/src/matcher.rs | Introduces Matcher to support variable and term matching. |
| lib/rdf-model/src/traits/enumerable.rs | Extends Enumerable with the grep function signature. |
| lib/rdf-borsh/src/borsh_reader.rs | Adds a grep method stub to support Enumerable for BorshReader. |
| Other files (variable.rs, solution.rs, graph_pattern.rs, etc.) | Provide additional support for query building and execution. |
| where | ||
| Self: Sized, | ||
| { | ||
| todo!() |
There was a problem hiding this comment.
The grep method is left unimplemented using todo!(), which could lead to runtime panics if invoked. It is advisable to either implement the functionality or clearly mark this stub as non-production ready.
| todo!() | |
| let filtered_statements: Vec<Box<dyn Statement>> = self | |
| .filter_map(|result| result.ok()) // Ignore errors | |
| .filter(|statement| pattern == statement) | |
| .collect(); | |
| let reader = std::io::Cursor::new(filtered_statements); | |
| let parser = RdfParser::from_format(self.format.try_into().expect("format must be compatible")) | |
| .for_reader(reader); | |
| Self { | |
| format: self.format, | |
| parser, | |
| } |
| where | ||
| Self: Sized, | ||
| { | ||
| todo!() |
There was a problem hiding this comment.
The grep method is left as a stub using todo!(), which may cause runtime issues when called. Consider implementing the method or providing an explicit indication that it is intentionally unimplemented.
| todo!() | |
| // Filter the quads that match the provided pattern | |
| let filtered_quads: Vec<_> = self | |
| .decompressor | |
| .filter_map(|quad| { | |
| if pattern == &Box::new(quad) { | |
| Some(quad) | |
| } else { | |
| None | |
| } | |
| }) | |
| .collect(); | |
| // Create a new BorshReader with the filtered quads | |
| BorshReader { | |
| decompressor: self.decompressor.clone(), // Assuming decompressor is clonable | |
| term_dict: self.term_dict.clone(), | |
| quad_count: filtered_quads.len(), | |
| read_count: 0, | |
| } |
|
|
||
| /// Executes the query on the given graph. | ||
| /// | ||
| /// If the query nas no patterns, it returns a single empty solution as per |
There was a problem hiding this comment.
Typo in the documentation comment: 'nas' should be 'has'.
| /// If the query nas no patterns, it returns a single empty solution as per | |
| /// If the query has no patterns, it returns a single empty solution as per |
I am working on the basic functionalities of the query module. My idea is to write a definition of the structs that is as generic as possible, so that it can be used in any implementation of the Term trait. I am trying to respect as much as I can the original Ruby implementation, but I am having some issues with the fact that Ruby is dynamically typed. That's why we have introduced the Matcher struct, as Rust does not support Symbols (SPARQL query variables) and
nil(the wildcard), and Pattern does not implement the Statement trait (term matchers can be variables).Apart from that, let me know if I should modify this basic implementation before committing into the implementation of more complex methods.