Skip to content

Implement Basic Functionalities for the Query Module#2

Open
angelip2303 wants to merge 20 commits intorust-rdf:masterfrom
angelip2303:master
Open

Implement Basic Functionalities for the Query Module#2
angelip2303 wants to merge 20 commits intorust-rdf:masterfrom
angelip2303:master

Conversation

@angelip2303
Copy link
Copy Markdown
Collaborator

@angelip2303 angelip2303 commented Jan 20, 2025

I am working on the basic functionalities of the query module. My idea is to write a definition of the structs that is as generic as possible, so that it can be used in any implementation of the Term trait. I am trying to respect as much as I can the original Ruby implementation, but I am having some issues with the fact that Ruby is dynamically typed. That's why we have introduced the Matcher struct, as Rust does not support Symbols (SPARQL query variables) and nil (the wildcard), and Pattern does not implement the Statement trait (term matchers can be variables).

Apart from that, let me know if I should modify this basic implementation before committing into the implementation of more complex methods.

@angelip2303 angelip2303 added the enhancement New feature or request label Jan 20, 2025
@angelip2303 angelip2303 self-assigned this Jan 20, 2025
@angelip2303 angelip2303 requested a review from artob as a code owner January 20, 2025 12:27
@angelip2303 angelip2303 requested review from SamuelSarle and removed request for SamuelSarle January 20, 2025 15:02
@angelip2303
Copy link
Copy Markdown
Collaborator Author

I've made a small refactor to the solution. Instead of using generics, I switched to dynamic dispatch, which I believe aligns better with the code style we've adopted in the rdf-model. I've also moved the Queryable trait to the rdf-query crate to prevent cyclic dependencies.

@angelip2303 angelip2303 marked this pull request as draft January 21, 2025 14:47
@artob
Copy link
Copy Markdown
Member

artob commented Jan 22, 2025

@angelip2303 When would you expect this to be ready to review? (Since it's still in draft mode)

@angelip2303
Copy link
Copy Markdown
Collaborator Author

@angelip2303 When would you expect this to be ready to review? (Since it's still in draft mode)

I believe that this will be ready before Friday, as I am currently trying to draft the basic query functionality

@angelip2303
Copy link
Copy Markdown
Collaborator Author

The current implementation is almost fully functional, the only issue is that the execute method in Query does not create Solutions lazily, and that each concrete Solution stores a HeapTerm instead of a polymorphic Term. This last thing may not be an issue, but this design can be reconsidered.

@angelip2303 angelip2303 marked this pull request as ready for review January 23, 2025 15:30
@artob
Copy link
Copy Markdown
Member

artob commented Jan 27, 2025

@race-of-sloths invite

@race-of-sloths
Copy link
Copy Markdown

@artob Thank you for calling!

@angelip2303 Thank you for the contribution! Join Race of Sloths by simply mentioning me in your comment/PRs description and start collecting Sloth Points through contributions to open source projects.

What is the Race of Sloths

Race of Sloths is a friendly competition where you can participate in challenges and compete with other open-source contributors within your normal workflow

For contributors:

  • Tag @race-of-sloths inside your pull requests
  • Wait for the maintainer to review and score your pull request
  • Check out your position in the Leaderboard
  • Keep weekly and monthly streaks to reach higher positions
  • Boast your contributions with a dynamic picture of your Profile

For maintainers:

  • Score pull requests that participate in the Race of Sloths and receive a reward
  • Engage contributors with fair scoring and fast responses so they keep their streaks
  • Promote the Race to the point where the Race starts promoting you
  • Grow the community of your contributors

Feel free to check our website for additional details!

Bot commands
  • For contributors
    • Include a PR: @race-of-sloths include to enter the Race with your PR
  • For maintainers:
    • Invite contributor @race-of-sloths invite to invite the contributor to participate in a race or include it, if it's already a runner.
    • Assign points: @race-of-sloths score [1/2/3/5/8/13] to award points based on your assessment.
    • Reject this PR: @race-of-sloths exclude to send this PR back to the drawing board.
    • Exclude repo: @race-of-sloths pause to stop bot activity in this repo until @race-of-sloths unpause command is called

@angelip2303
Copy link
Copy Markdown
Collaborator Author

This is ready for review 😎

@artob artob requested a review from imunproductive February 14, 2025 20:50
@artob artob assigned imunproductive and unassigned angelip2303 Feb 14, 2025
Copy link
Copy Markdown

@imunproductive imunproductive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this PR. I apologize for the delayed review. Overall the code seems good to me at first glance, but before going any further there is an immediate issue with building caused by OxrdfReader and BorshReader implementing trait Enumerable but not implementing its method grep. Could you please fix it?

pub trait Enumerable:
Countable + Iterator<Item = Result<Box<dyn Statement>, Box<dyn Error>>>
{
fn grep(&self, pattern: &impl PartialEq<Box<dyn Statement>>) -> Self
Copy link
Copy Markdown

@imunproductive imunproductive Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep is not implemented for OxrdfReader and BorshReader

Copy link
Copy Markdown
Contributor

@SamuelSarle SamuelSarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!
Generally, your implementation is good. Where it struggles is fitting with the existing Reader / Enumerable implementations.

Since in lib/rdf-query/src/lib.rs the module traits and in lib/rdf-query/src/pattern.rs the method Pattern::execute(&self, &Queryable) are non-public, I assume your intended interface for this package would be something like this:

use rdf_query::{pattern::Pattern, query::Query};
use rdf_reader::{
    // rdf_reader::providers::* isn't actually reachable because the `providers` module isn't public,
    // I modified it locally in lib/rdf-reader/lib/lib.rs.
    providers::OxrdfReader,

    Format,
    ReaderOptions,
};

fn main() {
    let file = std::fs::File::open("/path/to/data.ttl").unwrap();
    let options = ReaderOptions {
        format: Some(Format::Turtle),
    };
    let reader = OxrdfReader::new(file, options).unwrap();

    let pat = Pattern::new("<subject>", "<predicate>", "<object>", None);
    let query = Query::new(vec![pat]);

    let solutions = query.execute(&reader);

    for solution in solutions {
        for (variable, value) in solution.each_binding() {
            println!("{:?} => {:?}", variable, value);
        }
    }
}

This would be fine otherwise but the Queryable makes some assumptions that Reader / Enumerable can't promise. In essence all those types are wrapping some kind of iteration functionality but they don't—and sometimes can't—provide the sort of seek or "reset" that this snippet in the PR requires:

        let solutions: Vec<_> = self
            .patterns
            .iter()
            .flat_map(|pattern| {
                let statements = pattern.execute(queryable);
                statements.filter_map(|res| res.ok().map(|stmt| pattern.solution(stmt)))
            })
            .collect();

The flow for that is:

  1. iter over patterns, for the first one run pattern.execute(quaryable). The quaryable reads statements from the underlying iterator until it is finished and returns None
  2. repeat for the second pattern, the iterator is now empty and returns None.

Regarding the traits

  • Enumerable is a wrapper around Countable and Iterator<_>.
  • Reader and it's implementations like OxrdfReader implementations are a wrapper around Source which is around Enumerable, which is around Iterator, etc. etc.
  • Iterators are stateful and they give item for each call to Iterator::next until they are finished.
  • The reader implementations themselves are also wrappers around std::io:Read which only provides a method to get more bytes, has no resetting/seeking.

So all in all I can see some at least a couple methods to help with this:

  1. Instead of executing each pattern sequentially over the Enumerable, read one item from Enumerable and match it against each pattern.
  2. Have tighter requirements for Queryable than Enumerable, e.g. require that Queryable implementations also provide a reset method.
  3. Read statements to memory and then match the query patterns over them. Not ideal for many reasons, there could be many more statements that fit memory, the reader could be over a network socket that doesn't "end", etc.
  4. Perhaps tying Queryable to the readers (BorshReader/OxrdfReader) isn't the correct strategy? As far as I can tell from the RDF.rb side, execute takes # @param [RDF::Queryable] queryable # the graph or repository to query as input. I'd say not trying to implement Queryable for readers and waiting until there is a type that actually can be queried, like a repository, would be a better approach.

Comment on lines +37 to +41
impl From<&str> for Matcher {
fn from(name: &str) -> Self {
name.into()
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is self-recursing.

warning: function cannot return without recursing
  --> lib/rdf-query/src/matcher.rs:38:5
   |
38 | /     fn from(name: &str) -> Self {
39 | |         name.into()
40 | |     }
   | |_____^
   |
note: recursive call site
  --> lib/rdf-query/src/matcher.rs:39:9
   |
39 |         name.into()
   |         ^^^^^^^^^^^
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#unconditional_recursion
   = note: `#[warn(clippy::unconditional_recursion)]` on by default
Suggested change
impl From<&str> for Matcher {
fn from(name: &str) -> Self {
name.into()
}
}
impl From<&str> for Matcher {
fn from(name: &str) -> Self {
Variable::from(name).into()
}
}


/// Executes the query on the given graph.
///
/// If the query nas no patterns, it returns a single empty solution as per
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// If the query nas no patterns, it returns a single empty solution as per
/// If the query has no patterns, it returns a single empty solution as per

}

/// Returns true if the query has no patterns.
pub fn empty(&self) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional: I think this would be good to rename to match the idiomatic format rust uses. Also you're using empty elsewhere with different semantics (Solutions::empty() -> Solutions).

Suggested change
pub fn empty(&self) -> bool {
pub fn is_empty(&self) -> bool {

@artob artob requested a review from Copilot May 7, 2025 23:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces basic query functionalities by defining generic structs and traits to support a SPARQL-like query system. Key changes include:

  • Implementing the Enumerable trait’s grep method for OxrdfReader and BorshReader with a stub (todo!()).
  • Adding new modules for Variable, Pattern, Query, and related query execution logic.
  • Updating traits in the rdf-model and rdf-query crates for query processing support.

Reviewed Changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/rdf-reader/src/providers/oxrdf.rs Adds a stub for the Enumerable trait’s grep method implementation.
lib/rdf-query/src/query.rs Adds query execution logic with documentation for empty queries.
lib/rdf-query/src/pattern.rs Implements pattern matching and solution extraction logic.
lib/rdf-query/src/matcher.rs Introduces Matcher to support variable and term matching.
lib/rdf-model/src/traits/enumerable.rs Extends Enumerable with the grep function signature.
lib/rdf-borsh/src/borsh_reader.rs Adds a grep method stub to support Enumerable for BorshReader.
Other files (variable.rs, solution.rs, graph_pattern.rs, etc.) Provide additional support for query building and execution.

where
Self: Sized,
{
todo!()
Copy link

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grep method is left unimplemented using todo!(), which could lead to runtime panics if invoked. It is advisable to either implement the functionality or clearly mark this stub as non-production ready.

Suggested change
todo!()
let filtered_statements: Vec<Box<dyn Statement>> = self
.filter_map(|result| result.ok()) // Ignore errors
.filter(|statement| pattern == statement)
.collect();
let reader = std::io::Cursor::new(filtered_statements);
let parser = RdfParser::from_format(self.format.try_into().expect("format must be compatible"))
.for_reader(reader);
Self {
format: self.format,
parser,
}

Copilot uses AI. Check for mistakes.
where
Self: Sized,
{
todo!()
Copy link

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grep method is left as a stub using todo!(), which may cause runtime issues when called. Consider implementing the method or providing an explicit indication that it is intentionally unimplemented.

Suggested change
todo!()
// Filter the quads that match the provided pattern
let filtered_quads: Vec<_> = self
.decompressor
.filter_map(|quad| {
if pattern == &Box::new(quad) {
Some(quad)
} else {
None
}
})
.collect();
// Create a new BorshReader with the filtered quads
BorshReader {
decompressor: self.decompressor.clone(), // Assuming decompressor is clonable
term_dict: self.term_dict.clone(),
quad_count: filtered_quads.len(),
read_count: 0,
}

Copilot uses AI. Check for mistakes.

/// Executes the query on the given graph.
///
/// If the query nas no patterns, it returns a single empty solution as per
Copy link

Copilot AI May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the documentation comment: 'nas' should be 'has'.

Suggested change
/// If the query nas no patterns, it returns a single empty solution as per
/// If the query has no patterns, it returns a single empty solution as per

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants