Skip to content

fix: added field validation helper for aggregations#2768

Open
mdashti wants to merge 7 commits intomainfrom
paradedb/fix-agg-validation
Open

fix: added field validation helper for aggregations#2768
mdashti wants to merge 7 commits intomainfrom
paradedb/fix-agg-validation

Conversation

@mdashti
Copy link
Copy Markdown
Collaborator

@mdashti mdashti commented Dec 10, 2025

Ticket(s) Closed

What

Adds a public validate_aggregation_fields() helper function that validates all fields in an aggregation request exist in the schema and are configured as fast fields. This allows users to validate aggregations upfront before execution.

Why

Users were getting confusing empty results when aggregation field names had typos or didn't exist. However, some users (like Quickwit) need lenient behavior for schema evolution across different data segments.

This PR provides a validation helper that users can call explicitly when they want validation, while keeping the default aggregation behavior lenient for backward compatibility.

How

Added validate_aggregation_fields() function in agg_req.rs that:

  • Extracts all field names from the aggregation request using existing get_fast_field_names()
  • Validates each field exists in the schema (directly or as part of a JSON field)
  • Checks that each field is configured as a fast field
  • Returns FieldNotFound or SchemaError if validation fails

Users can call this before executing aggregations:

// Validate upfront (optional)
validate_aggregation_fields(&agg_req, segment_reader)?;

// Execute aggregation
let collector = AggregationCollector::from_aggs(agg_req, Default::default());
let results = searcher.search(&query, &collector)?;

The validation is completely optional - existing code continues to work unchanged with lenient behavior.

Tests

  • Added test_aggregation_field_validation_helper testing both invalid and valid fields
  • Includes working doctest example
  • All 166 existing aggregation tests pass unchanged
  • Fully backward compatible - no breaking changes

@PSeitz
Copy link
Copy Markdown
Collaborator

PSeitz commented Dec 10, 2025

The main difference is that we don't always have a fixed schema with JSON fields, which means a field could exist on one segment, but not another one.

Even if a requested field is not part of a JSON field, it would break use cases where you federate a query over different indices.

Adding some validation on top should be easy with the get_fast_field_names method.

@fulmicoton
Copy link
Copy Markdown
Collaborator

I thought I commented here, but can't find my comment :(.
Not sure what happened.

In quickwit, we make it possible to change your schema.
In that case, we check that the request for the current version of the schema, but we run the query over older data, with a best effort approach.

If the fields are not fast fields, the aggregation result will be empty.

So this is not a bug, but a feature.
Can we make your validation optional (and not the default)?

@mdashti
Copy link
Copy Markdown
Collaborator Author

mdashti commented Jan 15, 2026

@PSeitz @fulmicoton Thank you for the feedback! You're absolutely right - the strict enforcement would break important use cases like schema evolution.

I've updated the implementation to make field validation optional and opt-in:

Default Behavior (Unchanged)

  • strict_field_validation = false by default
  • Returns empty results for non-existent fields (current behavior)
  • Supports your use case of querying across splits with different schemas

Strict Mode (Opt-in)

  • strict_field_validation = true when explicitly enabled
  • Returns FieldNotFound error for non-existent fields
  • Useful for catching typos and configuration errors in development

Benefits

  1. Backward Compatible: No breaking changes - default behavior is unchanged
  2. Flexible: Users can choose strict or lenient based on their needs
  3. Explicit: The flag name makes it clear what's happening
  4. No Performance Impact: Validation only runs when explicitly enabled

Let me know if this works for your use case, or if you have other suggestions.

Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/mod.rs Outdated
Comment thread src/aggregation/mod.rs Outdated
Comment on lines +218 to +228
/// Create new aggregation context parameters with strict field validation enabled
pub fn with_strict_field_validation(
limits: AggregationLimitsGuard,
tokenizers: TokenizerManager,
) -> Self {
Self {
limits,
tokenizers,
strict_field_validation: true,
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create new aggregation context parameters with strict field validation enabled
pub fn with_strict_field_validation(
limits: AggregationLimitsGuard,
tokenizers: TokenizerManager,
) -> Self {
Self {
limits,
tokenizers,
strict_field_validation: true,
}
}

I don't think we need both a constructor and an accessor for this. This is too niche to require investing in syntactic sugar.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 90ceec6, but then changed the approach based on your last comment.

@fulmicoton-dd
Copy link
Copy Markdown
Collaborator

I added a few comments @mdashti.

The PR contains what looks like AI slop.
Using AI agent is ok, but make sure you do not push the burden of CR on us.

Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/accessor_helpers.rs Outdated
@PSeitz
Copy link
Copy Markdown
Collaborator

PSeitz commented Jan 15, 2026

I would move it outside of the aggregation. You can fetch the fields from the aggregation request and do a validation in a helper function

This reverts commit d5e0991549a05bf80f19f853f7689ad69f96e7e5.
I would move it outside of the aggregation. You can fetch the fields from the aggregation request and do a validation in a helper function
Copy link
Copy Markdown
Collaborator Author

@mdashti mdashti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fulmicoton Thanks for the comments. Your idea of providing a helper function is a good one. Changed the approach. Please take another look.

Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/mod.rs Outdated
Comment thread src/aggregation/mod.rs Outdated
Comment on lines +218 to +228
/// Create new aggregation context parameters with strict field validation enabled
pub fn with_strict_field_validation(
limits: AggregationLimitsGuard,
tokenizers: TokenizerManager,
) -> Self {
Self {
limits,
tokenizers,
strict_field_validation: true,
}
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 90ceec6, but then changed the approach based on your last comment.

Comment thread src/aggregation/accessor_helpers.rs Outdated
Comment thread src/aggregation/accessor_helpers.rs Outdated
@mdashti mdashti changed the title fix: added field validation to aggregations fix: added field validation helper for aggregations Jan 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

aggregations silently return empty results for invalid fields

4 participants