fix: added field validation helper for aggregations#2768
fix: added field validation helper for aggregations#2768
Conversation
|
The main difference is that we don't always have a fixed schema with JSON fields, which means a field could exist on one segment, but not another one. Even if a requested field is not part of a JSON field, it would break use cases where you federate a query over different indices. Adding some validation on top should be easy with the |
|
I thought I commented here, but can't find my comment :(. In quickwit, we make it possible to change your schema. If the fields are not fast fields, the aggregation result will be empty. So this is not a bug, but a feature. |
|
@PSeitz @fulmicoton Thank you for the feedback! You're absolutely right - the strict enforcement would break important use cases like schema evolution. I've updated the implementation to make field validation optional and opt-in: Default Behavior (Unchanged)
Strict Mode (Opt-in)
Benefits
Let me know if this works for your use case, or if you have other suggestions. |
| /// Create new aggregation context parameters with strict field validation enabled | ||
| pub fn with_strict_field_validation( | ||
| limits: AggregationLimitsGuard, | ||
| tokenizers: TokenizerManager, | ||
| ) -> Self { | ||
| Self { | ||
| limits, | ||
| tokenizers, | ||
| strict_field_validation: true, | ||
| } | ||
| } |
There was a problem hiding this comment.
| /// Create new aggregation context parameters with strict field validation enabled | |
| pub fn with_strict_field_validation( | |
| limits: AggregationLimitsGuard, | |
| tokenizers: TokenizerManager, | |
| ) -> Self { | |
| Self { | |
| limits, | |
| tokenizers, | |
| strict_field_validation: true, | |
| } | |
| } |
I don't think we need both a constructor and an accessor for this. This is too niche to require investing in syntactic sugar.
There was a problem hiding this comment.
Removed in 90ceec6, but then changed the approach based on your last comment.
|
I added a few comments @mdashti. The PR contains what looks like AI slop. |
|
I would move it outside of the aggregation. You can fetch the fields from the aggregation request and do a validation in a helper function |
This reverts commit d5e0991549a05bf80f19f853f7689ad69f96e7e5.
I would move it outside of the aggregation. You can fetch the fields from the aggregation request and do a validation in a helper function
mdashti
left a comment
There was a problem hiding this comment.
@fulmicoton Thanks for the comments. Your idea of providing a helper function is a good one. Changed the approach. Please take another look.
| /// Create new aggregation context parameters with strict field validation enabled | ||
| pub fn with_strict_field_validation( | ||
| limits: AggregationLimitsGuard, | ||
| tokenizers: TokenizerManager, | ||
| ) -> Self { | ||
| Self { | ||
| limits, | ||
| tokenizers, | ||
| strict_field_validation: true, | ||
| } | ||
| } |
There was a problem hiding this comment.
Removed in 90ceec6, but then changed the approach based on your last comment.
Ticket(s) Closed
What
Adds a public
validate_aggregation_fields()helper function that validates all fields in an aggregation request exist in the schema and are configured as fast fields. This allows users to validate aggregations upfront before execution.Why
Users were getting confusing empty results when aggregation field names had typos or didn't exist. However, some users (like Quickwit) need lenient behavior for schema evolution across different data segments.
This PR provides a validation helper that users can call explicitly when they want validation, while keeping the default aggregation behavior lenient for backward compatibility.
How
Added
validate_aggregation_fields()function inagg_req.rsthat:get_fast_field_names()FieldNotFoundorSchemaErrorif validation failsUsers can call this before executing aggregations:
The validation is completely optional - existing code continues to work unchanged with lenient behavior.
Tests
test_aggregation_field_validation_helpertesting both invalid and valid fields