Skip to content

[Feature Request] Maintain PassThroughCondition support past GE 2.0 #11688

@konnor-b

Description

@konnor-b

Is your feature request related to a problem? Please describe.
With condition_parser and PassThroughCondition being phased out in GX Core 2.0, how will SQL string like row conditions be supported in the future? When used with spark, users can utilize built-in spark functions directly in the row_condition value.

The new GX Column condition object currently does not appear to support these methods. For example, suppose you have this spark dataframe:

+---+------+
| id| fruit|
+---+------+
|  1| apple|
|  2|banana|
|  3|cherry|
|  4|  date|
|  5| APPLE|
|  6|Banana|
|  7|Cherry|
+---+------+

You want to filter on just apple and banana entries in the expectation:

# Define the Expectation with a row_condition using SQL upper() function:
expectation = gx.expectations.ExpectColumnValuesToBeInSet(
    column="fruit",
    value_set=["apple", "banana"],
    row_condition="UPPER(fruit) in ('BANANA', 'APPLE')",
    condition_parser="spark"
)

This can easily be done using the UPPER spark method, causing values from the dataframe to be converted to full uppercase letters before the comparison. However, with the new row_condition pattern, a GX Column object is being used in the row_condition and DOES NOT appear support built-in functions like UPPER, among many others.

Without this support, end users are required to use the Column("...") row_condition pattern, which isn't as universal, string conditions are not supported, and spark methods are no longer supported directly in the row_condition.

Describe the solution you'd like
Ideally, condition_parser and PassThroughCondition continue to remain supported or at the very least marked as deprecated in their current operational state and not removed entirely in GE 2.0

Describe alternatives you've considered
We have developed a translation script to convert simple spark syntax into GE Column objects. The shortcomings present themselves when it comes to modifying values received from the data frame before the comparison. We are also attempting to develop a contribution that would allow the modification of a data frame value to be modified before the comparison is completed so that further spark keyword support can be added as needed. (UPPER, DATEADD, TRIM, etc...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions