Spark Execution Engine: unexpected_index_column_names does not work with Nested Columns

**Describe the bug**
For flat (non-nested) columns, `unexpected_index_column_names` works as expected.
However, when using **nested columns**, it does not work.
In addition, if either `unexpected_index_column_names` or `domain_column_name_list` is set to a nested column, it becomes difficult to utilize `unexpected_index_list`.


**To Reproduce**
Example configuration and schema:

* Checkpoint configuration:
    * Index column: `Data.evt.id`

```python
checkpoint = context.checkpoints.add(
    gx.Checkpoint(
        name=checkpoint_name,
        validation_definitions=[validation_definition],
        actions=[gx.checkpoint.actions.UpdateDataDocsAction(name=checkpoint_name)],
        result_format={
            "result_format": "COMPLETE",
            "unexpected_index_column_names": [
                "Data.evt.id"
            ],
            "partial_unexpected_count": 0,
            "exclude_unexpected_values": False,
            "include_unexpected_rows": True,
            "return_unexpected_index_query": True,
        },
    )
)
```
* Suite configuration:
    * Validation column: `Data.evt.retry`

```json
{
  "id": "9f5cdaeb-0959-4b8b-af8b-52934a9234db",
  "type": "expect_column_values_to_be_in_set",
  "kwargs": {
    "column": "Data.evt.retry",
    "value_set": ["0", "1"]
  },
  "meta": {}
}
```

  

* Schema before `select()`:

  ```text
  root
   |-- Data: struct (nullable = true)
   |    |-- evt: struct (nullable = true)
   |    |    |-- id: string (nullable = true)
   |    |    |-- retry: string (nullable = true)
  ```

* Schema after `select(columns_to_keep)`:

  ```text
  root
   |-- id: string (nullable = true)
   |-- retry: integer (nullable = true)
  ```

At this point, Great Expectations no longer recognizes the original nested path (`Data.evt.id`), making it impossible to return `unexpected_index_list`.

*Stack trace excerpt (simplified):*

```python
raise gx_exceptions.InvalidMetricAccessorDomainKwargsKeyError(
    f"Error: The unexpected_index_column 'Data.evt.id' does not exist in Spark DataFrame."
)
```

**Expected behavior**

* Nested column paths should be preserved during `select(columns_to_keep)`.
* `unexpected_index_column_names` and `domain_column_name_list` should work properly with nested columns so that `unexpected_index_list` can be utilized.

**Environment (please complete the following information):**

* Operating System: MacOS
* Great Expectations Version: 1.5.10
* Data Source: Spark

---

**Additional context**
The issue seems to be caused by schema pruning in `map_condition_auxilliary_methods.py`, where `select(columns_to_keep)` flattens nested structures and loses the original nested path.

- https://github.com/great-expectations/great_expectations/blob/264b4e0b8c3c34851d5d0b5722b485458590d298/great_expectations/expectations/metrics/map_metric_provider/map_condition_auxilliary_methods.py#L740

```python
# Prune the dataframe down only the columns we care about
filtered = filtered.select(columns_to_keep)
```

**Suggested Fix**

Preserve nested column paths during pruning by aliasing columns explicitly:

```python
aliased_cols = [F.col(c).alias(c) for c in columns_to_keep]
filtered = filtered.select(*aliased_cols)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark Execution Engine: unexpected_index_column_names does not work with Nested Columns #11381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Spark Execution Engine: unexpected_index_column_names does not work with Nested Columns #11381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions