Skip to content

FIX: silent data loss in MultiIndex __setitem__ with object-dtype level#65119

Open
Bahtya wants to merge 2 commits intopandas-dev:mainfrom
Bahtya:fix/multiindex-setitem-silent-drop
Open

FIX: silent data loss in MultiIndex __setitem__ with object-dtype level#65119
Bahtya wants to merge 2 commits intopandas-dev:mainfrom
Bahtya:fix/multiindex-setitem-silent-drop

Conversation

@Bahtya
Copy link
Copy Markdown

@Bahtya Bahtya commented Apr 8, 2026

Problem

df[key] = df[key] / x silently drops the assignment (no error, no warning) when:

  1. DataFrame has a column MultiIndex
  2. Level 1 has object dtype due to mixed types (e.g. string + int)
  3. The top-level label has exactly one sub-column
cols = pd.MultiIndex.from_tuples(
    [("info", "M"), ("info", 0), ("earnings", 1), ("earnings", 2), ("prices", 0)]
)
df = pd.DataFrame(np.arange(20, dtype=float).reshape(4, 5), columns=cols)
df["prices"] = df["prices"] / 100  # silent no-op! values unchanged

This is a regression from 2.3.x → 3.0.x and causes silent data loss.

Root Cause

In _set_item_frame_value, the guard added for GH#62518/GH#61841 checks:

is_string_dtype(cols_droplevel.dtype) and not cols_droplevel.any()

When maybe_droplevels produces Index([0], dtype="object"):

  • is_string_dtype(object)True
  • Index([0]).any()False (because 0 is falsy)

This causes the early return to trigger incorrectly, silently discarding the assignment.

Fix

Replace not cols_droplevel.any() with (cols_droplevel == "").all() to explicitly check for empty strings:

and len(cols_droplevel) > 0
and (cols_droplevel == "").all()

This correctly identifies actual empty-string columns without being fooled by falsy integer values in object-dtype Indexes.

Testing

Verified locally that the fix resolves the issue from the bug report:

# Before fix: prices values unchanged [4. 9. 14. 19.]
# After fix: prices values correctly divided [0.04 0.09 0.14 0.19]

Also confirmed the original GH#62518/GH#61841 cases are still protected since their columns genuinely contain only empty strings.

Fixes #65118

…type level

When setting a top-level column on a DataFrame with a MultiIndex where
level 1 has object dtype (mixed types), assignments to single-subcolumn
groups are silently dropped.

Root cause: In _set_item_frame_value, the guard for GH#62518/GH#61841
(avoiding reindex into empty-string columns) used
is_string_dtype(cols_droplevel.dtype) and not cols_droplevel.any().
For object-dtype Index containing integer 0, is_string_dtype returns True
and any() returns False (0 is falsy), causing the early return to
trigger incorrectly.

Fix: Replace not cols_droplevel.any() with (cols_droplevel == "").all()
to explicitly check for empty strings instead of relying on truthiness.

Fixes pandas-dev#65118

Signed-off-by: bahtya <bahtyar153@qq.com>
@jbrockmendel
Copy link
Copy Markdown
Member

Pls add test

@Bahtya
Copy link
Copy Markdown
Author

Bahtya commented Apr 11, 2026

Thanks for the review! I'll add a test for the silent data loss case in MultiIndex __setitem__ with object-dtype levels.

@Bahtya
Copy link
Copy Markdown
Author

Bahtya commented Apr 17, 2026

Hi @jbrockmendel, thanks for the review! I'll add tests for this fix. Working on it now.

@Bahtya
Copy link
Copy Markdown
Author

Bahtya commented Apr 18, 2026

Hi @jbrockmendel, I've added three test cases for this fix:

  1. test_multiindex_setitem_object_dtype_level_no_silent_drop — exact reproduction from the bug report (mixed string/int level values, df["prices"] = df["prices"] / 100)
  2. test_multiindex_setitem_object_dtype_level_single_subcolumn — minimal case with integer 0 as level value
  3. test_multiindex_setitem_object_dtype_level_falsy_values — ensures columns with empty string "" alongside falsy integer 0 are still writable

All tests pass locally. Ready for re-review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Regression - df[key] = df[key] / x silently drops assignment on MultiIndex columns with mixed-dtype level and single sub-column

2 participants