Skip to content

BUG: fix read_parquet failing with TimedeltaIndex column (#59692)#65117

Open
tianlangqin wants to merge 2 commits intopandas-dev:mainfrom
hbd9577:fix-59692-read-parquet-timedelta-columns
Open

BUG: fix read_parquet failing with TimedeltaIndex column (#59692)#65117
tianlangqin wants to merge 2 commits intopandas-dev:mainfrom
hbd9577:fix-59692-read-parquet-timedelta-columns

Conversation

@tianlangqin
Copy link
Copy Markdown
Contributor

@tianlangqin tianlangqin commented Apr 8, 2026

When writing a dataframe with TimedeltaIndex as column names to parquet with pyarrow and then reading it back, pyarrow reconstructs the column index by calling level.astype(dtype) with a unitless timedelta64 dtype, which causes ValueError to be thrown since Pandas doesn't know the unit.

I fixed it by default replacing the unitless timedelta64 with timedelta64[ns] in ExtensionArray.astype before calling TimedeltaArray._from_sequence. Then I added two short circuit checks in _astype_nansafe and TimedeltaArray.astype so if the source array already has a valid timedelta or datetime resolution and the requested target is unitless, it now returns the array as it is instead of attempting an invalid conversion.

The change in TimedeltaArray.astype relaxes the previous behavior where idx.astype("timedelta64") automatically raised a ValueError. Which was introduced in #13149 to prevent silent data corruptions. This concern doesn't apply here since the cast request from timedelta64[ns] to timedelta64 results in the identical array to be returned. The validation for the opposite direction remains unchanged.


from pandas._libs.tslibs import is_unitless

if is_unitless(dtype):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We excluded support for this intentionally.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review. Since the error originates from pyarrow calling level.astype(dtype) with a unitless dtype during column reconstruction, where would you suggest handling this instead?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where does this happen?

return DatetimeArray._from_sequence(self, dtype=dtype, copy=copy)

elif lib.is_np_dtype(dtype, "m"):
from pandas._libs.tslibs import is_unitless
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import at the top

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: pyarrow cannot read timedelta64[ns]

2 participants