ENH/BUG: Utilize _values_for_json in Series.to_json#65127
ENH/BUG: Utilize _values_for_json in Series.to_json#65127Julian-Harbeck wants to merge 14 commits intopandas-dev:mainfrom
_values_for_json in Series.to_json#65127Conversation
|
A lot of errors seem to come from a Lines 2614 to 2623 in a22ca70 Therefore the following tests are already failing during serialization:
From roundtrip:
|
|
@jbrockmendel do you think it makes sense to just apply xfail markers to the failing tests and then solve those in upcoming PRs? Or should the test be less strict, for example in regards to a roundtrip? That could also be split to a second test so that JSON serialization and roundtrip are tested separately. |
|
i think its fine to add a |
|
@jbrockmendel The RecursionError also happens on main, that is not changing with this PR, minimum example: >>> import pandas as pd
>>> p_range = pd.period_range(start=pd.Period("2017Q1", freq="Q"), end=pd.Period("2017Q2", freq="Q"), freq="M")
>>> p_range
PeriodIndex(['2017-03', '2017-04', '2017-05', '2017-06'], dtype='period[M]')
>>> p_series = pd.Series(p_range)
>>> p_series
0 2017-03
1 2017-04
2 2017-05
3 2017-06
dtype: period[M]
>>> p_series.to_json()
<stdin-26>:1: Pandas4Warning: Period.dayofweek is deprecated and will be removed in a future version. Use Period.day_of_week instead.
<stdin-26>:1: Pandas4Warning: Period.dayofyear is deprecated and will be removed in a future version. Use Period.day_of_year instead.
<stdin-26>:1: Pandas4Warning: Period.daysinmonth is deprecated and will be removed in a future version. Use Period.days_in_month instead.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
p_series.to_json()
~~~~~~~~~~~~~~~~^^
File ".../python3.14/site-packages/pandas/core/generic.py", line 2639, in to_json
return json.to_json(
~~~~~~~~~~~~^
path_or_buf=path_or_buf,
^^^^^^^^^^^^^^^^^^^^^^^^
...<12 lines>...
mode=mode,
^^^^^^^^^^
)
^
File ".../python3.14/site-packages/pandas/io/json/_json.py", line 241, in to_json
).write()
~~~~~^^
File ".../python3.14/site-packages/pandas/io/json/_json.py", line 292, in write
return ujson_dumps(
self.obj_to_write,
...<6 lines>...
indent=self.indent,
)
OverflowError: Maximum recursion level reachedI guess that was just not known before because there was no test covering JSON serialization of ExtensionArrays? |
|
In that case an xfail seems reasonable |
…ependent exception expectation
|
Commit |
| ] | ||
| ) | ||
| with pytest.raises( | ||
| (NotImplementedError, ValueError, AssertionError), match=msg |
There was a problem hiding this comment.
should some fo these (in particular the AssertionError cases) be xfails?
There was a problem hiding this comment.
@jbrockmendel I tried using pytest.xfail inside of the TestArrowArray::test_values_for_json function, but now saw that this is forbidden in pre-commit. Any idea how to approach it? I cannot use @pytest.mark.xfail here due to some ArrowDtype passing and also the error depends on the dtype. Adding the mark in the parametrization of dtype would work, but this is pretty ugly in my opinion (through indirect=True we are passing this parametrized dtype instead of the normal dtype fixture to the data fixture):
@pytest.mark.filterwarnings(
"ignore:The default 'epoch' date format is deprecated:DeprecationWarning"
)
@pytest.mark.parametrize(
"dtype",
[
pytest.param(
pa_dtype,
id=str(pa_dtype),
marks=pytest.mark.xfail(
raises=NotImplementedError,
reason="as_unit not implemented for date",
),
)
for pa_dtype in tm.ALL_PYARROW_DTYPES
if pa_dtype in [pa.date32(), pa.date64()]
]
+ [
pytest.param(
pa_dtype,
id=str(pa_dtype),
marks=pytest.mark.xfail(
raises=ValueError, reason="year 51970 is out of range"
),
)
for pa_dtype in tm.ALL_PYARROW_DTYPES
if pa_dtype
in [
pa.timestamp(unit="s", tz="US/Pacific"),
pa.timestamp(unit="s", tz="US/Eastern"),
]
]
+ [
pytest.param(
pa_dtype,
id=str(pa_dtype),
marks=pytest.mark.xfail(
raises=AssertionError, reason="Series are different"
),
)
for pa_dtype in tm.ALL_PYARROW_DTYPES
if (ArrowDtype(pa_dtype).kind in "Mm" and "ms" not in str(pa_dtype))
and (
pa_dtype
not in [
pa.date32(),
pa.date64(),
pa.timestamp(unit="s", tz="US/Pacific"),
pa.timestamp(unit="s", tz="US/Eastern"),
]
)
]
+ [
pytest.param(pa_dtype, id=str(pa_dtype))
for pa_dtype in tm.ALL_PYARROW_DTYPES
if not (
(ArrowDtype(pa_dtype).kind in "Mm" and "ms" not in str(pa_dtype))
or (pa_dtype in [pa.date32(), pa.date64()])
or (
pa_dtype
in [
pa.timestamp(unit="s", tz="US/Pacific"),
pa.timestamp(unit="s", tz="US/Eastern"),
]
)
)
],
indirect=True,
)
def test_values_for_json(self, data):
# GH 65127
# All datetime and duration ArrowDtypes with non default resolution of ms fail
# on roundtrip. The date32 and date64 dtypes fail already in serialization due
# to as_unit not implemented for them. Currently the json serialization relies
# on the default 'epoch' format for datetimes, leading to the filtered
# Pandas4Warning.
super().test_values_for_json(data)
_values_for_jsoninSeries.to_json#65047 (Replace xxxx with the GitHub issue number)doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.Utilize
_values_for_jsoninSeries.to_jsonmaking Series with otherwise not serializable EAs serializable (e.g. ifEAdtype.typeis not serializable then likely alsoEA.__array__is not serializable).Also added the
test_values_for_jsontest toBaseMethodsTests:_values_for_jsonis a serializablenp.ndarray.Series.to_jsonactually uses_values_for_jsonfor EAs by checking thatSeries(data).to_json()andSeries(data._values_for_json()).to_json()return the same result. Another option would be to use a pytest-mocker but that does not seem to currently be a dev dependency of pandas.data.dtypegives the same Series. Not sure if this should be a part of this test as it requires that the ExtensionArray can also be recovered from the JSON created by_values_for_jsonand a simpleastype(ExtensionDtype). Although in my opinion that is exactly what you would want for JSON serialization and deserialization.Maybe it would also good to test the
ccode ofobjToJSON.cdirectly? But I am not sure how one would do that as theget_valuesfunction is not exposed, so I would welcome some help here.