Skip to content

ENH/BUG: Utilize _values_for_json in Series.to_json#65127

Open
Julian-Harbeck wants to merge 14 commits intopandas-dev:mainfrom
Julian-Harbeck:series-ea-values-for-json
Open

ENH/BUG: Utilize _values_for_json in Series.to_json#65127
Julian-Harbeck wants to merge 14 commits intopandas-dev:mainfrom
Julian-Harbeck:series-ea-values-for-json

Conversation

@Julian-Harbeck
Copy link
Copy Markdown
Contributor

Utilize _values_for_json in Series.to_json making Series with otherwise not serializable EAs serializable (e.g. if EAdtype.type is not serializable then likely also EA.__array__ is not serializable).

Also added the test_values_for_json test to BaseMethodsTests:

  • Check that the result of _values_for_json is a serializable np.ndarray.
  • Check that Series.to_json actually uses _values_for_json for EAs by checking that Series(data).to_json() and Series(data._values_for_json()).to_json() return the same result. Another option would be to use a pytest-mocker but that does not seem to currently be a dev dependency of pandas.
  • Checks that a round trip through JSON and converting back to data.dtype gives the same Series. Not sure if this should be a part of this test as it requires that the ExtensionArray can also be recovered from the JSON created by _values_for_json and a simple astype(ExtensionDtype). Although in my opinion that is exactly what you would want for JSON serialization and deserialization.

Maybe it would also good to test the c code of objToJSON.c directly? But I am not sure how one would do that as the get_values function is not exposed, so I would welcome some help here.

@Julian-Harbeck
Copy link
Copy Markdown
Contributor Author

A lot of errors seem to come from a Pandas4Warning related to epoch being deprecated:

pandas/pandas/core/generic.py

Lines 2614 to 2623 in a22ca70

elif date_format is None:
date_format = "epoch"
dtypes = self.dtypes if self.ndim == 2 else [self.dtype]
if any(dtype.kind in "mM" for dtype in dtypes):
warnings.warn(
"The default 'epoch' date format is deprecated and will be removed "
"in a future version, please use 'iso' date format instead.",
Pandas4Warning,
stacklevel=find_stack_level(),
)

Therefore the following tests are already failing during serialization:

  • pandas/tests/extension/test_arrow.py::TestArrowArray::test_values_for_json for all timelike dtypes
  • pandas/tests/extension/test_datetime.py::TestDatetimeArray::test_values_for_json

dtype.type not JSON serializable and no _values_for_json method implemented leading here to a OverflowError: Maximum recursion level reached:

  • pandas/tests/extension/test_period.py::TestPeriodArray::test_values_for_json[D]
  • pandas/tests/extension/test_period.py::TestPeriodArray::test_values_for_json[2D]

From roundtrip:

  • pandas/tests/extension/decimal/test_decimal.py::TestDecimalArray::test_values_for_json: Cannot create Decimal from dictionary created in the JSON serialization.
  • pandas/tests/extension/json/test_json.py::TestJSONArray::test_values_for_json: Issue seems to be recursive serialization which is not covert by the simple deserialization in the test.
  • pandas/tests/extension/test_interval.py::TestIntervalArray::test_values_for_json: Cannot create Interval from dictionary created in the JSON serialization.

@Julian-Harbeck Julian-Harbeck added Enhancement IO JSON read_json, to_json, json_normalize ExtensionArray Extending pandas with custom dtypes or arrays. labels Apr 9, 2026
@Julian-Harbeck
Copy link
Copy Markdown
Contributor Author

@jbrockmendel do you think it makes sense to just apply xfail markers to the failing tests and then solve those in upcoming PRs? Or should the test be less strict, for example in regards to a roundtrip? That could also be split to a second test so that JSON serialization and roundtrip are tested separately.

@jbrockmendel
Copy link
Copy Markdown
Member

i think its fine to add a @pytest.filterwarnings for the epoch thing. I'd be pretty uncomfortable with merging with the RecursionError though.

@Julian-Harbeck
Copy link
Copy Markdown
Contributor Author

@jbrockmendel The RecursionError also happens on main, that is not changing with this PR, minimum example:

>>> import pandas as pd

>>> p_range = pd.period_range(start=pd.Period("2017Q1", freq="Q"), end=pd.Period("2017Q2", freq="Q"), freq="M")
>>> p_range
PeriodIndex(['2017-03', '2017-04', '2017-05', '2017-06'], dtype='period[M]')

>>> p_series = pd.Series(p_range)
>>> p_series
0    2017-03
1    2017-04
2    2017-05
3    2017-06
dtype: period[M]

>>> p_series.to_json()
<stdin-26>:1: Pandas4Warning: Period.dayofweek is deprecated and will be removed in a future version. Use Period.day_of_week instead.
<stdin-26>:1: Pandas4Warning: Period.dayofyear is deprecated and will be removed in a future version. Use Period.day_of_year instead.
<stdin-26>:1: Pandas4Warning: Period.daysinmonth is deprecated and will be removed in a future version. Use Period.days_in_month instead.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    p_series.to_json()
    ~~~~~~~~~~~~~~~~^^
  File ".../python3.14/site-packages/pandas/core/generic.py", line 2639, in to_json
    return json.to_json(
           ~~~~~~~~~~~~^
        path_or_buf=path_or_buf,
        ^^^^^^^^^^^^^^^^^^^^^^^^
    ...<12 lines>...
        mode=mode,
        ^^^^^^^^^^
    )
    ^
  File ".../python3.14/site-packages/pandas/io/json/_json.py", line 241, in to_json
    ).write()
      ~~~~~^^
  File ".../python3.14/site-packages/pandas/io/json/_json.py", line 292, in write
    return ujson_dumps(
        self.obj_to_write,
    ...<6 lines>...
        indent=self.indent,
    )
OverflowError: Maximum recursion level reached

I guess that was just not known before because there was no test covering JSON serialization of ExtensionArrays?

@jbrockmendel
Copy link
Copy Markdown
Member

In that case an xfail seems reasonable

@Julian-Harbeck
Copy link
Copy Markdown
Contributor Author

Commit 21f625 should have been "Overwrite test_values_for_json in JSONArray".

Comment thread pandas/tests/extension/base/methods.py Outdated
Comment thread pandas/tests/extension/decimal/test_decimal.py Outdated
Comment thread pandas/tests/extension/test_arrow.py Outdated
]
)
with pytest.raises(
(NotImplementedError, ValueError, AssertionError), match=msg
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should some fo these (in particular the AssertionError cases) be xfails?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel I tried using pytest.xfail inside of the TestArrowArray::test_values_for_json function, but now saw that this is forbidden in pre-commit. Any idea how to approach it? I cannot use @pytest.mark.xfail here due to some ArrowDtype passing and also the error depends on the dtype. Adding the mark in the parametrization of dtype would work, but this is pretty ugly in my opinion (through indirect=True we are passing this parametrized dtype instead of the normal dtype fixture to the data fixture):

    @pytest.mark.filterwarnings(
        "ignore:The default 'epoch' date format is deprecated:DeprecationWarning"
    )
    @pytest.mark.parametrize(
        "dtype",
        [
            pytest.param(
                pa_dtype,
                id=str(pa_dtype),
                marks=pytest.mark.xfail(
                    raises=NotImplementedError,
                    reason="as_unit not implemented for date",
                ),
            )
            for pa_dtype in tm.ALL_PYARROW_DTYPES
            if pa_dtype in [pa.date32(), pa.date64()]
        ]
        + [
            pytest.param(
                pa_dtype,
                id=str(pa_dtype),
                marks=pytest.mark.xfail(
                    raises=ValueError, reason="year 51970 is out of range"
                ),
            )
            for pa_dtype in tm.ALL_PYARROW_DTYPES
            if pa_dtype
            in [
                pa.timestamp(unit="s", tz="US/Pacific"),
                pa.timestamp(unit="s", tz="US/Eastern"),
            ]
        ]
        + [
            pytest.param(
                pa_dtype,
                id=str(pa_dtype),
                marks=pytest.mark.xfail(
                    raises=AssertionError, reason="Series are different"
                ),
            )
            for pa_dtype in tm.ALL_PYARROW_DTYPES
            if (ArrowDtype(pa_dtype).kind in "Mm" and "ms" not in str(pa_dtype))
            and (
                pa_dtype
                not in [
                    pa.date32(),
                    pa.date64(),
                    pa.timestamp(unit="s", tz="US/Pacific"),
                    pa.timestamp(unit="s", tz="US/Eastern"),
                ]
            )
        ]
        + [
            pytest.param(pa_dtype, id=str(pa_dtype))
            for pa_dtype in tm.ALL_PYARROW_DTYPES
            if not (
                (ArrowDtype(pa_dtype).kind in "Mm" and "ms" not in str(pa_dtype))
                or (pa_dtype in [pa.date32(), pa.date64()])
                or (
                    pa_dtype
                    in [
                        pa.timestamp(unit="s", tz="US/Pacific"),
                        pa.timestamp(unit="s", tz="US/Eastern"),
                    ]
                )
            )
        ],
        indirect=True,
    )
    def test_values_for_json(self, data):
        # GH 65127
        # All datetime and duration ArrowDtypes with non default resolution of ms fail
        # on roundtrip. The date32 and date64 dtypes fail already in serialization due
        # to as_unit not implemented for them. Currently the json serialization relies
        # on the default 'epoch' format for datetimes, leading to the filtered
        # Pandas4Warning.
        super().test_values_for_json(data)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Enhancement ExtensionArray Extending pandas with custom dtypes or arrays. IO JSON read_json, to_json, json_normalize

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH/BUG: Utilize _values_for_json in Series.to_json

2 participants