Skip to content

Feat invalid Unicode surrogate crash in JSON serialization#15403

Merged
loicmathieu merged 3 commits intokestra-io:developfrom
harsh21234i:fix/invalid-unicode-surrogate-crash
Apr 13, 2026
Merged

Feat invalid Unicode surrogate crash in JSON serialization#15403
loicmathieu merged 3 commits intokestra-io:developfrom
harsh21234i:fix/invalid-unicode-surrogate-crash

Conversation

@harsh21234i
Copy link
Copy Markdown
Contributor

Added support for handling invalid UTF-16 surrogate sequences during JSON serialization.

This prevents crashes when serializing malformed Unicode strings (e.g., PostgreSQL jsonb rejection).
Fixes #14806

- Add SurrogateStrippingStringSerializer
- Integrate serializer in JacksonMapper
- Add comprehensive tests covering edge cases and issue kestra-io#14806
@github-project-automation github-project-automation Bot moved this to To review in Pull Requests Apr 7, 2026
@MilosPaunovic MilosPaunovic added area/backend Needs backend code changes kind/external Pull requests raised by community contributors labels Apr 7, 2026
Copy link
Copy Markdown
Member

@loicmathieu loicmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the rational but I'm not fan.
Doing this is highly opinionated and would incure a performance cost of reading the whome JSON byte array.

I would prefer that we fail gracely if there is an unsupported character sequence, we already handled that in the queue and he executor I think (we catch the Postgres specific error message and fail the exec).

A better approach may also to have validation whenever possible to avoid such un-processable strings to be set (like in inputs).

@harsh21234i
Copy link
Copy Markdown
Contributor Author

harsh21234i commented Apr 12, 2026

Hello @loicmathieu ,
I updated the PR to remove the global surrogate sanitization approach and keep the fix on the existing Postgres queue failure path.

I also tightened the queue-side detection so unsupported Unicode payload failures are still converted to UnsupportedMessageException, including lone surrogate cases, and kept regression coverage on the Postgres queue tests without relying on one exact Postgres error string.

I couldn’t fully verify the Postgres test locally because my local test environment does not currently have Postgres available on localhost:5432, but the code compiles and the change is now targeted to the existing recoverable failure path rather than global string replacement.

@harsh21234i harsh21234i force-pushed the fix/invalid-unicode-surrogate-crash branch from ed13d6f to 80370c2 Compare April 12, 2026 05:41
@harsh21234i
Copy link
Copy Markdown
Contributor Author

Backend tests themselves passed; the failing job is in the reporting step after Gradle and looks related to fork PR permissions ("Resource not accessible
by integration", "Missing organization token"), so it seems unrelated to the code change in this PR.

Copy link
Copy Markdown
Member

@loicmathieu loicmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@loicmathieu loicmathieu merged commit fc1b3aa into kestra-io:develop Apr 13, 2026
8 of 9 checks passed
@github-project-automation github-project-automation Bot moved this from To review to Done in Pull Requests Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/backend Needs backend code changes kind/external Pull requests raised by community contributors

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

invalid unicode sequence crashes kestra executor pod

3 participants