Skip to content

Improve speed via Stream.resource and faster encoding#5

Open
sax wants to merge 2 commits intoactivesphere:masterfrom
sax:faster-stream
Open

Improve speed via Stream.resource and faster encoding#5
sax wants to merge 2 commits intoactivesphere:masterfrom
sax:faster-stream

Conversation

@sax
Copy link
Copy Markdown
Contributor

@sax sax commented Apr 29, 2026

Collapsing Stream.flat_map + Stream.transform into a single Stream.resource improves the throughput of XML generation by about ~35% in my benchmarks. Optimizing the escaping, encoding, and validation of tags to optimize for ASCII, pass-through of safe input without additional allocation, and use of binary accumulators improves my benchmark by another ~15%.

@ananthakumaran
Copy link
Copy Markdown
Contributor

Could you rebase with master, I just pushed a fix for the ubuntu 20 runner issue

sax added 2 commits April 28, 2026 19:50
Rewrite escaping, encoding, and name validation to skip work on the
common case: ASCII fast paths, zero-allocation pass-through for
already-safe input, writable-binary accumulators in place of
per-codepoint iolist construction, and direct dispatch in place of
String.Chars protocol.
@sax
Copy link
Copy Markdown
Contributor Author

sax commented Apr 29, 2026

Just rebased.

@sax
Copy link
Copy Markdown
Contributor Author

sax commented Apr 29, 2026

Just in case it's helpful, here is the benchmark I used: https://github.com/synchronal/exceed/blob/main/benchmark/exceed.exs. So my benchmarks include zipping (because of generating Excel files) as well as stream generation.

@ananthakumaran
Copy link
Copy Markdown
Contributor

ananthakumaran commented Apr 29, 2026

  1. The first change I do see significant improvement for your benchmark, I am assuming this has to do with how Stream.flat_map works? it suspends too much? I will spend some more time here.

  2. I don't see any perf difference when I apply the printer.ex changes (without the flat_map change). The first 2 runs are using the existing printer.ex (master), second 2 runs are from your branch (only printer.ex file change)

image

@sax
Copy link
Copy Markdown
Contributor Author

sax commented Apr 29, 2026

Sorry for the delay, was just finishing my (US Pacific Time) day when you messaged.

On my M2 Mac Mini I'm seeing the following with Elixir 1.19.5-otp-28 and OTP 28.5:

  • HEAD of master: 32k - 34k rows/sec
  • Collapsing the two stream operations into one: 46k - 48k rows/sec
  • Including the printer.ex changes: 59k - 65k rows/sec

When I switch to Elixir 1.19.5-otp-27 and OTP 27.3.4.11 things do change:

  • HEAD of master: 32k - 34k rows/sec
  • Collapsing the two stream operations into one: 52k - 54k rows/sec
  • Including the printer.ex changes: 52k - 56k rows/sec

I'm seeing this pretty consistently.

@sax
Copy link
Copy Markdown
Contributor Author

sax commented Apr 30, 2026

Oh! I've pulled some overrides of escape_binary into Exceed.Xml, since I was finding performance improvements over the one in XmlStream a while ago. In order to see a real benchmark of XmlStream changes, we need to swap out escape_binary with calling into the library.

@ananthakumaran
Copy link
Copy Markdown
Contributor

I am OK with merging some of the changes that are mostly straightforward transformations in printer.ex file. But then, there couple of bigger changes that take simple, straightforward code and introduce a bunch of new code, which I am having a very hard time understanding what it does and why the performance increases.

escape_binary, the current code seems to be already binary optimized. Running ERL_COMPILER_OPTIONS=bin_opt_info mix compile --force doesn't point out any issues. I am OK to convert the return value to binary instead of IO data, if it provides any measurable performance. The proposed changes are much more complicated. I am assuming this is some pattern used by other libraries like Jason? Though, I don't understand why and how it improves (if at all)

I do see the changes related to Stream.flat_map results in about 2x performance, but again, what used to be a simple Stream library usage is now converted in something that's much more difficult to read and understand. I can't find any public documentation for :suspended, etc., and it seems like it depends on private/undocumented part of Stream/Enumearable.

@sax
Copy link
Copy Markdown
Contributor Author

sax commented May 2, 2026

Ok, I'll see if I can get another version together with just the stream changes, in a way that is more readable. I think the benefit is in the combination of reducing stream operations and utilizing pattern matching in function heads.

I also find it strange how dependent these are on specific Elixir/Erlang versions. I would like to figure out the reason behind that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants