Improve speed via Stream.resource and faster encoding#5
Improve speed via Stream.resource and faster encoding#5sax wants to merge 2 commits intoactivesphere:masterfrom
Conversation
|
Could you rebase with master, I just pushed a fix for the ubuntu 20 runner issue |
Rewrite escaping, encoding, and name validation to skip work on the common case: ASCII fast paths, zero-allocation pass-through for already-safe input, writable-binary accumulators in place of per-codepoint iolist construction, and direct dispatch in place of String.Chars protocol.
|
Just rebased. |
|
Just in case it's helpful, here is the benchmark I used: https://github.com/synchronal/exceed/blob/main/benchmark/exceed.exs. So my benchmarks include zipping (because of generating Excel files) as well as stream generation. |
|
Sorry for the delay, was just finishing my (US Pacific Time) day when you messaged. On my M2 Mac Mini I'm seeing the following with Elixir 1.19.5-otp-28 and OTP 28.5:
When I switch to Elixir 1.19.5-otp-27 and OTP 27.3.4.11 things do change:
I'm seeing this pretty consistently. |
|
Oh! I've pulled some overrides of |
|
I am OK with merging some of the changes that are mostly straightforward transformations in escape_binary, the current code seems to be already binary optimized. Running I do see the changes related to Stream.flat_map results in about 2x performance, but again, what used to be a simple Stream library usage is now converted in something that's much more difficult to read and understand. I can't find any public documentation for |
|
Ok, I'll see if I can get another version together with just the stream changes, in a way that is more readable. I think the benefit is in the combination of reducing stream operations and utilizing pattern matching in function heads. I also find it strange how dependent these are on specific Elixir/Erlang versions. I would like to figure out the reason behind that. |

Collapsing
Stream.flat_map+Stream.transforminto a singleStream.resourceimproves the throughput of XML generation by about ~35% in my benchmarks. Optimizing the escaping, encoding, and validation of tags to optimize for ASCII, pass-through of safe input without additional allocation, and use of binary accumulators improves my benchmark by another ~15%.