Add streaming options and improve performance for large data writes by AdamDrewsTR · Pull Request #2288 · qax-os/excelize

AdamDrewsTR · 2026-04-07T21:31:28Z

Description

Performance and memory optimization of the streaming write path (StreamWriter), the WriteTo/WriteToBuffer output pipeline, and the ZIP compression layer. Profiling against the mzimmerman/excelizetest benchmark suite identified four hot spots — ColumnNumberToName, CoordinatesToCellName, SetRow, and writeCell — which together accounted for the majority of CPU time and nearly all heap allocations per row.

Summary of improvements

Area	Key metric
SetRow hot path	68–79% faster, 94–99% fewer allocations
Full pipeline (SetRow + WriteTo)	67–72% faster, 51–87% less memory
XML-escaped strings	79% faster, 81% less memory
Peak memory (50K×100)	162 MB → 43 MB (−73%)
Allocations (50K×100)	15.1M → 153K (−99%)
ZIP compression	klauspost/compress: ~2× faster than stdlib

Changes

lib.go — precomputed column names

Added columnNames: a package-level precomputed lookup table of all 16 384 column name strings (A–XFD), initialized once at startup via an IIFE. ColumnNumberToName now returns a slice element instead of allocating a new string on every call.
Optimised CoordinatesToCellName to early-return on the common (non-absolute) path, avoiding concatenation with an empty sign variable.
Updated readXML to use the new bufferedWriter.Bytes() method instead of accessing .buf directly.
Replaced archive/zip import with github.com/klauspost/compress/zip.

stream.go — hot-loop optimizations

SetRow rewrite: precomputes rowStr once per row (previously per cell via CoordinatesToCellName), reuses a single xlsxC struct across the inner loop (fields zeroed per iteration instead of allocating a new struct), and reads column names directly from the columnNames table.
writeNumericCell: zero-allocation fast path for int, int8–int64, uint–uint64, float32, float64, and bool. Writes the complete <c> element directly to the buffer using strconv.Append* into a [24]byte scratch field, bypassing xlsxC entirely.
writeStringCell: zero-allocation fast path for string and []byte. Writes inline-string XML directly, bypassing xlsxC/xlsxSI/trimCellValue/bstrMarshal/xml.EscapeText. Handles xml:space="preserve" for leading/trailing whitespace. Falls back to slow path for _xHHHH_ escape patterns or strings exceeding TotalCellChars.
writeEscaped: custom XML escaper that scans for <>&"\r; fast-path writes directly when no special chars are found, slow-path does character-by-character replacement (still zero-alloc).
writeCellStart: helper to deduplicate the <c r="…" opening across fast and slow paths.
writeCell: now takes *xlsxC by pointer plus pre-split colName/rowStr, eliminating a ~184-byte struct copy and a string concatenation per cell. Skips xml.EscapeText for numeric/boolean <v> values (digits, ., -, +, E are always safe).
setCellValFunc: inlines all integer type cases directly, removing a redundant second type-switch dispatch through setCellIntFunc.
marshalAttrs: writes directly to *bufferedWriter (no intermediate strings.Builder). Row option validation split into validateRowOpts so XML is only written after validation passes.
parseRowOpts: returns RowOpts by value instead of *RowOpts.
streamCellStyle / colStyles []int: column style lookup is now O(1) via a cached slice built once in writeSheetData, replacing a per-cell O(N) linear scan of worksheet.Cols.

stream.go — `bufferedWriter` memory architecture

Two-phase architecture: below the threshold, all writes go to an in-memory bytes.Buffer. Once the threshold is crossed, the buffer is drained to a temp file exactly once and all subsequent writes flow through a fixed-size bufio.Writer wrapping the file. This bounds peak heap usage to approximately StreamingChunkSize + bioSize regardless of total data size, compared to the previous approach which re-grew a new bytes.Buffer to the threshold size on every flush cycle.
Sync() is now a no-op when bio != nil — bufio.Writer flushes internally when its buffer is full; forcing a flush on every SetRow call (the previous behavior) negated all batching benefit.
New methods: Bytes(), Reset(), CopyTo(w io.Writer), WriteInt(int64), WriteUint(uint64), WriteFloat(float64, ...).
CopyTo: uses a 256 KiB buffered reader to minimize Pread syscalls when copying from temp files (reduces ~3000 syscalls to ~400 for a 100 MB worksheet).
scratch [24]byte: used by WriteInt, WriteUint, WriteFloat to format numbers without heap allocation.

file.go — streaming WriteTo & compression

WriteTo rewrite: non-encrypted path now streams the ZIP directly to w via a countWriter wrapper — no intermediate bytes.Buffer. Encrypted path delegates to new writeToWithEncryption, which writes ZIP to a temp file, applies ZIP64 LFH fixup, reads back, encrypts, and writes to w.
WriteToBuffer: now calls configureZipCompression and only performs ZIP64 LFH fixup when len(f.zip64Entries) > 0.
writeToZip: replaces stream.rawData.Reader() + io.Copy with stream.rawData.CopyTo(fi) (uses the new efficient copy path).
writeZip64LFHFile: performs ZIP64 local file header fixup on a temp file (chunk-based, 1 MB reads) instead of requiring an in-memory buffer.

excelize.go — options & compression

New type Compression int with three constants: CompressionDefault, CompressionNone, CompressionBestSpeed.
New fields on Options: StreamingChunkSize int, StreamingBufSize int, Compression Compression. All are zero-by-default (zero → use package constants), so existing callers are completely unaffected. StreamingChunkSize: -1 keeps all data in memory (never spills to disk).
configureZipCompression: registers a custom flate.NewWriter compressor on *zip.Writer based on the Compression option.
Replaced archive/zip import with github.com/klauspost/compress/zip and github.com/klauspost/compress/flate.

templates.go

Added StreamingBufSizeDefault = 128 << 10 (128 KiB). Value determined empirically via BenchmarkBioSizeSweep and TestBioSizeIOProfile.

go.mod

Added github.com/klauspost/compress v1.18.5 — a high-performance, pure-Go drop-in replacement for archive/zip and compress/flate.

Related Issue

Fixes #876 — High memory when writing 1 million number of rows

Motivation and Context

User-reported profiling showed that generating large worksheets via StreamWriter was dominated by per-cell allocations in the column-name conversion functions and by unbounded bytes.Buffer growth in the write buffer. For a 100-column × 50 000-row sheet (~150 MB of XML), the previous code allocated 162 MB peak and made 15.1 million allocations. The WriteTo path then buffered the entire compressed ZIP in a bytes.Buffer before writing, adding another 50–200 MB of peak memory on top.

How Has This Been Tested

All existing tests pass (go test ./...).

New tests

Test	What it covers
`TestStreamingWriteTo`	Verifies WriteTo streams correctly without password; round-trips 100×10 sheet
`TestCompressionOption`	Generates 500×20 sheet at Default/None/BestSpeed; asserts size ordering; validates all are readable XLSX
`TestWriteToBufferCompression`	Verifies `WriteToBuffer` respects `CompressionNone`
`TestWriteToWithPassword`	Round-trips encrypted file via WriteTo with password
`TestWriteToWithPasswordAndCompression`	Combines password encryption with `CompressionBestSpeed`
`TestBioSizeIOProfile`	Instruments write-syscall counts and bytes at 10 `bufio.Writer` sizes (4 KiB – 4 MiB) to project performance on different storage tiers
`BenchmarkBioSizeSweep`	Measures ns/op and B/op across 10 `bufio.Writer` sizes for a 50K×100 sheet
`BenchmarkStringCellClean/Special`	Measures `writeEscaped` fast path vs slow path
`BenchmarkCompressionLevels`	50K×20 string sheet at Default/BestSpeed/None, each with disk-spill and in-memory variants (6 sub-benchmarks)
`BenchmarkStreamWriterLarge/Huge`	10K×50 and 50K×100 integer-cell benchmarks for regression tracking
`BenchmarkExcelize*` (9 sizes)	Full pipeline (build data + SetRow + WriteTo) adapted from mzimmerman/excelizetest

Benchmark results

Platform: Apple M1 Pro, macOS, Go 1.24, arm64
Methodology: go test -run=^$ -bench=... -benchmem -count=3, median of 3 runs

Streaming write path (SetRow + Flush + Close, no WriteTo)

Benchmark	master ns/op	PR ns/op	Δ CPU	master B/op	PR B/op	Δ Mem	master allocs	PR allocs	Δ Allocs
StreamWriter (100×10)	238.7 µs	64.2 µs	−73%	101.9 KB	86.1 KB	−16%	2.3K	147	−94%
StreamWriterLarge (50K×10)	74.9 ms	24.2 ms	−68%	55.0 MB	42.4 MB	−23%	1.65M	33.3K	−98%
StreamWriterHuge (50K×100)	851.9 ms	271.1 ms	−68%	162.2 MB	43.2 MB	−73%	15.14M	153.3K	−99%
StringCellClean (50K×10)	181.1 ms	61.2 ms	−66%	163.6 MB	42.5 MB	−74%	3.65M	33.3K	−99%
StringCellSpecial (50K×10)	337.7 ms	70.7 ms	−79%	223.8 MB	42.5 MB	−81%	5.15M	33.3K	−99%

Full pipeline (build string data + SetRow + WriteTo to buffer)

Benchmark	master ns/op	PR ns/op	Δ CPU	master B/op	PR B/op	Δ Mem	master allocs	PR allocs	Δ Allocs
Excelize 1K×10	9.1 ms	2.7 ms	−70%	4.9 MB	2.1 MB	−57%	86.0K	16.9K	−80%
Excelize 10K×10	66.1 ms	21.5 ms	−67%	47.4 MB	23.3 MB	−51%	833.0K	133.9K	−84%
Excelize 100K×100	7.17 s	2.03 s	−72%	2.54 GB	339.7 MB	−87%	80.29M	10.30M	−87%

Compression options (PR only, 50K×20 string rows, full WriteTo)

Mode	Time	Memory	Allocs
Default (temp file)	383.5 ms	68.7 MB	1.15M
BestSpeed (temp file)	285.9 ms	83.7 MB	1.15M
None (temp file)	249.7 ms	213.6 MB	1.15M
Default (in-memory)	271.3 ms	194.2 MB	1.15M
BestSpeed (in-memory)	254.1 ms	209.1 MB	1.15M
None (in-memory)	178.8 ms	339.1 MB	1.15M

Metric	master (sum)	PR (sum)	Δ
CPU time	8.69 s	2.48 s	−71%
Memory	3.20 GB	536 MB	−83%
Allocations	106.8M	10.7M	−90%

Real-world scenario (the Excelize 100K×100 full pipeline) is:

7.17 s → 2.03 s (−72% CPU)
2.54 GB → 340 MB (−87% memory)
80.3M → 10.3M allocs (−87%)

Key takeaways

StreamWriterHuge (50K×100): −68% CPU, −73% memory (162 MB → 43 MB), −99% allocs (15.1M → 153K)
Excelize 100K×100 full pipeline: 7.17 s → 2.03 s (−72%), 2.54 GB → 340 MB (−87%), 80M → 10M allocs (−87%)
XML-escaped strings (50K×10): −79% CPU, −81% memory — the writeEscaped zero-alloc path eliminates per-character allocations
Allocation reduction is the biggest win across the board: 94–99% fewer allocs in all streaming benchmarks
Memory is now bounded: peak ≈ StreamingChunkSize + bioSize regardless of total data size

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

…acters

…and []byte values

…ring writes

- Implement GetSheetStats and CalculateSheetStats methods to retrieve statistics for worksheets, including row count, column count, cell count, and maximum cell reference. - Enhance StreamWriter to support shared strings, allowing for more efficient storage of string values in Excel files. - Update dimension placeholder during streaming to reflect actual dimensions after data is written. - Add tests for sheet statistics and shared strings functionality.

AdamDrewsTR · 2026-04-16T21:30:52Z

++ Add support for sheet statistics and shared strings in streaming writes

Implement GetSheetStats and CalculateSheetStats methods to retrieve
statistics for worksheets, including row count, column count, cell count,
and maximum cell reference.
Enhance StreamWriter to support shared strings, allowing for more efficient
storage of string values in Excel files.
Update dimension placeholder during streaming to reflect actual dimensions
after data is written.
Add tests for sheet statistics and shared strings functionality.

dolmen · 2026-04-19T07:13:18Z

AI slop.

Submit smaller PRs, focused on one change.

dolmen

Not reviewable. Split!

ChronosMasterOfAllTime · 2026-04-19T15:23:13Z

AI slop.

Submit smaller PRs, focused on one change.

@dolmen

That's not constructive feedback and doesn't help anyone. What are some areas of concern that you see? Yes it's clear AI was used in this PR, but it would be helpful to understand if you identified any non-idiomatic patterns or design concerns. How would you split this PR?

AdamDrewsTR · 2026-04-19T15:31:13Z

AI slop.

Submit smaller PRs, focused on one change.

That is fine. We'll keep using our much more performant fork. Feel free to close.

ChronosMasterOfAllTime · 2026-04-20T20:53:19Z

This is what a linear, more separated PR of the enhancements would look like. I am worried it will create PR fatigue and the chain is dependent on another.

First PR: Optimize Streaming Write Performance for Large Data Sets

Second PR: Add Compression Options and Optimize Output Pipeline

Third PR: Add Fast Read Mode, Shared Strings, and Sheet Statistics

AdamDrewsTR added 3 commits April 7, 2026 16:06

Add streaming options and improve performance for large data writes

e4bd181

Add writeEscaped function for optimized XML escaping in cell writing

3088761

Add benchmarks for writeEscaped function with normal and special char…

7005520

…acters

AdamDrewsTR marked this pull request as draft April 7, 2026 22:12

AdamDrewsTR added 2 commits April 7, 2026 17:21

Optimize string cell writing by adding a fast path for plain strings …

13c0ce0

…and []byte values

Add compression options for ZIP archives and optimize memory usage du…

2a613f4

…ring writes

AdamDrewsTR marked this pull request as ready for review April 8, 2026 18:07

AdamDrewsTR mentioned this pull request Apr 8, 2026

High memory when writing 1million number of rows #876

Open

xuri added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 9, 2026

AdamDrewsTR force-pushed the large-data-perf branch from c00076b to 2a613f4 Compare April 10, 2026 20:17

dolmen suggested changes Apr 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add streaming options and improve performance for large data writes#2288

Add streaming options and improve performance for large data writes#2288
AdamDrewsTR wants to merge 6 commits intoqax-os:masterfrom
AdamDrewsTR:large-data-perf

AdamDrewsTR commented Apr 7, 2026 •

edited

Loading

Uh oh!

AdamDrewsTR commented Apr 16, 2026

Uh oh!

dolmen commented Apr 19, 2026

Uh oh!

dolmen left a comment

Uh oh!

ChronosMasterOfAllTime commented Apr 19, 2026

Uh oh!

AdamDrewsTR commented Apr 19, 2026

Uh oh!

ChronosMasterOfAllTime commented Apr 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

AdamDrewsTR commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary of improvements

Changes

lib.go — precomputed column names

stream.go — hot-loop optimizations

stream.go — bufferedWriter memory architecture

file.go — streaming WriteTo & compression

excelize.go — options & compression

templates.go

go.mod

Related Issue

Motivation and Context

How Has This Been Tested

New tests

Benchmark results

Streaming write path (SetRow + Flush + Close, no WriteTo)

Full pipeline (build string data + SetRow + WriteTo to buffer)

Compression options (PR only, 50K×20 string rows, full WriteTo)

Key takeaways

Types of changes

Checklist

Uh oh!

AdamDrewsTR commented Apr 16, 2026

Uh oh!

dolmen commented Apr 19, 2026

Uh oh!

dolmen left a comment

Choose a reason for hiding this comment

Uh oh!

ChronosMasterOfAllTime commented Apr 19, 2026

Uh oh!

AdamDrewsTR commented Apr 19, 2026

Uh oh!

ChronosMasterOfAllTime commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AdamDrewsTR commented Apr 7, 2026 •

edited

Loading

stream.go — `bufferedWriter` memory architecture

ChronosMasterOfAllTime commented Apr 20, 2026 •

edited

Loading