Skip to content

Add streaming options and improve performance for large data writes#2288

Open
AdamDrewsTR wants to merge 6 commits intoqax-os:masterfrom
AdamDrewsTR:large-data-perf
Open

Add streaming options and improve performance for large data writes#2288
AdamDrewsTR wants to merge 6 commits intoqax-os:masterfrom
AdamDrewsTR:large-data-perf

Conversation

@AdamDrewsTR
Copy link
Copy Markdown

@AdamDrewsTR AdamDrewsTR commented Apr 7, 2026

Description

Performance and memory optimization of the streaming write path (StreamWriter), the WriteTo/WriteToBuffer output pipeline, and the ZIP compression layer. Profiling against the mzimmerman/excelizetest benchmark suite identified four hot spots — ColumnNumberToName, CoordinatesToCellName, SetRow, and writeCell — which together accounted for the majority of CPU time and nearly all heap allocations per row.

Summary of improvements

Area Key metric
SetRow hot path 68–79% faster, 94–99% fewer allocations
Full pipeline (SetRow + WriteTo) 67–72% faster, 51–87% less memory
XML-escaped strings 79% faster, 81% less memory
Peak memory (50K×100) 162 MB → 43 MB (−73%)
Allocations (50K×100) 15.1M → 153K (−99%)
ZIP compression klauspost/compress: ~2× faster than stdlib

Changes

lib.go — precomputed column names

  • Added columnNames: a package-level precomputed lookup table of all 16 384 column name strings (A–XFD), initialized once at startup via an IIFE. ColumnNumberToName now returns a slice element instead of allocating a new string on every call.
  • Optimised CoordinatesToCellName to early-return on the common (non-absolute) path, avoiding concatenation with an empty sign variable.
  • Updated readXML to use the new bufferedWriter.Bytes() method instead of accessing .buf directly.
  • Replaced archive/zip import with github.com/klauspost/compress/zip.

stream.go — hot-loop optimizations

  • SetRow rewrite: precomputes rowStr once per row (previously per cell via CoordinatesToCellName), reuses a single xlsxC struct across the inner loop (fields zeroed per iteration instead of allocating a new struct), and reads column names directly from the columnNames table.
  • writeNumericCell: zero-allocation fast path for int, int8int64, uintuint64, float32, float64, and bool. Writes the complete <c> element directly to the buffer using strconv.Append* into a [24]byte scratch field, bypassing xlsxC entirely.
  • writeStringCell: zero-allocation fast path for string and []byte. Writes inline-string XML directly, bypassing xlsxC/xlsxSI/trimCellValue/bstrMarshal/xml.EscapeText. Handles xml:space="preserve" for leading/trailing whitespace. Falls back to slow path for _xHHHH_ escape patterns or strings exceeding TotalCellChars.
  • writeEscaped: custom XML escaper that scans for <>&"\r; fast-path writes directly when no special chars are found, slow-path does character-by-character replacement (still zero-alloc).
  • writeCellStart: helper to deduplicate the <c r="…" opening across fast and slow paths.
  • writeCell: now takes *xlsxC by pointer plus pre-split colName/rowStr, eliminating a ~184-byte struct copy and a string concatenation per cell. Skips xml.EscapeText for numeric/boolean <v> values (digits, ., -, +, E are always safe).
  • setCellValFunc: inlines all integer type cases directly, removing a redundant second type-switch dispatch through setCellIntFunc.
  • marshalAttrs: writes directly to *bufferedWriter (no intermediate strings.Builder). Row option validation split into validateRowOpts so XML is only written after validation passes.
  • parseRowOpts: returns RowOpts by value instead of *RowOpts.
  • streamCellStyle / colStyles []int: column style lookup is now O(1) via a cached slice built once in writeSheetData, replacing a per-cell O(N) linear scan of worksheet.Cols.

stream.go — bufferedWriter memory architecture

  • Two-phase architecture: below the threshold, all writes go to an in-memory bytes.Buffer. Once the threshold is crossed, the buffer is drained to a temp file exactly once and all subsequent writes flow through a fixed-size bufio.Writer wrapping the file. This bounds peak heap usage to approximately StreamingChunkSize + bioSize regardless of total data size, compared to the previous approach which re-grew a new bytes.Buffer to the threshold size on every flush cycle.
  • Sync() is now a no-op when bio != nilbufio.Writer flushes internally when its buffer is full; forcing a flush on every SetRow call (the previous behavior) negated all batching benefit.
  • New methods: Bytes(), Reset(), CopyTo(w io.Writer), WriteInt(int64), WriteUint(uint64), WriteFloat(float64, ...).
  • CopyTo: uses a 256 KiB buffered reader to minimize Pread syscalls when copying from temp files (reduces ~3000 syscalls to ~400 for a 100 MB worksheet).
  • scratch [24]byte: used by WriteInt, WriteUint, WriteFloat to format numbers without heap allocation.

file.go — streaming WriteTo & compression

  • WriteTo rewrite: non-encrypted path now streams the ZIP directly to w via a countWriter wrapper — no intermediate bytes.Buffer. Encrypted path delegates to new writeToWithEncryption, which writes ZIP to a temp file, applies ZIP64 LFH fixup, reads back, encrypts, and writes to w.
  • WriteToBuffer: now calls configureZipCompression and only performs ZIP64 LFH fixup when len(f.zip64Entries) > 0.
  • writeToZip: replaces stream.rawData.Reader() + io.Copy with stream.rawData.CopyTo(fi) (uses the new efficient copy path).
  • writeZip64LFHFile: performs ZIP64 local file header fixup on a temp file (chunk-based, 1 MB reads) instead of requiring an in-memory buffer.

excelize.go — options & compression

  • New type Compression int with three constants: CompressionDefault, CompressionNone, CompressionBestSpeed.
  • New fields on Options: StreamingChunkSize int, StreamingBufSize int, Compression Compression. All are zero-by-default (zero → use package constants), so existing callers are completely unaffected. StreamingChunkSize: -1 keeps all data in memory (never spills to disk).
  • configureZipCompression: registers a custom flate.NewWriter compressor on *zip.Writer based on the Compression option.
  • Replaced archive/zip import with github.com/klauspost/compress/zip and github.com/klauspost/compress/flate.

templates.go

  • Added StreamingBufSizeDefault = 128 << 10 (128 KiB). Value determined empirically via BenchmarkBioSizeSweep and TestBioSizeIOProfile.

go.mod

  • Added github.com/klauspost/compress v1.18.5 — a high-performance, pure-Go drop-in replacement for archive/zip and compress/flate.

Related Issue

Fixes #876 — High memory when writing 1 million number of rows

Motivation and Context

User-reported profiling showed that generating large worksheets via StreamWriter was dominated by per-cell allocations in the column-name conversion functions and by unbounded bytes.Buffer growth in the write buffer. For a 100-column × 50 000-row sheet (~150 MB of XML), the previous code allocated 162 MB peak and made 15.1 million allocations. The WriteTo path then buffered the entire compressed ZIP in a bytes.Buffer before writing, adding another 50–200 MB of peak memory on top.

How Has This Been Tested

All existing tests pass (go test ./...).

New tests

Test What it covers
TestStreamingWriteTo Verifies WriteTo streams correctly without password; round-trips 100×10 sheet
TestCompressionOption Generates 500×20 sheet at Default/None/BestSpeed; asserts size ordering; validates all are readable XLSX
TestWriteToBufferCompression Verifies WriteToBuffer respects CompressionNone
TestWriteToWithPassword Round-trips encrypted file via WriteTo with password
TestWriteToWithPasswordAndCompression Combines password encryption with CompressionBestSpeed
TestBioSizeIOProfile Instruments write-syscall counts and bytes at 10 bufio.Writer sizes (4 KiB – 4 MiB) to project performance on different storage tiers
BenchmarkBioSizeSweep Measures ns/op and B/op across 10 bufio.Writer sizes for a 50K×100 sheet
BenchmarkStringCellClean/Special Measures writeEscaped fast path vs slow path
BenchmarkCompressionLevels 50K×20 string sheet at Default/BestSpeed/None, each with disk-spill and in-memory variants (6 sub-benchmarks)
BenchmarkStreamWriterLarge/Huge 10K×50 and 50K×100 integer-cell benchmarks for regression tracking
BenchmarkExcelize* (9 sizes) Full pipeline (build data + SetRow + WriteTo) adapted from mzimmerman/excelizetest

Benchmark results

Platform: Apple M1 Pro, macOS, Go 1.24, arm64
Methodology: go test -run=^$ -bench=... -benchmem -count=3, median of 3 runs

Streaming write path (SetRow + Flush + Close, no WriteTo)

Benchmark master ns/op PR ns/op Δ CPU master B/op PR B/op Δ Mem master allocs PR allocs Δ Allocs
StreamWriter (100×10) 238.7 µs 64.2 µs −73% 101.9 KB 86.1 KB −16% 2.3K 147 −94%
StreamWriterLarge (50K×10) 74.9 ms 24.2 ms −68% 55.0 MB 42.4 MB −23% 1.65M 33.3K −98%
StreamWriterHuge (50K×100) 851.9 ms 271.1 ms −68% 162.2 MB 43.2 MB −73% 15.14M 153.3K −99%
StringCellClean (50K×10) 181.1 ms 61.2 ms −66% 163.6 MB 42.5 MB −74% 3.65M 33.3K −99%
StringCellSpecial (50K×10) 337.7 ms 70.7 ms −79% 223.8 MB 42.5 MB −81% 5.15M 33.3K −99%

Full pipeline (build string data + SetRow + WriteTo to buffer)

Benchmark master ns/op PR ns/op Δ CPU master B/op PR B/op Δ Mem master allocs PR allocs Δ Allocs
Excelize 1K×10 9.1 ms 2.7 ms −70% 4.9 MB 2.1 MB −57% 86.0K 16.9K −80%
Excelize 10K×10 66.1 ms 21.5 ms −67% 47.4 MB 23.3 MB −51% 833.0K 133.9K −84%
Excelize 100K×100 7.17 s 2.03 s −72% 2.54 GB 339.7 MB −87% 80.29M 10.30M −87%

Compression options (PR only, 50K×20 string rows, full WriteTo)

Mode Time Memory Allocs
Default (temp file) 383.5 ms 68.7 MB 1.15M
BestSpeed (temp file) 285.9 ms 83.7 MB 1.15M
None (temp file) 249.7 ms 213.6 MB 1.15M
Default (in-memory) 271.3 ms 194.2 MB 1.15M
BestSpeed (in-memory) 254.1 ms 209.1 MB 1.15M
None (in-memory) 178.8 ms 339.1 MB 1.15M
Metric master (sum) PR (sum) Δ
CPU time 8.69 s 2.48 s −71%
Memory 3.20 GB 536 MB −83%
Allocations 106.8M 10.7M −90%

Real-world scenario (the Excelize 100K×100 full pipeline) is:

  • 7.17 s → 2.03 s (−72% CPU)
  • 2.54 GB → 340 MB (−87% memory)
  • 80.3M → 10.3M allocs (−87%)

Key takeaways

  • StreamWriterHuge (50K×100): −68% CPU, −73% memory (162 MB → 43 MB), −99% allocs (15.1M → 153K)
  • Excelize 100K×100 full pipeline: 7.17 s → 2.03 s (−72%), 2.54 GB → 340 MB (−87%), 80M → 10M allocs (−87%)
  • XML-escaped strings (50K×10): −79% CPU, −81% memory — the writeEscaped zero-alloc path eliminates per-character allocations
  • Allocation reduction is the biggest win across the board: 94–99% fewer allocs in all streaming benchmarks
  • Memory is now bounded: peak ≈ StreamingChunkSize + bioSize regardless of total data size

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@AdamDrewsTR AdamDrewsTR marked this pull request as draft April 7, 2026 22:12
@AdamDrewsTR AdamDrewsTR marked this pull request as ready for review April 8, 2026 18:07
@xuri xuri added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Apr 9, 2026
- Implement GetSheetStats and CalculateSheetStats methods to retrieve
  statistics for worksheets, including row count, column count, cell count,
  and maximum cell reference.
- Enhance StreamWriter to support shared strings, allowing for more efficient
  storage of string values in Excel files.
- Update dimension placeholder during streaming to reflect actual dimensions
  after data is written.
- Add tests for sheet statistics and shared strings functionality.
@AdamDrewsTR
Copy link
Copy Markdown
Author

++ Add support for sheet statistics and shared strings in streaming writes

  • Implement GetSheetStats and CalculateSheetStats methods to retrieve
    statistics for worksheets, including row count, column count, cell count,
    and maximum cell reference.
  • Enhance StreamWriter to support shared strings, allowing for more efficient
    storage of string values in Excel files.
  • Update dimension placeholder during streaming to reflect actual dimensions
    after data is written.
  • Add tests for sheet statistics and shared strings functionality.

@dolmen
Copy link
Copy Markdown
Contributor

dolmen commented Apr 19, 2026

AI slop.

Submit smaller PRs, focused on one change.

Copy link
Copy Markdown
Contributor

@dolmen dolmen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not reviewable. Split!

@ChronosMasterOfAllTime
Copy link
Copy Markdown

AI slop.

Submit smaller PRs, focused on one change.

@dolmen

That's not constructive feedback and doesn't help anyone. What are some areas of concern that you see? Yes it's clear AI was used in this PR, but it would be helpful to understand if you identified any non-idiomatic patterns or design concerns. How would you split this PR?

@AdamDrewsTR
Copy link
Copy Markdown
Author

AI slop.

Submit smaller PRs, focused on one change.

That is fine. We'll keep using our much more performant fork. Feel free to close.

@ChronosMasterOfAllTime
Copy link
Copy Markdown

ChronosMasterOfAllTime commented Apr 20, 2026

This is what a linear, more separated PR of the enhancements would look like. I am worried it will create PR fatigue and the chain is dependent on another.

First PR: Optimize Streaming Write Performance for Large Data Sets

Second PR: Add Compression Options and Optimize Output Pipeline

Third PR: Add Fast Read Mode, Shared Strings, and Sheet Statistics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

High memory when writing 1million number of rows

4 participants