Add streaming options and improve performance for large data writes#2288
Add streaming options and improve performance for large data writes#2288AdamDrewsTR wants to merge 6 commits intoqax-os:masterfrom
Conversation
c00076b to
2a613f4
Compare
- Implement GetSheetStats and CalculateSheetStats methods to retrieve statistics for worksheets, including row count, column count, cell count, and maximum cell reference. - Enhance StreamWriter to support shared strings, allowing for more efficient storage of string values in Excel files. - Update dimension placeholder during streaming to reflect actual dimensions after data is written. - Add tests for sheet statistics and shared strings functionality.
|
++ Add support for sheet statistics and shared strings in streaming writes
|
|
AI slop. Submit smaller PRs, focused on one change. |
That's not constructive feedback and doesn't help anyone. What are some areas of concern that you see? Yes it's clear AI was used in this PR, but it would be helpful to understand if you identified any non-idiomatic patterns or design concerns. How would you split this PR? |
That is fine. We'll keep using our much more performant fork. Feel free to close. |
|
This is what a linear, more separated PR of the enhancements would look like. I am worried it will create PR fatigue and the chain is dependent on another. First PR: Optimize Streaming Write Performance for Large Data Sets Second PR: Add Compression Options and Optimize Output Pipeline Third PR: Add Fast Read Mode, Shared Strings, and Sheet Statistics |
Description
Performance and memory optimization of the streaming write path (
StreamWriter), theWriteTo/WriteToBufferoutput pipeline, and the ZIP compression layer. Profiling against the mzimmerman/excelizetest benchmark suite identified four hot spots —ColumnNumberToName,CoordinatesToCellName,SetRow, andwriteCell— which together accounted for the majority of CPU time and nearly all heap allocations per row.Summary of improvements
Changes
lib.go — precomputed column names
columnNames: a package-level precomputed lookup table of all 16 384 column name strings (A–XFD), initialized once at startup via an IIFE.ColumnNumberToNamenow returns a slice element instead of allocating a new string on every call.CoordinatesToCellNameto early-return on the common (non-absolute) path, avoiding concatenation with an emptysignvariable.readXMLto use the newbufferedWriter.Bytes()method instead of accessing.bufdirectly.archive/zipimport withgithub.com/klauspost/compress/zip.stream.go — hot-loop optimizations
SetRowrewrite: precomputesrowStronce per row (previously per cell viaCoordinatesToCellName), reuses a singlexlsxCstruct across the inner loop (fields zeroed per iteration instead of allocating a new struct), and reads column names directly from thecolumnNamestable.writeNumericCell: zero-allocation fast path forint,int8–int64,uint–uint64,float32,float64, andbool. Writes the complete<c>element directly to the buffer usingstrconv.Append*into a[24]bytescratch field, bypassingxlsxCentirely.writeStringCell: zero-allocation fast path forstringand[]byte. Writes inline-string XML directly, bypassingxlsxC/xlsxSI/trimCellValue/bstrMarshal/xml.EscapeText. Handlesxml:space="preserve"for leading/trailing whitespace. Falls back to slow path for_xHHHH_escape patterns or strings exceedingTotalCellChars.writeEscaped: custom XML escaper that scans for<>&"\r; fast-path writes directly when no special chars are found, slow-path does character-by-character replacement (still zero-alloc).writeCellStart: helper to deduplicate the<c r="…"opening across fast and slow paths.writeCell: now takes*xlsxCby pointer plus pre-splitcolName/rowStr, eliminating a ~184-byte struct copy and a string concatenation per cell. Skipsxml.EscapeTextfor numeric/boolean<v>values (digits,.,-,+,Eare always safe).setCellValFunc: inlines all integer type cases directly, removing a redundant second type-switch dispatch throughsetCellIntFunc.marshalAttrs: writes directly to*bufferedWriter(no intermediatestrings.Builder). Row option validation split intovalidateRowOptsso XML is only written after validation passes.parseRowOpts: returnsRowOptsby value instead of*RowOpts.streamCellStyle/colStyles []int: column style lookup is now O(1) via a cached slice built once inwriteSheetData, replacing a per-cell O(N) linear scan ofworksheet.Cols.stream.go —
bufferedWritermemory architecturebytes.Buffer. Once the threshold is crossed, the buffer is drained to a temp file exactly once and all subsequent writes flow through a fixed-sizebufio.Writerwrapping the file. This bounds peak heap usage to approximatelyStreamingChunkSize + bioSizeregardless of total data size, compared to the previous approach which re-grew a newbytes.Bufferto the threshold size on every flush cycle.Sync()is now a no-op whenbio != nil—bufio.Writerflushes internally when its buffer is full; forcing a flush on everySetRowcall (the previous behavior) negated all batching benefit.Bytes(),Reset(),CopyTo(w io.Writer),WriteInt(int64),WriteUint(uint64),WriteFloat(float64, ...).CopyTo: uses a 256 KiB buffered reader to minimize Pread syscalls when copying from temp files (reduces ~3000 syscalls to ~400 for a 100 MB worksheet).scratch [24]byte: used byWriteInt,WriteUint,WriteFloatto format numbers without heap allocation.file.go — streaming WriteTo & compression
WriteTorewrite: non-encrypted path now streams the ZIP directly towvia acountWriterwrapper — no intermediatebytes.Buffer. Encrypted path delegates to newwriteToWithEncryption, which writes ZIP to a temp file, applies ZIP64 LFH fixup, reads back, encrypts, and writes tow.WriteToBuffer: now callsconfigureZipCompressionand only performs ZIP64 LFH fixup whenlen(f.zip64Entries) > 0.writeToZip: replacesstream.rawData.Reader()+io.Copywithstream.rawData.CopyTo(fi)(uses the new efficient copy path).writeZip64LFHFile: performs ZIP64 local file header fixup on a temp file (chunk-based, 1 MB reads) instead of requiring an in-memory buffer.excelize.go — options & compression
Compression intwith three constants:CompressionDefault,CompressionNone,CompressionBestSpeed.Options:StreamingChunkSize int,StreamingBufSize int,Compression Compression. All are zero-by-default (zero → use package constants), so existing callers are completely unaffected.StreamingChunkSize: -1keeps all data in memory (never spills to disk).configureZipCompression: registers a customflate.NewWritercompressor on*zip.Writerbased on theCompressionoption.archive/zipimport withgithub.com/klauspost/compress/zipandgithub.com/klauspost/compress/flate.templates.go
StreamingBufSizeDefault = 128 << 10(128 KiB). Value determined empirically viaBenchmarkBioSizeSweepandTestBioSizeIOProfile.go.mod
github.com/klauspost/compress v1.18.5— a high-performance, pure-Go drop-in replacement forarchive/zipandcompress/flate.Related Issue
Fixes #876 — High memory when writing 1 million number of rows
Motivation and Context
User-reported profiling showed that generating large worksheets via
StreamWriterwas dominated by per-cell allocations in the column-name conversion functions and by unboundedbytes.Buffergrowth in the write buffer. For a 100-column × 50 000-row sheet (~150 MB of XML), the previous code allocated 162 MB peak and made 15.1 million allocations. TheWriteTopath then buffered the entire compressed ZIP in abytes.Bufferbefore writing, adding another 50–200 MB of peak memory on top.How Has This Been Tested
All existing tests pass (
go test ./...).New tests
TestStreamingWriteToTestCompressionOptionTestWriteToBufferCompressionWriteToBufferrespectsCompressionNoneTestWriteToWithPasswordTestWriteToWithPasswordAndCompressionCompressionBestSpeedTestBioSizeIOProfilebufio.Writersizes (4 KiB – 4 MiB) to project performance on different storage tiersBenchmarkBioSizeSweepbufio.Writersizes for a 50K×100 sheetBenchmarkStringCellClean/SpecialwriteEscapedfast path vs slow pathBenchmarkCompressionLevelsBenchmarkStreamWriterLarge/HugeBenchmarkExcelize*(9 sizes)Benchmark results
Platform: Apple M1 Pro, macOS, Go 1.24, arm64
Methodology:
go test -run=^$ -bench=... -benchmem -count=3, median of 3 runsStreaming write path (SetRow + Flush + Close, no WriteTo)
Full pipeline (build string data + SetRow + WriteTo to buffer)
Compression options (PR only, 50K×20 string rows, full WriteTo)
Real-world scenario (the Excelize 100K×100 full pipeline) is:
Key takeaways
writeEscapedzero-alloc path eliminates per-character allocationsStreamingChunkSize + bioSizeregardless of total data sizeTypes of changes
Checklist