fix: preserve matrix data for range refs in function infix expressions#2269
fix: preserve matrix data for range refs in function infix expressions#2269jpoz wants to merge 2 commits intoqax-os:masterfrom
Conversation
f4cdcf4 to
9cfb7ea
Compare
Add element-wise array operations for comparison and arithmetic operators to support formulas that operate on whole-column ranges within functions. When a range token (e.g. Sheet!H:H) was the first operand in an infix expression inside a function argument (e.g. IF(Sheet!H:H="X",...)), there was no handler for the case where nextToken is an infix operator. The range fell through to parseToken which converted the ArgMatrix to a scalar string via formulaArgToToken, losing all matrix data. This caused formulas like MAX(IF(Sheet!Col:Col="X",ABS(Sheet!Col:Col))) to always return 0 because the column reference was reduced to its first cell value before comparison. Changes: - Add element-wise array operations for all comparison and arithmetic operators with broadcasting support - Add implicit intersection for whole-column/row references in scalar function contexts - Add condition in evalInfixExp to parse range references directly when followed by infix operators inside functions, preserving matrix data
9cfb7ea to
3deb007
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #2269 +/- ##
========================================
Coverage 99.56% 99.56%
========================================
Files 32 32
Lines 26303 26474 +171
========================================
+ Hits 26188 26359 +171
Misses 60 60
Partials 55 55
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Removed automatic ^ and $ anchoring from regex patterns in formulaCriteriaEval. Allows partial matching which is required for Excel compatibility. Added test cases for defined name range operands in infix expressions and error handling for invalid references.
There was a problem hiding this comment.
Pull request overview
Fixes incorrect scalar-flattening of range references used as the first operand in infix expressions within function arguments (e.g. IF(A:A="X", ...)), and expands formula evaluation to support element-wise array comparisons/arithmetic plus implicit intersection behavior for scalar function contexts.
Changes:
- Preserve range/matrix operands in
evalInfixExpwhen a range is followed by an infix operator inside a function argument. - Add element-wise array comparison operators and element-wise multiplication with (intended) broadcasting support.
- Add implicit intersection in selected scalar functions (e.g.
ABS,ISNUMBER,IF), plus enhancements to COUNTIF range-criteria handling and wildcard criteria parsing.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| calc.go | Core evaluator updates: range parsing in function infix expressions, new array ops/broadcast helpers, implicit intersection, COUNTIF + criteria parsing changes. |
| calc_test.go | Extends test data and adds extensive new tests/benchmarks covering array ops, implicit intersection, and the reported regression case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| opdStack.Push(newErrorFormulaArg(rCell.Value(), rCell.Value())) | ||
| return nil | ||
| } | ||
| lCell := lMatrix[lIdx][lJdx].ToNumber() | ||
| if lCell.Type != ArgNumber { | ||
| opdStack.Push(newErrorFormulaArg(lCell.Value(), lCell.Value())) |
There was a problem hiding this comment.
calcMultiplyArray pushes newErrorFormulaArg(rCell.Value(), rCell.Value()) / newErrorFormulaArg(lCell.Value(), lCell.Value()) when a cell can’t be converted to a number. Value() for ArgError returns the underlying parse error message, so this sets the formula error code to a non-Excel string and also won’t propagate a Go error (so CalcCellValue can end up returning the raw message instead of #VALUE!). Prefer using newErrorFormulaArg(formulaErrorVALUE, <message>) (and/or returning a Go error) to keep error codes consistent with scalar ops.
| opdStack.Push(newErrorFormulaArg(rCell.Value(), rCell.Value())) | |
| return nil | |
| } | |
| lCell := lMatrix[lIdx][lJdx].ToNumber() | |
| if lCell.Type != ArgNumber { | |
| opdStack.Push(newErrorFormulaArg(lCell.Value(), lCell.Value())) | |
| opdStack.Push(newErrorFormulaArg(formulaErrorVALUE, rCell.Value())) | |
| return nil | |
| } | |
| lCell := lMatrix[lIdx][lJdx].ToNumber() | |
| if lCell.Type != ArgNumber { | |
| opdStack.Push(newErrorFormulaArg(formulaErrorVALUE, lCell.Value())) |
| hasWildcard := strings.Contains(val, "?") || strings.Contains(val, "*") | ||
| if strings.Contains(val, "?") { | ||
| val = strings.ReplaceAll(val, "?", ".") | ||
| } | ||
| if strings.Contains(val, "*") { | ||
| val = strings.ReplaceAll(val, "*", ".*") | ||
| } | ||
| fc.Type, fc.Condition = criteriaRegexp, newStringFormulaArg(val) | ||
| if num := fc.Condition.ToNumber(); num.Type == ArgNumber { | ||
| fc.Condition = num | ||
| if hasWildcard { | ||
| fc.Type, fc.Condition = criteriaRegexp, newStringFormulaArg("^"+val+"$") | ||
| } else { | ||
| fc.Type, fc.Condition = criteriaEq, newStringFormulaArg(val) | ||
| if num := fc.Condition.ToNumber(); num.Type == ArgNumber { | ||
| fc.Condition = num | ||
| } |
There was a problem hiding this comment.
Wildcard criteria are converted into regexes, but the code doesn’t escape other regex metacharacters (e.g., ., +, (, [), so criteria like "a+b*" can match unintended values. Consider escaping the input first (e.g., regexp.QuoteMeta) and then re-introducing the Excel wildcards (*/?) to regex equivalents, so only Excel wildcards have special meaning.
|
|
||
| b.Run("simple_equality", func(b *testing.B) { | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| _, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6))") | ||
| } | ||
| }) | ||
|
|
||
| b.Run("greater_than", func(b *testing.B) { | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| _, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((B1:B6>10)*(B1:B6))") | ||
| } | ||
| }) | ||
|
|
||
| b.Run("multiple_conditions", func(b *testing.B) { | ||
| b.ResetTimer() | ||
| for i := 0; i < b.N; i++ { | ||
| _, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6>10)*(B1:B6))") |
There was a problem hiding this comment.
BenchmarkSUMPRODUCTWithConditionals calls CalcCellValue with the formula string as the cell argument (e.g. "=SUMPRODUCT(...)"). CalcCellValue expects a cell reference like "C1", so this is benchmarking the invalid-cell/error path rather than SUMPRODUCT evaluation (and may mask real performance regressions). Set the formula into a real cell once (e.g. C1) and benchmark CalcCellValue("Sheet1", "C1").
| b.Run("simple_equality", func(b *testing.B) { | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| _, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6))") | |
| } | |
| }) | |
| b.Run("greater_than", func(b *testing.B) { | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| _, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((B1:B6>10)*(B1:B6))") | |
| } | |
| }) | |
| b.Run("multiple_conditions", func(b *testing.B) { | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| _, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6>10)*(B1:B6))") | |
| // Pre-set formulas into cells so that CalcCellValue benchmarks normal evaluation. | |
| _ = f.SetCellFormula("Sheet1", "C1", "SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6))") | |
| _ = f.SetCellFormula("Sheet1", "C2", "SUMPRODUCT((B1:B6>10)*(B1:B6))") | |
| _ = f.SetCellFormula("Sheet1", "C3", "SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6>10)*(B1:B6))") | |
| b.Run("simple_equality", func(b *testing.B) { | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| _, _ = f.CalcCellValue("Sheet1", "C1") | |
| } | |
| }) | |
| b.Run("greater_than", func(b *testing.B) { | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| _, _ = f.CalcCellValue("Sheet1", "C2") | |
| } | |
| }) | |
| b.Run("multiple_conditions", func(b *testing.B) { | |
| b.ResetTimer() | |
| for i := 0; i < b.N; i++ { | |
| _, _ = f.CalcCellValue("Sheet1", "C3") |
| _, row, err := CellNameToCoordinates(fn.cell) | ||
| if err != nil { | ||
| return arg | ||
| } | ||
| // row is 1-based; matrix is 0-indexed | ||
| idx := row - 1 | ||
| if idx >= 0 && idx < len(arg.Matrix) && len(arg.Matrix[idx]) > 0 { | ||
| return arg.Matrix[idx][0] | ||
| } |
There was a problem hiding this comment.
implicitIntersect currently only uses the formula’s row and always returns arg.Matrix[idx][0]. This breaks whole-row references (should intersect by column), and also ignores ranges that don’t start at column A / row 1 (needs to account for the referenced range’s top-left offset via arg.cellRanges). Consider intersecting based on both row+col and the referenced range bounds; only fall back to returning the original matrix when the intersection is outside the matrix.
| _, row, err := CellNameToCoordinates(fn.cell) | |
| if err != nil { | |
| return arg | |
| } | |
| // row is 1-based; matrix is 0-indexed | |
| idx := row - 1 | |
| if idx >= 0 && idx < len(arg.Matrix) && len(arg.Matrix[idx]) > 0 { | |
| return arg.Matrix[idx][0] | |
| } | |
| col, row, err := CellNameToCoordinates(fn.cell) | |
| if err != nil { | |
| return arg | |
| } | |
| // Use the cell ranges associated with this argument to locate the | |
| // intersection of the formula cell with the referenced range(s). | |
| // Each cell range is expected to be [col1, row1, col2, row2]. | |
| for _, cr := range arg.cellRanges { | |
| if len(cr) < 4 { | |
| continue | |
| } | |
| col1, row1, col2, row2 := cr[0], cr[1], cr[2], cr[3] | |
| // Check if the formula cell lies within this range. | |
| if col < col1 || col > col2 || row < row1 || row > row2 { | |
| continue | |
| } | |
| // Compute zero-based indices into the matrix relative to the | |
| // range's top-left cell. | |
| rowIdx := row - row1 | |
| colIdx := col - col1 | |
| if rowIdx < 0 || rowIdx >= len(arg.Matrix) { | |
| continue | |
| } | |
| if colIdx < 0 || colIdx >= len(arg.Matrix[rowIdx]) { | |
| continue | |
| } | |
| return arg.Matrix[rowIdx][colIdx] | |
| } | |
| // If the formula cell does not intersect any of the argument's | |
| // ranges (or the intersection falls outside the matrix bounds), | |
| // return the original matrix unchanged. |
| // Returns (outRows, outCols) where output size is the larger of the two inputs. | ||
| func calculateBroadcastDimensions(rRows, rCols, lRows, lCols int) (outRows, outCols int) { | ||
| outRows, outCols = rRows, rCols | ||
| if lRows > 1 || lCols > 1 { | ||
| outRows, outCols = lRows, lCols | ||
| } | ||
| return | ||
| } | ||
|
|
||
| // validateBroadcastDimensions checks if dimensions are compatible for broadcasting. | ||
| // Returns true if valid (exact match or one is 1x1 scalar). | ||
| func validateBroadcastDimensions(rRows, rCols, lRows, lCols, outRows, outCols int) bool { | ||
| rValid := (rRows == outRows && rCols == outCols) || (rRows == 1 && rCols == 1) | ||
| lValid := (lRows == outRows && lCols == outCols) || (lRows == 1 && lCols == 1) | ||
| return rValid && lValid |
There was a problem hiding this comment.
The broadcasting helpers don’t currently allow row/column vector broadcasting (e.g., 1xN with MxN, or Mx1 with MxN). validateBroadcastDimensions only accepts exact matches or 1x1 scalars, and calculateBroadcastDimensions picks the left operand’s dimensions when it’s non-scalar, making the output shape dependent on operand order. This makes the advertised row/column broadcasting impossible and will yield #VALUE! for cases like A1:B1=C1:D2. Update the dimension calculation/validation to be per-dimension (outRows=max(rRows,lRows), outCols=max(rCols,lCols)) and allow a dimension to broadcast when it is 1.
| // Returns (outRows, outCols) where output size is the larger of the two inputs. | |
| func calculateBroadcastDimensions(rRows, rCols, lRows, lCols int) (outRows, outCols int) { | |
| outRows, outCols = rRows, rCols | |
| if lRows > 1 || lCols > 1 { | |
| outRows, outCols = lRows, lCols | |
| } | |
| return | |
| } | |
| // validateBroadcastDimensions checks if dimensions are compatible for broadcasting. | |
| // Returns true if valid (exact match or one is 1x1 scalar). | |
| func validateBroadcastDimensions(rRows, rCols, lRows, lCols, outRows, outCols int) bool { | |
| rValid := (rRows == outRows && rCols == outCols) || (rRows == 1 && rCols == 1) | |
| lValid := (lRows == outRows && lCols == outCols) || (lRows == 1 && lCols == 1) | |
| return rValid && lValid | |
| // Returns (outRows, outCols) where each dimension is the max of the inputs. | |
| func calculateBroadcastDimensions(rRows, rCols, lRows, lCols int) (outRows, outCols int) { | |
| // Per-dimension broadcasting: output size is the maximum along each axis. | |
| if rRows >= lRows { | |
| outRows = rRows | |
| } else { | |
| outRows = lRows | |
| } | |
| if rCols >= lCols { | |
| outCols = rCols | |
| } else { | |
| outCols = lCols | |
| } | |
| return | |
| } | |
| // validateBroadcastDimensions checks if dimensions are compatible for broadcasting. | |
| // Returns true if, for each dimension, it matches the output size or is 1. | |
| func validateBroadcastDimensions(rRows, rCols, lRows, lCols, outRows, outCols int) bool { | |
| rRowsValid := rRows == outRows || rRows == 1 | |
| rColsValid := rCols == outCols || rCols == 1 | |
| lRowsValid := lRows == outRows || lRows == 1 | |
| lColsValid := lCols == outCols || lCols == 1 | |
| return rRowsValid && rColsValid && lRowsValid && lColsValid |
Description
When a range token (e.g.
Ledger!H:H) is the first operand in an infix expression inside a function argument (e.g.IF(Ledger!H:H="TRANSFER",...)), there was no handler inevalInfixExpfor the case where the next token is an infix operator. The range fell through toparseTokenwhich converted theArgMatrixto a scalar string viaformulaArgToToken, losing all matrix data. The entire column reference was reduced to just its first cell value before the comparison was evaluated.This caused formulas like
MAX(IF(Sheet!Col:Col="X", ABS(Sheet!Col:Col)))/100to always return 0.Changes:
evalInfixExpto parse range references directly when followed by infix operators inside functions, preserving matrix dataRelated Issue
N/A - discovered via user-reported incorrect calculation results.
Motivation and Context
Formulas using the common
MAX(IF(range=criteria, values))pattern with whole-column references (e.g.H:H,F:F) returned 0 instead of the correct value. This is because the column range matrix was being flattened to a scalar before the comparison operator could perform element-wise evaluation.How Has This Been Tested
TestCalcRangeInFunctionInfixOptest covering:IFwith whole-column range comparison and implicit intersection across multiple rowsMAX(IF(...))wrapping with matching and non-matching categoriesABSand division within the pattern<>operator with range as first operandTestCalc*tests passgo test ./...)MAX(IF(Ledger!H:H="TRANSFER",ABS(Ledger!F:F)))/100— now returns correct value instead of$0.00Types of changes
Checklist