Skip to content

fix: preserve matrix data for range refs in function infix expressions#2269

Open
jpoz wants to merge 2 commits intoqax-os:masterfrom
jpoz:jpoz/fix-range-in-function-infix-op
Open

fix: preserve matrix data for range refs in function infix expressions#2269
jpoz wants to merge 2 commits intoqax-os:masterfrom
jpoz:jpoz/fix-range-in-function-infix-op

Conversation

@jpoz
Copy link
Copy Markdown
Contributor

@jpoz jpoz commented Feb 28, 2026

Description

When a range token (e.g. Ledger!H:H) is the first operand in an infix expression inside a function argument (e.g. IF(Ledger!H:H="TRANSFER",...)), there was no handler in evalInfixExp for the case where the next token is an infix operator. The range fell through to parseToken which converted the ArgMatrix to a scalar string via formulaArgToToken, losing all matrix data. The entire column reference was reduced to just its first cell value before the comparison was evaluated.

This caused formulas like MAX(IF(Sheet!Col:Col="X", ABS(Sheet!Col:Col)))/100 to always return 0.

Changes:

  • Add element-wise array operations for all comparison and arithmetic operators with broadcasting support
  • Add implicit intersection for whole-column/row references in scalar function contexts
  • Add condition in evalInfixExp to parse range references directly when followed by infix operators inside functions, preserving matrix data

Related Issue

N/A - discovered via user-reported incorrect calculation results.

Motivation and Context

Formulas using the common MAX(IF(range=criteria, values)) pattern with whole-column references (e.g. H:H, F:F) returned 0 instead of the correct value. This is because the column range matrix was being flattened to a scalar before the comparison operator could perform element-wise evaluation.

How Has This Been Tested

  • Added TestCalcRangeInFunctionInfixOp test covering:
    • IF with whole-column range comparison and implicit intersection across multiple rows
    • MAX(IF(...)) wrapping with matching and non-matching categories
    • ABS and division within the pattern
    • <> operator with range as first operand
  • All existing TestCalc* tests pass
  • Full test suite passes (go test ./...)
  • Verified with an xlsx file containing MAX(IF(Ledger!H:H="TRANSFER",ABS(Ledger!F:F)))/100 — now returns correct value instead of $0.00

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@jpoz jpoz force-pushed the jpoz/fix-range-in-function-infix-op branch from f4cdcf4 to 9cfb7ea Compare February 28, 2026 01:39
Add element-wise array operations for comparison and arithmetic operators
to support formulas that operate on whole-column ranges within functions.

When a range token (e.g. Sheet!H:H) was the first operand in an infix
expression inside a function argument (e.g. IF(Sheet!H:H="X",...)), there
was no handler for the case where nextToken is an infix operator. The range
fell through to parseToken which converted the ArgMatrix to a scalar string
via formulaArgToToken, losing all matrix data.

This caused formulas like MAX(IF(Sheet!Col:Col="X",ABS(Sheet!Col:Col)))
to always return 0 because the column reference was reduced to its first
cell value before comparison.

Changes:
- Add element-wise array operations for all comparison and arithmetic
  operators with broadcasting support
- Add implicit intersection for whole-column/row references in scalar
  function contexts
- Add condition in evalInfixExp to parse range references directly when
  followed by infix operators inside functions, preserving matrix data
@jpoz jpoz force-pushed the jpoz/fix-range-in-function-infix-op branch from 9cfb7ea to 3deb007 Compare February 28, 2026 01:49
@xuri xuri added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Feb 28, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.56%. Comparing base (3e9bc14) to head (5247e5b).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff            @@
##           master    #2269    +/-   ##
========================================
  Coverage   99.56%   99.56%            
========================================
  Files          32       32            
  Lines       26303    26474   +171     
========================================
+ Hits        26188    26359   +171     
  Misses         60       60            
  Partials       55       55            
Flag Coverage Δ
unittests 99.56% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Removed automatic ^ and $ anchoring from regex patterns in
formulaCriteriaEval. Allows partial matching which is required
for Excel compatibility.

Added test cases for defined name range operands in infix
expressions and error handling for invalid references.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect scalar-flattening of range references used as the first operand in infix expressions within function arguments (e.g. IF(A:A="X", ...)), and expands formula evaluation to support element-wise array comparisons/arithmetic plus implicit intersection behavior for scalar function contexts.

Changes:

  • Preserve range/matrix operands in evalInfixExp when a range is followed by an infix operator inside a function argument.
  • Add element-wise array comparison operators and element-wise multiplication with (intended) broadcasting support.
  • Add implicit intersection in selected scalar functions (e.g. ABS, ISNUMBER, IF), plus enhancements to COUNTIF range-criteria handling and wildcard criteria parsing.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
calc.go Core evaluator updates: range parsing in function infix expressions, new array ops/broadcast helpers, implicit intersection, COUNTIF + criteria parsing changes.
calc_test.go Extends test data and adds extensive new tests/benchmarks covering array ops, implicit intersection, and the reported regression case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread calc.go
Comment on lines +1547 to +1552
opdStack.Push(newErrorFormulaArg(rCell.Value(), rCell.Value()))
return nil
}
lCell := lMatrix[lIdx][lJdx].ToNumber()
if lCell.Type != ArgNumber {
opdStack.Push(newErrorFormulaArg(lCell.Value(), lCell.Value()))
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calcMultiplyArray pushes newErrorFormulaArg(rCell.Value(), rCell.Value()) / newErrorFormulaArg(lCell.Value(), lCell.Value()) when a cell can’t be converted to a number. Value() for ArgError returns the underlying parse error message, so this sets the formula error code to a non-Excel string and also won’t propagate a Go error (so CalcCellValue can end up returning the raw message instead of #VALUE!). Prefer using newErrorFormulaArg(formulaErrorVALUE, <message>) (and/or returning a Go error) to keep error codes consistent with scalar ops.

Suggested change
opdStack.Push(newErrorFormulaArg(rCell.Value(), rCell.Value()))
return nil
}
lCell := lMatrix[lIdx][lJdx].ToNumber()
if lCell.Type != ArgNumber {
opdStack.Push(newErrorFormulaArg(lCell.Value(), lCell.Value()))
opdStack.Push(newErrorFormulaArg(formulaErrorVALUE, rCell.Value()))
return nil
}
lCell := lMatrix[lIdx][lJdx].ToNumber()
if lCell.Type != ArgNumber {
opdStack.Push(newErrorFormulaArg(formulaErrorVALUE, lCell.Value()))

Copilot uses AI. Check for mistakes.
Comment thread calc.go
Comment on lines +2166 to +2179
hasWildcard := strings.Contains(val, "?") || strings.Contains(val, "*")
if strings.Contains(val, "?") {
val = strings.ReplaceAll(val, "?", ".")
}
if strings.Contains(val, "*") {
val = strings.ReplaceAll(val, "*", ".*")
}
fc.Type, fc.Condition = criteriaRegexp, newStringFormulaArg(val)
if num := fc.Condition.ToNumber(); num.Type == ArgNumber {
fc.Condition = num
if hasWildcard {
fc.Type, fc.Condition = criteriaRegexp, newStringFormulaArg("^"+val+"$")
} else {
fc.Type, fc.Condition = criteriaEq, newStringFormulaArg(val)
if num := fc.Condition.ToNumber(); num.Type == ArgNumber {
fc.Condition = num
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wildcard criteria are converted into regexes, but the code doesn’t escape other regex metacharacters (e.g., ., +, (, [), so criteria like "a+b*" can match unintended values. Consider escaping the input first (e.g., regexp.QuoteMeta) and then re-introducing the Excel wildcards (*/?) to regex equivalents, so only Excel wildcards have special meaning.

Copilot uses AI. Check for mistakes.
Comment thread calc_test.go
Comment on lines +8711 to +8729

b.Run("simple_equality", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6))")
}
})

b.Run("greater_than", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((B1:B6>10)*(B1:B6))")
}
})

b.Run("multiple_conditions", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6>10)*(B1:B6))")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BenchmarkSUMPRODUCTWithConditionals calls CalcCellValue with the formula string as the cell argument (e.g. "=SUMPRODUCT(...)"). CalcCellValue expects a cell reference like "C1", so this is benchmarking the invalid-cell/error path rather than SUMPRODUCT evaluation (and may mask real performance regressions). Set the formula into a real cell once (e.g. C1) and benchmark CalcCellValue("Sheet1", "C1").

Suggested change
b.Run("simple_equality", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6))")
}
})
b.Run("greater_than", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((B1:B6>10)*(B1:B6))")
}
})
b.Run("multiple_conditions", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "=SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6>10)*(B1:B6))")
// Pre-set formulas into cells so that CalcCellValue benchmarks normal evaluation.
_ = f.SetCellFormula("Sheet1", "C1", "SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6))")
_ = f.SetCellFormula("Sheet1", "C2", "SUMPRODUCT((B1:B6>10)*(B1:B6))")
_ = f.SetCellFormula("Sheet1", "C3", "SUMPRODUCT((A1:A6=\"Apple\")*(B1:B6>10)*(B1:B6))")
b.Run("simple_equality", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "C1")
}
})
b.Run("greater_than", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "C2")
}
})
b.Run("multiple_conditions", func(b *testing.B) {
b.ResetTimer()
for i := 0; i < b.N; i++ {
_, _ = f.CalcCellValue("Sheet1", "C3")

Copilot uses AI. Check for mistakes.
Comment thread calc.go
Comment on lines +389 to +397
_, row, err := CellNameToCoordinates(fn.cell)
if err != nil {
return arg
}
// row is 1-based; matrix is 0-indexed
idx := row - 1
if idx >= 0 && idx < len(arg.Matrix) && len(arg.Matrix[idx]) > 0 {
return arg.Matrix[idx][0]
}
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implicitIntersect currently only uses the formula’s row and always returns arg.Matrix[idx][0]. This breaks whole-row references (should intersect by column), and also ignores ranges that don’t start at column A / row 1 (needs to account for the referenced range’s top-left offset via arg.cellRanges). Consider intersecting based on both row+col and the referenced range bounds; only fall back to returning the original matrix when the intersection is outside the matrix.

Suggested change
_, row, err := CellNameToCoordinates(fn.cell)
if err != nil {
return arg
}
// row is 1-based; matrix is 0-indexed
idx := row - 1
if idx >= 0 && idx < len(arg.Matrix) && len(arg.Matrix[idx]) > 0 {
return arg.Matrix[idx][0]
}
col, row, err := CellNameToCoordinates(fn.cell)
if err != nil {
return arg
}
// Use the cell ranges associated with this argument to locate the
// intersection of the formula cell with the referenced range(s).
// Each cell range is expected to be [col1, row1, col2, row2].
for _, cr := range arg.cellRanges {
if len(cr) < 4 {
continue
}
col1, row1, col2, row2 := cr[0], cr[1], cr[2], cr[3]
// Check if the formula cell lies within this range.
if col < col1 || col > col2 || row < row1 || row > row2 {
continue
}
// Compute zero-based indices into the matrix relative to the
// range's top-left cell.
rowIdx := row - row1
colIdx := col - col1
if rowIdx < 0 || rowIdx >= len(arg.Matrix) {
continue
}
if colIdx < 0 || colIdx >= len(arg.Matrix[rowIdx]) {
continue
}
return arg.Matrix[rowIdx][colIdx]
}
// If the formula cell does not intersect any of the argument's
// ranges (or the intersection falls outside the matrix bounds),
// return the original matrix unchanged.

Copilot uses AI. Check for mistakes.
Comment thread calc.go
Comment on lines +1344 to +1358
// Returns (outRows, outCols) where output size is the larger of the two inputs.
func calculateBroadcastDimensions(rRows, rCols, lRows, lCols int) (outRows, outCols int) {
outRows, outCols = rRows, rCols
if lRows > 1 || lCols > 1 {
outRows, outCols = lRows, lCols
}
return
}

// validateBroadcastDimensions checks if dimensions are compatible for broadcasting.
// Returns true if valid (exact match or one is 1x1 scalar).
func validateBroadcastDimensions(rRows, rCols, lRows, lCols, outRows, outCols int) bool {
rValid := (rRows == outRows && rCols == outCols) || (rRows == 1 && rCols == 1)
lValid := (lRows == outRows && lCols == outCols) || (lRows == 1 && lCols == 1)
return rValid && lValid
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The broadcasting helpers don’t currently allow row/column vector broadcasting (e.g., 1xN with MxN, or Mx1 with MxN). validateBroadcastDimensions only accepts exact matches or 1x1 scalars, and calculateBroadcastDimensions picks the left operand’s dimensions when it’s non-scalar, making the output shape dependent on operand order. This makes the advertised row/column broadcasting impossible and will yield #VALUE! for cases like A1:B1=C1:D2. Update the dimension calculation/validation to be per-dimension (outRows=max(rRows,lRows), outCols=max(rCols,lCols)) and allow a dimension to broadcast when it is 1.

Suggested change
// Returns (outRows, outCols) where output size is the larger of the two inputs.
func calculateBroadcastDimensions(rRows, rCols, lRows, lCols int) (outRows, outCols int) {
outRows, outCols = rRows, rCols
if lRows > 1 || lCols > 1 {
outRows, outCols = lRows, lCols
}
return
}
// validateBroadcastDimensions checks if dimensions are compatible for broadcasting.
// Returns true if valid (exact match or one is 1x1 scalar).
func validateBroadcastDimensions(rRows, rCols, lRows, lCols, outRows, outCols int) bool {
rValid := (rRows == outRows && rCols == outCols) || (rRows == 1 && rCols == 1)
lValid := (lRows == outRows && lCols == outCols) || (lRows == 1 && lCols == 1)
return rValid && lValid
// Returns (outRows, outCols) where each dimension is the max of the inputs.
func calculateBroadcastDimensions(rRows, rCols, lRows, lCols int) (outRows, outCols int) {
// Per-dimension broadcasting: output size is the maximum along each axis.
if rRows >= lRows {
outRows = rRows
} else {
outRows = lRows
}
if rCols >= lCols {
outCols = rCols
} else {
outCols = lCols
}
return
}
// validateBroadcastDimensions checks if dimensions are compatible for broadcasting.
// Returns true if, for each dimension, it matches the output size or is 1.
func validateBroadcastDimensions(rRows, rCols, lRows, lCols, outRows, outCols int) bool {
rRowsValid := rRows == outRows || rRows == 1
rColsValid := rCols == outCols || rCols == 1
lRowsValid := lRows == outRows || lRows == 1
lColsValid := lCols == outCols || lCols == 1
return rRowsValid && rColsValid && lRowsValid && lColsValid

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants