Refactor tensor class in C++ unit tests by timmoon10 · Pull Request #2962 · NVIDIA/TransformerEngine

timmoon10 · 2026-05-06T00:19:22Z

Description

The tensor wrapper in the C++ unit tests has become unwieldy, with complicated interactions between recipes and memory management. This has recently resulted in bugs where we accidently didn't allocate a required buffer (#2943). This PR disentangles the memory management from the recipe logic by adding a simple RAII class to manage GPU and CPU buffers. I've also added more explicit checks, e.g. when we assume a tensor is a single FP32.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Add class to manage GPU buffer, CPU buffer, and memory transfers between them.
Remove memory management logic from tensor class in C++ tests.
Add checks to accessors that make implicit assumptions on buffer size and dtype.

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Refactor test tensor wrapper by removing recipe-specific logic whenever possible. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

- Fix syntax error in switch case (:: -> :) - Fix double-underscore typo in variable name - Fix wrong buffer passed to set_amax_columnwise - Fix unique_ptr assignment from raw pointer (use reset()) - Remove dead duplicate NVTE_MXFP8_1D_SCALING branch in get_scales() - Rename cpu_data -> cpu_buffer to match Buffer class API - Remove const from Tensor::to_cpu/from_cpu and their callers, since both methods write to the CPU buffer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>

Signed-off-by: Tim Moon <tmoon@nvidia.com>

CPU and GPU types are inconsistent, so the type checks cause too many problems. Signed-off-by: Tim Moon <tmoon@nvidia.com>

greptile-apps · 2026-05-06T00:26:48Z

Greptile Summary

This PR replaces ad-hoc cudaMalloc/cudaFree calls and scattered CPU mirror pointers in the C++ test Tensor class with a compact inner Buffer RAII class that owns matched CPU and GPU allocations and exposes typed accessors. Recipe-specific allocation logic is reorganised into a switch in the constructor, removing a previously unreachable duplicate MXFP8 block and the exception-based scaling-mode probe.

Tensor::Buffer: new inner class allocating both CPU and GPU buffers at construction, with to_cpu()/from_cpu() helpers and a GPUDeleter functor.
Tensor member migration: old raw pointer fields replaced by optional<Buffer> and shared_ptr<Buffer>; ~Tensor simplified to = default.
Operator test fixes: all 13 affected test files now cache ref_scale before launching the GPU kernel to avoid post-kernel reads of the scale buffer.

Confidence Score: 5/5

Safe to merge — pure test-infrastructure refactor with no production code affected and no correctness regressions found.

All allocations and transfers are covered by the new RAII Buffer class; NVTE_CHECK guards make failures loud. The duplicate MXFP8 block removal is safe because get_scales still handles NVTE_MXFP8_1D_SCALING in its first branch. The only findings are minor redundant cudaMemcpy calls and a removed dtype guard, neither of which causes incorrect test results.

No files require special attention beyond the minor observations on the shared scale_inv double-sync and missing type guards on rowwise_cpu_scale_inv_ptr.

Important Files Changed

Filename	Overview
tests/cpp/test_common.h	Adds inner `Buffer` RAII class; migrates Tensor fields to `optional<Buffer>` / `shared_ptr<Buffer>`; accessors strengthened with `NVTE_CHECK` guards.
tests/cpp/test_common.cu	Implements `Buffer` and `Tensor` memory methods; removes duplicate MXFP8 block from `get_scales`; replaces exception-based scaling-mode probe with explicit `switch`.
tests/cpp/operator/test_cast.cu	Captures `ref_scale` before the kernel runs and guards `.scale()` call with `isFp8Type`.
tests/cpp_distributed/test_comm_gemm.cu	Guards `set_scale`/`set_scale_inv` calls behind `isFp8Type` check.
tests/cpp/operator/test_cast_current_scaling.cu	Initialises `ref_scale = 1.0` to fix prior UB where the variable was passed uninitialised in the non-FP8 path.
tests/cpp/operator/test_normalization.cu	Replaces post-kernel `z.scale()` call with pre-captured `ref_scale`.

Class Diagram

%%{init: {'theme': 'neutral'}}%%
classDiagram
    class Tensor {
        -TensorWrapper tensor_
        -optional~Buffer~ data_rowwise_
        -optional~Buffer~ data_columnwise_
        -shared_ptr~Buffer~ scale_inv_rowwise_
        -shared_ptr~Buffer~ scale_inv_columnwise_
        -optional~Buffer~ amax_rowwise_
        -optional~Buffer~ amax_columnwise_
        -optional~Buffer~ scale_
        +to_cpu()
        +from_cpu()
        +fill_uniform_rowwise_scale_inv()
        +fill_uniform_columnwise_scale_inv()
    }
    class Buffer {
        -unique_ptr cpu_buffer_
        -unique_ptr gpu_buffer_
        -size_t bytes_
        +to_cpu()
        +from_cpu()
        +cpu_buffer~T~() T*
        +gpu_buffer~T~() T*
    }
    class GPUDeleter {
        +operator()(void*)
    }
    Tensor *-- Buffer : owns
    Buffer *-- GPUDeleter : uses for cudaFree

_{Reviews (6): Last reviewed commit: "Typo" | Re-trigger Greptile}

@greptile-apps

Also adopt review suggestions from @greptile-apps. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2026-05-06T00:48:35Z

+  // Fill scales
+  if (t->scaling_mode() == NVTE_DELAYED_TENSOR_SCALING) {
+    if (isFp8Type(t->dtype())) {
+      // FP8 tensor scale is set to 1
+      t->set_scale_inv(1.0);
+    }
+  } else {
+    // Block scales are filled randomly
+    t->fill_uniform_rowwise_scale_inv();
+    t->fill_uniform_columnwise_scale_inv();
+  }


This is weird, but it approximates the previous behavior.

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 · 2026-05-06T00:58:51Z

/te-ci core L1

Oleg-Goncharov

LGTM, this looks much cleaner now, but the cast+transpose current scaling tests are failing with a segmentation fault.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 · 2026-05-07T04:09:55Z

/te-ci core L1

timmoon10 · 2026-05-07T20:19:37Z


    return {ret_rowwise, ret_colwise};
  }
-  if (scaling_mode == NVTE_MXFP8_1D_SCALING) {


We handle MXFP8 earlier in this function, so this if-statement was redundant.

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 and others added 7 commits April 30, 2026 01:51

Fix bug in NVFP4 quantize test where we set scale instead of amax

45aa776

Refactor test tensor wrapper by removing recipe-specific logic whenever possible. Signed-off-by: Tim Moon <tmoon@nvidia.com>

Only get fp32 scale when tensor is expected to have fp32 scale

1acb3c3

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Create dedicated class for managing GPU/CPU buffers

97945bc

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Merge branch 'main' into tmoon/refactor-cpp-test-tensor

6ff82c0

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Debug compilation errors

6d3e1db

Signed-off-by: Tim Moon <tmoon@nvidia.com>

Remove type check when accessing raw pointers

e324ae2

CPU and GPU types are inconsistent, so the type checks cause too many problems. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested a review from Oleg-Goncharov May 6, 2026 00:19

timmoon10 added the refactor label May 6, 2026

greptile-apps Bot reviewed May 6, 2026

View reviewed changes

Comment thread tests/cpp/test_common.h

Comment thread tests/cpp/test_common.cu Outdated

Debug distributed C++ tests

346e92a

Also adopt review suggestions from @greptile-apps. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 commented May 6, 2026

View reviewed changes

Comment thread tests/cpp/test_common.cu Outdated

Remove unused header

181cfab

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 commented May 6, 2026

View reviewed changes

Comment thread tests/cpp/test_common.h Outdated

Copy-paste error

b193fa7

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Oleg-Goncharov reviewed May 6, 2026

View reviewed changes

Use shared buffer for FP8 row-wise scale-inv and col-wise scale-inv

67b730f

Signed-off-by: Tim Moon <tmoon@nvidia.com>

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread tests/cpp/test_common.cu

Merge branch 'main' into tmoon/refactor-cpp-test-tensor

2b4082c

greptile-apps Bot reviewed May 7, 2026

View reviewed changes

Comment thread tests/cpp/test_common.cu

timmoon10 commented May 7, 2026

View reviewed changes

Comment thread tests/cpp/test_common.cu Outdated

Typo

605fe29

Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor tensor class in C++ unit tests#2962

Refactor tensor class in C++ unit tests#2962
timmoon10 wants to merge 13 commits intoNVIDIA:mainfrom
timmoon10:tmoon/refactor-cpp-test-tensor

timmoon10 commented May 6, 2026

Uh oh!

greptile-apps Bot commented May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

timmoon10 May 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented May 6, 2026

Uh oh!

Oleg-Goncharov left a comment

Uh oh!

Uh oh!

timmoon10 commented May 7, 2026

Uh oh!

Uh oh!

timmoon10 May 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timmoon10 commented May 6, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Class Diagram

Uh oh!

Uh oh!

Uh oh!

timmoon10 May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

timmoon10 commented May 6, 2026

Uh oh!

Oleg-Goncharov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

timmoon10 commented May 7, 2026

Uh oh!

Uh oh!

timmoon10 May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented May 6, 2026 •

edited

Loading

timmoon10 May 6, 2026 •

edited

Loading

timmoon10 May 7, 2026 •

edited

Loading