Refactor tensor class in C++ unit tests#2962
Conversation
Refactor test tensor wrapper by removing recipe-specific logic whenever possible. Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
- Fix syntax error in switch case (:: -> :) - Fix double-underscore typo in variable name - Fix wrong buffer passed to set_amax_columnwise - Fix unique_ptr assignment from raw pointer (use reset()) - Remove dead duplicate NVTE_MXFP8_1D_SCALING branch in get_scales() - Rename cpu_data -> cpu_buffer to match Buffer class API - Remove const from Tensor::to_cpu/from_cpu and their callers, since both methods write to the CPU buffer Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
CPU and GPU types are inconsistent, so the type checks cause too many problems. Signed-off-by: Tim Moon <tmoon@nvidia.com>
Greptile SummaryThis PR replaces ad-hoc
Confidence Score: 5/5Safe to merge — pure test-infrastructure refactor with no production code affected and no correctness regressions found. All allocations and transfers are covered by the new RAII Buffer class; NVTE_CHECK guards make failures loud. The duplicate MXFP8 block removal is safe because get_scales still handles NVTE_MXFP8_1D_SCALING in its first branch. The only findings are minor redundant cudaMemcpy calls and a removed dtype guard, neither of which causes incorrect test results. No files require special attention beyond the minor observations on the shared scale_inv double-sync and missing type guards on rowwise_cpu_scale_inv_ptr. Important Files Changed
Class Diagram%%{init: {'theme': 'neutral'}}%%
classDiagram
class Tensor {
-TensorWrapper tensor_
-optional~Buffer~ data_rowwise_
-optional~Buffer~ data_columnwise_
-shared_ptr~Buffer~ scale_inv_rowwise_
-shared_ptr~Buffer~ scale_inv_columnwise_
-optional~Buffer~ amax_rowwise_
-optional~Buffer~ amax_columnwise_
-optional~Buffer~ scale_
+to_cpu()
+from_cpu()
+fill_uniform_rowwise_scale_inv()
+fill_uniform_columnwise_scale_inv()
}
class Buffer {
-unique_ptr cpu_buffer_
-unique_ptr gpu_buffer_
-size_t bytes_
+to_cpu()
+from_cpu()
+cpu_buffer~T~() T*
+gpu_buffer~T~() T*
}
class GPUDeleter {
+operator()(void*)
}
Tensor *-- Buffer : owns
Buffer *-- GPUDeleter : uses for cudaFree
Reviews (6): Last reviewed commit: "Typo" | Re-trigger Greptile |
Also adopt review suggestions from @greptile-apps. Signed-off-by: Tim Moon <tmoon@nvidia.com>
| // Fill scales | ||
| if (t->scaling_mode() == NVTE_DELAYED_TENSOR_SCALING) { | ||
| if (isFp8Type(t->dtype())) { | ||
| // FP8 tensor scale is set to 1 | ||
| t->set_scale_inv(1.0); | ||
| } | ||
| } else { | ||
| // Block scales are filled randomly | ||
| t->fill_uniform_rowwise_scale_inv(); | ||
| t->fill_uniform_columnwise_scale_inv(); | ||
| } |
There was a problem hiding this comment.
This is weird, but it approximates the previous behavior.
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
|
/te-ci core L1 |
Oleg-Goncharov
left a comment
There was a problem hiding this comment.
LGTM, this looks much cleaner now, but the cast+transpose current scaling tests are failing with a segmentation fault.
Signed-off-by: Tim Moon <tmoon@nvidia.com>
|
/te-ci core L1 |
|
|
||
| return {ret_rowwise, ret_colwise}; | ||
| } | ||
| if (scaling_mode == NVTE_MXFP8_1D_SCALING) { |
There was a problem hiding this comment.
We handle MXFP8 earlier in this function, so this if-statement was redundant.
Description
The tensor wrapper in the C++ unit tests has become unwieldy, with complicated interactions between recipes and memory management. This has recently resulted in bugs where we accidently didn't allocate a required buffer (#2943). This PR disentangles the memory management from the recipe logic by adding a simple RAII class to manage GPU and CPU buffers. I've also added more explicit checks, e.g. when we assume a tensor is a single FP32.
Type of change
Changes
Checklist: