Summary
Group API of oneCCL expects the collectives grouped to finish before group_end() so the following pattern is not supported:
group_start();
send(send_buf_ptr, sendcount, ...);
group_end();
group_start();
recv(recv_buf_ptr, sendcount, ...);
group_end();
However, the above pattern is supported by NCCL.
Reproducer
See internal MLSL-3958.
Affected projects
XCCL backend for TorchComms.
Summary
Group API of oneCCL expects the collectives grouped to finish before
group_end()so the following pattern is not supported:However, the above pattern is supported by
NCCL.Reproducer
See internal
MLSL-3958.Affected projects
XCCLbackend forTorchComms.