Skip to content

fix: avoid inactive backend probing in tests#592

Open
voltjia wants to merge 1 commit intomasterfrom
fix/issue-75-cuda-platform-probing
Open

fix: avoid inactive backend probing in tests#592
voltjia wants to merge 1 commit intomasterfrom
fix/issue-75-cuda-platform-probing

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 7, 2026

Summary

  • Updates tests/conftest.py so skip checks use concrete platform selectors from --devices when they are provided.
  • Falls back to the active torch device selector, such as cuda, instead of probing every CUDA-like platform backend from Python.
  • Keeps the fix Python-only; this PR intentionally does not change .ci configuration.

Motivation

Closes #75

The regression is caused by skip_op_without_platform_impl mapping one torch device type to every platform sharing that type. For example, cuda was expanded to NVIDIA, MetaX, and Iluvatar, so a test running on one active CUDA-like backend could still call active_implementation_indices() for inactive sibling backends and abort in backend dispatch before pytest could skip the case.

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA Yes 4151 passed, 1375 skipped in 280.52s (0:04:40) Direct container run with python -m pip install .[dev] && python -m pytest; no #75 crash keywords.
Iluvatar Yes 5795 passed, 1447 skipped in 274.80s (0:04:34) Direct container run on GPU 6 with python -m pip install .[dev] --no-build-isolation && python -m pytest; no #75 crash keywords.
MetaX Yes 5795 passed, 1447 skipped in 341.72s (0:05:41) Direct container run with python -m pip install .[dev] --no-build-isolation && python -m pytest; no #75 crash keywords.
Cambricon Yes 12 failed, 3061 passed, 3857 skipped in 897.49s (0:14:57) Failures are tests/test_add.py int16 generation failures from RuntimeError: "random_" not implemented for 'Short'; no #75 crash keywords.
Moore Yes 300 failed, 5459 passed, 1483 skipped in 516.71s (0:08:36) Direct container run on GPU 6 with MUSA_VISIBLE_DEVICES=6, python -m pip install .[dev] --no-build-isolation && python -m pytest; failures are concentrated in tests/test_gemm.py; no #75 crash keywords.
Ascend Yes 3828 passed, 138 skipped in 435.82s (0:07:15) Direct container run with python -m pip install .[dev] && python -m pytest; pytest summary completed, wrapper status was test=137 after summary; no #75 crash keywords.
Validation details
Local checks:
- git diff --check origin/master..HEAD
- python3 -m py_compile tests/conftest.py
- uvx ruff check tests/conftest.py
- uvx ruff format --check tests/conftest.py

Targeted regression checks:
- NVIDIA, no --devices: tests/test_cast.py::test_cast[cuda-input_dtype0-out_dtype0-0.001-0.001-shape0-None-None] -> 1 skipped in 3.68s
- NVIDIA, --devices nvidia: same test -> 1 skipped in 2.96s
- NVIDIA, CPU Matmul no --devices: tests/test_matmul.py::test_matmul[cpu-dtype0-0.01-0.01-False-False-a_shape0-b_shape0-c_shape0] -> 1 skipped in 3.87s

Additional Moore investigation:
- `MTHREADS_VISIBLE_DEVICES=6` alone does not restrict `torch_musa`; `torch.musa.device_count()` still reports 8 devices.
- `MUSA_VISIBLE_DEVICES=6` restricts `torch_musa` to one visible device and avoids the earlier apparent hang.
- Moore `tests/test_add.py` on GPU 6 with `MUSA_VISIBLE_DEVICES=6`: `324 passed, 108 skipped in 20.96s`.
- Moore full `pytest -n 1` on GPU 6 with `MUSA_VISIBLE_DEVICES=6`: `300 failed, 5459 passed, 1483 skipped in 550.14s (0:09:10)`.
- Moore full `pytest` without `-n` on GPU 6 with `MUSA_VISIBLE_DEVICES=6`: `300 failed, 5459 passed, 1483 skipped in 516.71s (0:08:36)`.

Benchmark / Performance Impact

N/A. This change only affects pytest skip-selection logic and does not alter operator implementations or runtime kernels.

Notes for Reviewers

The important behavior is that Python no longer expands cuda into all CUDA-like platform names before asking each operator for active implementations. If CI passes a concrete platform through --devices, that concrete platform is still used. If CI or a local run uses the torch device name, the selector is passed through to the pybind layer so it can resolve the backend compiled into the current wheel.

Iluvatar and Moore now both have completed full pytest summaries. The earlier Moore hang was traced to the runner using MTHREADS_VISIBLE_DEVICES without MUSA_VISIBLE_DEVICES; torch_musa only honored MUSA_VISIBLE_DEVICES for device visibility in this environment.


Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Small PR is a single squashable commit; or, for a large PR, every commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • N/A — no public API changes.

General Code Hygiene (applies to all languages)

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences — capitalized first letter, terminal punctuation — unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

N/A — no C++ files changed.

Python Specific (if Python files changed)

  • Code is PEP 8 compliant; ruff check passes cleanly on CI (see .github/workflows/ruff.yml).
  • ruff format --check passes cleanly — if not, run ruff format and commit the result.
  • Comments are complete English sentences, starting with a capital letter and ending with punctuation; Markdown backticks are used for code references (CONTRIBUTING.md §Python).
  • Framework-specific conventions (e.g. lowercase pytest.skip messages without terminal period) are honored where applicable (CONTRIBUTING.md §Python).
  • No blank line between the function signature and the body when there is no docstring or comment (CONTRIBUTING.md §Python).
  • A blank line is present before and after if, for, and similar control-flow statements (CONTRIBUTING.md §Python).
  • A blank line appears before each return, except when it directly follows a control-flow statement (CONTRIBUTING.md §Python).
  • Docstrings (if any) follow PEP 257 (CONTRIBUTING.md §Python).
  • Type hints are added / kept consistent with the surrounding code.

Testing

  • pytest was run locally on every supported platform that this PR can affect, and the results are recorded in the "Test Results" table above (CONTRIBUTING.md §Pull Requests).
  • All supported platforms were tested and recorded in the table.
  • N/A — no new operator functionality was added.
  • Tests use pytest.mark.parametrize correctly: dependent parameters share one decorator (e.g. @pytest.mark.parametrize("dtype, rtol, atol", …)), independent parameters use separate decorators ordered by parameter declaration.
  • N/A — no new Payload-returning tests were added.
  • Default dtype / device parameterization is relied on, or overridden with an explicit pytest.mark.parametrize when necessary.
  • N/A — no new flaky test was added.
  • N/A — existing parametrized tests reproduce the bug on master; this PR fixes the test harness path without adding a new test file.

Build, CI, and Tooling

  • The project builds cleanly from a fresh directory with pip install .[dev] on at least one affected platform.
  • N/A — no CMake changes were made, so compile_commands.json regeneration was not required for this PR.
  • N/A — no new backend or device was added.
  • N/A — CUDA-like backend mutual exclusion was not changed.
  • N/A — no C++ files changed; ruff checks passed locally.
  • No new runtime dependency was added without updating pyproject.toml's [project.optional-dependencies] (or justified in the PR description).

Documentation

  • N/A — no user-facing behavior, build flags, or developer workflow were changed.
  • N/A — no new operator, dispatch helper, or public utility was added.
  • N/A — no breaking change.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • N/A — no third-party code was added.
  • N/A — no pointer arithmetic, memory access, or C++ bounds handling was changed.

Co-authored-by: Jiacheng Huang <huangjiacheng0709@outlook.com>
@voltjia voltjia marked this pull request as ready for review May 7, 2026 16:35
@voltjia voltjia requested a review from a team May 7, 2026 16:35
@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented May 7, 2026

@zhangyue207 初审,@Ziminli 终审。

@voltjia voltjia requested review from Ziminli and zhangyue207 May 7, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect pytest error classification and parallel execution issues

2 participants