Skip to content

feat: add Blackwell GPU (sm_120) CUDA support#401

Open
mvanhorn wants to merge 1 commit intojamiepine:mainfrom
mvanhorn:feat/386-blackwell-cuda-support
Open

feat: add Blackwell GPU (sm_120) CUDA support#401
mvanhorn wants to merge 1 commit intojamiepine:mainfrom
mvanhorn:feat/386-blackwell-cuda-support

Conversation

@mvanhorn
Copy link
Copy Markdown

@mvanhorn mvanhorn commented Apr 13, 2026

Adds TORCH_CUDA_ARCH_LIST to the CUDA build step in release.yml to include 12.0+PTX for Blackwell GPU forward compatibility. 5 reports (#386 #395 #396 #399 #400) from RTX 50-series users hitting 'no kernel image' because pre-built PyTorch cu128 doesn't include sm_120. Fixes #386. This contribution was developed with AI assistance (Claude Code).

Summary by CodeRabbit

  • Chores
    • Updated build configuration to support additional NVIDIA GPU architectures, expanding GPU compatibility and compatibility with newer GPU models.

Set TORCH_CUDA_ARCH_LIST in the CUDA build step to include 12.0+PTX
for forward compatibility with Blackwell GPUs (RTX 5070 Ti, 5080, etc).

Pre-built PyTorch cu128 wheels only ship native kernels for sm_80/86/89/90.
Without this, Blackwell GPU users get "no kernel image is available for
execution on the device" at runtime.

Fixes jamiepine#386
Related: jamiepine#395, jamiepine#396, jamiepine#399, jamiepine#400
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 34205f6d-9ff6-4b78-94ce-4b62e53a6ac2

📥 Commits

Reviewing files that changed from the base of the PR and between 75abbb0 and f9835d2.

📒 Files selected for processing (1)
  • .github/workflows/release.yml

📝 Walkthrough

Walkthrough

The GitHub Actions release workflow for Windows now explicitly specifies CUDA GPU architectures (8.0;8.6;8.9;9.0;12.0+PTX) via the TORCH_CUDA_ARCH_LIST environment variable during the CUDA build step, enabling Blackwell GPU support through PTX forward compatibility.

Changes

Cohort / File(s) Summary
CUDA Architecture Configuration
.github/workflows/release.yml
Added TORCH_CUDA_ARCH_LIST environment variable to Windows CUDA build step, explicitly targeting compute capabilities 8.0, 8.6, 8.9, 9.0, and 12.0+PTX (Blackwell).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

  • #362: The change directly addresses the missing sm_120 kernel image by explicitly targeting Blackwell GPU architecture (12.0+PTX) during Windows CUDA compilation.

Poem

🐰 A rabbit hops through cuda streams,
Adding Blackwell to our dreams,
Twelve-point-oh with PTX in sight,
Windows builds now compile just right! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding Blackwell GPU (sm_120) CUDA support via the TORCH_CUDA_ARCH_LIST environment variable in the release workflow.
Linked Issues check ✅ Passed The PR successfully addresses the linked issue #386 by adding Blackwell GPU (sm_120) support through the TORCH_CUDA_ARCH_LIST environment variable (12.0+PTX) in the CUDA build configuration.
Out of Scope Changes check ✅ Passed The PR contains only the necessary change to the release.yml workflow for adding Blackwell CUDA support; no unrelated or out-of-scope modifications are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add compute capability 10.0 support for Blackwell GPUs (RTX 5070 Ti) in CUDA builds

1 participant