Pre-submission Checklist
GPU Hardware
Tested on Intel Arc A380 (dGPU) and Intel UHD 730 (iGPU) — both hang.
Does NOT reproduce on Intel Arc A770 or UHD 770 (different kernel version — see below).
OS / Kernel
- Hangs: Ubuntu 24.04.3, kernel 6.17.0-20-generic
- Works: Ubuntu 24.04.2, kernel 6.17.0-14-generic
OpenCL Runtime Version
Driver Version: 26.09.37435.1
intel-opencl-icd 26.09.37435.1-0
intel-igc-core-2 2.30.1
intel-igc-opencl-2 2.30.1
Identical on both machines.
Summary
clEnqueueSVMMemcpy called with a destination pointer that is not a valid SVM/USM allocation should return CL_INVALID_VALUE or CL_OUT_OF_RESOURCES. On kernel 6.17.0-20 it instead enters a kernel-mode CPU loop in i915_gem_object_userptr_submit_init that makes the process unkillable — SIGKILL from root has no effect. Recovery requires rebooting the host.
On kernel 6.17.0-14 (same OpenCL driver, same hardware family), the call correctly returns -5 (CL_OUT_OF_RESOURCES) immediately.
Three conditions required to trigger
- Invalid SVM destination pointer (e.g.
0xdeadbeef — not a valid clSVMAlloc or USM allocation)
- Large transfer size (tested at 4 GB)
- Source buffer must have committed physical pages (
malloc + memset). Uncommitted pages (mmap MAP_NORESERVE) take a different code path that correctly returns -5.
Minimal reproducer
#define CL_TARGET_OPENCL_VERSION 300
#include <CL/cl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
int plat_idx = (argc >= 2) ? atoi(argv[1]) : 0;
size_t mb = (argc >= 3) ? (size_t)atoll(argv[2]) : 4096;
size_t bytes = mb * 1024ull * 1024ull;
cl_int err;
cl_uint nplats = 0;
clGetPlatformIDs(0, NULL, &nplats);
cl_platform_id plats[8];
clGetPlatformIDs(nplats < 8 ? nplats : 8, plats, NULL);
cl_device_id dev;
clGetDeviceIDs(plats[plat_idx], CL_DEVICE_TYPE_GPU, 1, &dev, NULL);
char dname[256] = {0};
clGetDeviceInfo(dev, CL_DEVICE_NAME, sizeof dname, dname, NULL);
fprintf(stderr, "Device: %s, size: %zu MB\n", dname, mb);
cl_context ctx = clCreateContext(NULL, 1, &dev, NULL, NULL, &err);
cl_command_queue q = clCreateCommandQueueWithProperties(ctx, dev, NULL, &err);
/* Committed source buffer — must be malloc+memset, not mmap */
char* h = (char*)malloc(bytes);
if (!h) { fprintf(stderr, "malloc failed\n"); return 3; }
memset(h, 0xab, bytes);
void* d = (void*)0xdeadbeefULL; /* NOT a valid SVM allocation */
fprintf(stderr, "clEnqueueSVMMemcpy dst=%p size=%zu\n", d, bytes);
err = clEnqueueSVMMemcpy(q, CL_FALSE, d, h, bytes, 0, NULL, NULL);
fprintf(stderr, "enqueue returned %d\n", (int)err);
/* Expected: err != 0 (rejected).
Observed on 6.17.0-20: process is already wedged above,
never reaches this line. */
if (err == CL_SUCCESS) {
fprintf(stderr, "clFinish...\n");
err = clFinish(q);
fprintf(stderr, "clFinish returned %d\n", (int)err);
}
free(h);
return (err != CL_SUCCESS) ? 0 : 1;
}
Build and run:
gcc -O2 cl_badptr.c -lOpenCL -o cl_badptr
# WARNING: on affected kernels this will wedge one CPU core until reboot
timeout --kill-after=10s 30s ./cl_badptr 0 4096
Results
| Machine |
Kernel |
GPU |
Result |
| cupcake |
6.17.0-14-generic |
Arc A770 |
enqueue returned -5 — OK |
| cupcake |
6.17.0-14-generic |
UHD 770 |
enqueue returned -5 — OK |
| meatloaf |
6.17.0-20-generic |
Arc A380 |
HANG at clEnqueueSVMMemcpy (exit 137) |
| meatloaf |
6.17.0-20-generic |
UHD 730 |
HANG at clEnqueueSVMMemcpy (exit 137) |
With mmap(MAP_NORESERVE) instead of malloc+memset for the source, all four combinations pass (return -5). The committed-pages requirement points to the i915 userptr/scatter-gather DMA setup path.
Kernel call trace (dmesg hung_task watchdog, from earlier investigation)
intel_iommu_map_pages+0xe7/0x140
iommu_map_nosync+0x133/0x2b0
iommu_map_sg+0xc8/0x1b0
iommu_dma_map_sg+0x59a/0x630
? i915_gem_shrink+0x6af/0x7a0 [i915]
__dma_map_sg_attrs+0x13b/0x1b0
dma_map_sg_attrs+0xe/0x30
i915_gem_gtt_prepare_pages+0x55/0x90 [i915]
i915_gem_userptr_get_pages+0xf3/0x200 [i915]
____i915_gem_object_get_pages+0x23/0x70 [i915]
i915_gem_object_userptr_submit_init+0x38c/0x420 [i915]
eb_lookup_vmas+0x141/0x290 [i915]
Process state while wedged
State: R (running)
SigBlk: 0000000000000000
WCHAN: -
%CPU: 99.9
On-CPU inside a kernel function, no pending signals checked.
Impact
- Wedged process does NOT poison the GPU for other workloads.
- Permanently consumes one CPU core and several GB of RSS.
- Only host reboot clears it.
Notes
- We discovered this via chipStar (a HIP-on-SPIR-V/OpenCL implementation). chipStar uses
clEnqueueSVMMemcpy for hipMemcpy in its Intel USM allocation strategy. When a HIP application ignores a failed hipMalloc and proceeds to hipMemcpy with the invalid pointer, chipStar forwards it to clEnqueueSVMMemcpy, hitting this bug.
- The OpenCL runtime could also add defense-in-depth by validating SVM pointers before delegating to the kernel, but the root cause appears to be an i915 kernel regression between 6.17.0-14 and 6.17.0-20.
Pre-submission Checklist
GPU Hardware
Tested on Intel Arc A380 (dGPU) and Intel UHD 730 (iGPU) — both hang.
Does NOT reproduce on Intel Arc A770 or UHD 770 (different kernel version — see below).
OS / Kernel
OpenCL Runtime Version
Identical on both machines.
Summary
clEnqueueSVMMemcpycalled with a destination pointer that is not a valid SVM/USM allocation should returnCL_INVALID_VALUEorCL_OUT_OF_RESOURCES. On kernel 6.17.0-20 it instead enters a kernel-mode CPU loop ini915_gem_object_userptr_submit_initthat makes the process unkillable —SIGKILLfrom root has no effect. Recovery requires rebooting the host.On kernel 6.17.0-14 (same OpenCL driver, same hardware family), the call correctly returns
-5(CL_OUT_OF_RESOURCES) immediately.Three conditions required to trigger
0xdeadbeef— not a validclSVMAllocor USM allocation)malloc+memset). Uncommitted pages (mmap MAP_NORESERVE) take a different code path that correctly returns-5.Minimal reproducer
Build and run:
gcc -O2 cl_badptr.c -lOpenCL -o cl_badptr # WARNING: on affected kernels this will wedge one CPU core until reboot timeout --kill-after=10s 30s ./cl_badptr 0 4096Results
enqueue returned -5— OKenqueue returned -5— OKclEnqueueSVMMemcpy(exit 137)clEnqueueSVMMemcpy(exit 137)With
mmap(MAP_NORESERVE)instead ofmalloc+memsetfor the source, all four combinations pass (return -5). The committed-pages requirement points to the i915 userptr/scatter-gather DMA setup path.Kernel call trace (dmesg hung_task watchdog, from earlier investigation)
Process state while wedged
On-CPU inside a kernel function, no pending signals checked.
Impact
Notes
clEnqueueSVMMemcpyforhipMemcpyin its Intel USM allocation strategy. When a HIP application ignores a failedhipMallocand proceeds tohipMemcpywith the invalid pointer, chipStar forwards it toclEnqueueSVMMemcpy, hitting this bug.