To install cuML from source, ensure the following dependencies are met:
Note: The easiest way to setup a fully functional build environment is to use the conda environment files located in
conda/environments/all_*.yaml. These files contain all the dependencies listed below except for clang-format (only needed for development/contributing) and UCX (only needed for optional multi-node operations). To create a development environment, see the recommended conda setup at the end of this section.
Hardware Needed to Run cuML: To run cuML code, you will need an NVIDIA GPU with the following minimum compute capability depending on your CUDA version:
- CUDA 12.x: compute capability 7.0 or higher (Volta™ architecture or newer)
- CUDA 13.x: compute capability 7.5 or higher (Turing™ architecture or newer)
Note that while a GPU is not required to build or develop cuML itself, it is necessary to execute and test GPU-accelerated functionality.
Software Dependencies:
- CUDA Toolkit (>= 12.2) - must include development libraries (cudart, cublas, cusparse, cusolver, curand, cufft)
- gcc (>= 13.0)
- cmake (>= 3.30.4)
- ninja - build system used by default
- Python (>= 3.11 and <= 3.14)
- Cython (>= 3.2.2)
RAPIDS Ecosystem Libraries:
These RAPIDS libraries must match the cuML version (e.g., all version 25.10 if building cuML 25.10):
C++ Libraries:
- librmm - RAPIDS Memory Manager (C++ library)
- libraft - RAPIDS CUDA accelerated algorithms (C++ library)
- libcuvs - CUDA Vector Search library
Python Packages:
- rmm - RAPIDS Memory Manager (Python package)
- pylibraft - RAPIDS CUDA accelerated algorithms (Python package)
- cuDF - GPU DataFrame library (Python package)
Python Build Dependencies:
- scikit-build-core
- rapids-build-backend
Python Runtime Dependencies:
For detailed version requirements of runtime dependencies (numpy, scikit-learn, scipy, joblib, numba, cupy, treelite, etc.), please see docs/source/supported_versions.rst.
Other External Libraries:
- treelite
- rapids-logger
Multi-GPU Support:
cuML has limited support for multi-GPU and multi-node operations. The following dependencies enable these features:
- NCCL (>= 2.19) - required for multi-GPU communication; provided automatically as a transitive dependency through libcuvs (listed above)
- UCX (>= 1.7) - optional; only required for multi-node operations (not needed for multi-GPU on a single node); must be explicitly enabled during build with
WITH_UCX=ON(see Using Infiniband for MNMG)
For development only:
- clang-format (= 20.1.8) - enforces uniform C++ coding style; required for pre-commit hooks and CI checks. The packages
clang=20andclang-tools=20from the conda-forge channel should be sufficient, if you are using conda. If not using conda, install the right version using your OS package manager.
It is recommended to use conda for environment/package management. If doing so, development environment .yaml files are located in conda/environments/all_*.yaml. These files contain most of the dependencies mentioned above. To create a development environment named cuml_dev, you can use the following commands (adjust the YAML filename to match your CUDA version and architecture):
conda create -n cuml_dev python=3.14
conda env update -n cuml_dev --file=conda/environments/all_cuda-131_arch-$(uname -m).yaml
conda activate cuml_devAs a convenience, a build.sh script is provided to simplify the build process. The libraries will be installed to $INSTALL_PREFIX if set (e.g., export INSTALL_PREFIX=/install/path), otherwise to $CONDA_PREFIX.
$ ./build.sh # build the cuML libraries, tests, and python package, then
# install them to $INSTALL_PREFIX if set, otherwise $CONDA_PREFIXFor workflows that involve frequent switching among branches or between debug and release builds, it is recommended that you install ccache and make use of it by passing the --ccache flag to build.sh.
To build individual components, specify them as arguments to build.sh:
$ ./build.sh libcuml # build and install the cuML C++ libraries
$ ./build.sh cuml # build and install the cuML Python package
$ ./build.sh prims # build the ml-prims tests
$ ./build.sh bench # build the cuML C++ benchmark
$ ./build.sh prims-bench # build the ml-prims C++ benchmarkOther build.sh options:
$ ./build.sh clean # remove any prior build artifacts and configuration (start over)
$ ./build.sh libcuml -v # build and install libcuml with verbose output
$ ./build.sh libcuml -g # build and install libcuml for debug
$ PARALLEL_LEVEL=8 ./build.sh libcuml # build and install libcuml limiting parallel build jobs to 8 (ninja -j8)
$ ./build.sh libcuml -n # build libcuml but do not install
$ ./build.sh prims --allgpuarch # build the ML prims tests for all supported GPU architectures
$ ./build.sh cuml --singlegpu # build the cuML Python package without MNMG algorithms
$ ./build.sh --ccache # use ccache to cache compilations, speeding up subsequent buildsBy default, Ninja is used as the cmake generator. To override this and use, e.g., make, define the CMAKE_GENERATOR environment variable accordingly:
CMAKE_GENERATOR='Unix Makefiles' ./build.shTo run the C++ unit tests (optional), from the repo root:
$ cd cpp/build
$ ./test/ml # single-GPU algorithm tests
$ ./test/ml_mg # multi-GPU algorithm tests, if --singlegpu was not used
$ ./test/prims # ML Primitive function testsIf you want a list of the available C++ tests:
$ ./test/ml --gtest_list_tests # single-GPU algorithm tests
$ ./test/ml_mg --gtest_list_tests # multi-GPU algorithm tests
$ ./test/prims --gtest_list_tests # ML Primitive function testsTo run all Python tests, including multi-GPU algorithms, from the repo root:
$ cd python
$ pytest -vTo run only single-GPU algorithm tests:
$ pytest --ignore=cuml/tests/dask --ignore=cuml/tests/test_nccl.pyIf you want a list of the available Python tests:
$ pytest cuml/tests --collect-onlyNote: Some tests require xgboost. If running tests in conda devcontainers, you must install the xgboost conda package manually. See dependencies.yaml for version information.
Once dependencies are present, follow the steps below:
- Clone the repository:
$ git clone https://github.com/rapidsai/cuml.git- Build and install
libcuml(C++/CUDA library containing the cuML algorithms), starting from the repository root folder:
$ cd cpp
$ mkdir build && cd build
$ cmake ..Note: If CUDA is not in your PATH, you may need to set CUDA_BIN_PATH before running cmake:
$ export CUDA_BIN_PATH=$CUDA_HOME # Default: /usr/local/cudaIf using a conda environment (recommended), configure cmake for libcuml:
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIXNote: You may see the following warning depending on your cmake version and CMAKE_INSTALL_PREFIX. This warning can be safely ignored:
Cannot generate a safe runtime search path for target ml_test because files
in some directories may conflict with libraries in implicit directories:
To silence it, add -DCMAKE_IGNORE_PATH=$CONDA_PREFIX/lib to your cmake command.
To reduce compile times, you can specify GPU compute capabilities to compile for. For example, for Volta GPUs:
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CUDA_ARCHITECTURES="70"Or for multiple architectures (e.g., Ampere and Hopper):
$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CUDA_ARCHITECTURES="80;86;90"You may also wish to make use of ccache to reduce build times when switching among branches or between debug and release builds:
$ cmake .. -DUSE_CCACHE=ONThere are many options to configure the build process, see the customizing build section.
- Build and install
libcuml:
$ make -j
$ make installTo run tests (optional):
$ ./test/ml # single-GPU algorithm tests
$ ./test/ml_mg # multi-GPU algorithm tests
$ ./test/prims # ML Primitive function testsIf you want a list of the available tests:
$ ./test/ml --gtest_list_tests # single-GPU algorithm tests
$ ./test/ml_mg --gtest_list_tests # multi-GPU algorithm tests
$ ./test/prims --gtest_list_tests # ML Primitive function testsTo run cuML C++ benchmarks (optional):
$ ./bench/sg_benchmark # single-GPU benchmarksUse the --help option for more information.
To run ml-prims C++ benchmarks (optional):
$ ./bench/prims_benchmark # ml-prims benchmarksUse the --help option for more information.
To build doxygen docs for all C/C++ source files:
$ make doc- Build and install the
cumlpython package:
From the repository root:
$ python -m pip install --no-build-isolation --no-deps --config-settings rapidsai.disable-cuda=true python/cumlTo run Python tests (optional):
$ cd python
$ pytest -vTo run only single-GPU algorithm tests:
$ pytest --ignore=cuml/tests/dask --ignore=cuml/tests/test_nccl.pyIf you want a list of the available tests:
$ pytest cuml/tests --collect-onlycuML's cmake has the following configurable flags available:
| Flag | Possible Values | Default Value | Behavior |
|---|---|---|---|
| BUILD_CUML_CPP_LIBRARY | [ON, OFF] | ON | Enable/disable building the libcuml shared library. Setting this variable to OFF sets the variables BUILD_CUML_TESTS, BUILD_CUML_MG_TESTS and BUILD_CUML_EXAMPLES to OFF |
| BUILD_CUML_STD_COMMS | [ON, OFF] | ON | Enable/disable building cuML NCCL+UCX communicator for running multi-node multi-GPU algorithms. Note that UCX support can also be enabled/disabled (see below). Note that BUILD_CUML_STD_COMMS and BUILD_CUML_MPI_COMMS are not mutually exclusive and can both be installed simultaneously. |
| WITH_UCX | [ON, OFF] | OFF | Enable/disable UCX support for the standard cuML communicator. Algorithms requiring point-to-point messaging will not work when this is disabled. This has no effect on the MPI communicator. |
| BUILD_CUML_MPI_COMMS | [ON, OFF] | OFF | Enable/disable building cuML MPI+NCCL communicator for running multi-node multi-GPU C++ tests. Note that BUILD_CUML_STD_COMMS and BUILD_CUML_MPI_COMMS are not mutually exclusive, and can both be installed simultaneously. |
| BUILD_CUML_TESTS | [ON, OFF] | ON | Enable/disable building cuML algorithm test executable ml_test. |
| BUILD_CUML_MG_TESTS | [ON, OFF] | ON | Enable/disable building cuML algorithm test executable ml_mg_test. |
| BUILD_PRIMS_TESTS | [ON, OFF] | ON | Enable/disable building cuML algorithm test executable prims_test. |
| BUILD_CUML_EXAMPLES | [ON, OFF] | ON | Enable/disable building cuML C++ API usage examples. |
| BUILD_CUML_BENCH | [ON, OFF] | ON | Enable/disable building of cuML C++ benchmark. |
| CMAKE_CXX11_ABI | [ON, OFF] | ON | Enable/disable the GLIBCXX11 ABI |
| DETECT_CONDA_ENV | [ON, OFF] | ON | Use detection of conda environment for dependencies. If set to ON, and no value for CMAKE_INSTALL_PREFIX is passed, then it will assign it to $CONDA_PREFIX (to install in the active environment). |
| DISABLE_OPENMP | [ON, OFF] | OFF | Set to ON to disable OpenMP |
| CMAKE_CUDA_ARCHITECTURES | List of GPU architectures, semicolon-separated | Empty | List the GPU architectures to compile the GPU targets for. Set to "NATIVE" to auto detect GPU architecture of the system, set to "ALL" to compile for all RAPIDS supported archs. |
| KERNEL_INFO | [ON, OFF] | OFF | Enable/disable kernel resource usage info in nvcc. |
| LINE_INFO | [ON, OFF] | OFF | Enable/disable lineinfo in nvcc. |
| NVTX | [ON, OFF] | OFF | Enable/disable nvtx markers in libcuml. |