Skip to content

Latest commit

 

History

History
282 lines (214 loc) · 12.4 KB

File metadata and controls

282 lines (214 loc) · 12.4 KB

cuML Build From Source Guide

Setting Up Your Build Environment

To install cuML from source, ensure the following dependencies are met:

Note: The easiest way to setup a fully functional build environment is to use the conda environment files located in conda/environments/all_*.yaml. These files contain all the dependencies listed below except for clang-format (only needed for development/contributing) and UCX (only needed for optional multi-node operations). To create a development environment, see the recommended conda setup at the end of this section.

Hardware Needed to Run cuML: To run cuML code, you will need an NVIDIA GPU with the following minimum compute capability depending on your CUDA version:

  • CUDA 12.x: compute capability 7.0 or higher (Volta™ architecture or newer)
  • CUDA 13.x: compute capability 7.5 or higher (Turing™ architecture or newer)

Note that while a GPU is not required to build or develop cuML itself, it is necessary to execute and test GPU-accelerated functionality.

Software Dependencies:

  1. CUDA Toolkit (>= 12.2) - must include development libraries (cudart, cublas, cusparse, cusolver, curand, cufft)
  2. gcc (>= 13.0)
  3. cmake (>= 3.30.4)
  4. ninja - build system used by default
  5. Python (>= 3.11 and <= 3.14)
  6. Cython (>= 3.2.2)

RAPIDS Ecosystem Libraries:

These RAPIDS libraries must match the cuML version (e.g., all version 25.10 if building cuML 25.10):

C++ Libraries:

  • librmm - RAPIDS Memory Manager (C++ library)
  • libraft - RAPIDS CUDA accelerated algorithms (C++ library)
  • libcuvs - CUDA Vector Search library

Python Packages:

  • rmm - RAPIDS Memory Manager (Python package)
  • pylibraft - RAPIDS CUDA accelerated algorithms (Python package)
  • cuDF - GPU DataFrame library (Python package)

Python Build Dependencies:

  • scikit-build-core
  • rapids-build-backend

Python Runtime Dependencies:

For detailed version requirements of runtime dependencies (numpy, scikit-learn, scipy, joblib, numba, cupy, treelite, etc.), please see docs/source/supported_versions.rst.

Other External Libraries:

  • treelite
  • rapids-logger

Multi-GPU Support:

cuML has limited support for multi-GPU and multi-node operations. The following dependencies enable these features:

  • NCCL (>= 2.19) - required for multi-GPU communication; provided automatically as a transitive dependency through libcuvs (listed above)
  • UCX (>= 1.7) - optional; only required for multi-node operations (not needed for multi-GPU on a single node); must be explicitly enabled during build with WITH_UCX=ON (see Using Infiniband for MNMG)

For development only:

  • clang-format (= 20.1.8) - enforces uniform C++ coding style; required for pre-commit hooks and CI checks. The packages clang=20 and clang-tools=20 from the conda-forge channel should be sufficient, if you are using conda. If not using conda, install the right version using your OS package manager.

Recommended Conda Setup

It is recommended to use conda for environment/package management. If doing so, development environment .yaml files are located in conda/environments/all_*.yaml. These files contain most of the dependencies mentioned above. To create a development environment named cuml_dev, you can use the following commands (adjust the YAML filename to match your CUDA version and architecture):

conda create -n cuml_dev python=3.14
conda env update -n cuml_dev --file=conda/environments/all_cuda-131_arch-$(uname -m).yaml
conda activate cuml_dev

Installing from Source

Recommended Process

As a convenience, a build.sh script is provided to simplify the build process. The libraries will be installed to $INSTALL_PREFIX if set (e.g., export INSTALL_PREFIX=/install/path), otherwise to $CONDA_PREFIX.

$ ./build.sh                           # build the cuML libraries, tests, and python package, then
                                       # install them to $INSTALL_PREFIX if set, otherwise $CONDA_PREFIX

For workflows that involve frequent switching among branches or between debug and release builds, it is recommended that you install ccache and make use of it by passing the --ccache flag to build.sh.

To build individual components, specify them as arguments to build.sh:

$ ./build.sh libcuml                   # build and install the cuML C++ libraries
$ ./build.sh cuml                      # build and install the cuML Python package
$ ./build.sh prims                     # build the ml-prims tests
$ ./build.sh bench                     # build the cuML C++ benchmark
$ ./build.sh prims-bench               # build the ml-prims C++ benchmark

Other build.sh options:

$ ./build.sh clean                     # remove any prior build artifacts and configuration (start over)
$ ./build.sh libcuml -v                # build and install libcuml with verbose output
$ ./build.sh libcuml -g                # build and install libcuml for debug
$ PARALLEL_LEVEL=8 ./build.sh libcuml  # build and install libcuml limiting parallel build jobs to 8 (ninja -j8)
$ ./build.sh libcuml -n                # build libcuml but do not install
$ ./build.sh prims --allgpuarch        # build the ML prims tests for all supported GPU architectures
$ ./build.sh cuml --singlegpu          # build the cuML Python package without MNMG algorithms
$ ./build.sh --ccache                  # use ccache to cache compilations, speeding up subsequent builds

By default, Ninja is used as the cmake generator. To override this and use, e.g., make, define the CMAKE_GENERATOR environment variable accordingly:

CMAKE_GENERATOR='Unix Makefiles' ./build.sh

To run the C++ unit tests (optional), from the repo root:

$ cd cpp/build
$ ./test/ml # single-GPU algorithm tests
$ ./test/ml_mg # multi-GPU algorithm tests, if --singlegpu was not used
$ ./test/prims # ML Primitive function tests

If you want a list of the available C++ tests:

$ ./test/ml --gtest_list_tests # single-GPU algorithm tests
$ ./test/ml_mg --gtest_list_tests # multi-GPU algorithm tests
$ ./test/prims --gtest_list_tests # ML Primitive function tests

To run all Python tests, including multi-GPU algorithms, from the repo root:

$ cd python
$ pytest -v

To run only single-GPU algorithm tests:

$ pytest --ignore=cuml/tests/dask --ignore=cuml/tests/test_nccl.py

If you want a list of the available Python tests:

$ pytest cuml/tests --collect-only

Note: Some tests require xgboost. If running tests in conda devcontainers, you must install the xgboost conda package manually. See dependencies.yaml for version information.

Manual Process

Once dependencies are present, follow the steps below:

  1. Clone the repository:
$ git clone https://github.com/rapidsai/cuml.git
  1. Build and install libcuml (C++/CUDA library containing the cuML algorithms), starting from the repository root folder:
$ cd cpp
$ mkdir build && cd build
$ cmake ..

Note: If CUDA is not in your PATH, you may need to set CUDA_BIN_PATH before running cmake:

$ export CUDA_BIN_PATH=$CUDA_HOME  # Default: /usr/local/cuda

If using a conda environment (recommended), configure cmake for libcuml:

$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX

Note: You may see the following warning depending on your cmake version and CMAKE_INSTALL_PREFIX. This warning can be safely ignored:

Cannot generate a safe runtime search path for target ml_test because files
in some directories may conflict with libraries in implicit directories:

To silence it, add -DCMAKE_IGNORE_PATH=$CONDA_PREFIX/lib to your cmake command.

To reduce compile times, you can specify GPU compute capabilities to compile for. For example, for Volta GPUs:

$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CUDA_ARCHITECTURES="70"

Or for multiple architectures (e.g., Ampere and Hopper):

$ cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_CUDA_ARCHITECTURES="80;86;90"

You may also wish to make use of ccache to reduce build times when switching among branches or between debug and release builds:

$ cmake .. -DUSE_CCACHE=ON

There are many options to configure the build process, see the customizing build section.

  1. Build and install libcuml:
$ make -j
$ make install

To run tests (optional):

$ ./test/ml # single-GPU algorithm tests
$ ./test/ml_mg # multi-GPU algorithm tests
$ ./test/prims # ML Primitive function tests

If you want a list of the available tests:

$ ./test/ml --gtest_list_tests # single-GPU algorithm tests
$ ./test/ml_mg --gtest_list_tests # multi-GPU algorithm tests
$ ./test/prims --gtest_list_tests # ML Primitive function tests

To run cuML C++ benchmarks (optional):

$ ./bench/sg_benchmark  # single-GPU benchmarks

Use the --help option for more information.

To run ml-prims C++ benchmarks (optional):

$ ./bench/prims_benchmark  # ml-prims benchmarks

Use the --help option for more information.

To build doxygen docs for all C/C++ source files:

$ make doc
  1. Build and install the cuml python package:

From the repository root:

$ python -m pip install --no-build-isolation --no-deps --config-settings rapidsai.disable-cuda=true python/cuml

To run Python tests (optional):

$ cd python
$ pytest -v

To run only single-GPU algorithm tests:

$ pytest --ignore=cuml/tests/dask --ignore=cuml/tests/test_nccl.py

If you want a list of the available tests:

$ pytest cuml/tests --collect-only

Custom Build Options

libcuml (C++ library)

cuML's cmake has the following configurable flags available:

Flag Possible Values Default Value Behavior
BUILD_CUML_CPP_LIBRARY [ON, OFF] ON Enable/disable building the libcuml shared library. Setting this variable to OFF sets the variables BUILD_CUML_TESTS, BUILD_CUML_MG_TESTS and BUILD_CUML_EXAMPLES to OFF
BUILD_CUML_STD_COMMS [ON, OFF] ON Enable/disable building cuML NCCL+UCX communicator for running multi-node multi-GPU algorithms. Note that UCX support can also be enabled/disabled (see below). Note that BUILD_CUML_STD_COMMS and BUILD_CUML_MPI_COMMS are not mutually exclusive and can both be installed simultaneously.
WITH_UCX [ON, OFF] OFF Enable/disable UCX support for the standard cuML communicator. Algorithms requiring point-to-point messaging will not work when this is disabled. This has no effect on the MPI communicator.
BUILD_CUML_MPI_COMMS [ON, OFF] OFF Enable/disable building cuML MPI+NCCL communicator for running multi-node multi-GPU C++ tests. Note that BUILD_CUML_STD_COMMS and BUILD_CUML_MPI_COMMS are not mutually exclusive, and can both be installed simultaneously.
BUILD_CUML_TESTS [ON, OFF] ON Enable/disable building cuML algorithm test executable ml_test.
BUILD_CUML_MG_TESTS [ON, OFF] ON Enable/disable building cuML algorithm test executable ml_mg_test.
BUILD_PRIMS_TESTS [ON, OFF] ON Enable/disable building cuML algorithm test executable prims_test.
BUILD_CUML_EXAMPLES [ON, OFF] ON Enable/disable building cuML C++ API usage examples.
BUILD_CUML_BENCH [ON, OFF] ON Enable/disable building of cuML C++ benchmark.
CMAKE_CXX11_ABI [ON, OFF] ON Enable/disable the GLIBCXX11 ABI
DETECT_CONDA_ENV [ON, OFF] ON Use detection of conda environment for dependencies. If set to ON, and no value for CMAKE_INSTALL_PREFIX is passed, then it will assign it to $CONDA_PREFIX (to install in the active environment).
DISABLE_OPENMP [ON, OFF] OFF Set to ON to disable OpenMP
CMAKE_CUDA_ARCHITECTURES List of GPU architectures, semicolon-separated Empty List the GPU architectures to compile the GPU targets for. Set to "NATIVE" to auto detect GPU architecture of the system, set to "ALL" to compile for all RAPIDS supported archs.
KERNEL_INFO [ON, OFF] OFF Enable/disable kernel resource usage info in nvcc.
LINE_INFO [ON, OFF] OFF Enable/disable lineinfo in nvcc.
NVTX [ON, OFF] OFF Enable/disable nvtx markers in libcuml.