nixpkgs/doc/languages-frameworks/cuda.section.md
Connor Baker 36d409bc3a _cuda.extensions: make overriding all CUDA package sets easier
Signed-off-by: Connor Baker <ConnorBaker01@gmail.com>
2025-07-18 19:41:51 +00:00

22 KiB

CUDA

Compute Unified Device Architecture (CUDA) is a parallel computing platform and application programming interface (API) model created by NVIDIA. It's commonly used to accelerate computationally intensive problems and has been widely adopted for High Performance Computing (HPC) and Machine Learning (ML) applications.

Packages provided by NVIDIA which require CUDA are typically stored in CUDA package sets.

Nixpkgs provides a number of CUDA package sets, each based on a different CUDA release. Top-level attributes providing access to CUDA package sets follow these naming conventions:

  • cudaPackages_x_y: A major-minor-versioned package set for a specific CUDA release, where x and y are the major and minor versions of the CUDA release.
  • cudaPackages_x: A major-versioned alias to the major-minor-versioned CUDA package set with the latest widely supported major CUDA release.
  • cudaPackages: An alias to the major-versioned alias for the latest widely supported CUDA release. This package set referenced by this alias is also referred to as the "default" CUDA package set. As an example, which is an alias to the most recent variant of cudaPackages_x. This is also referred to as the "default" CUDA package set.

Here are two examples to illustrate the naming conventions:

  • If cudaPackages_12_8 is the latest release in the 12.x series, but core libraries like OpenCV or ONNX Runtime fail to build with it, cudaPackages_12 may alias cudaPackages_12_6 instead of cudaPackages_12_8.
  • If cudaPackages_13_1 is the latest release, but core libraries like PyTorch or Torch Vision fail to build with it, cudaPackages may alias cudaPackages_12 instead of cudaPackages_13.

All CUDA package sets include common CUDA packages like libcublas, cudnn, tensorrt, and nccl.

Configuring Nixpkgs for CUDA

CUDA support is not enabled by default in Nixpkgs. To enable CUDA support, make sure Nixpkgs is imported with a configuration similar to the following:

{
  allowUnfreePredicate =
    let
      ensureList = x: if builtins.isList x then x else [ x ];
    in
    package:
    builtins.all (
      license:
      license.free
      || builtins.elem license.shortName [
        "CUDA EULA"
        "cuDNN EULA"
        "cuSPARSELt EULA"
        "cuTENSOR EULA"
        "NVidia OptiX EULA"
      ]
    ) (ensureList package.meta.license);
  cudaCapabilities = [ <target-architectures> ];
  cudaForwardCompat = true;
  cudaSupport = true;
}

The majority of CUDA packages are unfree, so either allowUnfreePredicate or allowUnfree should be set.

The cudaSupport configuration option is used by packages to conditionally enable CUDA-specific functionality. This configuration option is commonly used by packages which can be built with or without CUDA support.

The cudaCapabilities configuration option specifies a list of CUDA capabilities. Packages may use this option to control device code generation to take advantage of architecture-specific functionality, speed up compile times by producing less device code, or slim package closures. As an example, one can build for Ada Lovelace GPUs with cudaCapabilities = [ "8.9" ];. If cudaCapabilities is not provided, the default value is calculated per-package set, derived from a list of GPUs supported by that version of CUDA. Please consult supported GPUs for specific cards. Library maintainers should consult NVCC Docs and its release notes.

::: {.caution} Certain CUDA capabilities are not targeted by default, including capabilities belonging to the Jetson family of devices (like 8.7, which corresponds to the Jetson Orin) or non-baseline feature-sets (like 9.0a, which corresponds to the Hopper exclusive feature set). If you need to target these capabilities, you must explicitly set cudaCapabilities to include them. :::

The cudaForwardCompat boolean configuration option determines whether PTX support for future hardware is enabled.

Configuring CUDA package sets

CUDA package sets are created by callPackage-ing pkgs/top-level/cuda-packages.nix with an explicit argument for cudaMajorMinorVersion, a string of the form "<major>.<minor>" (e.g., "12.2"), which informs the CUDA package set tooling which version of CUDA to use. The majority of the CUDA package set tooling is available through the top-level attribute set _cuda, a fixed-point which exists apart from any instance of the CUDA package set.

::: {.caution} The cudaMajorMinorVersion and _cuda attributes are not part of the CUDA package set fixed-point, but are instead provided by callPackage from the top-level in the construction of the package set. As such, they must be modified via the package set's override attribute. :::

::: {.caution} As indicated by the underscore prefix, _cuda is an implementation detail and no guarantees are provided with respect to its stability or API. The _cuda attribute set is exposed only to ease creation or modification of CUDA package sets by expert, out-of-tree users. :::

::: {.note} The _cuda attribute set fixed-point should be modified through its extend attribute. :::

The _cuda.fixups attribute set is a mapping from package name to a callPackage-able expression which will be provided to overrideAttrs on the result of our generic builder.

::: {.caution} Fixups are chosen from _cuda.fixups by pname. As a result, packages with multiple versions (e.g., cudnn, cudnn_8_9, etc.) all share a single fixup function (i.e., _cuda.fixups.cudnn, which is pkgs/development/cuda-modules/fixups/cudnn.nix). :::

As an example, you can change the fixup function used for cuDNN for only the default CUDA package set with this overlay:

final: prev: {
  cudaPackages = prev.cudaPackages.override (prevAttrs: {
    _cuda = prevAttrs._cuda.extend (
      _: prevAttrs': {
        fixups = prevAttrs'.fixups // {
          cudnn = <your-fixup-function>;
        };
      }
    );
  });
}

Extending CUDA package sets

CUDA package sets are scopes, so they provide the usual overrideScope attribute for overriding package attributes (see the note about cudaMajorMinorVersion and _cuda in Configuring CUDA package sets).

Inspired by pythonPackagesExtensions, the _cuda.extensions attribute is a list of extensions applied to every version of the CUDA package set, allowing modification of all versions of the CUDA package set without having to know what they are or find a way to enumerate and modify them explicitly. As an example, disabling cuda_compat across all CUDA package sets can be accomplished with this overlay:

final: prev: {
  _cuda = prev._cuda.extend (
    _: prevAttrs: {
      extensions = prevAttrs.extensions ++ [ (_: _: { cuda_compat = null; }) ];
    }
  );
}

Using cudaPackages

::: {.caution} A non-trivial amount of CUDA package discoverability and usability relies on the various setup hooks used by a CUDA package set. As a result, users will likely encounter issues trying to perform builds within a devShell without manually invoking phases. :::

Nixpkgs makes CUDA package sets available under a number of attributes. While versioned package sets are available (e.g., cudaPackages_12_2), it is recommended to use the unversioned cudaPackages attribute, which is an alias to the latest version, as versioned attributes are periodically removed.

To use one or more CUDA packages in an expression, give the expression a cudaPackages parameter, and in case CUDA support is optional, add a config and cudaSupport parameter:

{
  config,
  cudaSupport ? config.cudaSupport,
  cudaPackages,
}:
<package-expression>

In your package's derivation arguments, it is strongly recommended the following are set:

{
  __structuredAttrs = true;
  strictDeps = true;
}

These settings ensure that the CUDA setup hooks function as intended.

When using callPackage, you can choose to pass in a different variant, e.g. when a package requires a specific version of CUDA:

{
  mypkg = callPackage { cudaPackages = cudaPackages_12_2; };
}

::: {.caution} Overriding the CUDA package set used by a package may cause inconsistencies, since the override does not affect dependencies of the package. As a result, it is easy to end up with a package which uses a different CUDA package set than its dependencies. If at all possible, it is recommended to change the default CUDA package set globally, to ensure a consistent environment. :::

Running Docker or Podman containers with CUDA support

It is possible to run Docker or Podman containers with CUDA support. The recommended mechanism to perform this task is to use the NVIDIA Container Toolkit.

The NVIDIA Container Toolkit can be enabled in NixOS like follows:

{
  hardware.nvidia-container-toolkit.enable = true;
}

This will automatically enable a service that generates a CDI specification (located at /var/run/cdi/nvidia-container-toolkit.json) based on the auto-detected hardware of your machine. You can check this service by running:

$ systemctl status nvidia-container-toolkit-cdi-generator.service

::: {.note} Depending on what settings you had already enabled in your system, you might need to restart your machine in order for the NVIDIA Container Toolkit to generate a valid CDI specification for your machine. :::

Once that a valid CDI specification has been generated for your machine on boot time, both Podman and Docker (> 25) will use this spec if you provide them with the --device flag:

$ podman run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)
$ docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)

You can check all the identifiers that have been generated for your auto-detected hardware by checking the contents of the /var/run/cdi/nvidia-container-toolkit.json file:

$ nix run nixpkgs#jq -- -r '.devices[].name' < /var/run/cdi/nvidia-container-toolkit.json
0
1
all

Specifying what devices to expose to the container

You can choose what devices are exposed to your containers by using the identifier on the generated CDI specification. Like follows:

$ podman run --rm -it --device=nvidia.com/gpu=0 ubuntu:latest nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)

You can repeat the --device argument as many times as necessary if you have multiple GPU's and you want to pick up which ones to expose to the container:

$ podman run --rm -it --device=nvidia.com/gpu=0 --device=nvidia.com/gpu=1 ubuntu:latest nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: <REDACTED>)
GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: <REDACTED>)

::: {.note} By default, the NVIDIA Container Toolkit will use the GPU index to identify specific devices. You can change the way to identify what devices to expose by using the hardware.nvidia-container-toolkit.device-name-strategy NixOS attribute. :::

Using docker-compose

It's possible to expose GPU's to a docker-compose environment as well. With a docker-compose.yaml file like follows:

services:
  some-service:
    image: ubuntu:latest
    command: sleep infinity
    deploy:
      resources:
        reservations:
          devices:
          - driver: cdi
            device_ids:
            - nvidia.com/gpu=all

In the same manner, you can pick specific devices that will be exposed to the container:

services:
  some-service:
    image: ubuntu:latest
    command: sleep infinity
    deploy:
      resources:
        reservations:
          devices:
          - driver: cdi
            device_ids:
            - nvidia.com/gpu=0
            - nvidia.com/gpu=1

Contributing

::: {.warning} This section of the docs is still very much in progress. Feedback is welcome in GitHub Issues tagging @NixOS/cuda-maintainers or on Matrix. :::

Package set maintenance

The CUDA Toolkit is a suite of CUDA libraries and software meant to provide a development environment for CUDA-accelerated applications. Until the release of CUDA 11.4, NVIDIA had only made the CUDA Toolkit available as a multi-gigabyte runfile installer, which we provide through the cudaPackages.cudatoolkit attribute. From CUDA 11.4 and onwards, NVIDIA has also provided CUDA redistributables (“CUDA-redist”): individually packaged CUDA Toolkit components meant to facilitate redistribution and inclusion in downstream projects. These packages are available in the cudaPackages package set.

All new projects should use the CUDA redistributables available in cudaPackages in place of cudaPackages.cudatoolkit, as they are much easier to maintain and update.

Updating redistributables

  1. Go to NVIDIA's index of CUDA redistributables: https://developer.download.nvidia.com/compute/cuda/redist/

  2. Make a note of the new version of CUDA available.

  3. Run

    nix run github:connorbaker/cuda-redist-find-features -- \
       download-manifests \
       --log-level DEBUG \
       --version <newest CUDA version> \
       https://developer.download.nvidia.com/compute/cuda/redist \
       ./pkgs/development/cuda-modules/cuda/manifests
    

    This will download a copy of the manifest for the new version of CUDA.

  4. Run

    nix run github:connorbaker/cuda-redist-find-features -- \
       process-manifests \
       --log-level DEBUG \
       --version <newest CUDA version> \
       https://developer.download.nvidia.com/compute/cuda/redist \
       ./pkgs/development/cuda-modules/cuda/manifests
    

    This will generate a redistrib_features_<newest CUDA version>.json file in the same directory as the manifest.

  5. Update the cudaVersionMap attribute set in pkgs/development/cuda-modules/cuda/extension.nix.

Updating cuTensor

  1. Repeat the steps present in Updating CUDA redistributables with the following changes:
    • Use the index of cuTensor redistributables: https://developer.download.nvidia.com/compute/cutensor/redist
    • Use the newest version of cuTensor available instead of the newest version of CUDA.
    • Use pkgs/development/cuda-modules/cutensor/manifests instead of pkgs/development/cuda-modules/cuda/manifests.
    • Skip the step of updating cudaVersionMap in pkgs/development/cuda-modules/cuda/extension.nix.

Updating supported compilers and GPUs

  1. Update nvccCompatibilities in pkgs/development/cuda-modules/_cuda/data/nvcc.nix to include the newest release of NVCC, as well as any newly supported host compilers.
  2. Update cudaCapabilityToInfo in pkgs/development/cuda-modules/_cuda/data/cuda.nix to include any new GPUs supported by the new release of CUDA.

Updating the CUDA Toolkit runfile installer

::: {.warning} While the CUDA Toolkit runfile installer is still available in Nixpkgs as the cudaPackages.cudatoolkit attribute, its use is not recommended, and it should be considered deprecated. Please migrate to the CUDA redistributables provided by the cudaPackages package set.

To ensure packages relying on the CUDA Toolkit runfile installer continue to build, it will continue to be updated until a migration path is available. :::

  1. Go to NVIDIA's CUDA Toolkit runfile installer download page: https://developer.nvidia.com/cuda-downloads

  2. Select the appropriate OS, architecture, distribution, and version, and installer type.

    • For example: Linux, x86_64, Ubuntu, 22.04, runfile (local)
    • NOTE: Typically, we use the Ubuntu runfile. It is unclear if the runfile for other distributions will work.
  3. Take the link provided by the installer instructions on the webpage after selecting the installer type and get its hash by running:

    nix store prefetch-file --hash-type sha256 <link>
    
  4. Update pkgs/development/cuda-modules/cudatoolkit/releases.nix to include the release.

Updating the CUDA package set

  1. Include a new cudaPackages_<major>_<minor> package set in pkgs/top-level/all-packages.nix.

    • NOTE: Changing the default CUDA package set should occur in a separate PR, allowing time for additional testing.
  2. Successfully build the closure of the new package set, updating pkgs/development/cuda-modules/cuda/overrides.nix as needed. Below are some common failures:

Unable to ... During ... Reason Solution Note
Find headers configurePhase or buildPhase Missing dependency on a dev output Add the missing dependency The dev output typically contains the headers
Find libraries configurePhase Missing dependency on a dev output Add the missing dependency The dev output typically contains CMake configuration files
Find libraries buildPhase or patchelf Missing dependency on a lib or static output Add the missing dependency The lib or static output typically contains the libraries

Failure to run the resulting binary is typically the most challenging to diagnose, as it may involve a combination of the aforementioned issues. This type of failure typically occurs when a library attempts to load or open a library it depends on that it does not declare in its DT_NEEDED section. Try the following debugging steps:

  1. First ensure that dependencies are patched with autoAddDriverRunpath.
  2. Failing that, try running the application with nixGL or a similar wrapper tool.
  3. If that works, it likely means that the application is attempting to load a library that is not in the RPATH or RUNPATH of the binary.

Writing tests

::: {.caution} The existence of passthru.testers and passthru.tests should be considered an implementation detail -- they are not meant to be a public or stable interface. :::

In general, there are two attribute sets in passthru that are used to build and run tests for CUDA packages: passthru.testers and passthru.tests. Each attribute set may contain an attribute set named cuda, which contains CUDA-specific derivations. The cuda attribute set is used to separate CUDA-specific derivations from those which support multiple implementations (e.g., OpenCL, ROCm, etc.) or have different licenses. For an example of such generic derivations, see the magma package.

::: {.note} Derivations are nested under the cuda attribute due to an OfBorg quirk: if evaluation fails (e.g., because of unfree licenses), the entire enclosing attribute set is discarded. This prevents other attributes in the set from being discovered, evaluated, or built. :::

passthru.testers

Attributes added to passthru.testers are derivations which produce an executable which runs a test. The produced executable should:

  • Take care to set up the environment, make temporary directories, and so on.
  • Be registered as the derivation's meta.mainProgram so that it can be run directly.

::: {.note} Testers which always require CUDA should be placed in passthru.testers.cuda, while those which are generic should be placed in passthru.testers. :::

The passthru.testers attribute set allows running tests outside the Nix sandbox. There are a number of reasons why this is useful, since such a test:

  • Can be run on non-NixOS systems, when wrapped with utilities like nixGL or nix-gl-host.
  • Has network access patterns which are difficult or impossible to sandbox.
  • Is free to produce output which is not deterministic, such as timing information.

passthru.tests

Attributes added to passthru.tests are derivations which run tests inside the Nix sandbox. Tests should:

  • Use the executables produced by passthru.testers, where possible, to avoid duplication of test logic.
  • Include requiredSystemFeatures = [ "cuda" ];, possibly conditioned on the value of cudaSupport if they are generic, to ensure that they are only run on systems exposing a CUDA-capable GPU.

::: {.note} Tests which always require CUDA should be placed in passthru.tests.cuda, while those which are generic should be placed in passthru.tests. :::

This is useful for tests which are deterministic (e.g., checking exit codes) and which can be provided with all necessary resources in the sandbox.