Enabling GPU support for Kubernetes nodes

To enable GPU support for your Kubernetes cluster, you need to deploy the NVIDIA device plugin for Kubernetes.

This guide allows to deploy the latest version of the driver. If you need a specific version, you can build your own nvidia-driver-installer container image by following the instructions on this GitHub page.

To deploy the NVIDIA device plugin for Kubernetes

  1. Disable SELinux on Kubernetes worker nodes with GPU by using the selinux_mode=disabled label during the worker group creation.
  2. Deploy Node Feature Discovery (NFD), a Kubernetes add-on for detecting hardware features and system configuration, to automatically discover GPU devices on your Kubernetes nodes and add the required labels:

    # kubectl apply -f https://raw.githubusercontent.com/virtuozzo/nvidia-driver-installer/main/daemonsets/node-feature-discovery.yaml
  3. Deploy the NVIDIA device plugin for Kubernetes:

    # kubectl apply -f https://raw.githubusercontent.com/virtuozzo/nvidia-driver-installer/main/daemonsets/nvidia-gpu-driver.yaml

    This deamon set will automatically distribute pods to all of your worker nodes with the required labels. For more details, refer to the official guide.

You can check that the plugin is installed correctly by doing as follows:

  1. Run a test pod:

    # kubectl apply -f https://raw.githubusercontent.com/virtuozzo/nvidia-driver-installer/main/tests/gpupod.yaml
  2. Check the pod logs:

    # kubectl logs gpu-pod
    [Vector addition of 50000 elements]
    Copy input data from the host memory to the CUDA device
    CUDA kernel launch with 196 blocks of 256 threads
    Copy output data from the CUDA device to the host memory
    Test PASSED
    Done