Enabling GPU support for Kubernetes nodes
To enable GPU support for your Kubernetes cluster, you need to deploy the NVIDIA device plugin for Kubernetes.
This guide allows to deploy the latest version of the driver. If you need a specific version, you can build your own nvidia-driver-installer
container image by following the instructions on this GitHub page.
To deploy the NVIDIA device plugin for Kubernetes
- Disable SELinux on Kubernetes worker nodes with GPU by using the
selinux_mode=disabled
label during the worker group creation. -
Deploy Node Feature Discovery (NFD), a Kubernetes add-on for detecting hardware features and system configuration, to automatically discover GPU devices on your Kubernetes nodes and add the required labels:
# kubectl apply -f https://raw.githubusercontent.com/virtuozzo/nvidia-driver-installer/main/daemonsets/node-feature-discovery.yaml
-
Deploy the NVIDIA device plugin for Kubernetes:
# kubectl apply -f https://raw.githubusercontent.com/virtuozzo/nvidia-driver-installer/main/daemonsets/nvidia-gpu-driver.yaml
This deamon set will automatically distribute pods to all of your worker nodes with the required labels. For more details, refer to the official guide.
You can check that the plugin is installed correctly by doing as follows:
-
Run a test pod:
# kubectl apply -f https://raw.githubusercontent.com/virtuozzo/nvidia-driver-installer/main/tests/gpupod.yaml
-
Check the pod logs:
# kubectl logs gpu-pod [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done