Preparing nodes for GPU virtualization
Before configuring GPU virtualization, you need to check whether your NVIDIA graphics card supports SR-IOV. The SR-IOV technology enables splitting a single physical device (physical function) into several virtual devices (virtual functions).
- Legacy GPUs are based on the NVIDIA Tesla architecture and have no SR-IOV support. For such GPUs, virtualization is performed by creating a mediated device (mdev) over the physical function.
- Modern GPUs are based on the NVIDIA Ampere architecture or newer and support SR-IOV. For such GPUs, virtualization is performed by creating a mdev over the virtual function.
For vGPU to work, enable it on the node by installing the NVIDIA kernel module, and then enable IOMMU. If you are using a modern GPU that is based on the NVIDIA Ampere architecture or newer, you need to enable the virtual functions for the GPU. For more details, refer to the official NVIDIA documentation.
Note that if you want to virtualize a GPU that was previously detached from the node for GPU passthrough, you need to additionally modify the GRUB configuration file.
To obtain the GPU PCI address
List all graphics cards on the node and obtain their PCI addresses:
# lspci -D | grep NVIDIA 0000:01:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 0000:81:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
In the command output, 0000:01:00.0
and
0000:81:00.0
are the PCI addresses of the graphics cards.
To enable vGPU on a node
-
On the node with the physical GPU, do one of the following:
-
If the physical GPU is attached to the node
Blacklist the Nouveau driver:
# rmmod nouveau # echo -e "blacklist nouveau\noptions nouveau modeset=0" > /usr/lib/modprobe.d/nouveau.conf # echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/nouveau.conf
-
If the physical GPU is detached from the node
-
In the /etc/default/grub file, locate the
GRUB_CMDLINE_LINUX
line, and then deletepci-stub.ids=<gpu_vid>:<gpu_pid>
. For example, for a GPU with the VID and PID10de:1eb8
, deletepci-stub.ids=10de:1eb8
, and check the resulting file:# cat /etc/default/grub | grep CMDLINE GRUB_CMDLINE_LINUX="crashkernel=auto tcache.enabled=0 quiet iommu=pt rd.driver.blacklist=nouveau nouveau.modeset=0"
-
Regenerate the GRUB configuration file.
-
On a BIOS-based system, run:
# /usr/sbin/grub2-mkconfig -o /etc/grub2.cfg --update-bls-cmdline
-
On a UEFI-based system, run:
# /usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg --update-bls-cmdline
-
-
Reboot the node to apply the changes:
# reboot
-
-
-
Install the vGPU NVIDIA driver:
-
Install the
kernel-devel
anddkms
packages:# dnf install kernel-devel dkms
-
Enable and start the
dkms
service:# systemctl enable dkms.service # systemctl start dkms.service
-
Install the vGPU KVM kernel module from the NVIDIA GRID package with the
--dkms
option:# bash NVIDIA-Linux-x86_64-xxx.xx.xx-vgpu-kvm*.run --dkms
-
Re-create the Linux boot image by running:
# dracut -f
-
-
Enable IOMMU on the node:
-
Run the
pci-helper.py enable-iommu
script:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu
The script works for both Intel and AMD processors.
-
Reboot the node to apply the changes:
# reboot
-
Check that IOMMU is successfully enabled in the
dmesg
output:# dmesg | grep -e DMAR -e IOMMU [ 0.000000] DMAR: IOMMU enabled
-
-
[For modern GPUs with SR-IOV support] Enable the virtual functions for your GPU:
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py nvidia-sriov-mgr --enable
To check that vGPU is enabled
-
[For legacy GPUs without SR-IOV support] Check the
/sys/bus/pci/devices/<pci_address>/mdev_supported_types
directory. For example, for the GPU with the PCI address0000:01:00.0
, run:ls /sys/bus/pci/devices/0000\:01:00.0/mdev_supported_types nvidia-222 nvidia-223 nvidia-224 nvidia-225 nvidia-226 nvidia-227 nvidia-228 nvidia-229 nvidia-230 nvidia-231 nvidia-232 nvidia-233 nvidia-234 nvidia-252 nvidia-319 nvidia-320 nvidia-321
For a vGPU-enabled card, the directory contains a list of supported vGPU types. A vGPU type is a vGPU configuration that defines the vRAM size, maximum resolution, maximum number of supported vGPUs, and other parameters.
-
[For modern GPUs with SR-IOV support] Check supported vGPU types and the number of available instances per vGPU type. For example, for the GPU with the PCI address
0000:c1:00.0
, run:# cd /sys/bus/pci/devices/0000:c1:00.0/virtfn0/mdev_supported_types # grep -vR --include=available_instances 0 nvidia-568/available_instances:1 nvidia-558/available_instances:1 nvidia-556/available_instances:1
In the command output, the supported types are
nvidia-568
,nvidia-558
, andnvidia-556
and each virtual function can host one instance.