Preparing nodes for GPU virtualization
For vGPU to work, enable it on the node by installing the NVIDIA kernel module. However, if you want to virtualize a GPU that was previously detached from the node for GPU passthrough, you need to additionally modify the GRUB configuration file.
Prerequisites
- To authorize further OpenStack commands, the OpenStack command-line client must be configured, as outlined in Connecting to OpenStack command-line interface.
To enable vGPU on a node
-
On the node with the physical GPU, do one of the following:
-
If the physical GPU is attached to the node
Blacklist the Nouveau driver:
# rmmod nouveau # echo -e "blacklist nouveau\noptions nouveau modeset=0" > /usr/lib/modprobe.d/nouveau.conf # echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/nouveau.conf
-
If the physical GPU is detached from the node
-
In the /etc/default/grub file, locate the
GRUB_CMDLINE_LINUX
line, and then deletepci-stub.ids=<gpu_vid>:<gpu_pid>
. For example, for a GPU with the VID and PID10de:1eb8
, deletepci-stub.ids=10de:1eb8
, and check the resulting file:# cat /etc/sysconfig/grub | grep CMDLINE GRUB_CMDLINE_LINUX="crashkernel=auto tcache.enabled=0 quiet iommu=pt rd.driver.blacklist=nouveau nouveau.modeset=0"
-
Regenerate the GRUB configuration file.
-
On a BIOS-based system, run:
# /usr/sbin/grub2-mkconfig -o /etc/grub2.cfg
-
On a UEFI-based system, run:
# /usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg
-
-
Reboot the node to apply the changes:
# reboot
-
-
-
Install the vGPU KVM kernel module from the NVIDIA GRID package:
# bash NVIDIA-Linux-x86_64-460.73.02-vgpu-kvm.run
-
Recreate the Linux boot image by running:
# dracut -f
-
Reboot the node to finish the module installation:
# reboot
To check that a GPU card is vGPU enabled
List all graphics cards on the node and obtain their PCI addresses:
# lspci | grep VGA 03:00.0 VGA compatible controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1) 03:00.0 VGA compatible controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
03:00.0
is the graphics card's PCI address.
Check that the graphics card is vGPU enabled:
ls /sys/bus/pci/devices/0000\:03:00.0/mdev_supported_types nvidia-222 nvidia-223 nvidia-224 nvidia-225 nvidia-226 nvidia-227 nvidia-228 nvidia-229 nvidia-230 nvidia-231 nvidia-232 nvidia-233 nvidia-234 nvidia-252 nvidia-319 nvidia-320 nvidia-321
For a vGPU-enabled card, the directory contains a list of supported vGPU types. A vGPU type is a vGPU configuration that defines the vRAM size, maximum resolution, maximum number of supported vGPUs, and other parameters.
To check that a node has vGPU resources for allocation
List resource providers in the compute cluster to obtain their IDs. For example:
# openstack --insecure resource provider list +--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+ | uuid | name | generation | root_provider_uuid | parent_provider_uuid | +--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+ | 359cccf7-9c64-4edc-a35d-f4673e485a04 | node001.vstoragedomain_pci_0000_03_00_0 | 1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | | b8443d1b-b941-4bf5-ab4b-2dc7c64ac7d1 | node001.vstoragedomain_pci_0000_81_00_0 | 1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | | 4936695a-4711-425a-b0e4-fdab5e4688d6 | node001.vstoragedomain | 823 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | None | +--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
In this output, the resource provider with the ID 4936695a-4711-425a-b0e4-fdab5e4688d6
has two child resource providers for two physical GPUs with PCI addresses 0000_03_00_0
and 0000_81_00_0
.
Use the obtained ID of a child resource provider to list its inventory. For example:
# openstack --insecure resource provider inventory list 359cccf7-9c64-4edc-a35d-f4673e485a04 +----------------+------------------+----------+----------+-----------+----------+-------+ | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total | +----------------+------------------+----------+----------+-----------+----------+-------+ | VGPU | 1.0 | 8 | 0 | 1 | 1 | 8 | +----------------+------------------+----------+----------+-----------+----------+-------+
The child resource provider has vGPU resources that can be allocated to virtual machines.