Preparing nodes for GPU virtualization

For vGPU to work, enable it on the node by installing the NVIDIA kernel module. However, if you want to virtualize a GPU that was previously detached from the node for GPU passthrough, you need to additionally modify the GRUB configuration file.

Prerequisites

To enable vGPU on a node

  1. On the node with the physical GPU, do one of the following:

  2. Install the vGPU KVM kernel module from the NVIDIA GRID package:

    # bash NVIDIA-Linux-x86_64-460.73.02-vgpu-kvm.run
  3. Recreate the Linux boot image by running:

    # dracut -f
  4. Reboot the node to finish the module installation:

    # reboot

To check that a GPU card is vGPU enabled

List all graphics cards on the node and obtain their PCI addresses:

# lspci -D | grep NVIDIA
0000:01:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
0000:81:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

0000:01:00.0 and 0000:81:00.0 are the PCI addresses of the graphics cards.

Check that the graphics card is vGPU enabled:

ls /sys/bus/pci/devices/0000\:03:00.0/mdev_supported_types
nvidia-222  nvidia-223  nvidia-224  nvidia-225  nvidia-226  nvidia-227  nvidia-228  nvidia-229  nvidia-230  nvidia-231
nvidia-232  nvidia-233  nvidia-234  nvidia-252  nvidia-319  nvidia-320  nvidia-321

For a vGPU-enabled card, the directory contains a list of supported vGPU types. A vGPU type is a vGPU configuration that defines the vRAM size, maximum resolution, maximum number of supported vGPUs, and other parameters.

To check that a node has vGPU resources for allocation

List resource providers in the compute cluster to obtain their IDs. For example:

# openstack --insecure resource provider list
+--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
| uuid                                 | name                                    | generation | root_provider_uuid                   | parent_provider_uuid                 |
+--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
| 359cccf7-9c64-4edc-a35d-f4673e485a04 | node001.vstoragedomain_pci_0000_03_00_0 |          1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 |
| b8443d1b-b941-4bf5-ab4b-2dc7c64ac7d1 | node001.vstoragedomain_pci_0000_81_00_0 |          1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 |
| 4936695a-4711-425a-b0e4-fdab5e4688d6 | node001.vstoragedomain                  |        823 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | None                                 |
+--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+

In this output, the resource provider with the ID 4936695a-4711-425a-b0e4-fdab5e4688d6 has two child resource providers for two physical GPUs with PCI addresses 0000_03_00_0 and 0000_81_00_0.

Use the obtained ID of a child resource provider to list its inventory. For example:

# openstack --insecure resource provider inventory list 359cccf7-9c64-4edc-a35d-f4673e485a04
+----------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+----------------+------------------+----------+----------+-----------+----------+-------+
| VGPU           |              1.0 |        8 |        0 |         1 |        1 |     8 |
+----------------+------------------+----------+----------+-----------+----------+-------+

The child resource provider has vGPU resources that can be allocated to virtual machines.