Configuring GPU virtualization

Before configuring GPU virtualization, you need to check whether your NVIDIA graphics card supports SR-IOV. The SR-IOV technology enables splitting a single physical device (physical function) into several virtual devices (virtual functions).

Legacy GPUs are based on the NVIDIA Tesla architecture and have no SR-IOV support. For such GPUs, virtualization is performed by creating a mediated device (mdev) over the physical function.
Modern GPUs are based on the NVIDIA Ampere architecture or newer and support SR-IOV. For such GPUs, virtualization is performed by creating a mdev over the virtual function.

For more details, refer to the official NVIDIA documentation.

Limitations

Virtual machines with attached vGPUs cannot be suspended and live migrated.
The default QLX driver for the VNC console and the NVIDIA GPU driver are incompatible
After installing the NVIDIA GPU driver inside a virtual machine with an attached vGPU, the VNC console stops working. You can use RDP for a remote connection. Alternatively, for templates that already have the NVIDIA GPU driver installed, you can set the hw_use_vgpu_display property, to disable the integrated QLX driver. For example:
```
# openstack --insecure image set --property hw_use_vgpu_display 007db63f-9b41-4918-b572-2c5eef4c8f4b
```

Prerequisites

Download the NVIDIA vGPU software drivers in the NVIDIA Licensing Portal (refer to NVIDIA vGPU Software).
To authorize further OpenStack commands, the OpenStack command-line client must be configured, as outlined in Connecting to OpenStack command-line interface.

Procedure overview

Prepare a compute node for GPU virtualization.
Reconfigure the compute cluster to enable vGPU support.
Change the vGPU type for a physical GPU.
Check that the node has vGPU resources for allocation.
Create a virtual machine with an attached vGPU.
Verify the attached GPU in the virtual machine.

To enable vGPU on a node

List all graphics cards on the node and obtain their PCI domain addresses:

# vinfra node gpu list
+---------------+---------------+---------+----------+--------------------+-----------+------------------------+-----------+------------+-----------------+-------------+--------------------+
| id            | node_id       | host    | status   | vendor             | vendor_id | device                 | device_id | alias      | mode            | pci_address | pci_domain_address |
+---------------+---------------+---------+----------+--------------------+-----------+------------------------+-----------+------------+-----------------+-------------+--------------------+
| 1269b15e<...> | c3b2321a<...> | node001 | attached | NVIDIA Corporation | 10de      | TU104GL [Tesla T4]     | 1eb8      |            |                 | 01:00.0     | 0000:01:00.0       |
| be6c7558<...> | c3b2321a<...> | node001 | attached | NVIDIA Corporation | 10de      | TU104GL [Tesla T4]     | 1eb8      |            |                 | 81:00.0     | 0000:81:00.0       |
+-----------+-------------------+---------+----------+--------------------+-----------+------------------------+-----------+------------+-----------------+-------------+--------------------+

In the command output, 0000:01:00.0 and 0000:81:00.0 are the PCI domain addresses of the graphics cards.

On the node with the physical GPU, do one of the following:
- If the physical GPU is attached to the node
  Blacklist the Nouveau driver:
```
# rmmod nouveau
# echo -e "blacklist nouveau\noptions nouveau modeset=0" > /usr/lib/modprobe.d/nouveau.conf
# echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/nouveau.conf
```
- If the physical GPU is detached from the node
  1. In the /etc/default/grub file, locate the GRUB_CMDLINE_LINUX line, and then delete pci-stub.ids=<gpu_vid>:<gpu_pid>. For example, for a GPU with the VID and PID 10de:1eb8, delete pci-stub.ids=10de:1eb8, and check the resulting file:
    # cat /etc/default/grub | grep CMDLINE GRUB_CMDLINE_LINUX="crashkernel=auto tcache.enabled=0 quiet iommu=pt rd.driver.blacklist=nouveau nouveau.modeset=0"
  2. Regenerate the GRUB configuration file.
    
    On a BIOS-based system, run:
    # /usr/sbin/grub2-mkconfig -o /etc/grub2.cfg --update-bls-cmdline
    
    On a UEFI-based system, run:
    # /usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg --update-bls-cmdline
  3. Reboot the node to apply the changes:
    # reboot
Install the vGPU NVIDIA driver:
1. Install the kernel-devel and dkms packages:
```
# dnf install kernel-devel dkms
```
2. Enable and start the dkms service:
```
# systemctl enable dkms.service 
# systemctl start dkms.service
```
3. Install the vGPU KVM kernel module from the NVIDIA GRID package with the --dkms option:
```
# bash NVIDIA-Linux-x86_64-xxx.xx.xx-vgpu-kvm*.run --dkms
```
4. Re-create the Linux boot image by running:
```
# dracut -f
```
Enable IOMMU on the node by running the pci-helper.py enable-iommu script and reboot the node to apply the changes:
```
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu
# reboot
```
The script works for both Intel and AMD processors.

Verify that IOMMU is enabled in the dmesg output:

# dmesg | grep -e DMAR -e IOMMU
[    0.000000] DMAR: IOMMU enabled

[For modern GPUs with SR-IOV support] Enable the virtual functions for your GPU:
```
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py nvidia-sriov-mgr --enable
```
Verify that vGPU is enabled on the node:
- [For legacy GPUs without SR-IOV support] Check the /sys/bus/pci/devices/<pci_address>/mdev_supported_types directory. For example, for the GPU with the PCI address 0000:01:00.0, run:
```
ls /sys/bus/pci/devices/0000\:01:00.0/mdev_supported_types
nvidia-222  nvidia-223  nvidia-224  nvidia-225  nvidia-226  nvidia-227  nvidia-228  nvidia-229  nvidia-230  nvidia-231
nvidia-232  nvidia-233  nvidia-234  nvidia-252  nvidia-319  nvidia-320  nvidia-321
```
  For a vGPU-enabled card, the directory contains a list of supported vGPU types. A vGPU type is a vGPU configuration that defines the vRAM size, maximum resolution, maximum number of supported vGPUs, and other parameters.
- [For modern GPUs with SR-IOV support] Check supported vGPU types and the number of available instances per vGPU type. For example, for the GPU with the PCI address 0000:c1:00.0, run:
```
# cd /sys/bus/pci/devices/0000:c1:00.0/virtfn0/mdev_supported_types
# grep -vR --include=available_instances 0
nvidia-568/available_instances:1
nvidia-558/available_instances:1
nvidia-556/available_instances:1
```
  In the command output, the supported types are nvidia-568, nvidia-558, and nvidia-556 and each virtual function can host one instance.

To enable vGPU support for the compute cluster

Create a configuration file in the YAML format. For example:
```
# cat << EOF > pci-passthrough.yaml
- node_id: 1d6481c2-1fd5-406b-a0c7-330f24bd0e3d
  devices:
    - device_type: pgpu
      device: "0000:01:00.0"
      vgpu_type: nvidia-224
    - device_type: pgpu
      device: "0000:81:00.0"
      vgpu_type: nvidia-232
EOF
```
In this example:
- node_id is the UUID of the compute node that hosts a physical GPU
- pgpu is the device type for a physical GPU that will be virtualized
- "0000:01:00.0" and "0000:81:00.0" are the PCI domain addresses of physical GPUs
- nvidia-224 and nvidia-232 are the vGPU type that will be enabled for a physical GPU
If a compute node has multiple graphics cards, it can be configured for both GPU passthrough and virtualization.

Reconfigure the compute cluster by using this configuration file:

# vinfra service compute set --pci-passthrough-config pci-passthrough.yaml
+---------+--------------------------------------+
| Field   | Value                                |
+---------+--------------------------------------+
| task_id | 89c8a6c4-f480-424e-ab44-c2f4e2976eb9 |
+---------+--------------------------------------+

Check the status of the task:

# vinfra task show 89c8a6c4-f480-424e-ab44-c2f4e2976eb9

To change the vGPU type for a physical GPU

Ensure that the compute cluster has no virtual machines with the current vGPU type.
Modify the configuration file. For example, replace nvidia-224 with nvidia-231 in the vgpu_type field:
```
- device_type: pgpu
  device: "0000:01:00.0"
  vgpu_type: nvidia-231
```
Pass the configuration file to the vinfra service compute set command. For example:
```
# vinfra service compute set --pci-passthrough-config config.yaml
```
Reboot the node with the physical GPU to apply changes:
```
# reboot
```

To check that a node has vGPU resources for allocation

List resource providers in the compute cluster to obtain their IDs. For example:

# openstack --insecure resource provider list
+--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
| uuid                                 | name                                    | generation | root_provider_uuid                   | parent_provider_uuid                 |
+--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
| 359cccf7-9c64-4edc-a35d-f4673e485a04 | node001.vstoragedomain_pci_0000_01_00_0 |          1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 |
| b8443d1b-b941-4bf5-ab4b-2dc7c64ac7d1 | node001.vstoragedomain_pci_0000_81_00_0 |          1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 |
| 4936695a-4711-425a-b0e4-fdab5e4688d6 | node001.vstoragedomain                  |        823 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | None                                 |
+--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+

In this output, the resource provider with the ID 4936695a-4711-425a-b0e4-fdab5e4688d6 has two child resource providers for two physical GPUs with PCI addresses 0000_01_00_0 and 0000_81_00_0.

Use the obtained ID of a child resource provider to list its inventory. For example:

# openstack --insecure resource provider inventory list 359cccf7-9c64-4edc-a35d-f4673e485a04
+----------------+------------------+----------+----------+-----------+----------+-------+
| resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
+----------------+------------------+----------+----------+-----------+----------+-------+
| VGPU           |              1.0 |        8 |        0 |         1 |        1 |     8 |
+----------------+------------------+----------+----------+-----------+----------+-------+

The child resource provider has vGPU resources that can be allocated to virtual machines.

To create a virtual machine with a vGPU

If you want to use one or multiple vGPU types in the compute cluster, you need to create flavors that request these virtual GPUs, and then create virtual machines with this flavors:

Create a flavor, as described in Creating flavors, specifying the GPU aliases from the pci-passthrough.yaml file and the number of vGPUs to use. You can create one flavor with multiple GPU aliases or multiple flavors per each alias. For example:
- To create the vgpu-flavor flavor with the both aliases nvidia-224 and nvidia-232, run:
```
# vinfra service compute flavor create vgpu-flavor --vcpus 16 --ram 65536 --gpu nvidia-224:2 --gpu nvidia-232:1 --public
```
- To create two different flavors: vgpu224-flavor with the nvidia-224 alias and vgpu232-flavor with the nvidia-232 alias, run:
```
# vinfra service compute flavor create vgpu224-flavor --vcpus 16 --ram 65536 --gpu nvidia-224:2 --public
# vinfra service compute flavor create vgpu232-flavor --vcpus 16 --ram 65536 --gpu nvidia-232:1 --public
```

Create a virtual machine, as described in Creating virtual machines, specifying the prepared flavors. For example, to create the VM vgpu224-vm with the vgpu224-flavor flavor and the VM vgpu232-vm with the vgpu232-flavor flavor, run:

# vinfra service compute server create vgpu224-vm --network <network_name> --volume source=image,id=<image_id>,size=64 --flavor vgpu224-flavor
# vinfra service compute server create vgpu232-vm --network <network_name> --volume source=image,id=<image_id>,size=64 --flavor vgpu232-flavor

To check the GPU in a virtual machine

Install the NVIDIA drivers:

# sudo apt update && sudo apt install -y nvidia-driver-470 nvidia-utils-470

Check the GPU by running:
```
# nvidia-smi
```
The GPU should be recognized and operational.