Configuring GPU passthrough

Limitations

Virtual machines with attached physical GPUs cannot be live migrated.

Procedure overview

Prepare a compute node for GPU passthrough.
Reconfigure the compute cluster to enable GPU passthrough.
Create a virtual machine with an attached physical GPU.
Verify the attached GPU in the virtual machine.

To prepare a node for GPU passthrough

List all graphics cards on a node and obtain their vendor IDs (VID), product IDs (PID), and PCI domain addresses:

# vinfra node gpu list
+---------------+---------------+---------+----------+--------------------+-----------+------------------------+-----------+------------+-----------------+-------------+--------------------+
| id            | node_id       | host    | status   | vendor             | vendor_id | device                 | device_id | alias      | mode            | pci_address | pci_domain_address |
+---------------+---------------+---------+----------+--------------------+-----------+------------------------+-----------+------------+-----------------+-------------+--------------------+
| 1269b15e<...> | c3b2321a<...> | node001 | attached | NVIDIA Corporation | 10de      | TU104GL [Tesla T4]     | 1eb8      |            |                 | 01:00.0     | 0000:01:00.0       |
| be6c7558<...> | c3b2321a<...> | node001 | attached | NVIDIA Corporation | 10de      | TU104GL [Tesla T4]     | 1eb8      |            |                 | 81:00.0     | 0000:81:00.0       |
+-----------+-------------------+---------+----------+--------------------+-----------+------------------------+-----------+------------+-----------------+-------------+--------------------+

[10de:1eb8] is the VID and PID of the graphics cards; 0000:01:00.0 and 0000:81:00.0 are their PCI domain addresses.

Detach the graphics card from the node:
- To detach multiple graphics cards with the PCI address, run the pci-helper.py detach script. For NVIDIA graphics cards, additionally blacklist the Nouveau driver. For example:
```
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py detach 10de:1eb8 --blacklist-nouveau
```
  The command detaches all of the graphics cards with the VID and PID 10de:1eb8 from the node and prevents the Nouveau driver from loading.
- To detach a particular graphics card, you can use its PCI address with the pci-helper.py bind-to-stub script. This will assign the pci-stub driver to the GPU at its PCI address. For example:
```
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py bind-to-stub 0000:01:00.0
```
  The command detaches the graphics card with PCI address 0000:01:00.0 from the node and prevents the Nouveau driver from loading.
- If you have multiple graphics cards detached from the node with pci-helper.py detach but want to use only one of them for GPU passthrough, you need to revert the detachment, and then detach one card with pci-helper.py bind-to-stub. In this case, other graphics cards on the node can be used as vGPUs.
  To revert multiple GPU detachment:
  1. In the /etc/default/grub file, locate the GRUB_CMDLINE_LINUX line, and then delete pci-stub.ids=<gpu_vid>:<gpu_pid> rd.driver.blacklist=nouveau nouveau.modeset=0. The resulting file may look as follows:
    # cat /etc/default/grub | grep CMDLINE GRUB_CMDLINE_LINUX="crashkernel=auto tcache.enabled=0 quiet iommu=pt"
  2. Regenerate the GRUB configuration file.
    
    On a BIOS-based system, run:
    # /usr/sbin/grub2-mkconfig -o /etc/grub2.cfg --update-bls-cmdline
    
    On a UEFI-based system, run:
    # /usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg --update-bls-cmdline
  3. Delete the /etc/modprobe.d/blacklist-nouveau.conf file.
  4. Re-create the Linux boot image by running:
    # dracut -f
  5. Reboot the node to apply the changes:
    # reboot
Enable IOMMU on the node by running the pci-helper.py enable-iommu script and reboot the node to apply the changes:
```
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu
# reboot
```
The script works for both Intel and AMD processors.

Verify that IOMMU is enabled in the dmesg output:

# dmesg | grep -e DMAR -e IOMMU
[    0.000000] DMAR: IOMMU enabled

To enable GPU passthrough for the compute cluster

Create a configuration file in the YAML format. For example:
```
# cat << EOF > pci-passthrough.yaml
- node_id: c3b2321a-7c12-8456-42ce-8005ff937e12
  devices:
    - device_type: generic
      device: 10de:1eb8
      alias: gpu
EOF
```
In this example:
- node_id is the UUID of the compute node that hosts a physical GPU
- generic is the device type for a physical GPU that will be passed through
- 10de:1eb8 is the VID and PID of a physical GPU
- gpu is an arbitrary name that will be used as an alias for a physical GPU
If a compute node has multiple graphics cards, it can be configured for both GPU passthrough and virtualization.

Reconfigure the compute cluster by using this configuration file:

# vinfra service compute set --pci-passthrough-config pci-passthrough.yaml
+---------+--------------------------------------+
| Field   | Value                                |
+---------+--------------------------------------+
| task_id | 89c8a6c4-f480-424e-ab44-c2f4e2976eb9 |
+---------+--------------------------------------+

Check the status of the task:

# vinfra task show 89c8a6c4-f480-424e-ab44-c2f4e2976eb9

To create a virtual machine with an attached physical GPU

Create a flavor, as described in Creating flavors, specifying the GPU alias from the pci-passthrough.yaml file and the number of GPUs to use. For example, to create the gpu-flavor flavor with 8 vCPUs and 16 GiB of RAM, run:
```
# vinfra service compute flavor create gpu-flavor --ram 16384 --vcpus 8 --gpu gpu:2 --public
```

Create a virtual machine, as described in Creating virtual machines, specifying gpu-flavor. For example, to create the VM gpu-vm, run:

# vinfra service compute server create gpu-vm --network <network_name> --volume source=image,id=<image_id>,size=64 --flavor gpu-flavor

To check the GPU in a virtual machine

Install the NVIDIA drivers:

# sudo apt update && sudo apt install -y nvidia-driver-470 nvidia-utils-470

Check the GPU by running:
```
# nvidia-smi
```
The GPU should be recognized and operational.