Configuring GPU passthrough
Limitations
- Virtual machines with attached physical GPUs cannot be live migrated.
Prerequisites
- To authorize further OpenStack commands, the OpenStack command-line client must be configured, as outlined in Connecting to OpenStack command-line interface.
Procedure overview
- Prepare a compute node for GPU passthrough.
- Reconfigure the compute cluster to enable GPU passthrough.
- Create a virtual machine with an attached physical GPU.
- Verify the attached GPU in the virtual machine.
To prepare a node for GPU passthrough
-
List all graphics cards on a node and obtain their VID and PID:
# lspci -nnD | grep NVIDIA 0000:01:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1) 0000:81:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
[10de:1eb8]
is the VID and PID of the graphics cards;0000:01:00.0
and0000:81:00.0
are their PCI addresses. -
Detach the graphics card from the node:
-
To detach multiple graphics cards with the same VID and PID, run the
pci-helper.py detach
script. For NVIDIA graphics cards, additionally blacklist the Nouveau driver. For example:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py detach 10de:1eb8 --blacklist-nouveau
The command detaches all of the graphics cards with the VID and PID
10de:1eb8
from the node and prevents the Nouveau driver from loading. -
To detach a particular graphics card, you can use its PCI address with the
pci-helper.py bind-to-stub
script. This will assign thepci-stub
driver to the GPU at its PCI address. For example:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py bind-to-stub 0000:01:00.0
The command detaches the graphics card with PCI address
0000:01:00.0
from the node and prevents the Nouveau driver from loading. -
If you have multiple graphics cards detached from the node with
pci-helper.py detach
but want to use only one of them for GPU passthrough, you need to revert the detachment, and then detach one card withpci-helper.py bind-to-stub
. In this case, other graphics cards on the node can be used as vGPUs.To revert multiple GPU detachment:
-
In the /etc/default/grub file, locate the
GRUB_CMDLINE_LINUX
line, and then deletepci-stub.ids=<gpu_vid>:<gpu_pid> rd.driver.blacklist=nouveau nouveau.modeset=0
. The resulting file may look as follows:# cat /etc/default/grub | grep CMDLINE GRUB_CMDLINE_LINUX="crashkernel=auto tcache.enabled=0 quiet iommu=pt"
-
Regenerate the GRUB configuration file.
-
On a BIOS-based system, run:
# /usr/sbin/grub2-mkconfig -o /etc/grub2.cfg --update-bls-cmdline
-
On a UEFI-based system, run:
# /usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg --update-bls-cmdline
-
- Delete the
/etc/modprobe.d/blacklist-nouveau.conf
file. -
Re-create the Linux boot image by running:
# dracut -f
-
Reboot the node to apply the changes:
# reboot
-
-
-
Enable IOMMU on the node by running the
pci-helper.py enable-iommu
script and reboot the node to apply the changes:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu # reboot
The script works for both Intel and AMD processors.
-
Verify that IOMMU is enabled in the
dmesg
output:# dmesg | grep -e DMAR -e IOMMU [ 0.000000] DMAR: IOMMU enabled
To enable GPU passthrough for the compute cluster
-
Create a configuration file in the YAML format. For example:
# cat << EOF > pci-passthrough.yaml - node_id: c3b2321a-7c12-8456-42ce-8005ff937e12 devices: - device_type: generic device: 10de:1eb8 alias: gpu EOF
In this example:
node_id
is the UUID of the compute node that hosts a a physical GPUgeneric
is the device type for a physical GPU that will be passed through1b36:0100
is the VID and PID of a physical GPUgpu
is an arbitrary name that will be used as an alias for a physical GPU
If a compute node has multiple graphics cards, it can be configured for both GPU passthrough and virtualization.
-
Reconfigure the compute cluster by using this configuration file:
# vinfra service compute set --pci-passthrough-config pci-passthrough.yaml +---------+--------------------------------------+ | Field | Value | +---------+--------------------------------------+ | task_id | 89c8a6c4-f480-424e-ab44-c2f4e2976eb9 | +---------+--------------------------------------+
-
Check the status of the task:
# vinfra task show 89c8a6c4-f480-424e-ab44-c2f4e2976eb9
If the compute configuration fails
Check whether the following error appears in
/var/log/vstorage-ui-backend/ansible.log
:2021-09-23 16:42:59,796 p=32130 u=vstoradmin | fatal: [32c8461b-92ec-48c3-ae02- 4d12194acd02]: FAILED! => {"changed": true, "cmd": "echo 4 > /sys/class/net/ enp103s0f1/device/sriov_numvfs", "delta": "0:00:00.127417", "end": "2021-09-23 19:42:59.784281", "msg": "non-zero return code", "rc": 1, "start": "2021-09-23 19:42:59.656864", "stderr": "/bin/sh: line 0: echo: write error: Cannot allocate memory", "stderr_lines": ["/bin/sh: line 0: echo: write error: Cannot allocate memory"], "stdout": "", "stdout_lines": []}
In this case, run the the
pci-helper.py
script, and reboot the node:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu --pci-realloc # reboot
When the node is up again, repeat the
vinfra service compute set
command.
To create a virtual machine with an attached physical GPU
-
Create a flavor with the
pci_passthrough
property specifying the GPU alias from the pci-passthrough.yaml file and the number of GPUs to use. For example, to create thegpu-flavor
flavor with 8 vCPUs and 16 GiB of RAM, run:# openstack --insecure flavor create --ram 16384 --vcpus 8 --property "pci_passthrough:alias"="gpu:1" --public gpu-flavor
-
Some drivers may require to hide the hypervisor signature. To do this, add the
hide_hypervisor_id
property to the flavor:# openstack --insecure flavor set gpu-flavor --property hide_hypervisor_id=true
-
Create a boot volume from an image (for example, Ubuntu):
# openstack --insecure volume create --size 20 --image ubuntu gpu-boot-volume
-
Create a virtual machine specifying
gpu-flavor
andgpu-boot-volume
. For example, to create the VMgpu-vm
, run:# openstack --insecure server create --flavor gpu-flavor --volume gpu-boot-volume --network <network_name> gpu-vm
To check the GPU in a virtual machine
-
Log in to the VM via SSH:
# ssh <username>@<vm_ip_address>
-
Install the NVIDIA drivers:
# sudo apt update && sudo apt install -y nvidia-driver-470 nvidia-utils-470
-
Check the GPU by running:
# nvidia-smi
The GPU should be recognized and operational.