Configuring GPU passthrough

Limitations

  • Virtual machines with attached physical GPUs cannot be live migrated.

Prerequisites

Procedure overview

  1. Prepare a compute node for GPU passthrough.
  2. Reconfigure the compute cluster to enable GPU passthrough.
  3. Create a virtual machine with an attached physical GPU.
  4. Verify the attached GPU in the virtual machine.

To prepare a node for GPU passthrough

  1. List all graphics cards on a node and obtain their VID and PID:

    # lspci -nnD | grep NVIDIA
    0000:01:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)
    0000:81:00.0 3D controller [0302]: NVIDIA Corporation TU104GL [Tesla T4] [10de:1eb8] (rev a1)

    [10de:1eb8] is the VID and PID of the graphics cards; 0000:01:00.0 and 0000:81:00.0 are their PCI addresses.

  2. Detach the graphics card from the node:

    • To detach multiple graphics cards with the same VID and PID, run the pci-helper.py detach script. For NVIDIA graphics cards, additionally blacklist the Nouveau driver. For example:

      # /usr/libexec/vstorage-ui-agent/bin/pci-helper.py detach 10de:1eb8 --blacklist-nouveau

      The command detaches all of the graphics cards with the VID and PID 10de:1eb8 from the node and prevents the Nouveau driver from loading.

    • To detach a particular graphics card, you can use its PCI address with the pci-helper.py bind-to-stub script. This will assign the pci-stub driver to the GPU at its PCI address. For example:

      # /usr/libexec/vstorage-ui-agent/bin/pci-helper.py bind-to-stub 0000:01:00.0

      The command detaches the graphics card with PCI address 0000:01:00.0 from the node and prevents the Nouveau driver from loading.

  3. Enable IOMMU on the node by running the pci-helper.py enable-iommu script and reboot the node to apply the changes:

    # /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu
    # reboot

    The script works for both Intel and AMD processors.

  4. Verify that IOMMU is enabled in the dmesg output:

    # dmesg | grep -e DMAR -e IOMMU
    [    0.000000] DMAR: IOMMU enabled

To enable GPU passthrough for the compute cluster

  1. Create a configuration file in the YAML format. For example:

    # cat << EOF > pci-passthrough.yaml
    - node_id: c3b2321a-7c12-8456-42ce-8005ff937e12
      devices:
        - device_type: generic
          device: 10de:1eb8
          alias: gpu
    EOF

    In this example:

    • node_id is the UUID of the compute node that hosts a a physical GPU
    • generic is the device type for a physical GPU that will be passed through
    • 1b36:0100 is the VID and PID of a physical GPU
    • gpu is an arbitrary name that will be used as an alias for a physical GPU

    If a compute node has multiple graphics cards, it can be configured for both GPU passthrough and virtualization.

  2. Reconfigure the compute cluster by using this configuration file:

    # vinfra service compute set --pci-passthrough-config pci-passthrough.yaml
    +---------+--------------------------------------+
    | Field   | Value                                |
    +---------+--------------------------------------+
    | task_id | 89c8a6c4-f480-424e-ab44-c2f4e2976eb9 |
    +---------+--------------------------------------+
  3. Check the status of the task:

    # vinfra task show 89c8a6c4-f480-424e-ab44-c2f4e2976eb9

To create a virtual machine with an attached physical GPU

  1. Create a flavor with the pci_passthrough property specifying the GPU alias from the pci-passthrough.yaml file and the number of GPUs to use. For example, to create the gpu-flavor flavor with 8 vCPUs and 16 GiB of RAM, run:

    # openstack --insecure flavor create --ram 16384 --vcpus 8 --property "pci_passthrough:alias"="gpu:1" --public gpu-flavor
  2. Some drivers may require to hide the hypervisor signature. To do this, add the hide_hypervisor_id property to the flavor:

    # openstack --insecure flavor set gpu-flavor --property hide_hypervisor_id=true
  3. Create a boot volume from an image (for example, Ubuntu):

    # openstack --insecure volume create --size 20 --image ubuntu gpu-boot-volume
    
  4. Create a virtual machine specifying gpu-flavor and gpu-boot-volume. For example, to create the VM gpu-vm, run:

    # openstack --insecure server create --flavor gpu-flavor --volume gpu-boot-volume --network <network_name> gpu-vm

To check the GPU in a virtual machine

  1. Log in to the VM via SSH:

    # ssh <username>@<vm_ip_address>
  2. Install the NVIDIA drivers:

    # sudo apt update && sudo apt install -y nvidia-driver-470 nvidia-utils-470
  3. Check the GPU by running:

    # nvidia-smi

    The GPU should be recognized and operational.