Configuring GPU virtualization

Before configuring GPU virtualization, you need to check whether your NVIDIA graphics card supports SR-IOV. The SR-IOV technology enables splitting a single physical device (physical function) into several virtual devices (virtual functions).

  • Legacy GPUs are based on the NVIDIA Tesla architecture and have no SR-IOV support. For such GPUs, virtualization is performed by creating a mediated device (mdev) over the physical function.
  • Modern GPUs are based on the NVIDIA Ampere architecture or newer and support SR-IOV. For such GPUs, virtualization is performed by creating a mdev over the virtual function.

For more details, refer to the official NVIDIA documentation.

Limitations

  • Virtual machines with attached vGPUs cannot be suspended and live migrated.

Prerequisites

Procedure overview

  1. Prepare a compute node for GPU virtualization.
  2. Reconfigure the compute cluster to enable vGPU support.
  3. Change the vGPU type for a physical GPU.

  4. Check that the node has vGPU resources for allocation.
  5. Create a virtual machine with an attached vGPU.
  6. Verify the attached GPU in the virtual machine.

To enable vGPU on a node

  1. List all graphics cards on the node and obtain their PCI addresses:

    # lspci -D | grep NVIDIA
    0000:01:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
    0000:81:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

    In the command output, 0000:01:00.0 and 0000:81:00.0 are the PCI addresses of the graphics cards.

  2. On the node with the physical GPU, do one of the following:

  3. Install the vGPU NVIDIA driver:

    1. Install the kernel-devel and dkms packages:

      # dnf install kernel-devel dkms
      
    2. Enable and start the dkms service:

      # systemctl enable dkms.service 
      # systemctl start dkms.service
    3. Install the vGPU KVM kernel module from the NVIDIA GRID package with the --dkms option:

      # bash NVIDIA-Linux-x86_64-xxx.xx.xx-vgpu-kvm*.run --dkms
    4. Re-create the Linux boot image by running:

      # dracut -f
  4. Enable IOMMU on the node by running the pci-helper.py enable-iommu script and reboot the node to apply the changes:

    # /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu
    # reboot

    The script works for both Intel and AMD processors.

  5. Verify that IOMMU is enabled in the dmesg output:

    # dmesg | grep -e DMAR -e IOMMU
    [    0.000000] DMAR: IOMMU enabled
  6. [For modern GPUs with SR-IOV support] Enable the virtual functions for your GPU:

    # /usr/libexec/vstorage-ui-agent/bin/pci-helper.py nvidia-sriov-mgr --enable
  7. Verify that vGPU is enabled on the node:

    • [For legacy GPUs without SR-IOV support] Check the /sys/bus/pci/devices/<pci_address>/mdev_supported_types directory. For example, for the GPU with the PCI address 0000:01:00.0, run:

      ls /sys/bus/pci/devices/0000\:01:00.0/mdev_supported_types
      nvidia-222  nvidia-223  nvidia-224  nvidia-225  nvidia-226  nvidia-227  nvidia-228  nvidia-229  nvidia-230  nvidia-231
      nvidia-232  nvidia-233  nvidia-234  nvidia-252  nvidia-319  nvidia-320  nvidia-321

      For a vGPU-enabled card, the directory contains a list of supported vGPU types. A vGPU type is a vGPU configuration that defines the vRAM size, maximum resolution, maximum number of supported vGPUs, and other parameters.

    • [For modern GPUs with SR-IOV support] Check supported vGPU types and the number of available instances per vGPU type. For example, for the GPU with the PCI address 0000:c1:00.0, run:

      # cd /sys/bus/pci/devices/0000:c1:00.0/virtfn0/mdev_supported_types
      # grep -vR --include=available_instances 0
      nvidia-568/available_instances:1
      nvidia-558/available_instances:1
      nvidia-556/available_instances:1

      In the command output, the supported types are nvidia-568, nvidia-558, and nvidia-556 and each virtual function can host one instance.

To enable vGPU support for the compute cluster

  1. Create a configuration file in the YAML format. For example:

    # cat << EOF > pci-passthrough.yaml
    - node_id: 1d6481c2-1fd5-406b-a0c7-330f24bd0e3d
      devices:
        - device_type: pgpu
          device: "0000:01:00.0"
          vgpu_type: nvidia-224
        - device_type: pgpu
          device: "0000:81:00.0"
          vgpu_type: nvidia-232
    EOF

    In this example:

    • node_id is the UUID of the compute node that hosts a a physical GPU
    • pgpu is the device type for a physical GPU that will be virtualized
    • "0000:01:00.0" and "0000:81:00.0" are the PCI addresses of physical GPUs
    • nvidia-224 and nvidia-232 are the vGPU type that will be enabled for a physical GPU

    If a compute node has multiple graphics cards, it can be configured for both GPU passthrough and virtualization.

  2. Reconfigure the compute cluster by using this configuration file:

    # vinfra service compute set --pci-passthrough-config pci-passthrough.yaml
    +---------+--------------------------------------+
    | Field   | Value                                |
    +---------+--------------------------------------+
    | task_id | 89c8a6c4-f480-424e-ab44-c2f4e2976eb9 |
    +---------+--------------------------------------+
  3. Check the status of the task:

    # vinfra task show 89c8a6c4-f480-424e-ab44-c2f4e2976eb9

To change the vGPU type for a physical GPU

  1. Ensure that the compute cluster has no virtual machines with the current vGPU type.
  2. Modify the configuration file. For example, replace nvidia-224 with nvidia-231 in the vgpu_type field:

    - device_type: pgpu
      device: "0000:01:00.0"
      vgpu_type: nvidia-231
  3. Pass the configuration file to the vinfra service compute set command. For example:

    # vinfra service compute set --pci-passthrough-config config.yaml
  4. Reboot the node with the physical GPU to apply changes:

    # reboot

To check that a node has vGPU resources for allocation

  1. List resource providers in the compute cluster to obtain their IDs. For example:

    # openstack --insecure resource provider list
    +--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
    | uuid                                 | name                                    | generation | root_provider_uuid                   | parent_provider_uuid                 |
    +--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
    | 359cccf7-9c64-4edc-a35d-f4673e485a04 | node001.vstoragedomain_pci_0000_01_00_0 |          1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 |
    | b8443d1b-b941-4bf5-ab4b-2dc7c64ac7d1 | node001.vstoragedomain_pci_0000_81_00_0 |          1 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | 4936695a-4711-425a-b0e4-fdab5e4688d6 |
    | 4936695a-4711-425a-b0e4-fdab5e4688d6 | node001.vstoragedomain                  |        823 | 4936695a-4711-425a-b0e4-fdab5e4688d6 | None                                 |
    +--------------------------------------+-----------------------------------------+------------+--------------------------------------+--------------------------------------+
    

    In this output, the resource provider with the ID 4936695a-4711-425a-b0e4-fdab5e4688d6 has two child resource providers for two physical GPUs with PCI addresses 0000_01_00_0 and 0000_81_00_0.

  2. Use the obtained ID of a child resource provider to list its inventory. For example:

    # openstack --insecure resource provider inventory list 359cccf7-9c64-4edc-a35d-f4673e485a04
    +----------------+------------------+----------+----------+-----------+----------+-------+
    | resource_class | allocation_ratio | max_unit | reserved | step_size | min_unit | total |
    +----------------+------------------+----------+----------+-----------+----------+-------+
    | VGPU           |              1.0 |        8 |        0 |         1 |        1 |     8 |
    +----------------+------------------+----------+----------+-----------+----------+-------+

    The child resource provider has vGPU resources that can be allocated to virtual machines.

To create a virtual machine with a vGPU

  • If you use only one vGPU type in the compute cluster, you need to create a flavor that requests one virtual GPU, and then create virtual machines with this flavor:

    1. Create a flavor with the resources property specifying the number of vGPUs to use. For example, to create the vgpu-flavor flavor with 2 vCPUs and 4 GiB of RAM, run:

      # openstack --insecure flavor create --ram 4096 --vcpus 2 --property resources:VGPU=1 --public vgpu-flavor
    2. Some drivers may require to hide the hypervisor signature. To do this, add the hide_hypervisor_id property to the flavor:

      # openstack --insecure flavor set vgpu-flavor --property hide_hypervisor_id=true
    3. Create a boot volume from an image (for example, Ubuntu):

      # openstack --insecure volume create --size 20 --image ubuntu vgpu-boot-volume
      
    4. Create a virtual machine specifying vgpu-flavor and vgpu-boot-volume. For example, to create the VM vgpu-vm, run:

      # openstack --insecure server create --flavor vgpu-flavor --volume vgpu-boot-volume --network <network_name> vgpu-vm

    The created virtual machine will have a virtual GPU of the type that is configured in the compute cluster.

  • If you want to use multiple vGPU types in the compute cluster, you need to manually create CUSTOM_NVIDIA_XXX traits, assign them to corresponding vGPU resource providers, and then proceed to create flavors and virtual machines with the assigned traits:

    1. List resource providers in the compute cluster to obtain their IDs. For example:

      # openstack --insecure resource provider list
      +--------------------------------------+-----------------------------------------+------------+--------------------+----------------------+
      | uuid                                 | name                                    | generation | root_provider_uuid | parent_provider_uuid |
      +--------------------------------------+-----------------------------------------+------------+--------------------+----------------------+
      | 7d2ef259-42df-4ef8-8eaa-66c3b7448fc3 | node001.vstoragedomain_pci_0000_85_00_0 |         62 | 1f08c319-f270-<…>  | 1f08c319-f270-<…>    |
      | 94a84fc6-2f28-46d5-93e1-e588e347dd3b | node001.vstoragedomain_pci_0000_10_00_0 |         38 | 1f08c319-f270-<…>  | 1f08c319-f270-<…>    |
      | 41c177e3-6998-4e56-8d29-f98f72fef910 | node002.vstoragedomain_pci_0000_85_00_0 |         13 | 9dbc8c64-0048-<…>  | 9dbc8c64-0048-<…>    |
      | 7fd1d10f-9ceb-4cd1-acec-a1254755211b | node002.vstoragedomain_pci_0000_10_00_0 |         13 | 9dbc8c64-0048-<…>  | 9dbc8c64-0048-<…>    |
      +--------------------------------------+-----------------------------------------+------------+--------------------+----------------------+
    2. Create custom traits that correspond to different GPU types. For example, to create the traits CUSTOM_NVIDIA_231 and CUSTOM_NVIDIA_232, run:

      # openstack --insecure trait create CUSTOM_NVIDIA_231
      # openstack --insecure trait create CUSTOM_NVIDIA_232
    3. Add the corresponding trait to the resource provider matching the GPU. For example:

      # openstack --insecure resource provider trait set --trait CUSTOM_NVIDIA_231 7d2ef259-42df-4ef8-8eaa-66c3b7448fc3
      +-------------------+
      | name              |
      +-------------------+
      | CUSTOM_NVIDIA_231 |
      +-------------------+
      # openstack --insecure resource provider trait set --trait CUSTOM_NVIDIA_231 94a84fc6-2f28-46d5-93e1-e588e347dd3b
      +-------------------+
      | name              |
      +-------------------+
      | CUSTOM_NVIDIA_231 |
      +-------------------+

      Now, the trait CUSTOM_NVIDIA_231 is assigned to the vGPU resource providers of the node node001. To assign the trait CUSTOM_NVIDIA_232 to the vGPU resource providers of the node node002, run:

      # openstack --insecure resource provider trait set --trait CUSTOM_NVIDIA_232 41c177e3-6998-4e56-8d29-f98f72fef910
      +-------------------+
      | name              |
      +-------------------+
      | CUSTOM_NVIDIA_232 |
      +-------------------+
      # openstack --insecure resource provider trait set --trait CUSTOM_NVIDIA_232 7fd1d10f-9ceb-4cd1-acec-a1254755211b
      +-------------------+
      | name              |
      +-------------------+
      | CUSTOM_NVIDIA_232 |
      +-------------------+
    4. Create flavors with the resources property specifying the number of vGPUs to use. For example, to create the vgpu231-flavor flavor with 2 vCPUs and 4 GiB of RAM and the vgpu232-flavor flavor with 4 vCPUs and 8 GiB of RAM, run:

      # openstack --insecure flavor create --ram 4096 --vcpus 2 --property resources:VGPU=1 --public vgpu231-flavor
      # openstack --insecure flavor create --ram 8192 --vcpus 4 --property resources:VGPU=1 --public vgpu232-flavor
    5. Add the requested traits to your flavors. For example, to add the traits CUSTOM_NVIDIA_231 and CUSTOM_NVIDIA_232 to the flavors vgpu231-flavor and vgpu232-flavor, run:

      # openstack --insecure flavor set --property trait:CUSTOM_NVIDIA_231=required vgpu231-flavor
      # openstack --insecure flavor set --property trait:CUSTOM_NVIDIA_232=required vgpu232-flavor
    6. Create a boot volume from an image (for example, Ubuntu):

      # openstack --insecure volume create --size 20 --image ubuntu vgpu-boot-volume
      
    7. Create virtual machines specifying the prepared flavors and vgpu-boot-volume. For example, to create the VM vgpu231-vm with the vgpu231-flavor flavor and the VM vgpu232-vm with the vgpu232-flavor flavor, run:

      # openstack --insecure server create --volume vgpu-boot-volume --flavor vgpu231-flavor --network <network_name> vgpu231-vm
      # openstack --insecure server create --volume vgpu-boot-volume --flavor vgpu232-flavor --network <network_name> vgpu232-vm

    The created virtual machines will have virtual GPUs of different types that are configured in the compute cluster.

To check the GPU in a virtual machine

  1. Log in to the VM via SSH:

    # ssh <username>@<vm_ip_address>
  2. Install the NVIDIA drivers:

    # sudo apt update && sudo apt install -y nvidia-driver-470 nvidia-utils-470
  3. Check the GPU by running:

    # nvidia-smi

    The GPU should be recognized and operational.