Enabling PCI passthrough and vGPU support

To enable PCI passthrough and vGPU support for the compute cluster, you need to create a configuration file in the YAML format, and then use it to reconfigure the compute cluster.

Prerequisites

To create the PCI passthrough and vGPU configuration file

Specify the identifier of a compute node that hosts PCI devices, and then add host devices that you want to pass through or virtualize:

  • To create virtual functions for a network adapter, add these lines:

    - device_type: sriov
      device: enp2s0
      physical_network: sriovnet
      num_vfs: 8

    where:

    • sriov is the device type for a network adapter
    • enp2s0 is the device name of a network adapter
    • sriovnet is an arbitrary name that will be used as an alias for a network adapter
    • num_vfs is the number of virtual functions to create for a network adapter

    The maximum number of virtual functions supported by a PCI device is specified in the /sys/class/net/<device_name>/device/sriov_totalvfs file. For example:

    # cat /sys/class/net/enp2s0/device/sriov_totalvfs
    63
  • To enable GPU passthrough, add these lines:

    - device_type: generic
      device: 1b36:0100
      alias: gpu

    where:

    • generic is the device type for a physical GPU that will be passed through
    • 1b36:0100 is the VID and PID of a physical GPU
    • gpu is an arbitrary name that will be used as an alias for a physical GPU
  • To enable vGPU, add these lines:

    - device_type: pgpu
      device: "0000:03:00.0"
      vgpu_type: nvidia-224

    where:

    • pgpu is the device type for a physical GPU that will be virtualized
    • "0000:03:00.0" is the PCI address of a physical GPU
    • nvidia-224 is the vGPU type that will be enabled for a physical GPU

The entire configuration file may look as follows:

# cat config.yaml
- node_id: c3b2321a-7c12-8456-42ce-8005ff937e12
  devices:
    - device_type: sriov
      device: enp2s0
      physical_network: sriovnet
      num_vfs: 8
    - device_type: generic
      device: 1b36:0100
      alias: gpu
    - device_type: pgpu
      device: "0000:01:00.0"
      vgpu_type: nvidia-232
- node_id: 1d6481c2-1fd5-406b-a0c7-330f24bd0e3d
  devices:
    - device_type: generic
      device: 10de:1eb8
      alias: gpu
    - device_type: pgpu
      device: "0000:03:00.0"
      vgpu_type: nvidia-224
    - device_type: pgpu
      device: "0000:81:00.0"
      vgpu_type: nvidia-228

To configure the compute cluster for PCI passthrough and vGPU support

Pass the configuration file to the vinfra service compute set command. For example:

# vinfra service compute set --pci-passthrough-config config.yaml

If the compute configuration fails

Check whether the following error appears in /var/log/vstorage-ui-backend/ansible.log:

2021-09-23 16:42:59,796 p=32130 u=vstoradmin | fatal: [32c8461b-92ec-48c3-ae02-
4d12194acd02]: FAILED! => {"changed": true, "cmd": "echo 4 > /sys/class/net/
enp103s0f1/device/sriov_numvfs", "delta": "0:00:00.127417", "end": "2021-09-23 
19:42:59.784281", "msg": "non-zero return code", "rc": 1, "start": "2021-09-23 
19:42:59.656864", "stderr": "/bin/sh: line 0: echo: write error: Cannot allocate 
memory", "stderr_lines": ["/bin/sh: line 0: echo: write error: Cannot allocate memory"], 
"stdout": "", "stdout_lines": []}

In this case, run the the pci-helper.py script, and reboot the node:

# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu --pci-realloc
# reboot

When the node is up again, repeat the vinfra service compute set command.