Configuring SR-IOV support
Limitations
- Virtual machines with attached PCI devices cannot be live migrated.
Prerequisites
- To authorize further OpenStack commands, the OpenStack command-line client must be configured, as outlined in Connecting to OpenStack command-line interface.
Procedure overview
- Prepare a compute node for SR-IOV support.
- Reconfigure the compute cluster to enable SR-IOV support.
- Create a virtual machine with an SR-IOV network port.
To prepare a node for SR-IOV
-
List all network adapters on a node and obtain their VID and PID:
# lspci -nnD | grep Ethernet 0000:00:03.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017] 0000:00:04.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017]
[15b3:1017]
is the VID and PID of the network adapter. -
Check that the chosen network adapter supports SR-IOV by using its VID and PID:
# lspci -vv -d 15b3:1017 | grep SR-IOV Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV) Capabilities: [180 v1] Single Root I/O Virtualization (SR-IOV)
-
Enable IOMMU on the node by running the
pci-helper.py enable-iommu
script and reboot the node to apply the changes:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu # reboot
The script works for both Intel and AMD processors.
-
Verify that IOMMU is enabled in the
dmesg
output:# dmesg | grep -e DMAR -e IOMMU [ 0.000000] DMAR: IOMMU enabled
-
[For NVIDIA Mellanox network adapters] Enable SR-IOV in firmware:
-
Download Mellanox Firmware Tools (MFT) from the official website and extract the archive on the node. For example:
# wget https://www.mellanox.com/downloads/MFT/mft-4.17.0-106-x86_64-rpm.tgz # tar -xvzf mft-4.17.0-106-x86_64-rpm.tgz
-
Install the package, and then start Mellanox Software Tools (MST):
# yum install rpm-build # . mft-4.17.0-106-x86_64-rpm/install.sh # mst start
-
Determine the MST device path:
# mst status
-
Query the current configuration:
# mlxconfig -d /dev/mst/mt4119_pciconf0 q ... Configurations: ... NUM_OF_VFS 4 # Number of activated VFs SRIOV_EN True(1) # SR-IOV is enabled ...
-
Set the desired values, if necessary. For example, to increase the number of virtual functions to 8, run:
# mlxconfig -d /dev/mst/mt4119_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=8
- Reboot the node to apply the changes.
-
To enable SR-IOV support for the compute cluster
-
Create a configuration file in the YAML format. For example:
# cat << EOF > pci-passthrough.yaml - node_id: c3b2321a-7c12-8456-42ce-8005ff937e12 devices: - device_type: sriov device: enp2s0 physical_network: sriovnet num_vfs: 8 EOF
In this example:
node_id
is the UUID of the compute node that hosts a network adaptersriov
is the device type for a network adapterenp2s0
is the device name of a network adaptersriovnet
is an arbitrary name that will be used as an alias for a network adapternum_vfs
is the number of virtual functions to create for a network adapter
The maximum number of virtual functions supported by a PCI device is specified in the /sys/class/net/<device_name>/device/sriov_totalvfs file. For example:
# cat /sys/class/net/enp2s0/device/sriov_totalvfs 63
-
Reconfigure the compute cluster by using this configuration file:
# vinfra service compute set --pci-passthrough-config pci-passthrough.yaml +---------+--------------------------------------+ | Field | Value | +---------+--------------------------------------+ | task_id | 89c8a6c4-f480-424e-ab44-c2f4e2976eb9 | +---------+--------------------------------------+
-
Check the status of the task:
# vinfra task show 89c8a6c4-f480-424e-ab44-c2f4e2976eb9
If the compute configuration fails
Check whether the following error appears in
/var/log/vstorage-ui-backend/ansible.log
:2021-09-23 16:42:59,796 p=32130 u=vstoradmin | fatal: [32c8461b-92ec-48c3-ae02- 4d12194acd02]: FAILED! => {"changed": true, "cmd": "echo 4 > /sys/class/net/ enp103s0f1/device/sriov_numvfs", "delta": "0:00:00.127417", "end": "2021-09-23 19:42:59.784281", "msg": "non-zero return code", "rc": 1, "start": "2021-09-23 19:42:59.656864", "stderr": "/bin/sh: line 0: echo: write error: Cannot allocate memory", "stderr_lines": ["/bin/sh: line 0: echo: write error: Cannot allocate memory"], "stdout": "", "stdout_lines": []}
In this case, run the the
pci-helper.py
script, and reboot the node:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu --pci-realloc # reboot
When the node is up again, repeat the
vinfra service compute set
command.
To create a virtual machine with an SR-IOV network port
-
Create a physical compute network specifying the network adapter alias from the pci-passthrough.yaml file and the default vNIC type
direct
. You also need to disable the built-in DHCP server and specify the desired IP address range. For example, to create thesriov-network
network with the 10.10.10.0/24 CIDR, run:# vinfra service compute network create sriov-network --physical-network sriovnet --default-vnic-type direct \ --no-dhcp --cidr 10.10.10.0/24
-
Create a virtual machine specifying the new network. For example, to create the VM
sriov-vm
from the templatecentos7
and with thelarge
flavor, run:# vinfra service compute server create sriov-vm --network id=sriov-network --volume source=image,size=11,id=centos7 --flavor large
If the VM creation fails
Check whether the following error appears in
/var/log/hci/nova/nova-compute.log
:2021-08-27 17:56:21.349 6 ERROR nova.compute.manager [instance: 9fb738bf-afe5-40ef-943c- 22e43696bfd9] libvirtError: internal error: qemu unexpectedly closed the monitor: 2021-08-27T14:56:20.294985Z qemu-kvm: -device vfio-pci,host=01:00.3,id=hostdev0, bus=pci.0,addr=0x6: vfio error: 0000:01:00.3: group 1 is not viable 2021-08-27 17:56:21.349 6 ERROR nova.compute.manager [instance: 9fb738bf-afe5-40ef-943c- 22e43696bfd9] Please ensure all devices within the iommu_group are bound to their vfio bus driver.
In this case, the physical and virtual functions of the network adapter might belong to the same IOMMU group. You can check this by using the
virsh nodedev-dumpxml
command and specifying the device names of physical and virtual functions. For example:# virsh nodedev-dumpxml pci_0000_00_03_0 | grep iommuGroup <iommuGroup number='1'> </iommuGroup> # virsh nodedev-dumpxml pci_0000_00_03_1 | grep iommuGroup <iommuGroup number='1'> </iommuGroup>
The device names have the format
pci_0000_<bus_number>_<device_number>_<function_number>
. These numbers can be obtained via thelspci
command:# lspci -nn | grep Ethernet 00:03.0 Ethernet controller [0200]: Mellanox Technologies MT27800 Family [ConnectX-5] [15b3:1017] ...
In this output,
00
is the bus number,03
is the device number, and0
is the function number.If the physical and virtual functions belong to the same IOMMU group, you need to detach the physical function from the node by running the
pci-helper.py
script and specifying its VID and PID. For example:# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py detach 15b3:1017