Preparing nodes for GPU virtualization

For vGPU to work, enable it on the node by installing the NVIDIA kernel module, and then enable IOMMU. However, if you want to virtualize a GPU that was previously detached from the node for GPU passthrough, you need to additionally modify the GRUB configuration file.

To enable vGPU on a node

On the node with the physical GPU, do one of the following:
- If the physical GPU is attached to the node
  Blacklist the Nouveau driver:
```
# rmmod nouveau
# echo -e "blacklist nouveau\noptions nouveau modeset=0" > /usr/lib/modprobe.d/nouveau.conf
# echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/nouveau.conf
```
- If the physical GPU is detached from the node
  1. In the /etc/default/grub file, locate the GRUB_CMDLINE_LINUX line, and then delete pci-stub.ids=<gpu_vid>:<gpu_pid>. For example, for a GPU with the VID and PID 10de:1eb8, delete pci-stub.ids=10de:1eb8, and check the resulting file:
    # cat /etc/default/grub | grep CMDLINE GRUB_CMDLINE_LINUX="crashkernel=auto tcache.enabled=0 quiet iommu=pt rd.driver.blacklist=nouveau nouveau.modeset=0"
  2. Regenerate the GRUB configuration file.
    
    On a BIOS-based system, run:
    # /usr/sbin/grub2-mkconfig -o /etc/grub2.cfg --update-bls-cmdline
    
    On a UEFI-based system, run:
    # /usr/sbin/grub2-mkconfig -o /etc/grub2-efi.cfg --update-bls-cmdline
  3. Reboot the node to apply the changes:
    # reboot
Install the kernel-devel and dkms packages:
```
# dnf install kernel-devel dkms
```

Enable and start the dkms service:

# systemctl enable dkms.service 
# systemctl start dkms.service

Install the vGPU KVM kernel module from the NVIDIA GRID package with the --dkms option:
```
# bash NVIDIA-Linux-x86_64-xxx.xx.xx-vgpu-kvm*.run --dkms
```
Re-create the Linux boot image by running:
```
# dracut -f
```
Run the pci-helper.py enable-iommu script to enable IOMMU on the node:
```
# /usr/libexec/vstorage-ui-agent/bin/pci-helper.py enable-iommu
```
The script works for both Intel and AMD processors.
Reboot the node to apply the changes:
```
# reboot
```

You can check that IOMMU is successfully enabled in the dmesg output:

# dmesg | grep -e DMAR -e IOMMU
[    0.000000] DMAR: IOMMU enabled

To check that a GPU card is vGPU enabled

List all graphics cards on the node and obtain their PCI addresses:

# lspci -D | grep NVIDIA
0000:01:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
0000:81:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

0000:01:00.0 and 0000:81:00.0 are the PCI addresses of the graphics cards.

Check that the graphics card is vGPU enabled:

ls /sys/bus/pci/devices/0000\:01:00.0/mdev_supported_types
nvidia-222  nvidia-223  nvidia-224  nvidia-225  nvidia-226  nvidia-227  nvidia-228  nvidia-229  nvidia-230  nvidia-231
nvidia-232  nvidia-233  nvidia-234  nvidia-252  nvidia-319  nvidia-320  nvidia-321

For a vGPU-enabled card, the directory contains a list of supported vGPU types. A vGPU type is a vGPU configuration that defines the vRAM size, maximum resolution, maximum number of supported vGPUs, and other parameters.