Managing GPU aliases
GPU aliases represent physical GPUs or vGPU types configured in the compute cluster. To use these aliases for GPU metrics, they must be discovered on cluster nodes and synchronized to the database.
Limitations
- An alias can be deleted only if its status is "unavailable".
Prerequisites
- GPU passthrough and/or vGPUs are configured in the compute cluster, as described in Configuring GPU passthrough and Configuring GPU virtualization.
To synchronize GPU aliases on nodes
Use the following command:
vinfra service compute gpu-alias sync --node-ids <node-id> [<node-id> ...]
--node-ids <node-id> [<node-id> ...]- Specific node IDs to synchronize from (optional). If not provided, aliases are synchronized from all nodes.
For example, to synchronize GPU aliases on all cluster nodes, run:
# vinfra service compute gpu-alias sync +---------------------+--------------+ | Field | Value | +---------------------+--------------+ | discovered_aliases | - gpu | | errors | [] | | existing_aliases | - gpu | | new_aliases | [] | | unavailable_aliases | - nvidia-232 | | | - nvidia-319 | +---------------------+--------------+
To list GPU aliases
Use the following command:
vinfra service compute gpu-alias list
For example:
# vinfra service compute gpu-alias list +------------+-------------+------------+ | alias | status | metrics | +------------+-------------+------------+ | nvidia-232 | unavailable | | | gpu | available | gpu.gpu | | nvidia-319 | unavailable | gpu.nvidia | +------------+-------------+------------+
The output shows available aliases and the metric associated with each alias.
To delete a GPU alias
Use the following command:
vinfra service compute gpu-alias delete <alias>
<alias>- GPU alias name to delete
For example, to delete the alias nvidia-232, run:
# vinfra service compute gpu-alias delete nvidia-232