Managing GPU aliases

GPU aliases represent physical GPUs or vGPU types configured in the compute cluster. To use these aliases for GPU metrics, they must be discovered on cluster nodes and synchronized to the database.

Limitations

  • An alias can be deleted only if its status is "unavailable".

Prerequisites

To synchronize GPU aliases on nodes

Use the following command:

vinfra service compute gpu-alias sync --node-ids <node-id> [<node-id> ...]
--node-ids <node-id> [<node-id> ...]
Specific node IDs to synchronize from (optional). If not provided, aliases are synchronized from all nodes.

For example, to synchronize GPU aliases on all cluster nodes, run:

# vinfra service compute gpu-alias sync
+---------------------+--------------+
| Field               | Value        |
+---------------------+--------------+
| discovered_aliases  | - gpu        |
| errors              | []           |
| existing_aliases    | - gpu        |
| new_aliases         | []           |
| unavailable_aliases | - nvidia-232 |
|                     | - nvidia-319 |
+---------------------+--------------+

To list GPU aliases

Use the following command:

vinfra service compute gpu-alias list

For example:

# vinfra service compute gpu-alias list
+------------+-------------+------------+
| alias      | status      | metrics    |
+------------+-------------+------------+
| nvidia-232 | unavailable |            |
| gpu        | available   | gpu.gpu    |
| nvidia-319 | unavailable | gpu.nvidia |
+------------+-------------+------------+

The output shows available aliases and the metric associated with each alias.

To delete a GPU alias

Use the following command:

vinfra service compute gpu-alias delete <alias>
<alias>
GPU alias name to delete

For example, to delete the alias nvidia-232, run:

# vinfra service compute gpu-alias delete nvidia-232