Monitoring the compute cluster

After you create the compute cluster, you can monitor its status and statistics. Additionally, you can monitor separate compute nodes, virtual machines, and load balancers.

To view the compute cluster status

Click the cluster name at the bottom of the left menu. It can be one of the following:

Healthy
All compute cluster components and nodes operate normally.
Configuring
The compute cluster configuration (the default CPU model for VMs or the number of compute nodes) is changing.
Warning
The compute cluster operates normally but some issues have been detected.
Critical
The compute cluster has encountered a critical problem and is not operational.

To view the compute cluster statistics

Admin panel

Go to the Compute > Overview screen, which has the following charts:

Command-line interface

Use the following command:

# vinfra service compute stat
+----------------+----------------------------------------------+
| Field          | Value                                        |
+----------------+----------------------------------------------+
| backup_plans   | count: 1                                     |
|                | scheduled: 1                                 |
| backups        | available: 1                                 |
|                | count: 1                                     |
| compute        | block_capacity: 2147483648                   |
|                | block_usage: 543162368                       |
|                | cpu_allocation_ratio: 8                      |
|                | cpu_usage: 0.07                              |
|                | ram_allocation_ratio: 1.0                    |
|                | vcpus: 2                                     |
|                | vcpus_free: 38                               |
|                | vm_mem_capacity: 48200712192                 |
|                | vm_mem_free: 47126970368                     |
|                | vm_mem_reserved: 1073741824                  |
|                | vm_mem_usage: 201162752                      |
| datetime       | 2025-02-24T15:07:33.576963                   |
| fenced         | physical_cpu_cores: 0                        |
|                | physical_cpu_usage: 0                        |
|                | physical_mem_total: 0                        |
|                | reserved_memory: 0                           |
|                | vcpus: 0                                     |
|                | vm_mem_capacity: 0                           |
| floating_ips   | active: 1                                    |
|                | count: 1                                     |
| images         | active: 1                                    |
|                | count: 1                                     |
| k8s_clusters   | count: 0                                     |
| load_balancers | count: 0                                     |
| networks       | active: 6                                    |
|                | count: 6                                     |
| physical       | block_capacity: 807980261376                 |
|                | block_free: 804296740864                     |
|                | cpu_cores: 12                                |
|                | cpu_usage: 10.99                             |
|                | mem_total: 74789638144                       |
|                | vcpus_total: 96                              |
| ports          | active: 19                                   |
|                | count: 20                                    |
|                | n/a: 1                                       |
| reserved       | cpus: 7                                      |
|                | memory: 26588925952                          |
|                | vcpus: 56                                    |
| routers        | active: 3                                    |
|                | count: 3                                     |
| servers        | active: 1                                    |
|                | count: 2                                     |
|                | error: 0                                     |
|                | in_progress: 0                               |
|                | running: 1                                   |
|                | shutoff: 1                                   |
|                | stopped: 1                                   |
|                | top:                                         |
|                |   disk:                                      |
|                |   - id: 6347a196-62aa-4f20-8b48-435e2c2a5bb9 |
|                |     name: cirros                             |
|                |     size: 274726912                          |
|                |   - id: 784bfe4c-5bae-4811-ad59-c52d5d62c66b |
|                |     name: test                               |
|                |     size: 268435456                          |
|                |   memory:                                    |
|                |   - id: 784bfe4c-5bae-4811-ad59-c52d5d62c66b |
|                |     name: test                               |
|                |     size: 201162752                          |
|                |   - id: 6347a196-62aa-4f20-8b48-435e2c2a5bb9 |
|                |     name: cirros                             |
|                |     size: 0                                  |
|                |   vcpus:                                     |
|                |   - count: 0.01                              |
|                |     id: 784bfe4c-5bae-4811-ad59-c52d5d62c66b |
|                |     name: test                               |
|                |   - count: 0                                 |
|                |     id: 6347a196-62aa-4f20-8b48-435e2c2a5bb9 |
|                |     name: cirros                             |
| snapshots      | available: 1                                 |
|                | count: 1                                     |
| stacks         | count: 0                                     |
| volumes        | available: 2                                 |
|                | count: 4                                     |
|                | in-use: 2                                    |
| vpns           | count: 0                                     |
+----------------+----------------------------------------------+

To view more details about the compute cluster

Go to the Monitoring > Dashboard screen, and then click Grafana dashboard. A separate browser tab will open with preconfigured Grafana dashboards.

The Compute service status dashboard shows the status of the compute services and agents on all of the compute nodes. You can sort the displayed services per hostname, service name, and service status.

On the Compute resource details dashboard, you can monitor all existing virtual objects in the compute cluster by status over time.

For the detailed monitoring of the compute resource allocation, use the Compute resource allocation dashboard. The charts on this dashboard show resource quotas, allocation usage, and ratio over time. You can view the statistics for all domains and projects, or filter the data per specific domain or project.

The Compute vCPU/RAM allocation and overcommitment ratio dashboard helps identify discrepancies between the expected and actual resource usage across compute nodes by displaying vCPU and RAM allocation dynamics per node reported by the Placement service and the hypervisor. It also shows the total amount of resources reserved for system services, as well as the percentage of vCPUs and RAM used by or allocated to VMs relative to the total available resources, excluding system reservations.

To monitor the compute API requests, use the Compute service API details dashboard. The charts on this dashboard show the rate of successful and failed requests, as well as the 95th and 99th percentiles of response time, per 10-minute intervals. You can filter the displayed requests per compute service. The most important charts here are those of error request rate and response time. If you see spikes on them, you need to check the status of the corresponding services.

The Compute RPC dashboard displays details on Remote Procedure Call (RPC) requests across the compute services. The RabbitMQ nodes, RabbitMQ messages, and RabbitMQ clients dashboards are intended for troubleshooting the RabbitMQ cluster by the support team. The PostgreSQL overview dashboard shows information about the PostgreSQL database size and replication status, as well as other database details. To see a detailed description for each chart, click the i icon in its left corner.