Compute alerts

Based on the metrics described in Compute metrics, the compute alerts are generated and displayed in the admin panel.

Compute service alerts

Keystone API service is down

OpenStack service API upstream is down

All OpenStack service API upstreams are down

OpenStack Cinder Scheduler is down

OpenStack Cinder Volume agent is down

OpenStack Neutron L3 agent is down

OpenStack Neutron OpenvSwitch agent is down

OpenStack Neutron Metadata agent is down

OpenStack Neutron DHCP agent is down

OpenStack Nova Compute is down

OpenStack Nova Conductor is down

OpenStack Nova Scheduler is down

OpenStack Octavia Provisioning Worker v1 is down

OpenStack Octavia Provisioning Worker v2 is down

OpenStack Octavia Housekeeping service is down

OpenStack Octavia HealthManager service is down

High request error rate for OpenStack API requests detected

Compute cluster alerts

Compute cluster has failed

Cluster is running out of vCPU resources

Cluster is out of vCPU resources

Cluster is running out of memory

Cluster is out of memory

Virtual machine error

Virtual machine state mismatch

Volume attachment details mismatch

Virtual network port check failed

Compute node alerts

Node is running out of vCPU resources: Node <node> with ID <id> has reached 80% of the vCPU allocation limit.

The compute node may soon experience the lack of vCPU resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
Node is out of vCPU resources: Node <node> with ID <id> has reached 95% of the vCPU allocation limit.

The compute node will soon experience the lack of vCPU resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
Node is running out of memory: Node <node> with ID <id> has reached 80% of the memory allocation limit.

The compute node may soon experience the lack of RAM resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
Node is out of memory: Node <node> with ID <id> has reached 95% of the memory allocation limit.

The compute node will soon experience the lack of RAM resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
Node had a fenced state for 1 hour: For the last 2 hours node <node> with ID <id> had a fenced state at least for 1 hour.

Domain quota alerts

Domain is out of vCPU resources: Domain <name> has reached <value_80<95>% of the vCPU allocation limit.

The domain will soon experience the lack of vCPU resources that will lead to inability to create new virtual machines. To avoid this, add more vCPUs to the domain quota.
Domain is out of vCPU resources: Domain <name> has reached <value_>=95>% of the vCPU allocation limit.

The domain will soon experience the lack of vCPU resources that will lead to inability to create new virtual machines. To avoid this, add more vCPUs to the domain quota.
Domain is out of memory: Domain <name> has reached <value_80<95>% of the memory allocation limit.

The domain will soon experience the lack of RAM resources that will lead to inability to create new virtual machines. To avoid this, add more RAM to the domain quota.
Domain is out of memory: Domain <name> has reached <value_>=95>% of the memory allocation limit.

The domain will soon experience the lack of RAM resources that will lead to inability to create new virtual machines. To avoid this, add more RAM to the domain quota.
Domain is out of storage policy space: Domain <name> has reached <value_80<95>% of the <policy_name> storage policy allocation limit.

The domain will soon experience the lack of storage policy space that will lead to inability to create new compute volumes with this storage policy. To avoid this, add more storage space to the domain quota.
Domain is out of storage policy space: Domain <name> has reached <value_>=95>% of the <policy_name> storage policy allocation limit.

The domain will soon experience the lack of storage policy space that will lead to inability to create new compute volumes with this storage policy. To avoid this, add more storage space to the domain quota.

Project quota alerts

Project is out of vCPU resources: Project <name> has reached 95% of the vCPU allocation limit.

The project will soon experience the lack of vCPU resources that will lead to inability to create new virtual machines. To avoid this, add more vCPUs to the project quota.
Project is out of memory: Project <name> has reached 95% of the memory allocation limit.

The project will soon experience the lack of RAM resources that will lead to inability to create new virtual machines. To avoid this, add more RAM to the project quota.
Project is out of floating IP addresses: Project <name> has reached 95% of the floating IP address allocation limit.

The project will soon experience the lack of floating IP addresses that will lead to inability to assign them to virtual machines. To avoid this, add more floating IPs to the project quota.
Network is out of IP addresses: Network <name> with ID <id> in project <name> has reached 95% of the IP address allocation limit.

The network will soon experience the lack of IP addresses that will lead to inability to connect new virtual machines to this network. To avoid this, add more allocation pools to the network.
Project is out of storage policy space: Project <name> has reached 95% of the <policy_name> storage policy allocation limit.

The project will soon experience the lack of storage policy space that will lead to inability to create new compute volumes with this storage policy. To avoid this, add more storage space to the project quota.

Other alerts

Libvirt service is down

Docker service is down

RabbitMQ node is down

RabbitMQ split brain detected

PostgreSQL database size is greater than 30 GB

PostgreSQL database "<name>" on node "<hostname>" is greater than 30 GB in size. Verify that deleted entries are archived or contact the technical support.

PostgreSQL database uses more than 50% of node root partition

PostgreSQL databases on node "<hostname>" with ID "<id>" use more than 50% of node root partition. Verify that deleted entries are archived or contact the technical support.

Kafka SSL CA certificate will expire in less than 30 days

Kafka SSL CA certificate will expire in <number> days. Please renew the certificate.

Kafka SSL CA certificate has expired

Kafka SSL CA certificate has expired. Please renew the certificate.

Kafka SSL client certificate will expire in less than 30 days

Kafka SSL client certificate will expire in <number> days. Please renew the certificate.

Kafka SSL client certificate has expired

Kafka SSL client certificate has expired. Please renew the certificate.