Compute alerts
Based on the metrics described in Compute metrics, the compute alerts are generated and displayed in the admin panel.
Compute service alerts
-
OpenStack Cinder API is down
-
OpenStack Block Storage (Cinder) API service is down.
-
Ensure that the
cinder_api
container is up on the management node by running:# docker ps --all | grep cinder_api
-
If the container is down, start it by running:
# docker start cinder_api
- Check the service log at /var/log/hci/cinder/cinder-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Cinder Scheduler is down
-
OpenStack Block Storage (Cinder) Scheduler agent is down on host <hostname>.
-
Ensure that the
cinder_scheduler
container is up on the specified node by running:# docker ps --all | grep cinder_scheduler
-
If the container is down, start it by running:
# docker start cinder_scheduler
- Check the service log at /var/log/hci/cinder/cinder-scheduler.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Cinder Volume agent is down
-
OpenStack Block Storage (Cinder) Volume agent is down on host <hostname>.
-
Ensure that the
cinder_volume
container is up on the specified node by running:# docker ps --all | grep cinder_volume
-
If the container is down, start it by running:
# docker start cinder_volume
- Check the service log at /var/log/hci/cinder/cinder-volume.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Glance API is down
-
OpenStack Image (Glance) API service is down.
-
Ensure that the
glance_api
container is up on the management node by running:# docker ps --all | grep glance_api
-
If the container is down, start it by running:
# docker start glance_api
- Check the service log at /var/log/hci/glance/glance-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Heat API is down
-
OpenStack Orchestration API service (Heat) is down.
-
Ensure that the
heat_api
container is up on the management node by running:# docker ps --all | grep heat_api
-
If the container is down, start it by running:
# docker start heat_api
- Check the service log at /var/log/hci/heat/heat-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Magnum API is down
-
OpenStack Container API service (Magnum) is down.
-
Ensure that the
magnum_api
container is up on the management node by running:# docker ps --all | grep magnum_api
-
If the container is down, start it by running:
# docker start magnum_api
- Check the service log at /var/log/hci/magnum/magnum-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Neutron API is down
-
OpenStack Networking API service (Neutron) is down.
-
Ensure that the
neutron_server
container is up on the management node by running:# docker ps --all | grep neutron_server
-
If the container is down, start it by running:
# docker start neutron_server
- Check the service log at /var/log/hci/neutron/neutron-server.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Neutron L3 agent is down
-
OpenStack Networking (Neutron) L3 agent is down on host <hostname>.
-
Ensure that the
neutron_l3_agent
container is up on the specified node by running:# docker ps --all | grep neutron_l3_agent
-
If the container is down, start it by running:
# docker start neutron_l3_agent
- Check the service log at /var/log/hci/neutron/neutron-l3-agent.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Neutron OpenvSwitch agent is down
-
OpenStack Networking (Neutron) OpenvSwitch agent is down on host <hostname>.
-
Ensure that the
neutron_openvswitch_agent
container is up on the specified node by running:# docker ps --all | grep neutron_openvswitch_agent
-
If the container is down, start it by running:
# docker start neutron_openvswitch_agent
- Check the service log at /var/log/hci/neutron/neutron-openvswitch-agent.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Neutron Metadata agent is down
-
OpenStack Networking (Neutron) Metadata agent is down on host <hostname>.
-
Ensure that the
neutron_metadata_agent
container is up on the specified node by running:# docker ps --all | grep neutron_metadata_agent
-
If the container is down, start it by running:
# docker start neutron_metadata_agent
- Check the service log at /var/log/hci/neutron/neutron-metadata-agent.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Neutron DHCP agent is down
-
OpenStack Networking (Neutron) DHCP agent is down on host <hostname>.
-
Ensure that the
neutron_dhcp_agent
container is up on the specified node by running:# docker ps --all | grep neutron_dhcp_agent
-
If the container is down, start it by running:
# docker start neutron_dhcp_agent
- Check the service log at /var/log/hci/neutron/neutron-dhcp-agent.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Nova API is down
-
OpenStack Compute (Nova) API service is down.
-
Ensure that the
nova_api
container is up on the management node by running:# docker ps --all | grep nova_api
-
If the container is down, start it by running:
# docker start nova_api
- Check the service log at /var/log/hci/nova/nova-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Nova Compute is down
-
OpenStack Compute (Nova) agent is down on host <hostname>.
-
Ensure that the
nova_compute
container is up on the specified node by running:# docker ps --all | grep nova_compute
-
If the container is down, start it by running:
# docker start nova_compute
- Check the service log at /var/log/hci/nova/nova-compute.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Nova Conductor is down
-
OpenStack Compute (Nova) Conductor agent is down on host <hostname>.
-
Ensure that the
nova_conductor
container is up on the specified node by running:# docker ps --all | grep nova_conductor
-
If the container is down, start it by running:
# docker start nova_conductor
- Check the service log at /var/log/hci/nova/nova-conductor.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Nova Scheduler is down
-
OpenStack Compute (Nova) Scheduler agent is down on host <hostname>.
-
Ensure that the
nova_scheduler
container is up on the specified node by running:# docker ps --all | grep nova_scheduler
-
If the container is down, start it by running:
# docker start nova_scheduler
- Check the service log at /var/log/hci/nova/nova-scheduler.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Octavia API is down
-
OpenStack Load Balancer API service (Octavia) is down.
-
Ensure that the
octavia_api
container is up on the management node by running:# docker ps --all | grep octavia_api
-
If the container is down, start it by running:
# docker start octavia_api
- Check the service log at /var/log/hci/octavia/octavia-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
OpenStack Placement API is down
-
OpenStack Placement API service is down.
-
Ensure that the
placement_api
container is up on the management node by running:# docker ps --all | grep placement_api
-
If the container is down, start it by running:
# docker start placement_api
- Check the service log at /var/log/hci/placement/placement-api.log.
- If you cannot troubleshoot the problem, contact the technical support team.
-
-
High request error rate for OpenStack API requests detected
-
Request error rate more than 5% detected for <object_id> for the last 1 hour. Check <object_id> resource usage.
- Check the status of the affected compute services.
- If some services are down, bring them up.
- If you cannot troubleshoot the problem, contact the technical support team.
Compute cluster alerts
-
Compute cluster has failed
-
Compute cluster has failed. Unable to manage virtual machines.
- Go to the Monitoring > Dashboard screen, and then click Grafana dashboard.
- Open the Compute service status dashboard and find out the failed service.
- Depending on the service, follow the instructions from the Compute service alerts section.
-
Cluster is running out of vCPU resources
-
Cluster has reached 80% of the vCPU allocation limit.
The compute cluster may soon experience the lack of vCPU resources that will lead to inability to accommodate new virtual machines. To avoid this, you can add more compute nodes or return to operation fenced nodes, if any.
-
Cluster is out of vCPU resources
-
Cluster has reached 95% of the vCPU allocation limit.
The compute cluster will soon experience the lack of vCPU resources that will lead to inability to accommodate new virtual machines. To avoid this, you can add more compute nodes or return to operation fenced nodes, if any.
-
Cluster is running out of memory
-
Cluster has reached 80% of the memory allocation limit.
The compute cluster may soon experience the lack of RAM resources that will lead to inability to accommodate new virtual machines. To avoid this, you can add more compute nodes or return to operation fenced nodes, if any.
-
Cluster is out of memory
-
Cluster has reached 95% of the memory allocation limit.
The compute cluster will soon experience the lack of RAM resources that will lead to inability to accommodate new virtual machines. To avoid this, you can add more compute nodes or return to operation fenced nodes, if any.
-
Virtual machine error
-
Virtual machine <name> with ID <id> is in the 'Error' state.
- Examine the VM history in the History tab on the VM right pane and reset the VM state, as described in Troubleshooting virtual machines.
- If you cannot troubleshoot the problem, contact the technical support team.
-
Virtual machine state mismatch
-
State of virtual machine <name> with ID <id> differs in the Nova databases and libvirt configuration.
Do not try to migrate the VM or reset its state. Contact the technical support team.
-
Volume attachment details mismatch
-
Attachment details for volume with ID <id> differ in the Nova and libvirt databases.
Do not try to migrate the VM or reset its state. Contact the technical support team.
Compute node alerts
-
Node is running out of vCPU resources
-
Node <node> with ID <id> has reached 80% of the vCPU allocation limit.
The compute node may soon experience the lack of vCPU resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
-
Node is out of vCPU resources
-
Node <node> with ID <id> has reached 95% of the vCPU allocation limit.
The compute node will soon experience the lack of vCPU resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
-
Node is running out of memory
-
Node <node> with ID <id> has reached 80% of the memory allocation limit.
The compute node may soon experience the lack of RAM resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
-
Node is out of memory
-
Node <node> with ID <id> has reached 95% of the memory allocation limit.
The compute node will soon experience the lack of RAM resources that will lead to inability to accommodate new virtual machines. To avoid this, check the distribution of VMs in the compute cluster, and then migrate the VMs from the specified node to less loaded compute nodes.
Project quota alerts
-
Project is out of vCPU resources
-
Project <name> has reached 95% of the vCPU allocation limit.
The project will soon experience the lack of vCPU resources that will lead to inability to create new virtual machines. To avoid this, add more vCPUs to the project quota.
-
Project is out of memory
-
Project <name> has reached 95% of the memory allocation limit.
The project will soon experience the lack of RAM resources that will lead to inability to create new virtual machines. To avoid this, add more RAM to the project quota.
-
Project is out of floating IP addresses
-
Project <name> has reached 95% of the floating IP address allocation limit.
The project will soon experience the lack of floating IP addresses that will lead to inability to assign them to virtual machines. To avoid this, add more floating IPs to the project quota.
-
Network is out of IP addresses
-
Network <name> with ID <id> in project <name> has reached 95% of the IP address allocation limit.
The network will soon experience the lack of IP addresses that will lead to inability to connect new virtual machines to this network. To avoid this, add more allocation pools to the network.
-
Project is out of storage policy space
-
Project <name> has reached 95% of the <policy_name> storage policy allocation limit.
The project will soon experience the lack of storage policy space that will lead to inability to create new compute volumes with this storage policy. To avoid this, add more storage space to the project quota.
Other alerts
-
Libvirt service is down
-
Libvirt service is down on node <node> with ID <id>. Check the service state and start it. If the service cannot start, contact the technical support.
Start the
libvirtd
service on the specified node by running:# systemctl start libvirtd.service
-
Docker service is down
-
Docker service is down on host <hostname>.
Start the Docker service on the specified node by running:
# systemctl start docker.service
-
RabbitMQ node is down
-
One or more nodes in the Rabbitmq cluster is down.
Contact the technical support team.
-
RabbitMQ split brain detected
-
RabbitMQ cluster has experienced a split brain due to a network partition.
Contact the technical support team.
-
PostgreSQL database size is greater than 30 GB
-
PostgreSQL database "<name>" on node "<hostname>" is greater than 30 GB in size. Verify that deleted entries are archived or contact the technical support.
-
PostgreSQL database uses more than 50% of node root partition
-
PostgreSQL databases on node "<hostname>" with ID "<id>" use more than 50% of node root partition. Verify that deleted entries are archived or contact the technical support.