Infrastructure alerts
The following infrastructure alerts are generated and displayed in the admin panel:
Title | Message | Severity |
---|---|---|
License alerts | ||
License is not loaded | License is not installed. | warning |
License expired | The license of cluster “<cluster_name>” has expired. Сontact your reseller to update your license immediately! | critical |
Cluster alerts | ||
Cluster is out of space | Cluster has just <free_space> TB (<free_space_in_percent>%) of physical storage space left. You may want to free some space or add more storage capacity. | warning |
Сluster “<cluster_name>” has run out of storage space allowed by license. No more data can be written. Please contact your reseller to update your license immediately! | warning | |
Licensed storage capacity is low | Cluster has reached 80% of licensed storage capacity. | warning |
Licensed storage capacity is critically low | Cluster has reached 90% of licensed storage capacity. | critical |
Not enough cluster nodes | Cluster “<cluster_name>” has only {1,2} node(s) instead of the recommended minimum of 3. Add {2,1} or more nodes to the cluster. | warning |
High availability for the admin panel must be configured | Configure high availability for the admin panel in Settings > Management node. Otherwise the admin panel will be a single point of failure. | critical |
Management node backup does not exist | Management node backup is older than <number_of_days> days. | critical |
The last management node backup has failed, does not exist, or is too old. | critical | |
Changes to the management database are not replicated | Changes to the management database are not replicated to the node "<hostname>" because it is offline. Check the node's state and connectivity. | critical |
Changes to the management database are not replicated to the node "<hostname>". Please contact the technical support. | ||
Cluster connectivity alerts | ||
Cluster network connectivity problem | All nodes have network connectivity problems: unstable connectivity via network "<network_name>" due to packet loss. | critical |
All nodes have network connectivity problems: no connectivity via network "<network_name>". | critical | |
Node network connectivity problem | Node "<hostname>" has network connectivity problems: unstable connectivity via network "<network_name>" due to the loss of all MTU-sized packets. | critical |
Node "<hostname>" has network connectivity problems: unstable connectivity via network "<network_name>" due to the loss of some MTU-sized packets. | critical | |
Node "<hostname>" has network connectivity problems: unstable connectivity via network "<network_name>" due to packet loss. | critical | |
Node "<hostname>" has network connectivity problems: no connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>". | critical | |
Node "<hostname>" has network connectivity problems: unstable connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>" due to the loss of all MTU-sized packets. | critical | |
Node "<hostname>" has network connectivity problems: unstable connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>" due to packet loss. | critical | |
Node "<hostname>" has network connectivity problems: unstable connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>" due to the loss of some MTU-sized packets. | critical | |
MTU mismatch | Some interfaces have MTU that differs from other interfaces in the same network: network "<network_name>" interface@host "<iface>@<hostname>". | critical |
Node alerts | ||
Node is offline | Node “<hostname>” is offline. | warning |
Node got offline too many times | Node “<hostname>” got offline too many times last hour. | warning |
Kernel is outdated | Node “<hostname>” is not running the latest kernel. | warning |
OOM killer triggered | OOM killer has been triggered on node “<hostname>”. | warning |
Time is not synced | Time on node “<hostname>” differs from time on backend node by more than 5 seconds. | warning |
No Internet access | Cluster node <hostname> cannot reach the repository. Make sure that all cluster nodes have Internet access. | warning |
Incompatible hardware detected | Incompatible hardware detected on node "<hostname>": <hardware_list>. Using Mellanox and AMD may lead to data loss. Please double check that SR-IOV is properly enabled. | critical |
Swap space is running low | <swap_proportion>% of swap is used on node "<hostname>". | critical |
Node has high CPU usage | Node <hostname> has CPU usage higher than 90%. The current value is <value>%. | warning |
Node has high memory usage | Node <hostname> has memory usage higher than 95%. The current value is <value>%. | warning |
Node has high disk I/O usage | Disk /dev/<disk_name> on node <hostname> has I/O usage higher than 85%. The current value is <value>%. | warning |
Node has high receive packet loss rate | Node <hostname> has <value> receive packet loss rate reported by job <job_name>. | warning |
Node has high transmit packet loss rate | Node <hostname> has <value> transmit packet loss rate reported by job <job_name>. | warning |
Node has high receive packet error rate | Node <hostname> has <value> receive packet error rate reported by job <job_name>. | warning |
Node has high transmit packet error rate | Node <hostname> has <value> transmit packet error rate reported by job <job_name>. | warning |
Disk alerts | ||
S.M.A.R.T. warning | Disk “<disk_name>”(<serial>) on node “<hostname>” has failed a S.M.A.R.T. check. | critical |
Disk error | Disk “<disk_name>” (<serial>) failed on node “<hostname>”. | critical |
Disk is out of space | Root partition on node “<hostname>” is running out of space (less than 10% of free space). | warning |
Disk write cache is enabled | Disk write cache is enabled for disk “<disk_name>” on node “<hostname>”. Disable it to avoid potential data loss in case of a power outage. | warning |
Disk write cache status unknown | Cannot determine the status of write cache for disk “<disk_name>” on node “<hostname>”. | warning |
Software RAID is not fully synced | Software RAID <disk_name> on node <hostname> is <value>% synced. | warning |
Systemd service is flapping | Systemd service <service_name> on node <hostname> has changed its state more than 5 times in 5 minutes or 15 times in one hour. | critical |
Network alerts | ||
Network warning | Network interface “<iface_name>” has incorrect settings: <duplex> duplex and <speed> speed. | warning |
Network interface “<iface_name>” on node “<hostname>” is missing important features (or has them disabled): “<feature_name>”. | warning | |
Network interface “<iface_name>” on node “<hostname>” is not in the full duplex mode. | warning | |
Network interface “<iface_name>” on node “<hostname>” has speed lower than the minimally required 1 Gbps. | warning | |
Network interface “<iface_name>” on node “<hostname>” has an undefined speed. | warning | |
Network interface is flapping | Network interface <iface_name> on node <hostname> is flapping. | warning |
Network bond is not redundant | Network bond <iface_name> on node <hostname> is missing <number_of_ifaces> subordinate interface(s). | critical |
Update alerts | ||
Software updates exist | Software updates exist for the node <hostname>. Current version: <current_version>. Available version: <available_version>. | information |
Update check failed | Update check failed on the node <hostname>. Please check access to the update repository. | warning |
Multiple update checks failed | Update checks failed multiple times on the node <hostname>. Please check access to the update repository. | critical |
Update download failed | Update download failed on the node <hostname>. | critical |
Node update failed | Software update failed on the node <hostname>. | critical |
Update failed | Update failed for the management panel and compute API. | critical |
Cluster update failed | Update failed for the cluster. | critical |
Entering maintenance for update failed | Entering maintenance failed while updating the node <hostname>. | critical |
Service alerts | ||
Compute cluster has failed | Compute cluster has failed. Unable to manage virtual machines. | critical |
Certificate expiration | Acronis Backup Gateway certificate has expired. All backup operations have been stopped. Update the certificate on the Backup Gateway screen. | critical |
Acronis Backup Gateway certificate will expire soon. Update the certificate on the Backup Gateway screen. | warning | |
Acronis Backup Gateway certificate will expire on "<expiration_date>". Update the certificate on the Backup Gateway screen. | ||
Redundancy warning | iSCSI LUN <lun_id> of target group “<target_group>” is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that the LUN can survive host failures in addition to disk failures. | warning |
iSCSI major upgrade failed | iSCSI major upgrade failed. Will be retried… | critical |
NFS service has unavailable FS services | Some File services are not running on <node>. Check the service status in the command-line interface. | warning |