Core storage alerts

Based on the metrics listed in Core storage metrics, the following core storage alerts are generated and displayed in the admin panel:

Title Message Severity
Metadata service alerts
Not enough metadata disks Cluster “<cluster_name>” has only one MDS. There is only one disk with the metadata role at the moment. Losing this disk will completely destroy all cluster data irrespective of the redundancy schema. critical
Cluster “<cluster_name>” requires more disks with the metadata role. Losing one more MDS will halt cluster operation. warning
Configuration warning Node “<hostname>” has more than one metadata service located on it. It is recommended to have only one metadata service per node. Delete the extra metadata service(s) from this node and create them on other nodes instead. warning
Cluster “<cluster_name>” has four metadata services. This configuration slows down the cluster performance and does not improve its availability. For a cluster of four nodes, it is enough to configure three MDSes. Delete an extra MDS from one of the cluster nodes.
Cluster “<cluster_name>” has more than five metadata services. This configuration slows down the cluster performance and does not improve its availability. For a large cluster, it is enough to configure five MDSes. Delete extra MDSes from the cluster nodes.
Service failed Metadata service #<id> is in the “<status>” state. Node: <hostname>. Disk: <disk_name>. Disk serial: <disk_serial>. warning
Metadata disk is out of space Metadata disk on node “<hostname>” is running out of space. warning
Metadata service has high CPU usage Metadata service on <node> has CPU usage higher than 80%. The service may be overloaded. warning
Metadata service has high commit latency Metadata service on <node> has the 95th percentile latency higher than 1 second. warning
Metadata service has critically high commit latency Metadata service on <node> has the 95th percentile latency higher than 5 seconds. critical
Cluster has unavailable metadata services Some metadata services are offline or have failed. Check and restart them. warning
Master metadata service changes too often Master metadata service has changed more than once in 5 minutes. warning
Chunk service alerts
Not enough disks with storage role Cluster “<cluster_name>” has no disks with the storage role. warning
Cluster “<cluster_name>” has too few available CSes. warning
Service failed Storage service #<id> is in the “<status>” state. Node: <hostname>. Disk: <disk_name>. Disk serial: <disk_serial>. warning
CS configuration is not optimal CS#<cs_id> on tier <tier> has incorrect journalling settings. warning
Encryption is disabled for CS#<cs_id> on tier <tier> but is enabled for other CSes on the same tier. warning
Storage disk is slow Disk <disk_name> (CS#<cs_id>) on node <hostname> is slow and needs to be replaced. warning
Disk cache settings are not optimal Disk <disk_name> (CS#<cs_id> on node <hostname> has cache settings different from other disks of the same tier. warning
Cluster has slow chunk services Some chunk services experience slowdown and degrade the cluster performance. warning
Cluster has offline chunk services Some chunk services are offline. Check and restart them. warning
Cluster has failed chunk services Some chunk services have failed. It may be caused by physical drive failure. warning
Storage cluster alerts
Cluster is running out of physical space There is little free physical space left on storage tier <tier>. warning
Cluster is out of physical space There is not enough free physical space on storage tier <tier>. critical
Node has stuck I/O requests Some I/O requests are stuck on <node>. critical
Cluster has blocked or slow replication Chunk replication is blocked or too slow. critical
Node has failed map requests Some map requests on <node> have failed. critical
Cluster has too many chunks There are too many chunks in the cluster, which slows down the metadata service. warning
Cluster has critically high number of chunks There are too many chunks in the cluster, which slows down the metadata service. critical
Cluster has too many files There are too many files in the cluster, which slows down the metadata service. warning
Cluster has critically high number of files There are too many files in the cluster, which slows down the metadata service. critical
Cluster has failed mount points Some mount points stopped working and need to be recovered. critical
Cluster has unaligned I/O reads I/O reads are not aligned. It may be caused by a wrongly formatted disk in a virtual machine. information
CS journal is running out of space CS journal has less than 20% of free space left. warning