Core storage alerts – Virtuozzo Hybrid Infrastructure

Metadata service alerts

Not enough metadata disks

Cluster “<cluster_name>” has only one MDS. There is only one disk with the metadata role at the moment. Losing this disk will completely destroy all cluster data irrespective of the redundancy schema.

critical

Cluster “<cluster_name>” requires more disks with the metadata role. Losing one more MDS will halt cluster operation.

warning

Configuration warning

Node “<hostname>” has more than one metadata service located on it. It is recommended to have only one metadata service per node. Delete the extra metadata service(s) from this node and create them on other nodes instead.

warning

Cluster “<cluster_name>” has four metadata services. This configuration slows down the cluster performance and does not improve its availability. For a cluster of four nodes, it is enough to configure three MDSes. Delete an extra MDS from one of the cluster nodes.

Cluster “<cluster_name>” has more than five metadata services. This configuration slows down the cluster performance and does not improve its availability. For a large cluster, it is enough to configure five MDSes. Delete extra MDSes from the cluster nodes.

Service failed

Metadata service #<id> is in the “<status>” state. Node: <hostname>. Disk: <disk_name>. Disk serial: <disk_serial>.

warning

Metadata disk is out of space

Metadata disk on node “<hostname>” is running out of space.

warning

Metadata service has high CPU usage

Metadata service on <node> has CPU usage higher than 80%. The service may be overloaded.

warning

Metadata service has high commit latency

Metadata service on <node> has the 95th percentile latency higher than 1 second.

warning

Metadata service has critically high commit latency

Metadata service on <node> has the 95th percentile latency higher than 5 seconds.

critical

Cluster has unavailable metadata services

Some metadata services are offline or have failed. Check and restart them.

warning

Master metadata service changes too often

Master metadata service has changed more than once in 5 minutes.

warning

Chunk service alerts

Not enough disks with storage role

Cluster “<cluster_name>” has no disks with the storage role.

warning

Cluster “<cluster_name>” has too few available CSes.

warning

Service failed

Storage service #<id> is in the “<status>” state. Node: <hostname>. Disk: <disk_name>. Disk serial: <disk_serial>.

warning

CS configuration is not optimal

CS#<cs_id> on tier <tier> has incorrect journalling settings.

warning

Encryption is disabled for CS#<cs_id> on tier <tier> but is enabled for other CSes on the same tier.

warning

Storage disk is slow

Disk <disk_name> (CS#<cs_id>) on node <hostname> is slow and needs to be replaced.

warning

Disk cache settings are not optimal

Disk <disk_name> (CS#<cs_id> on node <hostname> has cache settings different from other disks of the same tier.

warning

Cluster has slow chunk services

Some chunk services experience slowdown and degrade the cluster performance.

warning

Cluster has offline chunk services

Some chunk services are offline. Check and restart them.

warning

Cluster has failed chunk services

Some chunk services have failed. It may be caused by physical drive failure.

warning

Storage cluster alerts

Cluster is running out of physical space

There is little free physical space left on storage tier <tier>.

warning

Cluster is out of physical space

There is not enough free physical space on storage tier <tier>.

critical

Node has stuck I/O requests

Some I/O requests are stuck on <node>.

critical

Cluster has blocked or slow replication

Chunk replication is blocked or too slow.

critical

Node has failed map requests

Some map requests on <node> have failed.

critical

Cluster has too many chunks

There are too many chunks in the cluster, which slows down the metadata service.

warning

Cluster has critically high number of chunks

There are too many chunks in the cluster, which slows down the metadata service.

critical

Cluster has too many files

There are too many files in the cluster, which slows down the metadata service.

warning

Cluster has critically high number of files

There are too many files in the cluster, which slows down the metadata service.

critical

Cluster has failed mount points

Some mount points stopped working and need to be recovered.

critical

Cluster has unaligned I/O reads

I/O reads are not aligned. It may be caused by a wrongly formatted disk in a virtual machine.

information

CS journal is running out of space

CS journal has less than 20% of free space left.

warning