Infrastructure alerts – Virtuozzo Hybrid Infrastructure

License alerts

License is not loaded

License is not installed.

warning

License expired

The license of cluster “<cluster_name>” has expired. Сontact your reseller to update your license immediately!

critical

Cluster alerts

Cluster is out of space

Cluster has just <free_space> TB (<free_space_in_percent>%) of physical storage space left. You may want to free some space or add more storage capacity.

warning

Сluster “<cluster_name>” has run out of storage space allowed by license. No more data can be written. Please contact your reseller to update your license immediately!

warning

Licensed storage capacity is low

Cluster has reached 80% of licensed storage capacity.

warning

Licensed storage capacity is critically low

Cluster has reached 90% of licensed storage capacity.

critical

Not enough cluster nodes

Cluster “<cluster_name>” has only {1,2} node(s) instead of the recommended minimum of 3. Add {2,1} or more nodes to the cluster.

warning

High availability for the admin panel must be configured

Configure high availability for the admin panel in Settings > Management node. Otherwise the admin panel will be a single point of failure.

critical

Management node backup does not exist

Management node backup is older than <number_of_days> days.

critical

The last management node backup has failed, does not exist, or is too old.

critical

Changes to the management database are not replicated

Changes to the management database are not replicated to the node "<hostname>" because it is offline. Check the node's state and connectivity.

critical

Changes to the management database are not replicated to the node "<hostname>". Please contact the technical support.

Cluster connectivity alerts

Cluster network connectivity problem

All nodes have network connectivity problems: unstable connectivity via network "<network_name>" due to packet loss.

critical

All nodes have network connectivity problems: no connectivity via network "<network_name>".

critical

Node network connectivity problem

Node "<hostname>" has network connectivity problems: unstable connectivity via network "<network_name>" due to the loss of all MTU-sized packets.

critical

Node "<hostname>" has network connectivity problems: unstable connectivity via network "<network_name>" due to the loss of some MTU-sized packets.

critical

Node "<hostname>" has network connectivity problems: unstable connectivity via network "<network_name>" due to packet loss.

critical

Node "<hostname>" has network connectivity problems: no connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>".

critical

Node "<hostname>" has network connectivity problems: unstable connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>" due to the loss of all MTU-sized packets.

critical

Node "<hostname>" has network connectivity problems: unstable connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>" due to packet loss.

critical

Node "<hostname>" has network connectivity problems: unstable connectivity to node "<hostname>" with interface "<iface>" via interface "<iface>" due to the loss of some MTU-sized packets.

critical

MTU mismatch

Some interfaces have MTU that differs from other interfaces in the same network: network "<network_name>" interface@host "<iface>@<hostname>".

critical

Node alerts

Node is offline

Node “<hostname>” is offline.

warning

Node got offline too many times

Node “<hostname>” got offline too many times last hour.

warning

Kernel is outdated

Node “<hostname>” is not running the latest kernel.

warning

OOM killer triggered

OOM killer has been triggered on node “<hostname>”.

warning

Time is not synced

Time on node “<hostname>” differs from time on backend node by more than 5 seconds.

warning

No Internet access

Cluster node <hostname> cannot reach the repository. Make sure that all cluster nodes have Internet access.

warning

Incompatible hardware detected

Incompatible hardware detected on node "<hostname>": <hardware_list>. Using Mellanox and AMD may lead to data loss. Please double check that SR-IOV is properly enabled.

critical

Swap space is running low

<swap_proportion>% of swap is used on node "<hostname>".

critical

Node has high CPU usage

Node <hostname> has CPU usage higher than 90%. The current value is <value>%.

warning

Node has high memory usage

Node <hostname> has memory usage higher than 95%. The current value is <value>%.

warning

Node has high disk I/O usage

Disk /dev/<disk_name> on node <hostname> has I/O usage higher than 85%. The current value is <value>%.

warning

Node has high receive packet loss rate

Node <hostname> has <value> receive packet loss rate reported by job <job_name>.

warning

Node has high transmit packet loss rate

Node <hostname> has <value> transmit packet loss rate reported by job <job_name>.

warning

Node has high receive packet error rate

Node <hostname> has <value> receive packet error rate reported by job <job_name>.

warning

Node has high transmit packet error rate

Node <hostname> has <value> transmit packet error rate reported by job <job_name>.

warning

Disk alerts

S.M.A.R.T. warning

Disk “<disk_name>”(<serial>) on node “<hostname>” has failed a S.M.A.R.T. check.

critical

Disk error

Disk “<disk_name>” (<serial>) failed on node “<hostname>”.

critical

Disk is out of space

Root partition on node “<hostname>” is running out of space (less than 10% of free space).

warning

Disk write cache is enabled

Disk write cache is enabled for disk “<disk_name>” on node “<hostname>”. Disable it to avoid potential data loss in case of a power outage.

warning

Disk write cache status unknown

Cannot determine the status of write cache for disk “<disk_name>” on node “<hostname>”.

warning

Software RAID is not fully synced

Software RAID <disk_name> on node <hostname> is <value>% synced.

warning

Systemd service is flapping

Systemd service <service_name> on node <hostname> has changed its state more than 5 times in 5 minutes or 15 times in one hour.

critical

Network alerts

Network warning

Network interface “<iface_name>” has incorrect settings: <duplex> duplex and <speed> speed.

warning

Network interface “<iface_name>” on node “<hostname>” is missing important features (or has them disabled): “<feature_name>”.

warning

Network interface “<iface_name>” on node “<hostname>” is not in the full duplex mode.

warning

Network interface “<iface_name>” on node “<hostname>” has speed lower than the minimally required 1 Gbps.

warning

Network interface “<iface_name>” on node “<hostname>” has an undefined speed.

warning

Network interface is flapping

Network interface <iface_name> on node <hostname> is flapping.

warning

Network bond is not redundant

Network bond <iface_name> on node <hostname> is missing <number_of_ifaces> subordinate interface(s).

critical

Update alerts

Software updates exist

Software updates exist for the node <hostname>. Current version: <current_version>. Available version: <available_version>.

information

Update check failed

Update check failed on the node <hostname>. Please check access to the update repository.

warning

Multiple update checks failed

Update checks failed multiple times on the node <hostname>. Please check access to the update repository.

critical

Update download failed

Update download failed on the node <hostname>.

critical

Node update failed

Software update failed on the node <hostname>.

critical

Update failed

Update failed for the management panel and compute API.

critical

Cluster update failed

Update failed for the cluster.

critical

Entering maintenance for update failed

Entering maintenance failed while updating the node <hostname>.

critical

Service alerts

Compute cluster has failed

Compute cluster has failed. Unable to manage virtual machines.

critical

Certificate expiration

Acronis Backup Gateway certificate has expired. All backup operations have been stopped. Update the certificate on the Backup Gateway screen.

critical

Acronis Backup Gateway certificate will expire soon. Update the certificate on the Backup Gateway screen.

warning

Acronis Backup Gateway certificate will expire on "<expiration_date>". Update the certificate on the Backup Gateway screen.

Redundancy warning

iSCSI LUN <lun_id> of target group “<target_group>” is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that the LUN can survive host failures in addition to disk failures.

warning

iSCSI major upgrade failed

iSCSI major upgrade failed. Will be retried…

critical

NFS service has unavailable FS services

Some File services are not running on <node>. Check the service status in the command-line interface.

warning