Object storage alerts

Based on the metrics described in Object storage metrics, the object storage alerts are generated and displayed in the admin panel.

S3 Gateway alerts

S3 cluster has unavailable S3 Gateway services: Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface.
S3 Gateway service has high GET request latency: S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second
S3 Gateway service has critically high GET request latency: S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds.
S3 Gateway service has high cancel request rate: S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
S3 Gateway service has critically high cancel request rate: S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
S3 Gateway service has high CPU usage: S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded.
S3 Gateway service has critically high CPU usage: S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded.
S3 Gateway service has too many failed requests: S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code).

S3 Object service alerts

S3 cluster has unavailable object services: Some Object services are not running on <node>. Check the service status in the command-line interface.
Object service has high request latency: Object service (<service_id>) on <node> has the median request latency higher than 1 second.
Object service has critically high request latency: Object service (<service_id>) on <node> has the median request latency higher than 5 seconds.
Object service has high commit latency: Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
Object service has critically high commit latency: Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

S3 Name service alerts

S3 cluster has unavailable name services: Some Name services are not running on <node>. Check the service status in the command-line interface.
Name service has high request latency: Name service (<service_id>) on <node> has the median request latency higher than 1 second.
Name service has critically high request latency: Name service (<service_id>) on <node> has the median request latency higher than 5 seconds.
Name service has high commit latency: Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
Name service has critically high commit latency: Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

OSTOR agent alerts

Object storage agent is frozen for a long time: Object storage agent on <node> has the event loop inactive for more than 1 minute.
Object storage agent is offline: Object storage agent is offline on <node>.
Object storage agent is not connected to configuration service: Object storage agent failed to connect to the configuration service on <node>.

File service alerts

NFS service has unavailable FS services: Some File services are not running on <node>. Check the service status in the command-line interface.
NFS service failed to start: Object storage agent failed to start <service_name>(<service_id>) on <node>.
FSMDS failed to start: Object storage agent failed to start file service on <node>.
NFS service is experiencing some network problems: NFS service <service_name>, <service_id> on <hostname> has some RPC errors. Check your network configuration.
NFS service is experiencing many network problems: NFS service <service_name>, <service_id> on <hostname> has many RPC errors. Check your network configuration.

NDS service alerts

S3 NDS service has high notification processing error rate: S3 NDS service (<service_id>) on <node> has the notification processing error rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or an S3 topics misconfiguration.
S3 NDS service has critically high notification processing error rate: S3 NDS service (<service_id>) on <node> has the notification processing error rate higher than 15%. It may be caused by connectivity issues, requests timeouts, or an S3 topics misconfiguration.
S3 NDS service has high notification deletion error rate: S3 NDS service (<service_id>) on <node> has the notification deletion error rate higher than 5%. It may be caused by a storage misconfiguration, storage performance degradation, or other storage issues.
S3 NDS service has high notification repetition rate: S3 NDS service (<service_id>) on <node> has the notification repetition rate higher than 5%. It may be caused by a storage misconfiguration or other storage issues.
S3 NDS service has too many staged unprocessed notifications: S3 NDS service (<service_id>) on <node> has a lot of unprocessed notifications staged on the storage. It may be caused by connectivity or storage issues.
S3 NDS service has too many messages in simultaneous processing: S3 NDS service (<service_id>) on <node> has a lot of notifications in simultaneous processing on the endpoint. It may be caused by connectivity issues or an S3 topics misconfiguration.

Other S3 cluster alerts

S3 cluster misconfiguration: The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational. To ensure high availability, update the S3 cluster configuration, as described in the Knowledge Base at https://support.virtuozzo.com/hc/en-us/articles/27536517316753-Virtuozzo-Hybrid-Infrastructure-Alert-S3-cluster-misconfiguration.
S3 redundancy warning: S3 is set to failure domain "disk" even though there are enough available nodes. It is recommended to set the failure domain to "host" so that S3 can survive host failures in addition to disk failures.
S3 node is in the automatic maintenance mode: S3 services have been evacuated from <hostname> because of too many failed S3 requests. Check the service logs.
S3 service is frozen for a long time: S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute.
S3 service failed to start: Object storage agent failed to start <service_name>(<service_id>) on <node>.
S3 cluster has unavailable Geo-replication services: Some Geo-replication services are not running on <node>. Check the service status in the command-line interface.
S3 cluster has too many open file descriptors: There are more than 9000 open file descriptors on <node>. Please contact the technical support.
S3 service is experiencing some network problems: S3 service <service_name>, <service_id> on <hostname> has some RPC errors. Check your network configuration.
S3 service is experiencing many network problems: S3 service <service_name>, <service_id> on <hostname> has many RPC errors. Check your network configuration.