Object storage alerts

Based on the metrics described in Object storage metrics, the object storage alerts are generated and displayed in the admin panel.

S3 Gateway alerts

S3 cluster has unavailable S3 Gateway services

Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface.

S3 Gateway service has high GET request latency

S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second

S3 Gateway service has critically high GET request latency

S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds.

S3 Gateway service has high cancel request rate

S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.

S3 Gateway service has critically high cancel request rate

S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.

S3 Gateway service has high CPU usage

S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded.

S3 Gateway service has critically high CPU usage

S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded.

S3 Gateway service has too many failed requests

S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code).

S3 Object service alerts

S3 cluster has unavailable object services

Some Object services are not running on <node>. Check the service status in the command-line interface.

Object service has high request latency

Object service (<service_id>) on <node> has the median request latency higher than 1 second.

Object service has critically high request latency

Object service (<service_id>) on <node> has the median request latency higher than 5 seconds.

Object service has high commit latency

Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.

Object service has critically high commit latency

Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

S3 Name service alerts

S3 cluster has unavailable name services

Some Name services are not running on <node>. Check the service status in the command-line interface.

Name service has high request latency

Name service (<service_id>) on <node> has the median request latency higher than 1 second.

Name service has critically high request latency

Name service (<service_id>) on <node> has the median request latency higher than 5 seconds.

Name service has high commit latency

Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.

Name service has critically high commit latency

Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

OSTOR agent alerts

Object storage agent is frozen for a long time

Object storage agent on <node> has the event loop inactive for more than 1 minute.

Object storage agent is offline

Object storage agent is offline on <node>.

Object storage agent is not connected to configuration service

Object storage agent failed to connect to the configuration service on <node>.

File service alerts

NFS service has unavailable FS services

Some File services are not running on <node>. Check the service status in the command-line interface.

NFS service failed to start

Object storage agent failed to start <service_name>(<service_id>) on <node>.

FS failed to start

Object storage agent failed to start file service on <node>.

NFS service is experiencing some network problems

NFS service <service_name>, <service_id> on <hostname> has some RPC errors. Check your network configuration.

NFS service is experiencing many network problems

NFS service <service_name>, <service_id> on <hostname> has many RPC errors. Check your network configuration.

NDS service alerts

S3 NDS service has high notification processing error rate

S3 NDS service (<service_id>) on <node> has the notification processing error rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or an S3 topics misconfiguration.

S3 NDS service has critically high notification processing error rate

S3 NDS service (<service_id>) on <node> has the notification processing error rate higher than 15%. It may be caused by connectivity issues, requests timeouts, or an S3 topics misconfiguration.

S3 NDS service has high notification deletion error rate

S3 NDS service (<service_id>) on <node> has the notification deletion error rate higher than 5%. It may be caused by a storage misconfiguration, storage performance degradation, or other storage issues.

S3 NDS service has high notification repetition rate

S3 NDS service (<service_id>) on <node> has the notification repetition rate higher than 5%. It may be caused by a storage misconfiguration or other storage issues.

S3 NDS service has too many staged unprocessed notifications

S3 NDS service (<service_id>) on <node> has a lot of unprocessed notifications staged on the storage. It may be caused by connectivity or storage issues.

S3 NDS service has too many messages in simultaneous processing

S3 NDS service (<service_id>) on <node> has a lot of notifications in simultaneous processing on the endpoint. It may be caused by connectivity issues or an S3 topics misconfiguration.

Other S3 cluster alerts

S3 cluster misconfiguration

The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational.

Redundancy warning

S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures.

S3 service is frozen for a long time

S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute.

S3 service failed to start

Object storage agent failed to start <service_name>(<service_id>) on <node>.

S3 cluster has unavailable Geo-replication services

Some Geo-replication services are not running on <node>. Check the service status in the command-line interface.

S3 cluster has too many open file descriptors

There are more than 9000 open file descriptors on <node>. Please contact the technical support.

S3 service is experiencing some network problems

S3 service <service_name>, <service_id> on <hostname> has some RPC errors. Check your network configuration.

S3 service is experiencing many network problems

S3 service <service_name>, <service_id> on <hostname> has many RPC errors. Check your network configuration.