Object storage alerts

Based on the metrics described in Object storage metrics, the object storage alerts are generated and displayed in the admin panel.

S3 Gateway alerts

S3 cluster has unavailable S3 Gateway services

Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface.

S3 Gateway service has high GET request latency

S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second

S3 Gateway service has critically high GET request latency

S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds.

S3 Gateway service has high cancel request rate

S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.

S3 Gateway service has critically high cancel request rate

S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.

S3 Gateway service has high CPU usage

S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded.

S3 Gateway service has critically high CPU usage

S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded.

S3 Gateway service has too many failed requests

S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code).

S3 Object service alerts

S3 cluster has unavailable object services

Some Object services are not running on <node>. Check the service status in the command-line interface.

Object service has high request latency

Object service (<service_id>) on <node> has the median request latency higher than 1 second.

Object service has critically high request latency

Object service (<service_id>) on <node> has the median request latency higher than 5 seconds.

Object service has high commit latency

Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.

Object service has critically high commit latency

Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

S3 Name service alerts

S3 cluster has unavailable name services

Some Name services are not running on <node>. Check the service status in the command-line interface.

Name service has high request latency

Name service (<service_id>) on <node> has the median request latency higher than 1 second.

Name service has critically high request latency

Name service (<service_id>) on <node> has the median request latency higher than 5 seconds.

Name service has high commit latency

Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.

Name service has critically high commit latency

Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

OSTOR agent alerts

Object storage agent is frozen for a long time

Object storage agent on <node> has the event loop inactive for more than 1 minute.

Object storage agent is offline

Object storage agent is offline on <node>.

Object storage agent is not connected to configuration service

Object storage agent failed to connect to the configuration service on <node>.

File service alerts

NFS service has unavailable FS services

Some File services are not running on <node>. Check the service status in the command-line interface.

NFS service failed to start

Object storage agent failed to start <service_name>(<service_id>) on <node>.

FS failed to start

Object storage agent failed to start file service on <node>.

Other S3 cluster alerts

S3 cluster misconfiguration

The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational.

Redundancy warning

S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures.

S3 service is frozen for a long time

S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute.

S3 service failed to start

Object storage agent failed to start <service_name>(<service_id>) on <node>.

S3 cluster has unavailable Geo-replication services

Some Geo-replication services are not running on <node>. Check the service status in the command-line interface.

S3 cluster has too many open file descriptors

There are more than 9000 open file descriptors on <node>. Please contact the technical support.