Object storage alerts

Based on the metrics described in Object storage metrics, the object storage alerts are generated and displayed in the admin panel.

S3 Gateway alerts

S3 cluster has unavailable S3 Gateway services: Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface.
S3 Gateway service has high GET request latency: S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second
S3 Gateway service has critically high GET request latency: S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds.
S3 Gateway service has high cancel request rate: S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
S3 Gateway service has critically high cancel request rate: S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
S3 Gateway service has high CPU usage: S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded.
S3 Gateway service has critically high CPU usage: S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded.
S3 Gateway service has too many failed requests: S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code).

S3 cluster has unavailable object services: Some Object services are not running on <node>. Check the service status in the command-line interface.
Object service has high request latency: Object service (<service_id>) on <node> has the median request latency higher than 1 second.
Object service has critically high request latency: Object service (<service_id>) on <node> has the median request latency higher than 5 seconds.
Object service has high commit latency: Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
Object service has critically high commit latency: Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

S3 cluster has unavailable name services: Some Name services are not running on <node>. Check the service status in the command-line interface.
Name service has high request latency: Name service (<service_id>) on <node> has the median request latency higher than 1 second.
Name service has critically high request latency: Name service (<service_id>) on <node> has the median request latency higher than 5 seconds.
Name service has high commit latency: Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
Name service has critically high commit latency: Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.

Object storage agent is frozen for a long time: Object storage agent on <node> has the event loop inactive for more than 1 minute.
Object storage agent is offline: Object storage agent is offline on <node>.
Object storage agent is not connected to configuration service: Object storage agent failed to connect to the configuration service on <node>.

NFS service has unavailable FS services: Some File services are not running on <node>. Check the service status in the command-line interface.
NFS service failed to start: Object storage agent failed to start <service_name>(<service_id>) on <node>.
FS failed to start: Object storage agent failed to start file service on <node>.

S3 cluster misconfiguration: The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational.
Redundancy warning: S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures.
S3 service is frozen for a long time: S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute.
S3 service failed to start: Object storage agent failed to start <service_name>(<service_id>) on <node>.
S3 cluster has unavailable Geo-replication services: Some Geo-replication services are not running on <node>. Check the service status in the command-line interface.
S3 cluster has too many open file descriptors: There are more than 9000 open file descriptors on <node>. Please contact the technical support.