Object storage alerts

Based on the metrics described in Object storage metrics, the following object storage alerts are generated and displayed in the admin panel:

Title Message Severity
S3 Gateway alerts
S3 cluster has unavailable S3 Gateway services Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface. warning
S3 Gateway service has high GET request latency S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second warning
S3 Gateway service has critically high GET request latency S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds. critical
S3 Gateway service has high cancel request rate S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests. warning
S3 Gateway service has critically high cancel request rate S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests. critical
S3 Gateway service has high CPU usage S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded. warning
S3 Gateway service has critically high CPU usage S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded. critical
S3 Gateway service has too many failed requests S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code). critical
S3 OS service alerts
S3 cluster has unavailable object services Some Object services are not running on <node>. Check the service status in the command-line interface. warning
Object service has high request latency Object service (<service_id>) on <node> has the median request latency higher than 1 second. warning
Object service has critically high request latency Object service (<service_id>) on <node> has the median request latency higher than 5 seconds. critical
Object service has high commit latency Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance. warning
Object service has critically high commit latency Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance. critical
S3 NS service alerts
S3 cluster has unavailable name services Some Name services are not running on <node>. Check the service status in the command-line interface. warning
Name service has high request latency Name service (<service_id>) on <node> has the median request latency higher than 1 second. warning
Name service has critically high request latency Name service (<service_id>) on <node> has the median request latency higher than 5 seconds. critical
Name service has high commit latency Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance. warning
Name service has critically high commit latency Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance. critical
OSTOR agent alerts
Object storage agent is frozen for a long time Object storage agent on <node> has the event loop inactive for more than 1 minute. critical
Object storage agent is offline Object storage agent is offline on <node>. warning
Object storage agent is not connected to configuration service Object storage agent failed to connect to the configuration service on <node>. warning
S3 cluster alerts
S3 cluster misconfiguration

The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational.

warning
Redundancy warning S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures. warning
S3 service is frozen for a long time S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute. critical
S3 service failed to start Object storage agent failed to start <service_name>(<service_id>) on <node>. critical
S3 cluster has unavailable Geo-replication services Some Geo-replication services are not running on <node>. Check the service status in the command-line interface. warning
Other alerts
NFS service has unavailable FS services Some File services are not running on <node>. Check the service status in the command-line interface. warning
FS failed to start Object storage agent failed to start file service on <node>. critical