Object storage alerts
Based on the metrics described in Object storage metrics, the following object storage alerts are generated and displayed in the admin panel:
Title | Message | Severity |
---|---|---|
S3 Gateway alerts | ||
S3 cluster has unavailable S3 Gateway services | Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface. | warning |
S3 Gateway service has high GET request latency | S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second | warning |
S3 Gateway service has critically high GET request latency | S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds. | critical |
S3 Gateway service has high cancel request rate | S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests. | warning |
S3 Gateway service has critically high cancel request rate | S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests. | critical |
S3 Gateway service has high CPU usage | S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded. | warning |
S3 Gateway service has critically high CPU usage | S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded. | critical |
S3 Gateway service has too many failed requests | S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code). | critical |
S3 OS service alerts | ||
S3 cluster has unavailable object services | Some Object services are not running on <node>. Check the service status in the command-line interface. | warning |
Object service has high request latency | Object service (<service_id>) on <node> has the median request latency higher than 1 second. | warning |
Object service has critically high request latency | Object service (<service_id>) on <node> has the median request latency higher than 5 seconds. | critical |
Object service has high commit latency | Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance. | warning |
Object service has critically high commit latency | Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance. | critical |
S3 NS service alerts | ||
S3 cluster has unavailable name services | Some Name services are not running on <node>. Check the service status in the command-line interface. | warning |
Name service has high request latency | Name service (<service_id>) on <node> has the median request latency higher than 1 second. | warning |
Name service has critically high request latency | Name service (<service_id>) on <node> has the median request latency higher than 5 seconds. | critical |
Name service has high commit latency | Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance. | warning |
Name service has critically high commit latency | Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance. | critical |
OSTOR agent alerts | ||
Object storage agent is frozen for a long time | Object storage agent on <node> has the event loop inactive for more than 1 minute. | critical |
Object storage agent is offline | Object storage agent is offline on <node>. | warning |
Object storage agent is not connected to configuration service | Object storage agent failed to connect to the configuration service on <node>. | warning |
S3 cluster alerts | ||
S3 cluster misconfiguration |
The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational. |
warning |
Redundancy warning | S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures. | warning |
S3 service is frozen for a long time | S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute. | critical |
S3 service failed to start | Object storage agent failed to start <service_name>(<service_id>) on <node>. | critical |
S3 cluster has unavailable Geo-replication services | Some Geo-replication services are not running on <node>. Check the service status in the command-line interface. | warning |
Other alerts | ||
NFS service has unavailable FS services | Some File services are not running on <node>. Check the service status in the command-line interface. | warning |
FS failed to start | Object storage agent failed to start file service on <node>. | critical |