Object storage alerts
Based on the metrics described in Object storage metrics, the object storage alerts are generated and displayed in the admin panel.
S3 Gateway alerts
-
S3 cluster has unavailable S3 Gateway services
-
Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface.
-
S3 Gateway service has high GET request latency
-
S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second
-
S3 Gateway service has critically high GET request latency
-
S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds.
-
S3 Gateway service has high cancel request rate
-
S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
-
S3 Gateway service has critically high cancel request rate
-
S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
-
S3 Gateway service has high CPU usage
-
S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded.
-
S3 Gateway service has critically high CPU usage
-
S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded.
-
S3 Gateway service has too many failed requests
-
S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code).
S3 Object service alerts
-
S3 cluster has unavailable object services
-
Some Object services are not running on <node>. Check the service status in the command-line interface.
-
Object service has high request latency
-
Object service (<service_id>) on <node> has the median request latency higher than 1 second.
-
Object service has critically high request latency
-
Object service (<service_id>) on <node> has the median request latency higher than 5 seconds.
-
Object service has high commit latency
-
Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
-
Object service has critically high commit latency
-
Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.
S3 Name service alerts
-
S3 cluster has unavailable name services
-
Some Name services are not running on <node>. Check the service status in the command-line interface.
-
Name service has high request latency
-
Name service (<service_id>) on <node> has the median request latency higher than 1 second.
-
Name service has critically high request latency
-
Name service (<service_id>) on <node> has the median request latency higher than 5 seconds.
-
Name service has high commit latency
-
Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
-
Name service has critically high commit latency
-
Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.
OSTOR agent alerts
-
Object storage agent is frozen for a long time
-
Object storage agent on <node> has the event loop inactive for more than 1 minute.
-
Object storage agent is offline
-
Object storage agent is offline on <node>.
-
Object storage agent is not connected to configuration service
-
Object storage agent failed to connect to the configuration service on <node>.
File service alerts
-
NFS service has unavailable FS services
-
Some File services are not running on <node>. Check the service status in the command-line interface.
-
NFS service failed to start
-
Object storage agent failed to start <service_name>(<service_id>) on <node>.
-
FS failed to start
-
Object storage agent failed to start file service on <node>.
Other S3 cluster alerts
-
S3 cluster misconfiguration
-
The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational.
-
Redundancy warning
-
S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures.
-
S3 service is frozen for a long time
-
S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute.
-
S3 service failed to start
-
Object storage agent failed to start <service_name>(<service_id>) on <node>.
-
S3 cluster has unavailable Geo-replication services
-
Some Geo-replication services are not running on <node>. Check the service status in the command-line interface.
-
S3 cluster has too many open file descriptors
-
There are more than 9000 open file descriptors on <node>. Please contact the technical support.