Object storage alerts
Based on the metrics described in Object storage metrics, the object storage alerts are generated and displayed in the admin panel.
S3 Gateway alerts
-
S3 cluster has unavailable S3 Gateway services
-
Some S3 Gateway services are not running on <node>. Check the service status in the command-line interface.
-
S3 Gateway service has high GET request latency
-
S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 1 second
-
S3 Gateway service has critically high GET request latency
-
S3 Gateway service (<service_id>) on <node> has the median GET request latency higher than 5 seconds.
-
S3 Gateway service has high cancel request rate
-
S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
-
S3 Gateway service has critically high cancel request rate
-
S3 Gateway service (<service_id>) on <node> has the cancel request rate higher than 30%. It may be caused by connectivity issues, requests timeouts, or a small limit for pending requests.
-
S3 Gateway service has high CPU usage
-
S3 Gateway service (<service_id>) on <node> has CPU usage higher than 75%. The service may be overloaded.
-
S3 Gateway service has critically high CPU usage
-
S3 Gateway service (<service_id>) on <node> has CPU usage higher than 90%. The service may be overloaded.
-
S3 Gateway service has too many failed requests
-
S3 Gateway service (<service_id>) on <node> has a lot of failed requests with a server error (5XX status code).
S3 Object service alerts
-
S3 cluster has unavailable object services
-
Some Object services are not running on <node>. Check the service status in the command-line interface.
-
Object service has high request latency
-
Object service (<service_id>) on <node> has the median request latency higher than 1 second.
-
Object service has critically high request latency
-
Object service (<service_id>) on <node> has the median request latency higher than 5 seconds.
-
Object service has high commit latency
-
Object service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
-
Object service has critically high commit latency
-
Object service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.
S3 Name service alerts
-
S3 cluster has unavailable name services
-
Some Name services are not running on <node>. Check the service status in the command-line interface.
-
Name service has high request latency
-
Name service (<service_id>) on <node> has the median request latency higher than 1 second.
-
Name service has critically high request latency
-
Name service (<service_id>) on <node> has the median request latency higher than 5 seconds.
-
Name service has high commit latency
-
Name service (<service_id>) on <node> has the median commit latency higher than 1 second. Check the storage performance.
-
Name service has critically high commit latency
-
Name service (<service_id>) on <node> has the median commit latency higher than 10 seconds. Check the storage performance.
OSTOR agent alerts
-
Object storage agent is frozen for a long time
-
Object storage agent on <node> has the event loop inactive for more than 1 minute.
-
Object storage agent is offline
-
Object storage agent is offline on <node>.
-
Object storage agent is not connected to configuration service
-
Object storage agent failed to connect to the configuration service on <node>.
File service alerts
-
NFS service has unavailable FS services
-
Some File services are not running on <node>. Check the service status in the command-line interface.
-
NFS service failed to start
-
Object storage agent failed to start <service_name>(<service_id>) on <node>.
-
FS failed to start
-
Object storage agent failed to start file service on <node>.
-
NFS service is experiencing some network problems
-
NFS service <service_name>, <service_id> on <hostname> has some RPC errors. Check your network configuration.
-
NFS service is experiencing many network problems
-
NFS service <service_name>, <service_id> on <hostname> has many RPC errors. Check your network configuration.
NDS service alerts
-
S3 NDS service has high notification processing error rate
-
S3 NDS service (<service_id>) on <node> has the notification processing error rate higher than 5%. It may be caused by connectivity issues, requests timeouts, or an S3 topics misconfiguration.
-
S3 NDS service has critically high notification processing error rate
-
S3 NDS service (<service_id>) on <node> has the notification processing error rate higher than 15%. It may be caused by connectivity issues, requests timeouts, or an S3 topics misconfiguration.
-
S3 NDS service has high notification deletion error rate
-
S3 NDS service (<service_id>) on <node> has the notification deletion error rate higher than 5%. It may be caused by a storage misconfiguration, storage performance degradation, or other storage issues.
-
S3 NDS service has high notification repetition rate
-
S3 NDS service (<service_id>) on <node> has the notification repetition rate higher than 5%. It may be caused by a storage misconfiguration or other storage issues.
-
S3 NDS service has too many staged unprocessed notifications
-
S3 NDS service (<service_id>) on <node> has a lot of unprocessed notifications staged on the storage. It may be caused by connectivity or storage issues.
-
S3 NDS service has too many messages in simultaneous processing
-
S3 NDS service (<service_id>) on <node> has a lot of notifications in simultaneous processing on the endpoint. It may be caused by connectivity issues or an S3 topics misconfiguration.
Other S3 cluster alerts
-
S3 cluster misconfiguration
-
The S3 cluster configuration is not highly available. If one S3 node fails, the entire S3 cluster may become non-operational.
-
Redundancy warning
-
S3 is set to failure domain “disk” even though <number_of_nodes> nodes are available. It is recommended to set the failure domain to “host” so that S3 can survive host failures in addition to disk failures.
-
S3 service is frozen for a long time
-
S3 service (<service_name>, <service_id>) on <node> has the event loop inactive for more than 1 minute.
-
S3 service failed to start
-
Object storage agent failed to start <service_name>(<service_id>) on <node>.
-
S3 cluster has unavailable Geo-replication services
-
Some Geo-replication services are not running on <node>. Check the service status in the command-line interface.
-
S3 cluster has too many open file descriptors
-
There are more than 9000 open file descriptors on <node>. Please contact the technical support.
-
S3 service is experiencing some network problems
-
S3 service <service_name>, <service_id> on <hostname> has some RPC errors. Check your network configuration.
-
S3 service is experiencing many network problems
-
S3 service <service_name>, <service_id> on <hostname> has many RPC errors. Check your network configuration.