Object storage metrics

Metrics used for monitoring object storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:

  • /var/lib/prometheus/rules/s3.rules
  • /var/lib/prometheus/rules/ostor.rules

Metrics that are used to generate object storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/s3.rules. These metrics are described in the table:

Metric Description
instance_vol_svc:ostor_s3gw_req:rate5m Number of all requests per second by a particular S3 gateway service for 5 minutes
instance_vol_svc:ostor_s3gw_req_cancelled:rate5m Number of canceled requests per second by a particular S3 gateway service for 5 minutes
instance_vol_svc:ostor_req_server_err:rate5m Number of failed requests with a server error (5XX status code) per second by a particular S3 gateway service for 5 minutes
instance_vol_svc:ostor_s3gw_get_req_latency_ms_bucket:rate5m Current GET request latency by a particular S3 gateway service for 5 minutes, for each bucket
instance_vol_svc:ostor_commit_latency_us_bucket:rate5m Current commit latency by the Object storage service for 5 minutes, for each bucket
instance_vol_svc_req:ostor_os_req_latency_ms_bucket:rate5m Current request latency by a particular OS service for 5 minutes, for each bucket
instance_vol_svc_req:ostor_ns_req_latency_ms_bucket:rate5m Current request latency by a particular NS service for 5 minutes, for each bucket
pcs_process_inactive_seconds_total Total amount of time a process has been inactive
process_cpu_seconds_total Total amount of time a process has used CPU
ostor_svc_start_failed_count_total Total number of failed attempts to start a service
ostor_svc_registry_cfg_failed_total Total number of failed attempts to connect to the configuration service
nds_staged_messages_count Total number of unprocessed NDS notification messages that are staged on the storage
nds_endpoint_process_count Number of NDS notification messages that are being simultaneously processed on the endpoint
instance_vol_svc:ostor_nds_total:rate5m Number of NDS notification messages per second by a particular NDS service for 5 minutes
instance_vol_svc:ostor_nds_repeat_total:rate5m Number of repeated NDS notification messages per second by a particular NDS service for 5 minutes
instance_vol_svc:ostor_nds_error_total:rate5m Number of all NDS notification processing errors per second by a particular NDS service for 5 minutes
instance_vol_svc:ostor_nds_delete_error_total:rate5m Number of all NDS notification deletion errors per second by a particular NDS service for 5 minutes
rpc_errors_total Number of RPC errors reported by the user space part of storage
fused_kernel_rpc_errors_total Number of RPC errors reported by the kernel part of storage

Bucket and user size metrics

Metrics that report object storage usage per bucket and per user are not available by default. To collect this statistics, you need to enable it by running the following command on an S3 node:

# ostor-ctl set-vol -V 0100000000000002 --enable-stat

The following metrics will appear in Prometheus:

  • account_control_buckets_size: Bucket size, in bytes
  • account_control_user_size: Total size of all user buckets, in bytes