Object storage metrics
Metrics used for monitoring object storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:
- /var/lib/prometheus/rules/s3.rules
- /var/lib/prometheus/rules/ostor.rules
Metrics that are used to generate object storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/s3.rules. These metrics are described in the table:
Metric | Description |
---|---|
ostor_config_value
|
List of object storage configuration parameters |
instance_vol_svc:ostor_s3gw_req:rate5m
|
Number of all requests per second by an S3 gateway service over 5 minutes |
instance_vol_svc:ostor_s3gw_req_cancelled:rate5m
|
Number of canceled requests per second by an S3 gateway service over 5 minutes |
instance_vol_svc:ostor_req_server_err:rate5m
|
Number of failed requests with a server error (5XX status code) per second by an S3 gateway service over 5 minutes |
instance_vol_svc:ostor_s3gw_get_req_latency_ms_bucket:rate5m
|
Current GET request latency by an S3 gateway service over 5 minutes, for each bucket |
instance_vol_svc:ostor_commit_latency_us_bucket:rate5m
|
Current commit latency by the Object storage service over 5 minutes, for each bucket |
instance_vol_svc_req:ostor_os_req_latency_ms_bucket:rate5m
|
Current request latency by an OS service over 5 minutes, for each bucket |
instance_vol_svc_req:ostor_ns_req_latency_ms_bucket:rate5m
|
Current request latency by an NS service over 5 minutes, for each bucket |
pcs_process_inactive_seconds_total
|
Total amount of time a process has been inactive |
process_cpu_seconds_total
|
Total amount of time a process has used CPU |
ostor_svc_start_failed_count_total
|
Total number of failed attempts to start a service |
ostor_svc_registry_cfg_failed_total
|
Total number of failed attempts to connect to the configuration service |
nds_staged_messages_count
|
Total number of unprocessed NDS notification messages that are staged on the storage |
nds_endpoint_process_count
|
Number of NDS notification messages that are being simultaneously processed on the endpoint |
instance_vol_svc:ostor_nds_total:rate5m
|
Number of NDS notification messages per second by an NDS service over 5 minutes |
instance_vol_svc:ostor_nds_repeat_total:rate5m
|
Number of repeated NDS notification messages per second by an NDS service over 5 minutes |
instance_vol_svc:ostor_nds_error_total:rate5m
|
Number of all NDS notification processing errors per second by an NDS service over 5 minutes |
instance_vol_svc:ostor_nds_delete_error_total:rate5m
|
Number of all NDS notification deletion errors per second by an NDS service over 5 minutes |
rpc_errors_total
|
Number of RPC errors reported by the user space part of storage |
fused_kernel_rpc_errors_total
|
Number of RPC errors reported by the kernel part of storage |
Bucket and user statistics
Metrics that report bucket and user statistics are not available by default. To collect them, do the following:
-
Obtain the storage cluster password. For example:
# vinfra cluster password show +----------+---------+ | Field | Value | +----------+---------+ | id | 1 | | name | cluster | | password | LwWUsf | +----------+---------+
-
Identify the S3 node that hosts the ACC service. For example:
# ostor-ctl agent-status | grep ACC ACC 4000000000000026 ACTIVE 2 197 3ea066999d582815 192.168.128.226:41395
-
On the node from step 2, enable exporting bucket and user statistics to Prometheus by running:
# ostor-ctl set-vol -V 0100000000000002 --enable-stat
When prompted, specify the password from step 1.
-
Restart the ACC service to apply the changes:
# systemctl stop ostor-agentd.service # systemctl restart ostor-agentd.service
The following metrics will appear in Prometheus:
Metric | Description |
---|---|
account_control_buckets_size
|
Bucket size, in bytes |
account_control_user_size
|
Total size of all user buckets, in bytes |
account_control_s3_session_total
|
Total number of S3 sessions initiated by the service |
account_control_s3_session_errors_total
|
Total number of S3 sessions errors |
account_control_s3_client_total
|
Total number of S3 clients created by the service |
account_control_s3_client_errors_total
|
Total number of S3 client errors |
account_control_object_name_parsing_errors_total
|
Total number of storage object name parsing errors |
account_control_object_user_errors_total
|
Total number of object owner and user parsing errors |
account_control_object_upload_errors_total
|
Total number of S3 multipart object upload errors |
account_control_server_configuration_errors_total
|
Total number of server configuration errors |
account_control_bucket_creds_errors_total
|
Total number of invalid bucket credentials errors |
account_control_lifecycle_validation_errors_total
|
Total number of lifecycle objects validation rules errors |
account_control_lifecycle_transport_errors_total
|
Total number of lifecycle gRPC transport errors |
account_control_s3_delete_bulk_total 1
|
Total number of bulk object deletions per bucket |
account_control_s3_delete_bulk_errors_total 2
|
Total number of bulk object delete errors per bucket |
account_control_object_path_errors_total 3
|
Total number of object path parsing errors per bucket |
account_control_object_query_errors_total 4
|
Total number of object query errors for per bucket |
account_control_bucket_read_errors_total 5
|
Total number of read errors per bucket |
account_control_bucket_lifecycle_errors_total 6
|
Total number of lifecycle errors per bucket |
account_control_bucket_actions_errors_total 7
|
Total number of actions errors per bucket |