Object storage metrics

Metrics used for monitoring object storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:

/var/lib/prometheus/rules/s3.rules
/var/lib/prometheus/rules/ostor.rules

Metrics that are used to generate object storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/s3.rules. These metrics are described in the table:

Metric	Description
`ostor_config_value`	List of object storage configuration parameters
`instance_vol_svc:ostor_s3gw_req:rate5m`	Number of all requests per second by an S3 gateway service over 5 minutes
`instance_vol_svc:ostor_s3gw_req_cancelled:rate5m`	Number of canceled requests per second by an S3 gateway service over 5 minutes
`instance_vol_svc:ostor_req_server_err:rate5m`	Number of failed requests with a server error (5XX status code) per second by an S3 gateway service over 5 minutes
`instance_vol_svc:ostor_s3gw_get_req_latency_ms_bucket:rate5m`	Current GET request latency by an S3 gateway service over 5 minutes, for each bucket
`instance_vol_svc:ostor_commit_latency_us_bucket:rate5m`	Current commit latency by the Object storage service over 5 minutes, for each bucket
`instance_vol_svc_req:ostor_os_req_latency_ms_bucket:rate5m`	Current request latency by an OS service over 5 minutes, for each bucket
`instance_vol_svc_req:ostor_ns_req_latency_ms_bucket:rate5m`	Current request latency by an NS service over 5 minutes, for each bucket
`pcs_process_inactive_seconds_total`	Total amount of time a process has been inactive
`process_cpu_seconds_total`	Total amount of time a process has used CPU
`ostor_svc_start_failed_count_total`	Total number of failed attempts to start a service
`ostor_svc_registry_cfg_failed_total`	Total number of failed attempts to connect to the configuration service
`nds_staged_messages_count`	Total number of unprocessed NDS notification messages that are staged on the storage
`nds_endpoint_process_count`	Number of NDS notification messages that are being simultaneously processed on the endpoint
`instance_vol_svc:ostor_nds_total:rate5m`	Number of NDS notification messages per second by an NDS service over 5 minutes
`instance_vol_svc:ostor_nds_repeat_total:rate5m`	Number of repeated NDS notification messages per second by an NDS service over 5 minutes
`instance_vol_svc:ostor_nds_error_total:rate5m`	Number of all NDS notification processing errors per second by an NDS service over 5 minutes
`instance_vol_svc:ostor_nds_delete_error_total:rate5m`	Number of all NDS notification deletion errors per second by an NDS service over 5 minutes
`rpc_errors_total`	Number of RPC errors reported by the user space part of storage
`fused_kernel_rpc_errors_total`	Number of RPC errors reported by the kernel part of storage

Bucket and user statistics

Metrics that report bucket and user statistics are not available by default. To collect them, do the following:

Obtain the storage cluster password. For example:

# vinfra cluster password show
+----------+---------+
| Field    | Value   |
+----------+---------+
| id       | 1       |
| name     | cluster |
| password | LwWUsf  |
+----------+---------+

Identify the S3 node that hosts the ACC service. For example:

# ostor-ctl agent-status | grep ACC
 ACC   4000000000000026     ACTIVE      2      197   3ea066999d582815  192.168.128.226:41395

On the node from step 2, enable exporting bucket and user statistics to Prometheus by running:
```
# ostor-ctl set-vol -V 0100000000000002 --enable-stat
```
When prompted, specify the password from step 1.

Restart the ACC service to apply the changes:

# systemctl stop ostor-agentd.service
# systemctl restart ostor-agentd.service

The following metrics will appear in Prometheus:

Metric	Description
`account_control_buckets_size`	Bucket size, in bytes
`account_control_user_size`	Total size of all user buckets, in bytes
`account_control_s3_session_total`	Total number of S3 sessions initiated by the service
`account_control_s3_session_errors_total`	Total number of S3 sessions errors
`account_control_s3_client_total`	Total number of S3 clients created by the service
`account_control_s3_client_errors_total`	Total number of S3 client errors
`account_control_object_name_parsing_errors_total`	Total number of storage object name parsing errors
`account_control_object_user_errors_total`	Total number of object owner and user parsing errors
`account_control_object_upload_errors_total`	Total number of S3 multipart object upload errors
`account_control_server_configuration_errors_total`	Total number of server configuration errors
`account_control_bucket_creds_errors_total`	Total number of invalid bucket credentials errors
`account_control_lifecycle_validation_errors_total`	Total number of lifecycle objects validation rules errors
`account_control_lifecycle_transport_errors_total`	Total number of lifecycle gRPC transport errors
`account_control_s3_delete_bulk_total`1	Total number of bulk object deletions per bucket
`account_control_s3_delete_bulk_errors_total`2	Total number of bulk object delete errors per bucket
`account_control_object_path_errors_total`3	Total number of object path parsing errors per bucket
`account_control_object_query_errors_total`4	Total number of object query errors for per bucket
`account_control_bucket_read_errors_total`5	Total number of read errors per bucket
`account_control_bucket_lifecycle_errors_total`6	Total number of lifecycle errors per bucket
`account_control_bucket_actions_errors_total`7	Total number of actions errors per bucket