Object storage metrics

Metrics used for monitoring object storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:

  • /var/lib/prometheus/rules/s3.rules
  • /var/lib/prometheus/rules/ostor.rules

Metrics that are used to generate object storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/s3.rules. These metrics are described in the table:

Metric Description
ostor_config_value List of object storage configuration parameters
instance_vol_svc:ostor_s3gw_req:rate5m Number of all requests per second by an S3 gateway service over 5 minutes
instance_vol_svc:ostor_s3gw_req_cancelled:rate5m Number of canceled requests per second by an S3 gateway service over 5 minutes
instance_vol_svc:ostor_req_server_err:rate5m Number of failed requests with a server error (5XX status code) per second by an S3 gateway service over 5 minutes
instance_vol_svc:ostor_s3gw_get_req_latency_ms_bucket:rate5m Current GET request latency by an S3 gateway service over 5 minutes, for each bucket
instance_vol_svc:ostor_commit_latency_us_bucket:rate5m Current commit latency by the Object storage service over 5 minutes, for each bucket
instance_vol_svc_req:ostor_os_req_latency_ms_bucket:rate5m Current request latency by an OS service over 5 minutes, for each bucket
instance_vol_svc_req:ostor_ns_req_latency_ms_bucket:rate5m Current request latency by an NS service over 5 minutes, for each bucket
pcs_process_inactive_seconds_total Total amount of time a process has been inactive
process_cpu_seconds_total Total amount of time a process has used CPU
ostor_svc_start_failed_count_total Total number of failed attempts to start a service
ostor_svc_registry_cfg_failed_total Total number of failed attempts to connect to the configuration service
nds_staged_messages_count Total number of unprocessed NDS notification messages that are staged on the storage
nds_endpoint_process_count Number of NDS notification messages that are being simultaneously processed on the endpoint
instance_vol_svc:ostor_nds_total:rate5m Number of NDS notification messages per second by an NDS service over 5 minutes
instance_vol_svc:ostor_nds_repeat_total:rate5m Number of repeated NDS notification messages per second by an NDS service over 5 minutes
instance_vol_svc:ostor_nds_error_total:rate5m Number of all NDS notification processing errors per second by an NDS service over 5 minutes
instance_vol_svc:ostor_nds_delete_error_total:rate5m Number of all NDS notification deletion errors per second by an NDS service over 5 minutes
rpc_errors_total Number of RPC errors reported by the user space part of storage
fused_kernel_rpc_errors_total Number of RPC errors reported by the kernel part of storage

Bucket and user statistics

Metrics that report bucket and user statistics are not available by default. To collect them, do the following:

  1. Obtain the storage cluster password. For example:

    # vinfra cluster password show
    +----------+---------+
    | Field    | Value   |
    +----------+---------+
    | id       | 1       |
    | name     | cluster |
    | password | LwWUsf  |
    +----------+---------+
  2. Identify the S3 node that hosts the ACC service. For example:

    # ostor-ctl agent-status | grep ACC
     ACC   4000000000000026     ACTIVE      2      197   3ea066999d582815  192.168.128.226:41395
  3. On the node from step 2, enable exporting bucket and user statistics to Prometheus by running:

    # ostor-ctl set-vol -V 0100000000000002 --enable-stat

    When prompted, specify the password from step 1.

  4. Restart the ACC service to apply the changes:

    # systemctl stop ostor-agentd.service
    # systemctl restart ostor-agentd.service

The following metrics will appear in Prometheus:

Metric Description
account_control_buckets_size Bucket size, in bytes
account_control_user_size Total size of all user buckets, in bytes
account_control_s3_session_total Total number of S3 sessions initiated by the service
account_control_s3_session_errors_total Total number of S3 sessions errors
account_control_s3_client_total Total number of S3 clients created by the service
account_control_s3_client_errors_total Total number of S3 client errors
account_control_object_name_parsing_errors_total Total number of storage object name parsing errors
account_control_object_user_errors_total Total number of object owner and user parsing errors
account_control_object_upload_errors_total Total number of S3 multipart object upload errors
account_control_server_configuration_errors_total Total number of server configuration errors
account_control_bucket_creds_errors_total Total number of invalid bucket credentials errors
account_control_lifecycle_validation_errors_total Total number of lifecycle objects validation rules errors
account_control_lifecycle_transport_errors_total Total number of lifecycle gRPC transport errors
account_control_s3_delete_bulk_total1 Total number of bulk object deletions per bucket
account_control_s3_delete_bulk_errors_total2 Total number of bulk object delete errors per bucket
account_control_object_path_errors_total3 Total number of object path parsing errors per bucket
account_control_object_query_errors_total4 Total number of object query errors for per bucket
account_control_bucket_read_errors_total5 Total number of read errors per bucket
account_control_bucket_lifecycle_errors_total6 Total number of lifecycle errors per bucket
account_control_bucket_actions_errors_total7 Total number of actions errors per bucket