Object storage metrics

Metrics used for monitoring object storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:

  • /var/lib/prometheus/rules/s3.rules
  • /var/lib/prometheus/rules/ostor.rules

Metrics that are used to generate object storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/s3.rules. These metrics are described in the table:

Metric Description
ostor_config_value List of object storage configuration parameters
ostor_s3gw_req:rate5m Number of all requests per second by an S3 gateway service over 5 minutes
ostor_s3gw_req_cancelled:rate5m Number of canceled requests per second by an S3 gateway service over 5 minutes
ostor_req_server_err:rate5m Number of failed requests with a server error (5XX status code) per second by an S3 gateway service over 5 minutes
ostor_s3gw_get_req_latency_ms_bucket:rate5m Current GET request latency by an S3 gateway service over 5 minutes, for each bucket
ostor_commit_latency_us_bucket:rate5m Current commit latency by the Object storage service over 5 minutes, for each bucket
ostor_os_req_latency_ms_bucket:rate5m Current request latency by an OS service over 5 minutes, for each bucket
ostor_ns_req_latency_ms_bucket:rate5m Current request latency by an NS service over 5 minutes, for each bucket
pcs_process_inactive_seconds_total Total amount of time a process has been inactive
process_cpu_seconds_total Total amount of time a process has used CPU
process_open_fds Number of open file descriptors
ostor_svc_start_failed_count_total Total number of failed attempts to start a service
ostor_svc_registry_cfg_failed_total Total number of failed attempts to connect to the configuration service
nds_staged_messages_count Total number of unprocessed NDS notification messages that are staged on the storage
nds_endpoint_process_count Number of NDS notification messages that are being simultaneously processed on the endpoint
ostor_nds_total:rate5m Number of NDS notification messages per second by an NDS service over 5 minutes
ostor_nds_repeat_total:rate5m Number of repeated NDS notification messages per second by an NDS service over 5 minutes
ostor_nds_error_total:rate5m Number of all NDS notification processing errors per second by an NDS service over 5 minutes
ostor_nds_delete_error_total:rate5m Number of all NDS notification deletion errors per second by an NDS service over 5 minutes
rpc_errors_total Number of RPC errors reported by the user space part of storage
fused_kernel_rpc_errors_total Number of RPC errors reported by the kernel part of storage

Bucket and user statistics

Metrics that report bucket and user statistics are not available by default. To collect them, do the following:

  1. Obtain the storage cluster password. For example:

    # vinfra cluster password show
    +----------+---------+
    | Field    | Value   |
    +----------+---------+
    | id       | 1       |
    | name     | cluster |
    | password | LwWUsf  |
    +----------+---------+
  2. Identify the S3 node that hosts the ACC service. For example:

    # ostor-ctl agent-status | grep ACC
     ACC   4000000000000026     ACTIVE      2      197   3ea066999d582815  192.168.128.226:41395
  3. On the node from step 2, enable exporting bucket and user statistics to Prometheus by running:

    # ostor-ctl set-vol -V 0100000000000002 --enable-stat

    When prompted, specify the password from step 1.

  4. Restart the ACC service to apply the changes:

    # systemctl stop ostor-agentd.service
    # systemctl restart ostor-agentd.service

The following metrics will appear in Prometheus:

Metric Description
account_control_buckets_size Bucket size, in bytes
account_control_user_size Total size of all user buckets, in bytes
account_control_s3_session_total Total number of S3 sessions initiated by the service
account_control_s3_session_errors_total Total number of S3 sessions errors
account_control_s3_client_total Total number of S3 clients created by the service
account_control_s3_client_errors_total Total number of S3 client errors
account_control_object_name_parsing_errors_total Total number of storage object name parsing errors
account_control_object_user_errors_total Total number of object owner and user parsing errors
account_control_object_upload_errors_total Total number of S3 multipart object upload errors
account_control_server_configuration_errors_total Total number of server configuration errors
account_control_bucket_creds_errors_total Total number of invalid bucket credentials errors
account_control_lifecycle_validation_errors_total Total number of lifecycle objects validation rules errors
account_control_lifecycle_transport_errors_total Total number of lifecycle gRPC transport errors
account_control_s3_delete_bulk_total1 Total number of bulk object deletions per bucket
account_control_s3_delete_bulk_errors_total2 Total number of bulk object delete errors per bucket
account_control_object_path_errors_total3 Total number of object path parsing errors per bucket
account_control_object_query_errors_total4 Total number of object query errors for per bucket
account_control_bucket_read_errors_total5 Total number of read errors per bucket
account_control_bucket_lifecycle_errors_total6 Total number of lifecycle errors per bucket
account_control_bucket_actions_errors_total7 Total number of actions errors per bucket

Stability metrics

Metric Description
gRPC-related metrics
grpc_client_started_total Total number of RPCs started on the client
grpc_client_handled_total Total number of RPCs completed on the client, grouped by status code
grpc_client_msg_sent_total Total number of messages sent by the client (streaming RPCs)
grpc_client_msg_received_total Total number of messages received by the client (streaming RPCs)
grpc_client_handling_seconds_bucket Histogram buckets of RPC handling durations on the client
grpc_client_handling_seconds_sum Cumulative duration of RPC handling on the client in seconds
grpc_client_handling_seconds_count Total number of RPCs observed on the client
grpc_server_started_total Total number of RPCs started on the server (incoming calls)
grpc_server_handled_total Total number of RPCs completed on the server, grouped by status code
grpc_server_msg_received_total Total number of messages received by the server (streaming RPCs)
grpc_server_msg_sent_total Total number of messages sent by the server (streaming RPCs)
grpc_server_handling_seconds_bucket Histogram buckets of RPC handling durations on the server
grpc_server_handling_seconds_sum Cumulative duration of RPC handling on the server in seconds
grpc_server_handling_seconds_count Total number of RPCs observed on the server
ostor_gr_events_total Total number of GR events processed
ostor_gr_processing_duration_seconds Duration of GR event processing in seconds
ostor_gr_event_failures_total Total number of failed GR events
ostor_gr_tasks Number of GR tasks currently in the queue
ostor_gr_active_tasks Number of GR tasks currently active
ostor_gr_delay_duration_seconds Duration of delay before GR tasks are executed, in seconds
ostor_gr_events Number of GR events pending replication
ostor_gr_bytes Number of GR bytes pending replication
NFS metrics
nfsv4_quota_ops Number of NFSv4 quota requests per operation
nfs_mdcache_entries Number of allocated NFS metadata cache entries
nfs_compound Number of NFS compound operations performed
nfs_read Number of read requests per export
nfs_write Number of write requests per export
File system metrics
ostorfs_opened_files Number of open file handles per volume
ostorfs_pending_ops Number of pending operations per volume
ostorfs_objects Number of allocated objects per volume
ostorfs_used_bytes Number of bytes used per volume
ostorfs_total_bytes Maximum volume size in bytes
ostorfs_used_files Number of files created per volume
ostorfs_read_bytes Number of bytes read per volume
ostorfs_write_bytes Number of bytes written per volume
ostorfs_op Total number of operations per volume, by operation type and status
ostorfs_op_latency_ms Operation latency per volume and per operation type, in milliseconds
ostor_fs_nr_files Number of files in the FS volume
ostor_fs_nr_pending Number of pending FS requests
ostor_fs_total_bytes Total number of bytes in the FS volume
ostor_fs_used_bytes Number of bytes used in the FS volume
ostor_fs_a3_size Size of Archive3
ostor_fs_a3_commits Number of Archive3 commits
ostor_fs_a3_commit_latency_ms Latency of Archive3 commits in milliseconds
ostor_fs_req_latency_ms Latency of FS requests by type in milliseconds
ostor_fs_req Number of FS operations by type
API-related metrics
ostor_api_writing Number of pending API messages
ostor_api_retries Number of retried API messages
ostor_api_msg Total number of API messages
ostor_api_msg_latency_ms Latency of API message processing in milliseconds
Limiter metrics
ostor_limiter_paused Indicates whether the limiter is currently paused
ostor_limiter_active_groups_count Number of active limiter groups
ostor_limiter_active_services_count Number of services currently limited
ostor_limiter_soft_limit_violations_total Total number of soft-limit violations
ostor_limiter_oom_events_total Total number of out-of-memory (OOM) events
ostor_limiter_oom_kills_total Total number of OOM-triggered process kills
ostor_limiter_processes_killed_total Total number of processes killed by the limiter
ostor_limiter_leak_checks_performed_total Total number of leak checks performed
ostor_limiter_aggressive_cleanups_total Total number of aggressive cleanup actions
ostor_limiter_timer_activations_total Total number of limiter timer activations
ostor_limiter_cgroup_operations_total Total number of cgroup operations performed
ostor_limiter_cgroup_operation_errors_total Total number of failed cgroup operations
ostor_limiter_group_soft_limit_violations Number of soft-limit violations per group
ostor_limiter_group_oom_events Number of OOM events per group
ostor_limiter_group_oom_kills Number of OOM-related process kills per group
ostor_limiter_group_processes_killed Number of processes killed by the limiter per group
ostor_limiter_group_timer_running Indicates whether the group timer is active
ostor_limiter_group_consecutive_violations Number of consecutive soft-limit violations per group
ostor_limiter_leak_check_duration_ms Duration of leak checks in milliseconds
ostor_limiter_limit_apply_duration_ms Duration of limit-apply operations in milliseconds
NS metrics
ostor_ns_nr_pending Number of pending NS requests
ostor_ns_nr_guards Number of active NS guards
ostor_ns_data_bytes Number of bytes used by an NS service
ostor_ns_gr_events Number of GR events created by an NS service
ostor_ns_gr_processed_events Number of GR events processed by an NS service
ostor_ns_committed_size Number of bytes written to the NS log
ostor_ns_blk_acquired Number of blocks acquired by an NS service
ostor_ns_blk_waits Number of block read waits
ostor_ns_req_latency_ms Latency of NS requests in milliseconds
ostor_ns_req Number of NS operations by type
ostor_ns_req_failed Number of failed NS operations
OS metrics
ostor_os_nr_pending Number of pending OS requests
ostor_os_data_bytes Number of bytes used by an OS service
ostor_os_read_bytes Number of bytes read by an OS service
ostor_os_write_bytes Number of bytes written by an OS service
ostor_os_blk_acquired Number of blocks acquired by an OS service
ostor_os_blk_waits Number of block read waits
ostor_os_req_latency_ms Latency of OS requests in milliseconds
ostor_os_req Number of OS operations by type
ostor_os_req_failed Number of failed OS operations
ostor_os_io_total Number of OS I/O operations by type
ostor_os_io_failed_total Number of failed OS I/O operations
ostor_os_io_latency Latency of OS I/O operations by type
S3 gateway metrics
ostor_s3gw_pending Number of pending S3 gateway requests
ostor_s3gw_req Number of incoming S3 gateway requests
ostor_s3gw_req_cancelled Number of canceled S3 requests
ostor_s3gw_req_failed Number of failed S3 requests
ostor_s3gw_req_err_client Number of client-error requests
ostor_s3gw_req_err_server Number of server-error requests
ostor_s3gw_fcgi_write_size Number of bytes written to FastCGI by an S3 gateway
ostor_s3gw_fcgi_read_size Number of bytes read from FastCGI by an S3 gateway
ostor_s3gw_fcgi_latency_ms Latency of FastCGI buffer requests in milliseconds
ostor_s3gw_sys_req Number of system requests
ostor_s3gw_put_req Number of PUT requests
ostor_s3gw_get_req Number of GET requests
ostor_s3gw_list_req Number of LIST requests
ostor_s3gw_delete_req Number of DELETE requests
ostor_s3gw_req_type Number of requests by type
ostor_s3gw_get_req_latency_ms GET last-byte latency in milliseconds, by size bucket
ostor_s3gw_put_req_latency_ms PUT last-byte latency in milliseconds, by size bucket
ostor_s3gw_req_type_latency_ms Total request latency in milliseconds by request type and object count
Other metrics
ostor_fw_processing_active Indicates whether the framework processing is currently active
ostor_fw_svc_inflight Number of inflight requests per remote service
blk_cache_blocks Number of cached blocks
blk_cache_commited Number of committed blocks
ostor_commits Total number of commit operations
ostor_commit_latency_us Commit latency in microseconds
blk_cache_checkpoint_latency Checkpoint latency by block count
incomplete_backkup_count Number of incomplete backups
auto_maintenance_status Status of auto-maintenance
ostor_lock_pending Number of pending lock records
ostor_tmp_pending Number of pending temporary records
ostor_tmp_process_latency_ms Latency of temporary record processing in milliseconds
ostor_lock_process_latency_ms Latency of lock record processing in milliseconds