Object storage metrics
Metrics used for monitoring object storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:
- /var/lib/prometheus/rules/s3.rules
- /var/lib/prometheus/rules/ostor.rules
Metrics that are used to generate object storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/s3.rules. These metrics are described in the table:
| Metric | Description |
|---|---|
ostor_config_value
|
List of object storage configuration parameters |
ostor_s3gw_req:rate5m
|
Number of all requests per second by an S3 gateway service over 5 minutes |
ostor_s3gw_req_cancelled:rate5m
|
Number of canceled requests per second by an S3 gateway service over 5 minutes |
ostor_req_server_err:rate5m
|
Number of failed requests with a server error (5XX status code) per second by an S3 gateway service over 5 minutes |
ostor_s3gw_get_req_latency_ms_bucket:rate5m
|
Current GET request latency by an S3 gateway service over 5 minutes, for each bucket |
ostor_commit_latency_us_bucket:rate5m
|
Current commit latency by the Object storage service over 5 minutes, for each bucket |
ostor_os_req_latency_ms_bucket:rate5m
|
Current request latency by an OS service over 5 minutes, for each bucket |
ostor_ns_req_latency_ms_bucket:rate5m
|
Current request latency by an NS service over 5 minutes, for each bucket |
pcs_process_inactive_seconds_total
|
Total amount of time a process has been inactive |
process_cpu_seconds_total
|
Total amount of time a process has used CPU |
process_open_fds
|
Number of open file descriptors |
ostor_svc_start_failed_count_total
|
Total number of failed attempts to start a service |
ostor_svc_registry_cfg_failed_total
|
Total number of failed attempts to connect to the configuration service |
nds_staged_messages_count
|
Total number of unprocessed NDS notification messages that are staged on the storage |
nds_endpoint_process_count
|
Number of NDS notification messages that are being simultaneously processed on the endpoint |
ostor_nds_total:rate5m
|
Number of NDS notification messages per second by an NDS service over 5 minutes |
ostor_nds_repeat_total:rate5m
|
Number of repeated NDS notification messages per second by an NDS service over 5 minutes |
ostor_nds_error_total:rate5m
|
Number of all NDS notification processing errors per second by an NDS service over 5 minutes |
ostor_nds_delete_error_total:rate5m
|
Number of all NDS notification deletion errors per second by an NDS service over 5 minutes |
rpc_errors_total
|
Number of RPC errors reported by the user space part of storage |
fused_kernel_rpc_errors_total
|
Number of RPC errors reported by the kernel part of storage |
Bucket and user statistics
Metrics that report bucket and user statistics are not available by default. To collect them, do the following:
-
Obtain the storage cluster password. For example:
# vinfra cluster password show +----------+---------+ | Field | Value | +----------+---------+ | id | 1 | | name | cluster | | password | LwWUsf | +----------+---------+
-
Identify the S3 node that hosts the ACC service. For example:
# ostor-ctl agent-status | grep ACC ACC 4000000000000026 ACTIVE 2 197 3ea066999d582815 192.168.128.226:41395
-
On the node from step 2, enable exporting bucket and user statistics to Prometheus by running:
# ostor-ctl set-vol -V 0100000000000002 --enable-stat
When prompted, specify the password from step 1.
-
Restart the ACC service to apply the changes:
# systemctl stop ostor-agentd.service # systemctl restart ostor-agentd.service
The following metrics will appear in Prometheus:
| Metric | Description |
|---|---|
account_control_buckets_size
|
Bucket size, in bytes |
account_control_user_size
|
Total size of all user buckets, in bytes |
account_control_s3_session_total
|
Total number of S3 sessions initiated by the service |
account_control_s3_session_errors_total
|
Total number of S3 sessions errors |
account_control_s3_client_total
|
Total number of S3 clients created by the service |
account_control_s3_client_errors_total
|
Total number of S3 client errors |
account_control_object_name_parsing_errors_total
|
Total number of storage object name parsing errors |
account_control_object_user_errors_total
|
Total number of object owner and user parsing errors |
account_control_object_upload_errors_total
|
Total number of S3 multipart object upload errors |
account_control_server_configuration_errors_total
|
Total number of server configuration errors |
account_control_bucket_creds_errors_total
|
Total number of invalid bucket credentials errors |
account_control_lifecycle_validation_errors_total
|
Total number of lifecycle objects validation rules errors |
account_control_lifecycle_transport_errors_total
|
Total number of lifecycle gRPC transport errors |
account_control_s3_delete_bulk_total1
|
Total number of bulk object deletions per bucket |
account_control_s3_delete_bulk_errors_total2
|
Total number of bulk object delete errors per bucket |
account_control_object_path_errors_total3
|
Total number of object path parsing errors per bucket |
account_control_object_query_errors_total4
|
Total number of object query errors for per bucket |
account_control_bucket_read_errors_total5
|
Total number of read errors per bucket |
account_control_bucket_lifecycle_errors_total6
|
Total number of lifecycle errors per bucket |
account_control_bucket_actions_errors_total7
|
Total number of actions errors per bucket |
Stability metrics
| Metric | Description |
|---|---|
| gRPC-related metrics | |
grpc_client_started_total
|
Total number of RPCs started on the client |
grpc_client_handled_total
|
Total number of RPCs completed on the client, grouped by status code |
grpc_client_msg_sent_total
|
Total number of messages sent by the client (streaming RPCs) |
grpc_client_msg_received_total
|
Total number of messages received by the client (streaming RPCs) |
grpc_client_handling_seconds_bucket
|
Histogram buckets of RPC handling durations on the client |
grpc_client_handling_seconds_sum
|
Cumulative duration of RPC handling on the client in seconds |
grpc_client_handling_seconds_count
|
Total number of RPCs observed on the client |
grpc_server_started_total
|
Total number of RPCs started on the server (incoming calls) |
grpc_server_handled_total
|
Total number of RPCs completed on the server, grouped by status code |
grpc_server_msg_received_total
|
Total number of messages received by the server (streaming RPCs) |
grpc_server_msg_sent_total
|
Total number of messages sent by the server (streaming RPCs) |
grpc_server_handling_seconds_bucket
|
Histogram buckets of RPC handling durations on the server |
grpc_server_handling_seconds_sum
|
Cumulative duration of RPC handling on the server in seconds |
grpc_server_handling_seconds_count
|
Total number of RPCs observed on the server |
ostor_gr_events_total
|
Total number of GR events processed |
ostor_gr_processing_duration_seconds
|
Duration of GR event processing in seconds |
ostor_gr_event_failures_total
|
Total number of failed GR events |
ostor_gr_tasks
|
Number of GR tasks currently in the queue |
ostor_gr_active_tasks
|
Number of GR tasks currently active |
ostor_gr_delay_duration_seconds
|
Duration of delay before GR tasks are executed, in seconds |
ostor_gr_events
|
Number of GR events pending replication |
ostor_gr_bytes
|
Number of GR bytes pending replication |
| NFS metrics | |
nfsv4_quota_ops
|
Number of NFSv4 quota requests per operation |
nfs_mdcache_entries
|
Number of allocated NFS metadata cache entries |
nfs_compound
|
Number of NFS compound operations performed |
nfs_read
|
Number of read requests per export |
nfs_write
|
Number of write requests per export |
| File system metrics | |
ostorfs_opened_files
|
Number of open file handles per volume |
ostorfs_pending_ops
|
Number of pending operations per volume |
ostorfs_objects
|
Number of allocated objects per volume |
ostorfs_used_bytes
|
Number of bytes used per volume |
ostorfs_total_bytes
|
Maximum volume size in bytes |
ostorfs_used_files
|
Number of files created per volume |
ostorfs_read_bytes
|
Number of bytes read per volume |
ostorfs_write_bytes
|
Number of bytes written per volume |
ostorfs_op
|
Total number of operations per volume, by operation type and status |
ostorfs_op_latency_ms
|
Operation latency per volume and per operation type, in milliseconds |
ostor_fs_nr_files
|
Number of files in the FS volume |
ostor_fs_nr_pending
|
Number of pending FS requests |
ostor_fs_total_bytes
|
Total number of bytes in the FS volume |
ostor_fs_used_bytes
|
Number of bytes used in the FS volume |
ostor_fs_a3_size
|
Size of Archive3 |
ostor_fs_a3_commits
|
Number of Archive3 commits |
ostor_fs_a3_commit_latency_ms
|
Latency of Archive3 commits in milliseconds |
ostor_fs_req_latency_ms
|
Latency of FS requests by type in milliseconds |
ostor_fs_req
|
Number of FS operations by type |
| API-related metrics | |
ostor_api_writing
|
Number of pending API messages |
ostor_api_retries
|
Number of retried API messages |
ostor_api_msg
|
Total number of API messages |
ostor_api_msg_latency_ms
|
Latency of API message processing in milliseconds |
| Limiter metrics | |
ostor_limiter_paused
|
Indicates whether the limiter is currently paused |
ostor_limiter_active_groups_count
|
Number of active limiter groups |
ostor_limiter_active_services_count
|
Number of services currently limited |
ostor_limiter_soft_limit_violations_total
|
Total number of soft-limit violations |
ostor_limiter_oom_events_total
|
Total number of out-of-memory (OOM) events |
ostor_limiter_oom_kills_total
|
Total number of OOM-triggered process kills |
ostor_limiter_processes_killed_total
|
Total number of processes killed by the limiter |
ostor_limiter_leak_checks_performed_total
|
Total number of leak checks performed |
ostor_limiter_aggressive_cleanups_total
|
Total number of aggressive cleanup actions |
ostor_limiter_timer_activations_total
|
Total number of limiter timer activations |
ostor_limiter_cgroup_operations_total
|
Total number of cgroup operations performed |
ostor_limiter_cgroup_operation_errors_total
|
Total number of failed cgroup operations |
ostor_limiter_group_soft_limit_violations
|
Number of soft-limit violations per group |
ostor_limiter_group_oom_events
|
Number of OOM events per group |
ostor_limiter_group_oom_kills
|
Number of OOM-related process kills per group |
ostor_limiter_group_processes_killed
|
Number of processes killed by the limiter per group |
ostor_limiter_group_timer_running
|
Indicates whether the group timer is active |
ostor_limiter_group_consecutive_violations
|
Number of consecutive soft-limit violations per group |
ostor_limiter_leak_check_duration_ms
|
Duration of leak checks in milliseconds |
ostor_limiter_limit_apply_duration_ms
|
Duration of limit-apply operations in milliseconds |
| NS metrics | |
ostor_ns_nr_pending
|
Number of pending NS requests |
ostor_ns_nr_guards
|
Number of active NS guards |
ostor_ns_data_bytes
|
Number of bytes used by an NS service |
ostor_ns_gr_events
|
Number of GR events created by an NS service |
ostor_ns_gr_processed_events
|
Number of GR events processed by an NS service |
ostor_ns_committed_size
|
Number of bytes written to the NS log |
ostor_ns_blk_acquired
|
Number of blocks acquired by an NS service |
ostor_ns_blk_waits
|
Number of block read waits |
ostor_ns_req_latency_ms
|
Latency of NS requests in milliseconds |
ostor_ns_req
|
Number of NS operations by type |
ostor_ns_req_failed
|
Number of failed NS operations |
| OS metrics | |
ostor_os_nr_pending
|
Number of pending OS requests |
ostor_os_data_bytes
|
Number of bytes used by an OS service |
ostor_os_read_bytes
|
Number of bytes read by an OS service |
ostor_os_write_bytes
|
Number of bytes written by an OS service |
ostor_os_blk_acquired
|
Number of blocks acquired by an OS service |
ostor_os_blk_waits
|
Number of block read waits |
ostor_os_req_latency_ms
|
Latency of OS requests in milliseconds |
ostor_os_req
|
Number of OS operations by type |
ostor_os_req_failed
|
Number of failed OS operations |
ostor_os_io_total
|
Number of OS I/O operations by type |
ostor_os_io_failed_total
|
Number of failed OS I/O operations |
ostor_os_io_latency
|
Latency of OS I/O operations by type |
| S3 gateway metrics | |
ostor_s3gw_pending
|
Number of pending S3 gateway requests |
ostor_s3gw_req
|
Number of incoming S3 gateway requests |
ostor_s3gw_req_cancelled
|
Number of canceled S3 requests |
ostor_s3gw_req_failed
|
Number of failed S3 requests |
ostor_s3gw_req_err_client
|
Number of client-error requests |
ostor_s3gw_req_err_server
|
Number of server-error requests |
ostor_s3gw_fcgi_write_size
|
Number of bytes written to FastCGI by an S3 gateway |
ostor_s3gw_fcgi_read_size
|
Number of bytes read from FastCGI by an S3 gateway |
ostor_s3gw_fcgi_latency_ms
|
Latency of FastCGI buffer requests in milliseconds |
ostor_s3gw_sys_req
|
Number of system requests |
ostor_s3gw_put_req
|
Number of PUT requests |
ostor_s3gw_get_req
|
Number of GET requests |
ostor_s3gw_list_req
|
Number of LIST requests |
ostor_s3gw_delete_req
|
Number of DELETE requests |
ostor_s3gw_req_type
|
Number of requests by type |
ostor_s3gw_get_req_latency_ms
|
GET last-byte latency in milliseconds, by size bucket |
ostor_s3gw_put_req_latency_ms
|
PUT last-byte latency in milliseconds, by size bucket |
ostor_s3gw_req_type_latency_ms
|
Total request latency in milliseconds by request type and object count |
| Other metrics | |
ostor_fw_processing_active
|
Indicates whether the framework processing is currently active |
ostor_fw_svc_inflight
|
Number of inflight requests per remote service |
blk_cache_blocks
|
Number of cached blocks |
blk_cache_commited
|
Number of committed blocks |
ostor_commits
|
Total number of commit operations |
ostor_commit_latency_us
|
Commit latency in microseconds |
blk_cache_checkpoint_latency
|
Checkpoint latency by block count |
incomplete_backkup_count
|
Number of incomplete backups |
auto_maintenance_status
|
Status of auto-maintenance |
ostor_lock_pending
|
Number of pending lock records |
ostor_tmp_pending
|
Number of pending temporary records |
ostor_tmp_process_latency_ms
|
Latency of temporary record processing in milliseconds |
ostor_lock_process_latency_ms
|
Latency of lock record processing in milliseconds |