Backup storage metrics
Metrics used for monitoring backup storage are configured in the Prometheus recording rules and can be found in the /var/lib/prometheus/rules/abgw.rules file on any node in the cluster. The most important of these metrics are described in the table:
Metric | Description |
---|---|
FES object counters | |
abgw_accounts
|
Number of accounts backup storage is currently working with (that is, number of accounts with open backup archives) |
abgw_files
|
Number of currently open backup archives. Backup archives are open for reading and writing only during a backup operation. Other operations, such as restoring, browsing, and validation, open backup archives only for reading. |
abgw_conns[proto]
|
Number of current connections between backup storage and clients. The value is an array of counters. Details of the backup storage protocol (V1/V2) are available. |
Connection counters | |
abgw_conns_total
|
Total number of connections between backup storage and clients since the service startup |
abgw_client_conns_cur[name]
|
Number of currently connected clients, divided by type |
abgw_client_conns_total[name]
|
Total number of clients since the service startup, divided by type |
Certificate errors and expiration times | |
abgw_verify_certs_errors_total[err]
|
Total number of certificate verification errors since the service startup, divided by error type |
abgw_next_certificate_expiration[path]
|
Expiration date of backup storage certificates |
abgw_cert_update_fail_total
|
Number of failed attempts to update the certificate revocation list. The list is required to correctly apply a new quota in Acronis Cyber Cloud, when the current customer certificate is revoked and a new certificate is requested. |
abgw_crl_download_fail_total
|
Number of failed attempts to download the certificate revocation list. The list is required to correctly apply a new quota in Acronis Cyber Cloud, when the current customer certificate is revoked and a new certificate is requested. |
Backup storage protocol V1 request histograms and counters | |
abgw_read_reqs_total
|
Number of read requests since the service startup |
abgw_write_reqs_total
|
Number of write requests since the service startup |
abgw_req_errs_total[req][err]
|
Array with request errors, divided by request type and error codes |
abgw_req_latency_ms[req]
|
Histogram with request latency |
Backup storage protocol V2 request histograms and counters | |
abgw_v2_ireq_errs_total[req][err]
|
Number of read requests since the service startup |
abgw_v2_ireq_latency_ms[req][lat]
|
Number of write requests since the service startup |
abgw_v2_ereq_errs_total[req][err]
|
Array with request errors, divided by request type and error codes |
abgw_v2_ereq_latency_ms[req][err]
|
Histogram with request latency |
Byte counters | |
abgw_read_bytes_total[proxied]
|
Number of bytes read from a disk since the service startup. The proxied parameter shows data read via a reverse proxy. |
abgw_write_bytes_total[proxied]
|
Number of bytes written to a disk since the service startup. The proxied parameter shows data written via a reverse proxy. |
abgw_write_rollback_bytes_total
|
Size of data overwritten by backup storage per client's request when backup storage could not confirm to the client that data was already written. The metric is used only for the backup storage protocol V1 and legacy backup clients. |
File operation and I/O operation metrics | |
abgw_file_lookup_errs_total[err]
|
Number of failed attempts to open files or find already open files, divided by error codes |
abgw_fop_latency_ms_bucket[fop][proxied][err]
|
Histogram with the sum of file operation latency, divided by operation type (read, write, sync, stat), proxied or not, by error number, and other file operations |
abgw_iop_latency_ms_bucket[iop][proxied][err]
|
Histogram with I/O operation latency, divided by operation type, proxied or not, and by error number |
abgw_io_limiting_failures_total[type]
|
Number of failed I/O requests to backup storage since the service startup, due to poor performance of the underlying storage |
abgw_iop_wd_timeouts[iop]
|
Number of file operations that take more than two minutes, divided by operation type |
Migration metrics | |
abgw_account_pull_errs_total[err]
|
Number of failed attempts to retrieve the account list by the destination backup storage from the source backup storage before the migration start |
abgw_nr_files_to_pull
|
Number of files to migrate from the source backup storage to the destination backup storage (includes all files for which migration is not completed) |
abgw_pull_backlog_bytes
|
Number of bytes on the source backup storage that are not yet migrated to the destination backup storage |
abgw_pull_progress_bytes_total
|
Number of bytes on the destination backup storage that are already migrated from the source backup storage since the service startup |
abgw_file_migration_source_open_errs_total[err]
|
Number of failed attempts to open files for migration on the source backup storage since the service startup |
abgw_file_migration_source_read_errs_total[err]
|
Number of failed attempts to read files for migration on the source backup storage since the service startup |
Object storage and geo-replication metrics | |
abgw_push_backlog_bytes[ostor, replica]
|
Number of bytes to be written to the object destination storage, or to the secondary cluster in case of geo-replication |
abgw_push_progress_bytes_total[ostor, replica]
|
Number of bytes written to the object destination storage, or to the secondary cluster in case of geo-replication. This metric helps to understand the speed of data replication or copying. |
abgw_push_replica_errs_total[err]
|
Number of failed attempts to write files to the object destination storage, or to the secondary cluster in case of geo-replication, since the service startup, divided by error type |
abgw_replica_integrity_checks_fail_total
|
Number of corrupted replicas on the secondary cluster since the service startup |
abgw_file_replica_auto_errs_total[err]
|
Number of geo-replication errors for new files (created after configuring geo-replication) since the service startup, divided by error type |
abgw_file_replica_open_errs_total[err]
|
Number of failed attempts by the primary cluster to open files for writing on the secondary cluster since the service startup, divided by error code |
Object destination storage metrics | |
abgw_ostor_used_space_bytes
|
Space size used by all backup archives, including data and unused space, on the object destination storage |
abgw_nr_ostor_sequence_mismatch_total
|
Number of files failed to be opened by backup storage due to their version mismatch on the object destination storage |
abgw_ostor_garbage_bytes
|
Unused space size inside all backup archives that is not yet physically cleaned up on the object destination storage |
Container archive validation results | |
abgw_containers_validate_segments_fail_total
|
Number of archives with failed validation (segments) on the NFS and object destination storage |
abgw_containers_validate_trees_fail_total
|
Number of archives with failed validation (trees) on the NFS and object destination storage |
Other metrics | |
abgw_append_throttle_delay_ms_total
|
Total sum of delays injected since the service startup. The metric helps to understand if throttling is enabled for backup storage. |
abgw_iop_ebusy
|
Number of I/O errors for open file operations since the service startup |
Histogram metrics with the "_bucket" suffix have corresponding metrics ending with "_sum" and "_counter", for example:
abgw_iop_latency_ms_bucket
shows the current measurement for I/O operation latency per bucketabgw_iop_latency_ms_count
shows the total sum of all measurements for I/O operation latency per bucketabgw_iop_latency_ms_sum
shows the number of stored measurements for I/O operation latency per bucket