Backup storage metrics

Metrics used for monitoring backup storage are configured in the Prometheus recording rules and can be found in the /var/lib/prometheus/rules/abgw.rules file on any node in the cluster. The most important of these metrics are described in the table:

Metric Description
FES object counters
abgw_accounts Number of accounts backup storage is currently working with (that is, number of accounts with open backup archives)
abgw_files

Number of currently open backup archives. Backup archives are open for reading and writing only during a backup operation. Other operations, such as restoring, browsing, and validation, open backup archives only for reading.

abgw_conns[proto] Number of current connections between backup storage and clients. The value is an array of counters. Details of the backup storage protocol (V1/V2) are available.
Connection counters
abgw_conns_total Total number of connections between backup storage and clients since the service startup
abgw_client_conns_cur[name] Number of currently connected clients, divided by type
abgw_client_conns_total[name] Total number of clients since the service startup, divided by type
Certificate errors and expiration times
abgw_verify_certs_errors_total[err] Total number of certificate verification errors since the service startup, divided by error type
abgw_next_certificate_expiration[path] Expiration date of backup storage certificates
abgw_cert_update_fail_total

Number of failed attempts to update the certificate revocation list. The list is required to correctly apply a new quota in Acronis Cyber Cloud, when the current customer certificate is revoked and a new certificate is requested.

abgw_crl_download_fail_total Number of failed attempts to download the certificate revocation list. The list is required to correctly apply a new quota in Acronis Cyber Cloud, when the current customer certificate is revoked and a new certificate is requested.
Backup storage protocol V1 request histograms and counters
abgw_read_reqs_total Number of read requests since the service startup
abgw_write_reqs_total Number of write requests since the service startup
abgw_req_errs_total[req][err] Array with request errors, divided by request type and error codes
abgw_req_latency_ms[req] Histogram with request latency
Backup storage protocol V2 request histograms and counters
abgw_v2_ireq_errs_total[req][err] Number of read requests since the service startup
abgw_v2_ireq_latency_ms[req][lat] Number of write requests since the service startup
abgw_v2_ereq_errs_total[req][err] Array with request errors, divided by request type and error codes
abgw_v2_ereq_latency_ms[req][err] Histogram with request latency
Byte counters
abgw_read_bytes_total[proxied] Number of bytes read from a disk since the service startup. The proxied parameter shows data read via a reverse proxy.
abgw_write_bytes_total[proxied] Number of bytes written to a disk since the service startup. The proxied parameter shows data written via a reverse proxy.
abgw_write_rollback_bytes_total

Size of data overwritten by backup storage per client's request when backup storage could not confirm to the client that data was already written. The metric is used only for the backup storage protocol V1 and legacy backup clients.

File operation and I/O operation metrics
abgw_file_lookup_errs_total[err] Number of failed attempts to open files or find already open files, divided by error codes
abgw_fop_latency_ms_bucket[fop][proxied][err] Histogram with the sum of file operation latency, divided by operation type (read, write, sync, stat), proxied or not, by error number, and other file operations
abgw_iop_latency_ms_bucket[iop][proxied][err] Histogram with I/O operation latency, divided by operation type, proxied or not, and by error number
abgw_io_limiting_failures_total[type] Number of failed I/O requests to backup storage since the service startup, due to poor performance of the underlying storage
abgw_iop_wd_timeouts[iop] Number of file operations that take more than two minutes, divided by operation type
Migration metrics
abgw_account_pull_errs_total[err]

Number of failed attempts to retrieve the account list by the destination backup storage from the source backup storage before the migration start

abgw_nr_files_to_pull Number of files to migrate from the source backup storage to the destination backup storage (includes all files for which migration is not completed)
abgw_pull_backlog_bytes Number of bytes on the source backup storage that are not yet migrated to the destination backup storage
abgw_pull_progress_bytes_total Number of bytes on the destination backup storage that are already migrated from the source backup storage since the service startup
abgw_file_migration_source_open_errs_total[err] Number of failed attempts to open files for migration on the source backup storage since the service startup
abgw_file_migration_source_read_errs_total[err] Number of failed attempts to read files for migration on the source backup storage since the service startup
Object storage and geo-replication metrics
abgw_push_backlog_bytes[ostor, replica] Number of bytes to be written to the object destination storage, or to the secondary cluster in case of geo-replication
abgw_push_progress_bytes_total[ostor, replica] Number of bytes written to the object destination storage, or to the secondary cluster in case of geo-replication. This metric helps to understand the speed of data replication or copying.
abgw_push_replica_errs_total[err] Number of failed attempts to write files to the object destination storage, or to the secondary cluster in case of geo-replication, since the service startup, divided by error type
abgw_replica_integrity_checks_fail_total

Number of corrupted replicas on the secondary cluster since the service startup

abgw_file_replica_auto_errs_total[err] Number of geo-replication errors for new files (created after configuring geo-replication) since the service startup, divided by error type
abgw_file_replica_open_errs_total[err] Number of failed attempts by the primary cluster to open files for writing on the secondary cluster since the service startup, divided by error code
Object destination storage metrics
abgw_ostor_used_space_bytes Space size used by all backup archives, including data and unused space, on the object destination storage
abgw_nr_ostor_sequence_mismatch_total Number of files failed to be opened by backup storage due to their version mismatch on the object destination storage
abgw_ostor_garbage_bytes Unused space size inside all backup archives that is not yet physically cleaned up on the object destination storage
Container archive validation results
abgw_containers_validate_segments_fail_total

Number of archives with failed validation (segments) on the NFS and object destination storage

abgw_containers_validate_trees_fail_total Number of archives with failed validation (trees) on the NFS and object destination storage
Other metrics
abgw_append_throttle_delay_ms_total Total sum of delays injected since the service startup. The metric helps to understand if throttling is enabled for backup storage.
abgw_iop_ebusy Number of I/O errors for open file operations since the service startup

Histogram metrics with the "_bucket" suffix have corresponding metrics ending with "_sum" and "_counter", for example:

  • abgw_iop_latency_ms_bucket shows the current measurement for I/O operation latency per bucket
  • abgw_iop_latency_ms_count shows the total sum of all measurements for I/O operation latency per bucket
  • abgw_iop_latency_ms_sum shows the number of stored measurements for I/O operation latency per bucket