Monitoring backup storage
After you create backup storage, you can monitor it on the Storage services > Backup storage > Overview screen. The charts show the following information:
- Nodes. The chart shows the number and availability of nodes in the backup storage cluster.
- Performance. The chart shows the read and write activity of backup storage services over time.
- Geo-replication. The chart shows the geo-replication speed and backlog, which is the amount of data waiting to be replicated. If the geo-replication backlog does not decrease over time, it means the data cannot be replicated fast enough. The reason may be insufficient network transfer speed, and you may need to check or upgrade your network.
- Append latency. The chart shows the time spent on processing requests from backup agents to the storage.
-
Append throttle. If the chart is not empty, it means the underlying storage lacks free space and the backup storage is throttling user requests to slow down the data flow.
Two thresholds, soft and hard, are set on the percentage of used storage space. When the soft threshold is reached, backup storage starts to throttle write operations. Throttling intensity depends on consumed space and increases until the hard threshold is reached. When the used space passes the hard threshold, throttling works with maximum intensity. The thresholds depend on the backup destination and the number of nodes in the backup storage cluster:
Backup destination Number of backup nodes Soft threshold Hard threshold Local cluster 1 93% 95% 2+ 90% 92% NFS 1 93% 95% Public cloud 1 88% 90% 2+ 85% 87% - Object storage. The chart shows the object storage speed and backlog, which is the amount of data waiting to be uploaded to public cloud. If the object storage backlog does not decrease over time, it means the data cannot be uploaded fast enough. The reason may be insufficient network transfer speed, and you may need to check or upgrade your network.
You can also monitor backups storage nodes. To do this, go to Storage services > Backup storage > Nodes and click the required node. On the right pane, the Overview tab displays the performance statistics:
- CPU/RAM: CPU usage in percent over time, and RAM usage, in GiB over time
- Successful/Failed request rate: the number of successful and failed append requests per second
- Egress/Ingress request rate: the number of read and write requests per second
- Throughput: the amount of data read from or written to the backup storage per second
- Request latency: the time spent on processing requests
Advanced Backup Gateway monitoring via Grafana
For advanced monitoring of the Backup Gateway cluster, go to the Monitoring > Dashboard screen, and then click Grafana dashboard. A separate browser tab will open with preconfigured Grafana dashboards, two of which are dedicated to Acronis Backup Gateway. To see a detailed description for each chart, click the i icon on its left corner.
On the Acronis Backup Gateway dashboard, you need to pay attention to the following charts:
-
Availability. Any time period during which the gateways have not been available will be highlighted in red. In this case, you will need to look into logs on the nodes with the failed service and report a problem. To see the Backup Gateway log, use the following command:
# zstdcat /var/log/vstorage/abgw.log.zst
-
Migration/Replication throughput. The migration chart should be displayed during migration or if the cluster serves as master in a geo-replication configuration. The replication chart should mirror the ingress bandwidth chart.
-
Migration/replication backlog. The migration chart should decrease over time. The replication chart should be near zero, high values indicate network issues.
-
Rate limiting/ingress throttling. If the chart is not empty, it means the underlying storage lacks free space and the Backup Gateway is throttling user requests to slow down the data flow. Add more storage space to the cluster to solve the issue. For more information, refer to https://kb.acronis.com/content/62823.
-
New client connections. A high rate of failed connections due to SSL certificate verification problems on the chart means that clients uploaded an invalid certificate chain.
-
IO watchdog timeouts. If the chart is not empty, it means the underlying storage is not healthy and cannot deliver the required performance.
To see the charts for a particular client request, file, and I/O operation, select them from the drop-down menus above. A high rate of failed requests or operations and high latencies on these charts indicate that the Backup Gateway experiences issues that need to be reported. For example, you can check charts for the “Append” request:
- The Append rate chart displays the backup data flow from backup agents to the storage in operations per second (one operation equals one big block of backup data; blocks can be of various size).
- The Append latency chart shows the time spent on processing requests and should average several tens of milliseconds with peak values below one second.
The Acronis Backup Gateway Details dashboard is intended for low-level troubleshooting by the support team. To monitor a particular node, client request, file, and I/O operation, select them from the drop-down menus above. On the dashboard, you can make sure the Event loop inactivity chart is empty. Otherwise, the Backup Gateway is not healthy on this node and the issue needs to be reported.