Core storage metrics
Metrics used for monitoring core storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:
- /var/lib/prometheus/rules/mdsd.rules
 - /var/lib/prometheus/rules/csd.rules
 - /var/lib/prometheus/rules/fused.rules
 - /var/lib/prometheus/rules/rjournal.rules
 
Metrics that are used to generate core storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/pcs.rules. These metrics are described in the table:
| Metric | Description | 
|---|---|
fused_stuck_reqs_30s
                                         | 
                                        Number of stuck I/O requests on a node for more than 30 seconds | 
fused_stuck_reqs_10s
                                         | 
                                        Number of stuck I/O requests on a node for more than 10 seconds | 
fused_maps_failed
                                         | 
                                        Number of failed map requests on a node | 
fused_map_failures_total
                                         | 
                                        Total number of failed map requests on a node | 
fused_unaligned_writes:rate5m
                                         | 
                                        Number of unaligned write requests per second for 5 minutes | 
fused_writes:rate5m
                                         | 
                                        Number of write requests per second for 5 minutes | 
fused_unaligned_reads:rate5m
                                         | 
                                        Number of unaligned read requests per second for 5 minutes | 
fused_reads:rate5m
                                         | 
                                        Number of read requests per second for 5 minutes | 
mdsd_cluster_replication_stuck_chunks
                                         | 
                                        Number of chunks that block replication | 
mdsd_cluster_replication_touts_total
                                         | 
                                        Total number of chunks that slow down replication | 
job:mdsd_fs_chunk_maps:sum
                                         | 
                                        Number of chunks in the storage cluster | 
job:mdsd_fs_files:sum
                                         | 
                                        Number of files in the storage cluster | 
master:mdsd_cs_status
                                         | 
                                        Chunk service status | 
mdsd_cluster_free_space_bytes
                                         | 
                                        Amount of free physical space in the storage cluster | 
mdsd_cluster_space_bytes
                                         | 
                                        Total amount of physical space in the storage cluster | 
mdsd_is_master
                                         | 
                                        Node that runs the master metadata service | 
mdsd_master_uptime
                                         | 
                                        Master metadata uptime | 
instance_le:rjournal_commit_duration_seconds_bucket:rate5m
                                         | 
                                        Current commit latency by a particular metadata service for 5 minutes, for each bucket | 
instance_csid:csd_journal_usage_ratio:rate5m
                                         | 
                                        Percentage of free space for a chunk service journal for 5 minutes | 
process_cpu_seconds_total
                                         | 
                                        Total amount of time a process has used CPU | 
process_swap_bytes
                                         | 
                                        Amount of swap space used by a process | 
storage_policy_allocatable_space
                                         | 
                                        Amount of allocatable space per storage policy | 
The Prometheus recording rules also include metrics that are used for monitoring the following processes:
- Replication is a process of restoring redundancy of data.
 - Re-encoding is a process of changing redundancy of files with erasure coding.
 - Rebalancing is a process that moves data from one place to another.
 
These metrics are described in the table:
| Metric | Description | 
|---|---|
mdsd_cluster_to_replicate_chunks
                                         | 
                                        Number of chunks that need to be replicated | 
mdsd_cluster_replicated_chunks
                                         | 
                                        Total number of replicated chunks | 
mdsd_cluster_replication_touts
                                         | 
                                        Total number of timed out replications | 
mdsd_cluster_replication_stuck_chunks
                                         | 
                                        Number of chunks with last replication attempt failed | 
mdsd_cluster_rebalance_pending_chunks
                                         | 
                                        Number of chunks that need to be rebalanced | 
mdsd_enc_pending_files
                                         | 
                                        Number of files with re-encoding pending | 
mdsd_enc_pending_bytes
                                         | 
                                        Estimated physical size of files to be re-encoded (excluding punch-holed data) | 
mdsd_enc_pending_raw
                                         | 
                                        Estimated physical size of files to be re-encoded as a sum of sizes of involved chunks | 
fused_ls_gc_reencoding_chunks
                                         | 
                                        Amount of chunks being re-encoded at this time | 
fused_ls_gc_reencoded_bytes
                                         | 
                                        Total amount of data re-encoded |