Core storage metrics
Metrics used for monitoring core storage are configured in the Prometheus recording rules and can be found in these files on any node in the cluster:
- /var/lib/prometheus/rules/mdsd.rules
- /var/lib/prometheus/rules/csd.rules
- /var/lib/prometheus/rules/fused.rules
- /var/lib/prometheus/rules/rjournal.rules
Metrics that are used to generate core storage alerts are added to the alerting rules in /var/lib/prometheus/alerts/pcs.rules. These metrics are described in the table:
Metric | Description |
---|---|
fused_stuck_reqs_30s
|
Number of stuck I/O requests on a node for more than 30 seconds |
fused_stuck_reqs_10s
|
Number of stuck I/O requests on a node for more than 10 seconds |
fused_maps_failed
|
Number of failed map requests on a node |
fused_map_failures_total
|
Total number of failed map requests on a node |
fused_unaligned_writes:rate5m
|
Number of unaligned write requests per second for 5 minutes |
fused_writes:rate5m
|
Number of write requests per second for 5 minutes |
fused_unaligned_reads:rate5m
|
Number of unaligned read requests per second for 5 minutes |
fused_reads:rate5m
|
Number of read requests per second for 5 minutes |
mdsd_cluster_replication_stuck_chunks
|
Number of chunks that block replication |
mdsd_cluster_replication_touts_total
|
Total number of chunks that slow down replication |
job:mdsd_fs_chunk_maps:sum
|
Number of chunks in the storage cluster |
job:mdsd_fs_files:sum
|
Number of files in the storage cluster |
master:mdsd_cs_status
|
Chunk service status |
mdsd_cluster_free_space_bytes
|
Amount of free physical space in the storage cluster |
mdsd_cluster_space_bytes
|
Total amount of physical space in the storage cluster |
mdsd_is_master
|
Node that runs the master metadata service |
mdsd_master_uptime
|
Master metadata uptime |
instance_le:rjournal_commit_duration_seconds_bucket:rate5m
|
Current commit latency by a particular metadata service for 5 minutes, for each bucket |
instance_csid:csd_journal_usage_ratio:rate5m
|
Percentage of free space for a chunk service journal for 5 minutes |
process_cpu_seconds_total
|
Total amount of time a process has used CPU |
process_swap_bytes
|
Amount of swap space used by a process |