Disk-related metrics in Prometheus
The Prometheus service stores the following disk-related metrics:
| CS-related metrics | |
| csd_io_op_time_seconds | Mean time per I/O request | 
| master:mdsd_cs_status | CS status on master MDS | 
| Disk-related metrics in /proc/diskstats | |
| node_disk_read_time_seconds | Total time, in seconds, spent on read requests | 
| node_disk_reads_completed | Total number of completed read requests | 
| node_disk_write_time_seconds | Total time, in seconds, spent on write requests | 
| node_disk_writes_completed | Total number of completed write requests | 
| S.M.A.R.T. metrics | |
| smart_device_smart_healthy | S.M.A.R.T. status is healthy | 
| smart_reallocated_sector_ct | Total number of reallocated disk sectors (05) | 
| smart_reported_uncorrect | Total number of errors that could not be recovered using hardware ECC (187) | 
| smart_command_timeout | Total number of aborted operations due to a timeout (188) | 
| smart_current_pending_sector | Total number of unstable sectors (197) | 
| smart_offline_uncorrectable | Total number of uncorrectable errors when reading/writing a sector (198) | 
| smart_media_wearout_indicator | Media Wearout Indicator for SSD (233) | 
| smart_nvme_intel_wear_leveling | Media Wearout Indicator for Intel NVME (233) | 
| smart_scsi_read_errors_uncorrected | Total number of uncorrectable errors when reading a sector | 
| smart_scsi_reallocated_sector_ct | Total number of reallocated disk sectors | 
| smart_scsi_verify_errors_uncorrected | Total number of uncorrectable errors when verifying a sector | 
| smart_scsi_write_errors_uncorrected | Total number of uncorrectable errors when writing a sector | 
| Kernel SCSI errors | |
| kernel_scsi_failures_total | Total number of SCSI failures reported by the kernel | 
| Disk health metric from vstorage-disks-monitor | |
| diskmon_cs_disk_health | Disk health reported by the vstorage-disks-monitorservice. Possible values are 0.0–1.0. The 1.0 value means that the disk is 100% healthy. | 
 See also
See also