Disk-related metrics in Prometheus

The Prometheus service stores the following disk-related metrics:

CS-related metrics
csd_io_op_time_seconds Mean time per I/O request
master:mdsd_cs_status CS status on master MDS
Disk-related metrics in /proc/diskstats
node_disk_read_time_seconds Total time, in seconds, spent on read requests
node_disk_reads_completed Total number of completed read requests
node_disk_write_time_seconds Total time, in seconds, spent on write requests
node_disk_writes_completed Total number of completed write requests
S.M.A.R.T. metrics
smart_device_smart_healthy S.M.A.R.T. status is healthy
smart_reallocated_sector_ct Total number of reallocated disk sectors (05)
smart_reported_uncorrect Total number of errors that could not be recovered using hardware ECC (187)
smart_command_timeout Total number of aborted operations due to a timeout (188)
smart_current_pending_sector Total number of unstable sectors (197)
smart_offline_uncorrectable Total number of uncorrectable errors when reading/writing a sector (198)
smart_media_wearout_indicator Media Wearout Indicator for SSD (233)
smart_nvme_intel_wear_leveling Media Wearout Indicator for Intel NVME (233)
smart_scsi_read_errors_uncorrected Total number of uncorrectable errors when reading a sector
smart_scsi_reallocated_sector_ct Total number of reallocated disk sectors
smart_scsi_verify_errors_uncorrected Total number of uncorrectable errors when verifying a sector
smart_scsi_write_errors_uncorrected Total number of uncorrectable errors when writing a sector
Kernel SCSI errors
kernel_scsi_failures_total Total number of SCSI failures reported by the kernel
Disk health metric from vstorage-disks-monitor
diskmon_cs_disk_health Disk health reported by the vstorage-disks-monitor service. Possible values are 0.0–1.0. The 1.0 value means that the disk is 100% healthy.