Monitoring node disks

Limitations

  • You cannot monitor performance of shingled magnetic recording (SMR) disks.

To monitor performance of a node disk

  1. Go to the Infrastructure > Nodes screen and click the node name.
  2. On the Disks tab, click a node disk, and then take a look at the charts on the Monitoring tab.

The disk charts display its current usage, average latency, and read/write activity. For advanced monitoring, click Grafana dashboard.

The default time interval for the charts is twelve hours. To zoom into a particular time interval, select the interval with the mouse; to reset zoom, double-click any chart.

To view the service details

Admin panel

  1. Go to the Infrastructure > Nodes screen and click the node name.
  2. On the Disks tab, click a node disk, and then go to the Service tab.

Service properties differ depending on the disk role:

Service properties Storage Metadata Metadata+Cache Cache
Status

Storage service status:

Active
The service is up and running.
Unresponsive
The service stops responding and degrades the cluster performance. The disk is isolated from the cluster I/O.
Inactive

The service has not responded for some time, but data replication has not started yet. A storage service is marked as inactive during its first 5 minutes of inactivity.

Offline

The service is inactive for more than 5 minutes. After a storage service goes offline, the cluster starts replicating data to restore the chunks that were stored on the affected storage disk.

Out of space
The disk that runs the service is running out of space.
Releasing
The service is being released.
Failed
The service is running but a problem has occurred with the storage disk.
Release failed
The service failed to be released.
Entering maintenance
The node that hosts the service is entering the maintenance mode.
Maintenance
The node that hosts the service is in the maintenance mode. The service is active, but not available for allocating new data chunks.
Unknown
The state of the service is unknown.
Dropped
The service was removed by the administrator.
Unavailable
The service is active, but not available for allocating new data chunks.
Unrecognized
The service cannot be recognized.

Metadata service status:

Available
The service is online.
Syncing
The service is syncing the cluster metadata.
Unavailable
The service is offline.
Systemd Shows the state of vstorage-csd.<cluster_name>.<CS_ID>.service

Shows the state of vstorage-mdsd.<cluster_name>.<MDS_ID>.service

Tier

Shows the assigned storage tier

Shows tiers that are being cached

Service ID

Storage service ID

Metadata service ID

Usage

Space usage on the disk

Caching

Enabled/Disabled

Cache location

Shows the SSD disk where this disk's write cache is saved to.

Displayed if caching is enabled.

Checksumming Enabled/Disabled
Encryption Enabled/Disabled

Command-line interface

Use the following command:

vinfra node disk show [--node <node>] <disk>
--node <node>
Node ID or hostname
<disk>
Disk ID or device name (default: node001.vstoragedomain)

For example, to view the details of the disk nvme0n1 attached to the node node003, run:

# vinfra node disk show nvme0n1 --node node003
+--------------------+------------------------------------------------------------------------------------------+
| Field              | Value                                                                                    |
+--------------------+------------------------------------------------------------------------------------------+
| being_assigned     | False                                                                                    |
| being_released     | False                                                                                    |
| device             | nvme0n1                                                                                  |
| disk_status        | ok                                                                                       |
| encryption         |                                                                                          |
| form_factor        |                                                                                          |
| id                 | B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                                     |
| is_blink_available | False                                                                                    |
| is_blinking        | False                                                                                    |
| issues             | []                                                                                       |
| lun_id             |                                                                                          |
| model              | INTEL SSDPE2KX020T8                                                                      |
| node_id            | e40195d1-64b8-4117-85f3-00bb5d7a1db6                                                     |
| nvme               | True                                                                                     |
| physical_size      | 2000398934016                                                                            |
| protocol           | name: NVMe                                                                               |
|                    | speed: null                                                                              |
| role               | cs                                                                                       |
| rpm                |                                                                                          |
| serial_number      | PHLJ950101C02P0BGN                                                                       |
| service_id         | 1091                                                                                     |
| service_params     | fail_messages: null                                                                      |
|                    | journal_data_size: 270532608                                                             |
|                    | journal_disk_id: B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                    |
|                    | journal_path: /vstorage/dc7aea32/journal/journal-cs-6aa56a11-70e6-4fd3-be4c-bf7fcd65e5d6 |
|                    | journal_type: inner_cache                                                                |
|                    | repo_dir: /vstorage/dc7aea32/cs                                                          |
|                    | systemd: active                                                                          |
|                    | tier: 0                                                                                  |
| service_status     | active                                                                                   |
| smart_status       | passed                                                                                   |
| space              | size: 1968848437248                                                                      |
|                    | used: 1540324716544                                                                      |
| tasks              |                                                                                          |
| temperature        | 36.0                                                                                     |
| type               | ssd                                                                                      |
| zoned              |                                                                                          |
+--------------------+------------------------------------------------------------------------------------------+

In the command output, service properties differ depending on the disk role:

Service properties cs mds mds-journal journal
service_id

Storage service ID

Metadata service ID

service_params
journal_data_size
Size of cached data for the storage service
journal_disk_id
Cache disk ID
journal_path
Path to the directory with the write journal
journal_type

Cache type used for the storage service:

  • no_cache
  • inner_cache
  • external_cache
repo_dir
Path to the repository with the storage service
systemd
Shows the state of vstorage-csd.<cluster_name>.<CS_ID>.service
tier
Shows the assigned storage tier
repo_dir
Path to the repository with the metadata service
systemd
Shows the state of vstorage-mdsd.<cluster_name>.<MDS_ID>.service
service_status

Storage service status:

active
The service is up and running.
ill
The service stops responding and degrades the cluster performance. The disk is isolated from the cluster I/O.
inactive
The service has not responded for some time, but data replication has not started yet. A storage service is marked as inactive during its first 5 minutes of inactivity.
offline
The service is inactive for more than 5 minutes. After a storage service goes offline, the cluster starts replicating data to restore the chunks that were stored on the affected storage disk.
no space
The disk that runs the service is running out of space.
releasing
The service is being released.
failed
The service is running but a problem has occurred with the storage disk.
failed rel
The service failed to be released.
entering_maintenance
The node that hosts the service is entering the maintenance mode.
maintenance
The node that hosts the service is in the maintenance mode. The service is active, but not available for allocating new data chunks
unknown
The state of the service is unknown.
dropped
The service was removed by the administrator.
unavailable
The service is active, but not available for allocating new data chunks.
unrecognized
The service cannot be recognized.

Metadata service status:

avail
The service is online.
stale
The service is syncing the cluster metadata.
unavail
The service is offline.

To view the disk details

Admin panel

  1. Go to the Infrastructure > Nodes screen and click the node name.
  2. On the Disks tab, click a node disk, and then go to the Disk tab.

Disk properties include the drive name, state, type, physical capacity, disk protocol, model, serial number, S.M.A.R.T. status, and temperature. A disk can have the following states:

Healthy
The disk is functioning normally.
Unavailable
The disk is powered down or disconnected.
Failed
The disk has failed or S.M.A.R.T. reported an error. You need to replace the disk.

Command-line interface

Use the following command:

vinfra node disk show [--node <node>] <disk>
--node <node>
Node ID or hostname
<disk>
Disk ID or device name (default: node001.vstoragedomain)

For example, to view the details of the disk nvme0n1 attached to the node node003, run:

# vinfra node disk show nvme0n1 --node node003
+--------------------+------------------------------------------------------------------------------------------+
| Field              | Value                                                                                    |
+--------------------+------------------------------------------------------------------------------------------+
| being_assigned     | False                                                                                    |
| being_released     | False                                                                                    |
| device             | nvme0n1                                                                                  |
| disk_status        | ok                                                                                       |
| encryption         |                                                                                          |
| form_factor        |                                                                                          |
| id                 | B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                                     |
| is_blink_available | False                                                                                    |
| is_blinking        | False                                                                                    |
| issues             | []                                                                                       |
| lun_id             |                                                                                          |
| model              | INTEL SSDPE2KX020T8                                                                      |
| node_id            | e40195d1-64b8-4117-85f3-00bb5d7a1db6                                                     |
| nvme               | True                                                                                     |
| physical_size      | 2000398934016                                                                            |
| protocol           | name: NVMe                                                                               |
|                    | speed: null                                                                              |
| role               | cs                                                                                       |
| rpm                |                                                                                          |
| serial_number      | PHLJ950101C02P0BGN                                                                       |
| service_id         | 1091                                                                                     |
| service_params     | fail_messages: null                                                                      |
|                    | journal_data_size: 270532608                                                             |
|                    | journal_disk_id: B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                    |
|                    | journal_path: /vstorage/dc7aea32/journal/journal-cs-6aa56a11-70e6-4fd3-be4c-bf7fcd65e5d6 |
|                    | journal_type: inner_cache                                                                |
|                    | repo_dir: /vstorage/dc7aea32/cs                                                          |
|                    | systemd: active                                                                          |
|                    | tier: 0                                                                                  |
| service_status     | active                                                                                   |
| smart_status       | passed                                                                                   |
| space              | size: 1968848437248                                                                      |
|                    | used: 1540324716544                                                                      |
| tasks              |                                                                                          |
| temperature        | 36.0                                                                                     |
| type               | ssd                                                                                      |
| zoned              |                                                                                          |
+--------------------+------------------------------------------------------------------------------------------+

In the command output, the disk properties include the device name, disk status, type, physical size, protocol, model, serial number, S.M.A.R.T. status, temperature, etc. iSCSi disks also have its LUN ID.

To check storage disks with enabled caching

  1. Go to the Infrastructure > Nodes screen and click the node name.
  2. On the Disks tab, click a node disk with the Cache role, and then go to the Cache for disks tab.

The tab lists all of the storage disks that are being cached on the current disk.

To have the disk blink its activity LED

Admin panel

  1. Go to the Infrastructure > Nodes screen and click the node name.
  2. On the Disks tab, click a node disk.
  3. On the disk right pane, click Blink.

To have the disk stop blinking, click Unblink.

Command-line interface

Use the following commands:

  • To start blinking the specified disk bay:

    vinfra node disk blink on [--node <node>] <disk>
    
    --node <node>
    Node ID or hostname (default: node001.vstoragedomain)
    <disk>
    Disk ID or device name

    For example, to start blinking the disk sda on the node node005, run:

    # vinfra node disk blink on sda --node node005
  • To stop blinking the specified disk bay:

    vinfra node disk blink off [--node <node>] <disk>
    
    --node <node>
    Node ID or hostname (default: node001.vstoragedomain)
    <disk>
    Disk ID or device name

    For example, to stop blinking the disk sda on the node node005, run:

    # vinfra node disk blink off sda --node node005