Monitoring node disks

Limitations

You cannot monitor performance of shingled magnetic recording (SMR) disks.

To monitor performance of a node disk

Go to the Infrastructure > Nodes screen and click the node name.
On the Disks tab, click a node disk, and then take a look at the charts on the Monitoring tab.

The disk charts display its current usage, average latency, and read/write activity. For advanced monitoring, click Grafana dashboard.

The default time interval for the charts is twelve hours. To zoom into a particular time interval, select the interval with the mouse; to reset zoom, double-click any chart.

To view the service details

Admin panel

Go to the Infrastructure > Nodes screen and click the node name.
On the Disks tab, click a node disk, and then go to the Service tab.

Service properties differ depending on the disk role:

Service properties	Storage	Metadata	Metadata+Cache	Cache
Status	Storage service status: Active The service is up and running. Unresponsive The service stops responding and degrades the cluster performance. The disk is isolated from the cluster I/O. Inactive The service has not responded for some time, but data replication has not started yet. A storage service is marked as inactive during its first 5 minutes of inactivity. Offline The service is inactive for more than 5 minutes. After a storage service goes offline, the cluster starts replicating data to restore the chunks that were stored on the affected storage disk. Out of space The disk that runs the service is running out of space. Releasing The service is being released. Failed The service is running but a problem has occurred with the storage disk. Release failed The service failed to be released. Entering maintenance The node that hosts the service is entering the maintenance mode. Maintenance The node that hosts the service is in the maintenance mode. The service is active, but not available for allocating new data chunks. Unknown The state of the service is unknown. Dropped The service was removed by the administrator. Unavailable The service is active, but not available for allocating new data chunks. Unrecognized The service cannot be recognized.	Metadata service status: Available The service is online. Syncing The service is syncing the cluster metadata. Unavailable The service is offline.		—
Systemd	Shows the state of `vstorage-csd.<cluster_name>.<CS_ID>.service`	Shows the state of `vstorage-mdsd.<cluster_name>.<MDS_ID>.service`		—
Tier	Shows the assigned storage tier	—	Shows tiers that are being cached
Service ID	Storage service ID	Metadata service ID		—
Usage	Space usage on the disk
Caching	Enabled/Disabled	—	—	—
Cache location	Shows the SSD disk where this disk's write cache is saved to. Displayed if caching is enabled.	—	—	—
Checksumming	Enabled/Disabled	—	—	—
Encryption	Enabled/Disabled	—	—	—

Command-line interface

Use the following command:

vinfra node disk show [--node <node>] <disk>

--node <node>: Node ID or hostname
<disk>: Disk ID or device name (default: node001.vstoragedomain)

For example, to view the details of the disk nvme0n1 attached to the node node003, run:

# vinfra node disk show nvme0n1 --node node003
+--------------------+------------------------------------------------------------------------------------------+
| Field              | Value                                                                                    |
+--------------------+------------------------------------------------------------------------------------------+
| being_assigned     | False                                                                                    |
| being_released     | False                                                                                    |
| device             | nvme0n1                                                                                  |
| disk_status        | ok                                                                                       |
| encryption         |                                                                                          |
| form_factor        |                                                                                          |
| id                 | B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                                     |
| is_blink_available | False                                                                                    |
| is_blinking        | False                                                                                    |
| issues             | []                                                                                       |
| lun_id             |                                                                                          |
| model              | INTEL SSDPE2KX020T8                                                                      |
| node_id            | e40195d1-64b8-4117-85f3-00bb5d7a1db6                                                     |
| nvme               | True                                                                                     |
| physical_size      | 2000398934016                                                                            |
| protocol           | name: NVMe                                                                               |
|                    | speed: null                                                                              |
| role               | cs                                                                                       |
| rpm                |                                                                                          |
| serial_number      | PHLJ950101C02P0BGN                                                                       |
| service_id         | 1091                                                                                     |
| service_params     | fail_messages: null                                                                      |
|                    | journal_data_size: 270532608                                                             |
|                    | journal_disk_id: B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                    |
|                    | journal_path: /vstorage/dc7aea32/journal/journal-cs-6aa56a11-70e6-4fd3-be4c-bf7fcd65e5d6 |
|                    | journal_type: inner_cache                                                                |
|                    | repo_dir: /vstorage/dc7aea32/cs                                                          |
|                    | systemd: active                                                                          |
|                    | tier: 0                                                                                  |
| service_status     | active                                                                                   |
| smart_status       | passed                                                                                   |
| space              | size: 1968848437248                                                                      |
|                    | used: 1540324716544                                                                      |
| tasks              |                                                                                          |
| temperature        | 36.0                                                                                     |
| type               | ssd                                                                                      |
| zoned              |                                                                                          |
+--------------------+------------------------------------------------------------------------------------------+

In the command output, service properties differ depending on the disk role:

Service properties cs mds mds-journal journal

service_id

Storage service ID

Metadata service ID

—

service_params

journal_data_size

Size of cached data for the storage service

journal_disk_id

Cache disk ID

journal_path

Path to the directory with the write journal

journal_type

Cache type used for the storage service:

no_cache
inner_cache
external_cache

repo_dir

Path to the repository with the storage service

systemd

Shows the state of vstorage-csd.<cluster_name>.<CS_ID>.service

tier

Shows the assigned storage tier

repo_dir: Path to the repository with the metadata service
systemd: Shows the state of vstorage-mdsd.<cluster_name>.<MDS_ID>.service

—

service_status

Storage service status:

active: The service is up and running.
ill: The service stops responding and degrades the cluster performance. The disk is isolated from the cluster I/O.
inactive: The service has not responded for some time, but data replication has not started yet. A storage service is marked as inactive during its first 5 minutes of inactivity.
offline: The service is inactive for more than 5 minutes. After a storage service goes offline, the cluster starts replicating data to restore the chunks that were stored on the affected storage disk.
no space: The disk that runs the service is running out of space.
releasing: The service is being released.
failed: The service is running but a problem has occurred with the storage disk.
failed rel: The service failed to be released.
entering_maintenance: The node that hosts the service is entering the maintenance mode.
maintenance: The node that hosts the service is in the maintenance mode. The service is active, but not available for allocating new data chunks
unknown: The state of the service is unknown.
dropped: The service was removed by the administrator.
unavailable: The service is active, but not available for allocating new data chunks.
unrecognized: The service cannot be recognized.

Metadata service status:

avail: The service is online.
stale: The service is syncing the cluster metadata.
unavail: The service is offline.

—

To view the disk details

Admin panel

Go to the Infrastructure > Nodes screen and click the node name.
On the Disks tab, click a node disk, and then go to the Disk tab.

Disk properties include the drive name, state, type, physical capacity, disk protocol, model, serial number, S.M.A.R.T. status, and temperature. A disk can have the following states:

Healthy: The disk is functioning normally.
Unavailable: The disk is powered down or disconnected.
Failed: The disk has failed or S.M.A.R.T. reported an error. You need to replace the disk.

Command-line interface

Use the following command:

vinfra node disk show [--node <node>] <disk>

--node <node>: Node ID or hostname
<disk>: Disk ID or device name (default: node001.vstoragedomain)

For example, to view the details of the disk nvme0n1 attached to the node node003, run:

# vinfra node disk show nvme0n1 --node node003
+--------------------+------------------------------------------------------------------------------------------+
| Field              | Value                                                                                    |
+--------------------+------------------------------------------------------------------------------------------+
| being_assigned     | False                                                                                    |
| being_released     | False                                                                                    |
| device             | nvme0n1                                                                                  |
| disk_status        | ok                                                                                       |
| encryption         |                                                                                          |
| form_factor        |                                                                                          |
| id                 | B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                                     |
| is_blink_available | False                                                                                    |
| is_blinking        | False                                                                                    |
| issues             | []                                                                                       |
| lun_id             |                                                                                          |
| model              | INTEL SSDPE2KX020T8                                                                      |
| node_id            | e40195d1-64b8-4117-85f3-00bb5d7a1db6                                                     |
| nvme               | True                                                                                     |
| physical_size      | 2000398934016                                                                            |
| protocol           | name: NVMe                                                                               |
|                    | speed: null                                                                              |
| role               | cs                                                                                       |
| rpm                |                                                                                          |
| serial_number      | PHLJ950101C02P0BGN                                                                       |
| service_id         | 1091                                                                                     |
| service_params     | fail_messages: null                                                                      |
|                    | journal_data_size: 270532608                                                             |
|                    | journal_disk_id: B9F2C34F-19CF-4133-A3AF-A1440BE837AD                                    |
|                    | journal_path: /vstorage/dc7aea32/journal/journal-cs-6aa56a11-70e6-4fd3-be4c-bf7fcd65e5d6 |
|                    | journal_type: inner_cache                                                                |
|                    | repo_dir: /vstorage/dc7aea32/cs                                                          |
|                    | systemd: active                                                                          |
|                    | tier: 0                                                                                  |
| service_status     | active                                                                                   |
| smart_status       | passed                                                                                   |
| space              | size: 1968848437248                                                                      |
|                    | used: 1540324716544                                                                      |
| tasks              |                                                                                          |
| temperature        | 36.0                                                                                     |
| type               | ssd                                                                                      |
| zoned              |                                                                                          |
+--------------------+------------------------------------------------------------------------------------------+

In the command output, the disk properties include the device name, disk status, type, physical size, protocol, model, serial number, S.M.A.R.T. status, temperature, etc. iSCSi disks also have its LUN ID.

To check storage disks with enabled caching

Go to the Infrastructure > Nodes screen and click the node name.
On the Disks tab, click a node disk with the Cache role, and then go to the Cache for disks tab.

The tab lists all of the storage disks that are being cached on the current disk.

To have the disk blink its activity LED

Admin panel

Go to the Infrastructure > Nodes screen and click the node name.
On the Disks tab, click a node disk.
On the disk right pane, click Blink.

To have the disk stop blinking, click Unblink.

Command-line interface

Use the following commands:

To start blinking the specified disk bay:
```
vinfra node disk blink on [--node <node>] <disk>
```
--node <node>

Node ID or hostname (default: node001.vstoragedomain)

<disk>

Disk ID or device name

For example, to start blinking the disk sda on the node node005, run:
```
# vinfra node disk blink on sda --node node005
```
To stop blinking the specified disk bay:
```
vinfra node disk blink off [--node <node>] <disk>
```
--node <node>

Node ID or hostname (default: node001.vstoragedomain)

<disk>

Disk ID or device name

For example, to stop blinking the disk sda on the node node005, run:
```
# vinfra node disk blink off sda --node node005
```