5.1. Monitoring General Storage Cluster Parameters¶
By monitoring general parameters, you can get detailed information about all components of the storage cluster, its overall status and health. To display this information, use the
vstorage -c <cluster_name> top command. For example:
The command above shows detailed information about the
stor1 cluster. The general parameters (highlighted in red) are as follows.
Overall status of the cluster:
- All chunk servers in the cluster are active.
- There is not enough information about the cluster state (e.g., because the master MDS server was elected a while ago).
- Some of the chunk servers in the cluster are inactive.
- The cluster has too many inactive chunk servers; the automatic replication is disabled.
- SMART warning
- One or more physical disks attached to cluster nodes are in pre-failure condition. For details, see Monitoring Physical Disks.
Amount of disk space in the cluster:
- Free physical disk space in the cluster.
- Amount of logical disk space available to clients. Allocatable disk space is calculated on the basis of the current replication parameters and free disk space on chunk servers. It may also be limited by license.
For more information on monitoring and understanding disk space usage in clusters, see Understanding Disk Space Usage.
- MDS nodes
- Number of active MDS servers as compared to the total number of MDS servers configured for the cluster.
- Epoch time
- Time elapsed since the MDS master server election.
- CS nodes
Number of active chunk servers as compared to the total number of chunk servers configured for the cluster.
In parentheses, you can see the additional information on these chunk servers:
- Active chunk servers (avail.) that are currently up and running in the cluster.
- Inactive chunk servers (inactive) that are temporarily unavailable. A chunk server is marked as inactive during its first 5 minutes of inactivity.
- Offline chunk servers (offline) that have been inactive for more than 5 minutes. A chunk server changes its state to offline after 5 minutes of inactivity. Once the state is changed to offline, the cluster starts replicating data to restore the chunks that were stored on the offline chunk server.
- Key number under which the license is registered on the Key Authentication server and license state.
- Replication settings. The normal number of chunk replicas and the limit after which a chunk gets blocked until recovered.
Disk IO activity in the cluster:
- Speed of read and write I/O operations, in bytes per second.
- Number of read and write I/O operations per second.