3.5. Managing Cluster Parameters¶
This section explains what cluster parameters are and how you can configure them with the
3.5.1. Cluster Parameters Overview¶
The cluster parameters control creating, locating, and managing replicas for data chunks in a Virtuozzo Storage cluster. All parameters can be divided into three main groups of parameters: replication, encoding, location.
The table below briefly describes some of the cluster parameters. For more information on the parameters and how to configure them, see the following sections.
|Normal Replicas||The number of replicas to create for a data chunk, from 1 to 15. Recommended: 3.|
|Minimum Replicas||The minimum number of replicas for a data chunk, from 1 to 15. Recommended: 2.|
|Failure Domain||A placement policy for replicas, can be host (default) or disk (CS).|
|Tier||Storage tiers, from 0 to 3 (0 by default).|
3.5.2. Configuring Replication Parameters¶
The cluster replication parameters define the following:
- The normal number of replicas of a data chunk. When a new data chunk is created, Virtuozzo Storage automatically replicates it until the normal number of replicas is reached.
- The minimum number of replicas of a data chunk (optional). During the life cycle of a data chunk, the number of its replicas may vary. If a lot of chunk servers go down it may fall below the defined minimum. In such a case, all write operations to the affected replicas are suspended until their number reaches the minimum value.
To check the current replication parameters applied to a cluster, you can use the
vstorage get-attr command. For example, if your cluster is mounted to the
/vstorage/stor1 directory, you can run the following command:
# vstorage get-attr /vstorage/stor1 connected to MDS#1 File: '/vstorage/stor1' Attributes: ... replicas=1:1 ...
As you can see, the normal and minimum numbers of chunk replicas are set to 1.
Initially, any cluster is configured to have only 1 replica per each data chunk, which is sufficient to evaluate the Virtuozzo Storage functionality using one server only. In production, however, to provide high availability for your data, you are recommended to
- configure each data chunk to have at least 3 replicas,
- set the minimum number of replicas to 2.
Such a configuration requires at least 3 chunk servers to be set up in the cluster.
To configure the current replication parameters so that they apply to all virtual machines and Containers in your cluster, you can run the
vstorage set-attr command on the directory to which the cluster is mounted. For example, to set the recommended replication values to the
stor1 cluster mounted to
/vstorage/stor1, set the normal number of replicas for the cluster to 3:
# vstorage set-attr -R /vstorage/stor1 replicas=3
The minimum number of replicas will be automatically set to 2 by default.
For information on how the minimum number of replicas is calculated, see the
vstorage-set-attr man page.
Along with applying replication parameters to the entire contents of your cluster, you can also configure them for specific directories and files. For example:
# vstorage set-attr -R /vstorage/stor1/private/MyCT replicas=3
3.5.3. Configuring Encoding Parameters¶
As a better alternative to replication, Virtuozzo Storage can provide data redundancy by means of erasure coding. With it, Virtuozzo Storage breaks the incoming data stream into fragments of certain size, then splits each fragment into a certain number (M) of 1-megabyte pieces and creates a certain number (N) of parity pieces for redundancy. All pieces are distributed among M+N storage nodes, that is, one piece per node. On storage nodes, pieces are stored in regular chunks but such chunks are not replicated as redundancy is already achieved. The cluster can survive failure of any N storage nodes without data loss.
The values of M and N are indicated in the names of erasure coding redundancy modes. For example, in the 5+2 mode, the incoming data is broken into 5MB fragments, each fragment is split into five 1MB pieces and two more 1MB parity pieces are added for redundancy. In addition, if N is 2, the data is encoded using the RAID6 scheme, and if N is greater than 2, Reed-Solomon erasure codes are used.
It is recommended to use the following erasure coding redundancy modes (M+N):
Encoding is configured for directories. For example:
# vstorage set-attr -R /vstorage/stor1 encoding=5+2
After encoding is enabled, the redundancy mode cannot be changed back to replication. However, you can switch between different encoding modes for the same directory.
3.5.4. Configuring Failure Domains¶
A failure domain is a set of services which can fail in a correlated manner. Due to correlated failures it is very critical to scatter data replicas across different failure domains for data availability. Failure domain examples include:
- A single disk (the smallest possible failure domain). For this reason, Virtuozzo Storage never places more than 1 data replica per disk or chunk server (CS).
- A single host running multiple CS services. When such a host fails (e.g., due to a power outage or network disconnect), all CS services on it become unavailable at once. For this reason, Virtuozzo Storage is configured by default to make sure that a single host never stores more than 1 chunk replica (see Defining Failure Domains below).
184.108.40.206. Failure Domain Topology¶
Every Virtuozzo Storage service component has topology information assigned to it. Topology paths define a logical tree of components’ physical locations consisting of identifiers
host_ID.cs_ID that are generated automatically:
host_IDis a unique, randomly generated host identifier created during installation and located at
cs_IDis a unique service identifier generated at CS creation.
To view the current services topology and disk space available per location, run the
vstorage top command and press w.
220.127.116.11. Defining Failure Domains¶
Based on the levels of hierarchy described above, you can use the
vstorage set-attr command to define failure domains for proper file replica allocation:
# vstorage -c <cluster_name> set-attr -R -p /failure-domain=<disk|host>
diskmeans that only 1 replica is allowed per disk or chunk server,
hostmeans that only 1 replica is allowed per host (default),
You should use the same configuration for all cluster files as it simplifies the analysis and is less error-prone.
18.104.22.168. Recommendations on Failure Domains¶
Do not use failure domain
disk simultaneously with journaling SSDs. In this case, multiple replicas may happen to be located on disks serviced by the same journaling SSD. If that SSD fails, all replicas that depend on journals located on it will be lost. As a result, your data may be lost.
- For the flexibility of Virtuozzo Storage allocator and rebalancing mechanisms, it is always recommended to have at least 5 failure domains configured in a production setup. Reserve enough disk space on each failure domain so if a domain fails it can be recovered to healthy ones.
- At least 3 replicas are recommended.
- If a huge failure domain fails and goes offline, Virtuozzo Storage will not perform data recovery by default, because replicating a huge amount of data may take longer than domain repairs. This behavior managed by the global parameter
vstorage-config) which controls the number of failed hosts to be considered as a normal disaster worth recovering in the automatic mode
- Depending on the global parameter
vstorage-config), the domain policy can be strict (default) or advisory. Tuning this parameter is highly not recommended unless you are absolutely sure of what you are doing.
3.5.5. Using Storage Tiers¶
This section describes storage tiers used in Virtuozzo Storage clusters and provides information of how to configure and monitor them.
22.214.171.124. What Are Storage Tiers¶
Storage tiers represent a way to organize storage space. You can use them to keep different categories of data on different chunk servers. For example, you can use high-speed solid-state drives to store performance-critical data instead of caching cluster operations.
126.96.36.199. Configuring Storage Tiers¶
To assign disk space to a storage tier, do this:
Assign all chunk servers with SSD drives to the same tier. You can do this when setting up a chunk server (see Stage 2: Creating a Chunk Server for details).
For information on recommended SSD drives, see Using SSD Drives.
Assign the frequently accessed directories and files to tier 1 with the
vstorage set-attrcommand. For example:
# vstorage set-attr -R /vstorage/stor1/private/MyCT tier=1
This command recursively assigns the directory
/vstorage/stor1/private/MyCTand its contents to tier 1.
When assigning storage to tiers, have in mind that faster storage drives should be assigned to higher tiers. For example, you can use tier 0 for backups and other cold data (CS without SSD journals), tier 1 for virtual environments—a lot of cold data but fast random writes (CS with SSD journals), tier 2 for hot data (CS on SSD), journals, caches, specific virtual machine disks, and such.
This recommendation is related to how Virtuozzo Storage works with storage space. If a storage tier runs out of free space, Virtuozzo Storage will attempt to temporarily use a lower tier. If you add more storage to the original tier later, the data, temporarily stored elsewhere, will be moved to the tier where it should have been stored originally.
For example, if you try to write data to the tier 2 and it is full, Virtuozzo Storage will attempt to write that data to tier 1, then to tier 0. If you add more storage to tier 2 later, the aforementioned data, now stored on the tier 1 or 0, will be moved back to the tier 2 where it was meant to be stored originally.
Automatic Data Balancing
To maximize the I/O performance of chunk servers in a cluster, Virtuozzo Storage automatically balances CS load by moving hot data chunks from hot chunk servers to colder ones.
A chunk server is considered hot if its request queue depth exceeds the cluster-average value by 40% or more (see example below). With data chunks, “hot” means “most requested”.
The hotness (i.e. request queue depth) of chunk servers is indicated by the
QDEPTH parameter shown in the output of
vstorage top and
vstorage stat commands. For example:
... IO QDEPTH: 0.1 aver, 1.0 max; 1 out of 1 hot CS balanced 46 sec ago ... CSID STATUS SPACE AVAIL REPLICAS UNIQUE IOWAIT IOLAT(ms) QDEPTH HOST BUILD_VERSION 1025 active 1007.3 156.8G 7142 0 10% 1/117 0.3 10.31.240.167 6.0.11-10 1026 active 1007.3 156.8G 7267 0 11% 0/225 0.1 10.31.240.167 6.0.11-10 1027 active 1007.3 156.8G 7151 0 2% 0/10 0.1 10.31.240.167 6.0.11-10 1028 active 1007.3 156.8G 7285 0 13% 1/141 1.0 10.31.240.167 6.0.11-10 ...
In the output, the
IO QDEPTH line shows the average and maximum request queue depth values in the entire cluster for the last 60 seconds. The
QDEPTH column shows average request queue depth values for each CS for the last 5 seconds.
Each 60 seconds, the hottest data chunk is moved from a hot CS to one with a shorter request queue.
188.8.131.52. Monitoring Storage Tiers¶
You can monitor disk space assigned to each storage tier with the
top utility in the verbose mode (enabled by pressing v). Typical output may look like this:
3.5.6. Changing Virtuozzo Storage Cluster Network¶
Before moving your cluster to a new network, consider the following:
- Changing the cluster network results in a brief downtime for the period when more than half of the MDS servers are unavailable.
- It is highly recommended to back up all MDS repositories before changing the cluster network.
To change the Virtuozzo Storage cluster network, do the following on each node in the cluster where an MDS service is running:
Stop the MDS service:
# systemctl stop vstorage-mdsd.target
Specify new IP addresses for all metadata servers in the cluster with the command
vstorage configure-mds -r <MDS_repo> -n <MDS_ID@new_IP_address>[:port] [-n ...], where:
<MDS_repo>is the repository path of the MDS on the current node.
<MDS_ID@new_IP_address>pair is an MDS identifier and a corresponding new IP address. For example, for a cluster with 5 metadata servers:
# vstorage -c stor1 configure-mds -r /vstorage/stor1-cs1/mds/data -n firstname.lastname@example.org \ -n email@example.com -n firstname.lastname@example.org -n email@example.com -n firstname.lastname@example.org
- You can obtain the identifier and repository path for the current MDS with the
vstorage list-services -Mcommand.
- If you omit the port, the default port 2510 will be used.
Start the MDS service:
# systemctl start vstorage-mdsd.target
3.5.7. Enabling Online Compacting of Virtual Machines¶
Online compacting of virtual machines on Virtuozzo Storage in the replication mode allows reclaiming disk space no longer occupied by data by means of the
FALLOC_FL_PUNCH_HOLE flag. Online compacting is based on triggering the TRIM command from inside a guest. Windows guests have the feature enabled by default, while for Linux guests it is enabled with guest tool installation.
Online compacting works by default unless the
discard flag is not set to
unmap for VM’s disk drives.
To enable online compacting for your Virtuozzo Storage cluster, do the following:
Update all cluster nodes to Virtuozzo Hybrid Server 7 Update 5.
Restart updated cluster nodes one by one.
Run the following command on any cluster node:
# vstorage set-config "gen.do_punch_hole=1"
Running the command before updating all the chunk servers will result in data corruption!
To reclaim unused space accumulated before online compacting was enabled (e.g., from VMs created on Virtuozzo Hybrid Server 7 Update 4 and older), create a file inside the VM with size comparable to that of the unused space, then remove it.