Understanding storage policies

Virtuozzo Hybrid Infrastructure can be used for the following scenarios: iSCSI block storage, NFS file storage, S3 object storage, Backup Storage (to store the backups created in Acronis Cyber Protect solutions). You can also use its built-in hypervisor to create compute virtual machines (VM). In all these scenarios, the common unit of data is a volume. For the compute service, a volume is a virtual drive that can be attached to a VM. For iSCSI, S3, Backup Gateway, and NFS, a volume is the data unit used for exporting space. In all these cases, when you create a volume, you need to define its redundancy mode, tier, and failure domain. These parameters make up a storage policy defining how redundant a volume must be and where it needs to be located.

Redundancy means that the data is stored across different storage nodes and stays highly available even if some nodes fail. If a storage node is inaccessible, the data copies on it are replaced by new ones that are distributed among healthy storage nodes. When the storage node goes up after the downtime, out-of-date data on it is updated.

With replication, Virtuozzo Hybrid Infrastructure breaks a volume into fixed-size pieces (data chunks). Each chunk is replicated as many times as is set in the storage policy. The replicas are stored on different storage nodes if the failure domain is host, so that each node has only one replica of a given chunk.

With erasure coding (or just encoding), the incoming data stream is split into fragments of a certain size. Then, each fragment is not copied itself; instead, a certain number (M) of such fragments are grouped and a certain number (N) of parity pieces are created for redundancy. All pieces are distributed among M+N storage nodes (selected from all available nodes). The data can survive the failure of any N storage nodes without data loss. The values of M and N are indicated in the names of erasure coding redundancy modes. For example, in the 5+2 mode, the incoming data is split into 5 fragments, and 2 more parity pieces (same size) are added for redundancy. Refer to the Administrator Guide for detailed information on redundancy, data overhead, number of nodes, and raw space requirements.

To better understand a storage policy, let’s have a look at its main components (tiers, failure domains, and redundancy), for a sample scenario. For example, you have three nodes with a number of storage nodes: fast SSDs and high-capacity HDDs. Node 1 has only SSDs; nodes 2 and 3 have both SSDs and HDDs. You want to export storage space via iSCSI and S3, so you need to define a suitable storage policy for each workload.

  • The first parameter, tier, defines a group of disks united by criteria (drive type, as a rule) tailored to a specific storage workload. For this sample scenario, you can group your SSD drives into tier 2, and HDD drives into tier 3. You can assign a disk to a tier when creating a storage cluster or adding nodes to it (refer to Creating the storage cluster). Note that only nodes 2 and 3 have HDDs and will be used for tier 3. The first node’s SSDs cannot be used for tier 3.
  • The second parameter, failure domain, defines a scope within which a set of storage services can fail in a correlated manner. The default failure domain is host. Each data chunk is copied to different storage nodes, just one copy per node. If a node fails, the data is still accessible from the healthy nodes. A disk can also be a failure domain, though it is only relevant for one-node clusters. As you have three nodes in this scenario, we recommend choosing the host failure domain.
  • The third parameter, redundancy, should be configured to fit the available disks and tiers. In our evaluation example, you have three nodes: all of them have SSDs on tier 2. So, if you select tier 2 in your storage policy, you can use the three nodes for 1, 2, or 3 replicas. But only two of your nodes have HDDs on tier 3. So, if you select tier 3 in your storage policy, you can only store 1 or 2 replicas on the two nodes. In both cases, you can also use encoding, but for our evaluation, let’s stick to replication: 3 replicas for SSDs and 2 replicas for HDDs.

To sum it up, the resulting storage policies are: