Redundancy modes
Virtuozzo Hybrid Infrastructure supports a number of modes for each redundancy method. Only predefined redundancy modes are available in the admin panel. The following table illustrates data overhead of various redundancy modes. The first three lines are replication and the rest are erasure coding.
The numbers of failure domains listed in the table indicate only the requirements of each redundancy method but not the number of failure domains needed for the Virtuozzo Hybrid Infrastructure cluster. The general recommendation is to always have at least one more failure domain in a cluster than required by the chosen redundancy scheme. For example, a cluster using replication with 3 replicas and the host failure domain should have four nodes, and a cluster that works in the 7+2 erasure coding mode with the disk failure domain should have ten disks. Such a cluster configuration has the following advantages:
- The cluster will not be exposed to additional failures when in the degraded state. With one failure domain down, the cluster may not survive another even single-disk failure without data loss.
- You will be able to perform maintenance on cluster nodes that may be needed to recover a failed node (for example, for installing software updates).
- In most cases, the cluster will have enough nodes to rebuild itself. In a cluster with the host failure domain but without a spare node, each replica of user data is distributed to each cluster node for redundancy. If one or two nodes go down, the user data will not be lost, but the cluster will become degraded and will only start self-healing after the failed nodes are back online. During its rebuilding process, the cluster may be exposed to additional failures until all of its nodes are healthy again.
- You can replace and upgrade a cluster node without adding a new node to the cluster. A graceful release of a storage node is only possible if the remaining nodes in the cluster can comply with the configured redundancy scheme. You can, however, release a node forcibly without data migration, but it will make the cluster degraded and trigger the cluster self-healing.
The minimum and recommended cluster configurations are described in Quantity of servers.
Redundancy mode | Failure domains required to store data copies | How many failure domains can fail without data loss | Storage overhead, percent | Raw space needed to store 100 GB of data |
---|---|---|---|---|
1 replica (no redundancy) | 1 | 0 | 0 | 100 GB |
2 replicas | 2 | 1 | 100 | 200 GB |
3 replicas | 3 | 2 | 200 | 300 GB |
Encoding 1+0 (no redundancy) | 1 | 0 | 0 | 100 GB |
Encoding 1+1 | 2 | 1 | 100 | 200 GB |
Encoding 1+2 | 3 | 2 | 200 | 300 GB |
Encoding 3+1 | 4 | 1 | 33 | 133 GB |
Encoding 3+2 | 5 | 2 | 67 | 167 GB |
Encoding 5+2 | 7 | 2 | 40 | 140 GB |
Encoding 7+2 | 9 | 2 | 29 | 129 GB |
Encoding 17+3 | 20 | 3 | 18 | 118 GB |
The 1+0, 1+1, 1+2, and 3+1 encoding modes are meant for small clusters that have insufficient nodes for other erasure coding modes but will grow in the future. As a redundancy type cannot be changed once chosen (from replication to erasure coding or vice versa), this mode allows you to choose erasure coding even if your cluster is smaller than recommended. Once the cluster has grown, more beneficial redundancy modes can be chosen.
You select a data redundancy mode when configuring storage services and creating storage volumes for virtual machines. No matter what redundancy mode you select, it is highly recommended to be protected against a simultaneous failure of two nodes, as that happens often in real-life scenarios.
By default, all encoding modes, except 1+0 and M+1, allow write operations when one failure domain (for example, a storage node or disk) is inaccessible. The cluster starts working in the read-only mode with disabled write operations in the following cases:
- When redundancy is 1, that is with the M+1 encoding mode, and one failure domain is inaccessible.
- When redundancy is 2, that is with the M+2 encoding mode, and two failure domains are inaccessible.
If the number of unavailable failure domains is higher than the redundancy factor, then data becomes unavailable even for reading and there is a high risk of data loss. Therefore, for production, it is strongly recommended to use redundancy modes with the redundancy factor 2 or 3, such as encoding M+2, encoding M+3, and 3 replicas.