Redundancy modes
Virtuozzo Hybrid Infrastructure supports a number of modes for each redundancy method. You select a data redundancy mode when configuring storage services and creating storage volumes for virtual machines.
Regardless of the selected mode, it is strongly recommended to configure protection against the simultaneous failure of two nodes, as such failures are common in real-life scenarios.
Failure domain requirements
The tables below list redundancy modes predefined for different numbers of failure domains in a storage tier. As a general rule, the cluster should include at least one more failure domain than required by the chosen redundancy mode. For example, a cluster using replication with 3 replicas and the host failure domain should have four nodes, and a cluster that works in the 7+2 erasure coding mode with the disk failure domain should have ten disks. This additional failure domain provides the following advantages:
- Improved fault tolerance: The cluster remains protected against additional failures while in a degraded state. Without a spare domain, the cluster may not survive another even single-disk failure without data loss.
- Simplified maintenance: You can safely perform maintenance on cluster nodes, such as installing software updates, without compromising redundancy or risking data unavailability.
- Cluster self-healing: A cluster with a spare node usually has enough resources to rebuild itself after a failure. With the host failure domain but without a spare node, user data remains safe if one or two nodes fail, but the cluster will not start self-healing until the failed nodes are back online. During rebuilding, the cluster may be exposed to additional failures until all of its nodes are healthy again.
- Simplified node replacement and upgrades: Nodes can be replaced or upgraded without adding new nodes to the cluster. A graceful release of a storage node is only possible if the remaining nodes in the cluster still meet the redundancy requirements. You can, however, release a node forcibly without data migration, but it will make the cluster degraded and trigger the cluster self-healing.
The minimum and recommended cluster configurations are described in Quantity of servers.
Replication modes
| Number of failure domains | Redundancy mode | How many failure domains can fail without data loss | Storage overhead | Comment |
|---|---|---|---|---|
| 1 | 1 replica (no redundancy) | 0 | 0% | Stored data has no redundancy |
| 2 | 2 replicas | 1 | 100% | Stored data will not have redundancy in case of single domain failure |
| 1 replica (no redundancy) | 0 | 0% | Stored data has no redundancy | |
| 3 | 3 replicas | 2 | 200% | Not enough failure domains to provide high availability in case of single domain failure |
| 2 replicas | 1 | 100% | Recommended | |
| 1 replica (no redundancy) | 0 | 0% | Stored data has no redundancy | |
| 4 | 3 replicas | 2 | 200% | Recommended for high reliability |
| 2 replicas | 1 | 100% | Recommended for low overhead | |
| 1 replica (no redundancy) | 0 | 0% | Stored data has no redundancy | |
| 4+ | 3 replicas | 2 | 200% | Recommended for high reliability |
| 2 replicas | 1 | 100% | Recommended for low overhead | |
| 1 replica (no redundancy) | 0 | 0% | Stored data has no redundancy |
Erasure coding modes
| Number of failure domains | Redundancy mode | How many failure domains can fail without data loss | Storage overhead | Comment |
|---|---|---|---|---|
| 1 | 1+0 (no redundancy) | 0 | 0% | Stored data has no redundancy |
| 2 | 1+1 | 1 | 100% | Stored data will not have redundancy in case of single domain failure |
| 1+0 (no redundancy) | 0 | 0% | Stored data has no redundancy | |
| 3 | 1+2 | 2 | 200% | Not enough failure domains to provide high availability in case of single domain failure |
| 2+1 | 1 | 50% | Stored data will not have redundancy in case of single domain failure | |
| 1+0 (no redundancy) | 0 | 0% | Stored data has no redundancy | |
| 4 | 1+2 | 2 | 200% | Recommended |
| 2+2 | 2 | 100% | Not enough failure domains to provide high availability in case of single domain failure | |
| 3+1 | 1 | 33% | Stored data will not have redundancy in case of single domain failure | |
| 1+0 | 0 | 0% | Stored data has no redundancy | |
| 5 | 2+2 | 2 | 100% | Recommended |
| 3+2 | 2 | 67% | Not enough failure domains to provide high availability in case of single domain failure | |
| 6 | 3+2 | 2 | 67% | Recommended |
| 4+2 | 2 | 50% | Not enough failure domains to provide high availability in case of single domain failure | |
| 7 | 4+2 | 2 | 50% | Recommended |
| 5+2 | 2 | 40% | Not enough failure domains to provide high availability in case of single domain failure | |
| 8 | 5+2 | 2 | 40% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 9 | 6+2 | 2 | 33% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 5+3 | 3 | 60% | Recommended for high reliability | |
| 10 | 7+2 | 2 | 29% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 6+3 | 3 | 50% | Recommended for high reliability | |
| 11 | 8+2 | 2 | 25% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 7+3 | 3 | 43% | Recommended for high reliability | |
| 12 | 9+2 | 2 | 22% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 8+3 | 3 | 38% | Recommended for high reliability | |
| 13 | 10+2 | 2 | 20% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 9+3 | 3 | 33% | Recommended for high reliability | |
| 14 | 11+2 | 2 | 18% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 10+3 | 3 | 30% | Recommended for high reliability | |
| 15 | 12+2 | 2 | 17% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 11+3 | 3 | 27% | Recommended for high reliability | |
| 16 | 13+2 | 2 | 15% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 12+3 | 3 | 25% | Recommended for high reliability | |
| 17 | 14+2 | 2 | 14% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 13+3 | 3 | 23% | Recommended for high reliability | |
| 18 | 15+2 | 2 | 13% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 14+3 | 3 | 21% | Recommended for high reliability | |
| 19 | 16+2 | 2 | 13% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 15+3 | 3 | 20% | Recommended for high reliability | |
| 20 | 17+2 | 2 | 12% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 16+3 | 3 | 19% | Recommended for high reliability | |
| 20+ | 17+2 | 2 | 12% | Recommended for low overhead |
| 4+2 | 2 | 50% | Recommended for high performance | |
| 16+3 | 3 | 19% | Recommended for high reliability |
You can also create custom erasure coding modes by specifying the number of data fragments and parity fragments used for recovery.
The 1+0, 1+1, 1+2, and 3+1 encoding modes are meant for small clusters that have insufficient nodes for other erasure coding modes but will grow in the future. As a redundancy type cannot be changed once chosen (from replication to erasure coding or vice versa), this mode allows you to choose erasure coding even if your cluster is smaller than recommended. Once the cluster has grown, more beneficial redundancy modes can be chosen.
Cluster behavior during failures
By default, all encoding modes, except 1+0 and M+1, allow write operations when one failure domain (for example, a storage node or disk) is inaccessible. The cluster starts working in the read-only mode with disabled write operations in the following cases:
- When redundancy is 1 (M+1 encoding mode) and one failure domain is inaccessible.
- When redundancy is 2 (M+2 encoding mode) and two failure domains are inaccessible.
If the number of unavailable failure domains is higher than the redundancy factor, then data becomes unavailable even for reading and there is a high risk of data loss. Therefore, for production, it is strongly recommended to use redundancy modes with the redundancy factor 2 or 3, such as encoding M+2, encoding M+3, and 3 replicas.