Redundancy modes

Virtuozzo Hybrid Infrastructure supports a number of modes for each redundancy method. You select a data redundancy mode when configuring storage services and creating storage volumes for virtual machines.

Regardless of the selected mode, it is strongly recommended to configure protection against the simultaneous failure of two nodes, as such failures are common in real-life scenarios.

Failure domain requirements

The tables below list redundancy modes predefined for different numbers of failure domains in a storage tier. As a general rule, the cluster should include at least one more failure domain than required by the chosen redundancy mode. For example, a cluster using replication with 3 replicas and the host failure domain should have four nodes, and a cluster that works in the 7+2 erasure coding mode with the disk failure domain should have ten disks. This additional failure domain provides the following advantages:

  • Improved fault tolerance: The cluster remains protected against additional failures while in a degraded state. Without a spare domain, the cluster may not survive another even single-disk failure without data loss.
  • Simplified maintenance: You can safely perform maintenance on cluster nodes, such as installing software updates, without compromising redundancy or risking data unavailability.
  • Cluster self-healing: A cluster with a spare node usually has enough resources to rebuild itself after a failure. With the host failure domain but without a spare node, user data remains safe if one or two nodes fail, but the cluster will not start self-healing until the failed nodes are back online. During rebuilding, the cluster may be exposed to additional failures until all of its nodes are healthy again.
  • Simplified node replacement and upgrades: Nodes can be replaced or upgraded without adding new nodes to the cluster. A graceful release of a storage node is only possible if the remaining nodes in the cluster still meet the redundancy requirements. You can, however, release a node forcibly without data migration, but it will make the cluster degraded and trigger the cluster self-healing.

The minimum and recommended cluster configurations are described in Quantity of servers.

Replication modes

Number of failure domains Redundancy mode How many failure domains can fail without data loss Storage overhead Comment
1 1 replica (no redundancy) 0 0% Stored data has no redundancy
2 2 replicas 1 100% Stored data will not have redundancy in case of single domain failure
1 replica (no redundancy) 0 0% Stored data has no redundancy
3 3 replicas 2 200% Not enough failure domains to provide high availability in case of single domain failure
2 replicas 1 100% Recommended
1 replica (no redundancy) 0 0% Stored data has no redundancy
4 3 replicas 2 200% Recommended for high reliability
2 replicas 1 100% Recommended for low overhead
1 replica (no redundancy) 0 0% Stored data has no redundancy
4+ 3 replicas 2 200% Recommended for high reliability
2 replicas 1 100% Recommended for low overhead
1 replica (no redundancy) 0 0% Stored data has no redundancy

Erasure coding modes

Number of failure domains Redundancy mode How many failure domains can fail without data loss Storage overhead Comment
1 1+0 (no redundancy) 0 0% Stored data has no redundancy
2 1+1 1 100% Stored data will not have redundancy in case of single domain failure
1+0 (no redundancy) 0 0% Stored data has no redundancy
3 1+2 2 200% Not enough failure domains to provide high availability in case of single domain failure
2+1 1 50% Stored data will not have redundancy in case of single domain failure
1+0 (no redundancy) 0 0% Stored data has no redundancy
4 1+2 2 200% Recommended
2+2 2 100% Not enough failure domains to provide high availability in case of single domain failure
3+1 1 33% Stored data will not have redundancy in case of single domain failure
1+0 0 0% Stored data has no redundancy
5 2+2 2 100% Recommended
3+2 2 67% Not enough failure domains to provide high availability in case of single domain failure
6 3+2 2 67% Recommended
4+2 2 50% Not enough failure domains to provide high availability in case of single domain failure
7 4+2 2 50% Recommended
5+2 2 40% Not enough failure domains to provide high availability in case of single domain failure
8 5+2 2 40% Recommended for low overhead
4+2 2 50% Recommended for high performance
9 6+2 2 33% Recommended for low overhead
4+2 2 50% Recommended for high performance
5+3 3 60% Recommended for high reliability
10 7+2 2 29% Recommended for low overhead
4+2 2 50% Recommended for high performance
6+3 3 50% Recommended for high reliability
11 8+2 2 25% Recommended for low overhead
4+2 2 50% Recommended for high performance
7+3 3 43% Recommended for high reliability
12 9+2 2 22% Recommended for low overhead
4+2 2 50% Recommended for high performance
8+3 3 38% Recommended for high reliability
13 10+2 2 20% Recommended for low overhead
4+2 2 50% Recommended for high performance
9+3 3 33% Recommended for high reliability
14 11+2 2 18% Recommended for low overhead
4+2 2 50% Recommended for high performance
10+3 3 30% Recommended for high reliability
15 12+2 2 17% Recommended for low overhead
4+2 2 50% Recommended for high performance
11+3 3 27% Recommended for high reliability
16 13+2 2 15% Recommended for low overhead
4+2 2 50% Recommended for high performance
12+3 3 25% Recommended for high reliability
17 14+2 2 14% Recommended for low overhead
4+2 2 50% Recommended for high performance
13+3 3 23% Recommended for high reliability
18 15+2 2 13% Recommended for low overhead
4+2 2 50% Recommended for high performance
14+3 3 21% Recommended for high reliability
19 16+2 2 13% Recommended for low overhead
4+2 2 50% Recommended for high performance
15+3 3 20% Recommended for high reliability
20 17+2 2 12% Recommended for low overhead
4+2 2 50% Recommended for high performance
16+3 3 19% Recommended for high reliability
20+ 17+2 2 12% Recommended for low overhead
4+2 2 50% Recommended for high performance
16+3 3 19% Recommended for high reliability

You can also create custom erasure coding modes by specifying the number of data fragments and parity fragments used for recovery.

The 1+0, 1+1, 1+2, and 3+1 encoding modes are meant for small clusters that have insufficient nodes for other erasure coding modes but will grow in the future. As a redundancy type cannot be changed once chosen (from replication to erasure coding or vice versa), this mode allows you to choose erasure coding even if your cluster is smaller than recommended. Once the cluster has grown, more beneficial redundancy modes can be chosen.

Cluster behavior during failures

By default, all encoding modes, except 1+0 and M+1, allow write operations when one failure domain (for example, a storage node or disk) is inaccessible. The cluster starts working in the read-only mode with disabled write operations in the following cases:

  • When redundancy is 1 (M+1 encoding mode) and one failure domain is inaccessible.
  • When redundancy is 2 (M+2 encoding mode) and two failure domains are inaccessible.

If the number of unavailable failure domains is higher than the redundancy factor, then data becomes unavailable even for reading and there is a high risk of data loss. Therefore, for production, it is strongly recommended to use redundancy modes with the redundancy factor 2 or 3, such as encoding M+2, encoding M+3, and 3 replicas.