Redundancy by replication

With replication, Virtuozzo Hybrid Infrastructure breaks the incoming data stream into 256 MB chunks. Each chunk is replicated and replicas are stored on different failure domains, so that each failure domain has only one replica of a given chunk.

The following diagram illustrates the 2 replicas redundancy mode with the host failure domain.

Replication in Virtuozzo Hybrid Infrastructure is similar to the RAID rebuild process, but has two key differences:

  • Replication in Virtuozzo Hybrid Infrastructure is much faster than that of a typical online RAID 1/5/10 rebuild. The reason is that Virtuozzo Hybrid Infrastructure replicates chunks in parallel, to multiple failure domains.
  • The more storage nodes are in a cluster, the faster the cluster will recover from a disk or node failure.

High replication performance minimizes the periods of reduced redundancy for the cluster. Replication performance is affected by:

  • The number of available storage nodes. As the replication runs in parallel, the more available replication sources and destinations there are, the faster it is.
  • Performance of storage node disks.
  • Network performance. All replicas are transferred between failure domains over network. For example, 1 Gbps throughput can be a bottleneck (refer to Network requirements and recommendations).
  • Distribution of data in the cluster. Some storage nodes may have much more data to replicate than others and may become overloaded during replication.
  • I/O activity in the cluster during replication.

When you configure replication, you set the number of replicas that represents the normal number of replicas per data chunk. Replication has another parameter called the minimum number of replicas, by default it equals to the normal number of replicas minus one. The table below briefly describes these and some other cluster parameters.

Parameter Description
Replication parameters
Normal replicas

The number of replicas to create for a data chunk, from 1 to 3.

When a new data chunk is created, Virtuozzo Hybrid Infrastructure automatically replicates it until the normal number of replicas is reached.

The recommended value is 3 replicas.

Minimum replicas (optional)

The minimum number of replicas for a data chunk, from 1 to 3.

During the life cycle of a data chunk, the number of its replicas may vary. If a lot of chunk servers go down it may fall below the defined minimum. In this case, all write operations to the affected replicas are suspended until their number reaches the minimum value.

The recommended value is 2 replicas.

Location parameters
Failure domain A placement policy for replicas: disk, host (default), rack, row, or room (refer to Failure domains)
Tier Storage tiers, from 0 (default) to 3 (refer to Storage tiers)

For production, it is recommended to use the replication mode with 3 replicas.