Storage cluster best practices

Using similar hardware

All cluster nodes must have identical or very similar hardware. Otherwise, the cluster will be imbalanced in terms of performance, because the cluster performance is limited by the slowest node in the cluster. This includes CPU, the amount of RAM, network cards, storage devices, controllers, etc.

Moreover, it is strongly recommended that:

All cluster nodes have the same number of disks in each storage tier.
All disks assigned to the same storage tier and role are identical in technology and size.

For example, if you have different HDD disks in the same storage tier, it will lead to unpredictable performance results, because the cluster speed is constrained by the slowest device in the tier. Also, with HDDs of a different size, the physical storage space is used inefficiently, resulting in unused resources.

Using different hardware can also be a problem for reasons unrelated to performance. For example, using different CPUs may block migration of virtual machines between nodes.

Using the same software version

We recommend using the same software version on all cluster nodes, to avoid both performance issues and issues that may occur during maintenance operations such as adding new nodes.

Using different storage tiers for different performance goals

Storage tiers allow you to create groups of disks. As a best practice, it is preferable to group storage devices based on their technology, and group data based on its performance goals. For example, separating data with high-priority access from data with low-priority access helps to optimize data access performance for both of these data types. Generally, it is also advised to separate hot data from cold data, and replicated data from encoded data.

Keep in mind that switching from replication to erasure coding may degrade storage performance. To avoid changing the redundancy method, we recommend planning it in advance.

A typical scenario of using storage tiers is to use disks of different technology as capacity devices in the same cluster, for example, HDDs and SSDs. Also note that faster drives should be assigned to higher storage tiers. For details on storage tiers, refer to Storage tiers.

Enabling NVMe performance

Enable NVMe performance to boost the performance of very fast devices such as NVMes. For details on enabling and configuring this feature, refer to Configuring NVMe performance.

External caching

If your cluster has HDD disks, some workloads might benefit from an additional caching layer of fast devices, such as SSDs or NVMes. Keep in mind that such a configuration is only optimal for some workloads, but when possible, the gained performance completely justifies the added costs. For details on storage cache configuration, refer to Cache configuration.

Enabling RDMA

Enabling RDMA reduces network latency and improves overall throughput, especially with random workloads.

To enable RDMA, every storage node in your cluster must be equipped with RDMA-capable network cards, and the network switch must support RDMA.

Note that RDMA must be enabled before the storage cluster is created. For details on enabling RDMA, refer to Enabling RDMA.

Using jumbo frames

If your cluster has 10+ Gbit/s network adapters, you can configure them to use jumbo frames (9000-byte MTU) on storage network switch ports and node interfaces, to achieve full performance.

To test if jumbo frames are working correctly, ping all other node interfaces in the storage network from each node:

# ping -s 8972 -M do <HOST>

Choosing the cluster size, redundancy, and network bandwidth

When choosing between building one large cluster and multiple small clusters, with an equal number of nodes in total, similar network latency between nodes, and with no other size limits reached (such as the limit on the number of files or chunks), it is always preferable to have a single large cluster. The larger the cluster, the better its performance, efficiency, and redundancy. A large cluster can afford to lose more nodes while still being able to heal itself automatically, avoid cluster degradation, and use more efficient erasure coding schemes.

While choosing between erasure coding schemes, you need to consider multiple factors. With the same number of parity chunks, using a higher number of data chunks increases storage efficiency, but decreases system reliability and performance.

The chosen redundancy scheme also affects the number of failure domains in a cluster. The general recommendation is to always plan at least one more failure domain in a cluster than required.

To calculate the optimal number of storage disks for each node, you need to consider network bandwidth as an upper limit of the available storage bandwidth each node can provide. The table below shows the maximum number of devices that can be hosted on each node before network bandwidth becomes a bottleneck.

Network bandwidth	Maximum number of disks per node based on device speed
Network bandwidth	100 MB/s	300 MB/s	1000 MB/s
10 GbE	12	4	1
2x10 GbE	21	7	2
25 GbE	31	10	3
2x 25GbE	53	17	5
50GbE	62	20	6
2x 50GbE	106	35	10
100GbE	125	41	12
2x 100GbE	212	70	21

Depending on capabilities of a network switch, its total available bandwidth can be saturated and not allow further scaling after a certain number of nodes is reached. Refer to your network switch specifications to understand if this can become a bottleneck.

Separating internal and external networks

Network bandwidth is used not only to deliver data services to network clients, but also for internode communication. Moreover, different data redundancy schemes have different network overhead. For example, when using replication with 3 replicas, serving a certain amount of write requests from applications needs at least double the amount of internal network bandwidth.

Separating internal and external data networks helps to increase available bandwidth and avoid bottlenecks.

Virtual machines performance

When trying to optimize performance of virtual machines, consider the following:

VirtIO disks are typically more performant than disks with the default SCSI bus. Note that VirtIO disks must be thick.
Snapshots have an impact on performance. If snapshots are not necessary, use volumes without snapshots, to improve performance.