2.4. Cache Configuration

2.4.1. Supported Device Types

Currently supported drives include HDD, SSD, and NVMe devices. Their characteristics are described in the table below.

Type

Cost

Performance

Interface and form-factor

Hard disk drives (HDD)

Low

Up to 200 MB/s Tens/Hundreds IOPS

SAS or SATA

Solid-state drives (SSD)

Average

Up to 600 MB/s Tens of thousands IOPS

SAS or SATA

Non-volatile memory express (NVMe)

High

From 1 to 10 GB/s Hundreds of thousands IOPS

2.5” U.2, PCIe Add-In-Card (AIC), or M.2

Note

PMem or NVRAM devices are not officially supported.

The amount and type of cache devices supported in your cluster should be checked per cluster node. In order to be of any use, devices that provide acceleration must be faster than the underlying devices.

Note

Cache devices configured in RAID1 mirroring are not officially supported.

It is recommended that all capacity devices in the same storage tier should be identical in terms of technology and size. Otherwise, there may be unpredictable performance and behavior in case of a hardware failure. Moreover, all cluster nodes should offer the same amount of storage. If this requirement is not met, the storage space in the cluster will be limited by the smallest node.

A similar recommendation applies to cache devices. As the writing speed is constrained by the slowest device in the cluster, we strongly recommend using cache devices of the same technology and size.

2.4.2. Choosing a Cache Device

As all the data ingested in the system goes through cache devices, the choice of a cache device should be based not only on speed, but also on device endurance. Device endurance is measured in two ways:

  • Drive Writes per Day (DWPD) measures the number of times the device can be completely overwritten each day, to reach the expected device end-of-life (usually five years).

  • Terabytes Written (TBW) measures the expected amount of data that can be written before the device fails.

Both parameters are equivalent and should be carefully evaluated. For example, you have a 1-TB flash drive with 1 DPWD, that means you can write 1 TB into it every day over its lifetime. If its warranty period is five years, that works out to 1 TB per day * 365 days/year * 5 years = 1825 TB of cumulative writes, after which the drive usually will have to be replaced. Thus, the drive’s TBW will be 1825.

The DWPD of a typical consumer-grade SSD drive can be as low as 0.1, while a high-end datacenter-grade flash drive can have up to 60 DWPD. For a cache device, the recommended minimum is 10 DWPD.

Another parameter to consider is power loss protection of the device. Some consumer-grade flash drives are known to silently ignore data flushing requests, which may lead to data loss in case of a power outage. Examples of such drives include OCZ Vertex 3, Intel 520, Intel X25-E, and Intel X-25-M G2. We recommend avoiding these drives (or test them with the vstorage-hwflush-check tool), and using enterprise-grade or datacenter-grade devices instead.

2.4.3. Provisioning Cache Devices

The minimum number of cache devices per node is one. However, note that in this case, if caching is used for all capacity devices, the cache device becomes a single point of failure, which may make the entire node unavailable. In order to avoid this, at least three cache devices per node are recommended.

Using multiple cache devices also provides the following improvements:

  • More capacity. This can be helpful if data is written in long bursts or if the cache fails in offloading to the underlying device.

  • Performance boost. If there is enough parallelism on the client side, the workload can be split among several cache devices, thus increasing the overall throughput.

  • High availability. With fewer capacity devices per cache device or with RAID mirroring, you can lower the probability of a downtime or its impact.

We recommend provisioning one cache device to every 4-12 capacity devices. Keep in mind that the speed of a cache device should be at least twice as high as that of the underlying capacity devices combined. Otherwise, the cache device may be a performance bottleneck. In this case, however, using cache can still improve latency and even performance in systems with lower parallelism.

2.4.4. Journal Sizing

Regardless of a cache device size, its journal size can be different, depending on the available space and number of chunk services that share the cache device. There are scenarios when using a journal smaller than the available capacity leads to performance improvements.

On one hand, if the size of all journals is less than the amount of available RAM, then the journal will only be used to store metadata. This will allow the system to keep the journal in RAM, avoiding all reads from the journal and resulting in fewer I/O operations. Ultimately, this will reduce the load on the cache devices and may improve the overall performance.

On the other hand, if the size of all journals is more than the amount of available RAM, then the journal will also be used to store temporary data and will serve as a read and write cache. This will boost the performance of both read and write requests. However, in this case, the cache device should be at least twice as fast as all of the underlying capacity devices combined, to be beneficial to the overall performance. If this is not the case, it is preferable to have a smaller journal. As speed is also largely dependent on the workload, this might not be obvious.

2.4.5. Cache Sizing

To decide on a cache device size, consider the endurance factor of a particular device and its journal size.

If you use cache for user data, then the cache device should be able to withstand sustained high throughput for as long as needed without filling up. The cache must offload its contents to the underlying device periodically, and this process depends on the speed of the underlying device. If the cache device becomes full, the system performance will degrade to the speed of the underlying devices, thus negating the caching benefits. Therefore, if the expected workload comes in bursts of a certain duration (for example, during office hours), the cache should be able to store at least the amount of data written during that period of time.

Risks and Possible Failures

Though cache devices may significantly improve the cluster performance, you need to consider their possible failures. Flash devices generally have a shorter lifespan and their use in this context exposes them to greater wear, when compared to capacity devices.

Also, keep in mind that as one cache device can be used to store multiple journals, all capacity devices associated with a cache device will become unavailable if this cache device fails.

Consider the following possible issues when using cache devices:

  • Data loss. A cache device failure may lead to data loss if the data has no replicas or RAID mirroring is not configured.

  • Performance degradation. If a cache device fails, the system will use other devices for storing data, which may result in a performance bottleneck or trigger the data rebalancing process to restore the data redundancy. This, in turn, will lead to increased disk and network usage and reduce the cluster performance.

  • Low availability. With a failed cache device, data redundancy may be degraded, which may result in a read-only or unreadable cluster in severe cases.

  • Less capacity. If a cache device fails, several capacity devices may become unavailable, leading to a lack of disk space available for writing new data.

To prevent these issues, use optimal redundancy policies and multiple cache devices in your system. Additionally, you can consider the possibility of using local replication (for example, RAID1) on top of distributed replication, especially in systems with low replication factors (1 replica or 1+0 encoding).