Storage cache architecture
The terms "cache" and "journal" are sometimes used interchangeably. In the storage cluster, however, cache refers to a fast hardware device (for example, SSD- or NVMe-based) that is used to store the chunk service journal. A journal, in its turn, is a buffer that is used by the chunk service and stored in a cache device. As multiple chunk services can share the same cache device, one cache can contain multiple journals.
As such, caching does not count as an additional storage tier in the cluster. Instead, each cache device can be associated with multiple chunk services that are assigned to different tiers, and used to store data journals.
By default, the chunk service stores its journal on the same device as its data. This configuration is called "inner cache." In order to use a fast cache device, the chunk service must be configured to use an "external cache."
If you use the "inner cache" configuration, it is recommended to keep the default journal size of 256 MB.
Reads and writes behavior
In the storage cluster, cache is mainly used for writing data: when new data is ingested in the system, it is temporarily stored in the cache. As a cache device is faster than a capacity one, writing data on the cache device improves performance. For a certain amount of time, data only exists in the cache, with remote replicas on the other cluster nodes, if remote replication is configured. During this time, all read operations hit the cache and benefit from the performance boost, as well. When the cache is reclaimed and data is removed from it, all subsequent read operations are redirected to a capacity device.
The journal is used as a ring buffer: it stores data until there is a need to reclaim space and make room for new data. When this happens, data is offloaded to a capacity device in a first-in, first-out fashion (FIFO).
Caching benefits
Caching helps to significantly improve write speed and write latency, with only a slight increase in the system cost. Systems with cache can benefit from high capacity of low-cost and low-performance hard disk devices (HDD), while providing fast write access by using flash devices, such as solid-state drives (SSD) or non-volatile memory devices (NVMe). Though the cost of cache devices is higher than that of capacity devices, only a few cache devices are needed, thereby making the overall system cost generally low. Moreover, the performance boost often justifies such an upgrade.
You can benefit from using cache in the following scenarios:
- "Hot" data storage
- Random writes
- Smaller block sizes or smaller files
- Databases and environments with several clients/threads
On the other hand, scenarios that usually have small advantage when using cache include:
- "Cold" data storage
- Constant throughput workloads, such as video surveillance recording
- Sequential writes of very large files
- Read-intensive workloads
In these cases, you might use an all-HDD solution, which will provide the same performance at a lower cost; or an all-flash solution, if your aim is to increase performance.