2. Recommendations on benchmarking

Fio allows you to test storage using random and sequential workloads.

Random workloads indicate how databases, mail or web servers, and other similar software may perform if deployed inside VMs located on your storage. Such workloads typically use small blocks of data, e.g., 4K. HDDs are usually not very efficient at random reads/writes, because in such cases their read-and-write heads have to spend time on moving around the disk a lot. SSDs, in turn, typically have the block size of 1M, so they need to read and write 1M even if just 4K need to be changed. The available random read/write job profiles are set to use 4K blocks of data.

Sequential workloads can give you an idea about the performance of writing large files onto disks: creation of backups, migration of VM images, and such. Such job profiles are set to use 1M blocks of data.

Other considerations that you should take into account are:

  • It is recommended to have a cluster made of identical nodes. In this case, a single fio job profile will be suitable for benchmarking all of the nodes. If your cluster nodes have different hardware, you will need to create and use as many fio job profiles as there are node hardware configurations.

  • The dataset needs to be at least double the size of node’s RAM. This is needed so the dataset will not fit completely into RAM and be cached, allowing you to measure disk’s actual read performance.

  • To improve validity of results, 3-5 iterations of the same test need to be performed.

  • Each test should take at least 60 seconds, 120-300 seconds are recommended. This allows to negate the benefits of fast cache on some SSD drives.

Three main scenarios are usually tested:

  • Reads: 1M sequential and 4K random.

  • Writes: 1M sequential and 4K random.

  • Expand: sequential writes to a file that gradually increases in size. For example, backups, database files, VM disks, etc.