Performance issues and symptoms

Disks IOPS saturation

A common root cause of performance issues is the disk throughput limit. To understand if a disk has reached its limit, you can check the state of its I/O queue. If this queue is constantly full, this means that the disk is at its peak performance capacity. To investigate the state of the I/O queue for all disks, use the following command:

# iostat -x 10

The command output will be similar to this:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.08    0.00    1.11    0.03    0.03   94.76

Device:  rrqm/s  wrqm/s   r/s   w/s  rkB/s  wkB/s ... %util
sda        0.00    8.30  0.00  3.20   0.00  46.40 ...  0.20
sdb        0.00    0.00  0.00  0.00   0.00   0.00 ...  0.00
sdc        0.00    0.00  0.00  0.00   0.00   0.00 ...  0.00
scd0       0.00    0.00  0.00  0.00   0.00   0.00 ...  0.00

You need to pay attention to the following metrics:

  • %iowait is the percentage of time the CPU was idle while requests were in the I/O queue. If this value is well above zero, this might mean that the node I/O is constrained by the disks speed.
  • %idle is the percentage of time the CPU was idle while there were no requests in the I/O queue. If this value is close to zero, this means that the node I/O is constrained by the CPU speed.
  • %util is the percentage of time the device I/O queue was not empty. If this value is close to 100%, this means that the disk throughput is reaching its limit.

iSCSI LUNs performance

iSCSI LUNs served via Virtuozzo Hybrid Infrastructure may experience reduced performance, especially when accessing data with a high number of threads. In this case, it is generally preferable to split the load across multiple smaller iSCSI LUNs, or if possible, avoid iSCSI by accessing storage devices directly. If it is not possible to reduce the size of LUNs, you can consider deploying an LVM RAID0 group over multiple smaller LUNs.

Journal size and location

If your cluster was deployed from an old product version, it is possible that the journal configuration is not optimal. Specifically, if the journal is configured as “inner cache,” that is, stored on the same physical device as data, the recommended size is 256 MB.

Also, consider moving the journals to a faster device such as an SSD or NVMe, if it is applicable to your workload. For more details on storage cache configuration, refer to Cache configuration.

Make sure the journal settings are the same for all journals in the same tier.

To check the current size and location of journals, run the following command and specify the desired disk:

# vinfra node disk show <DISK>

For example:

# vinfra node disk show sdc

The command output will be similar to this:

| service_params | journal_data_size: 270532608
|                | journal_disk_id: 5EE99654-4D5E-4E00-8AF6-7E83244C5E6B
|                | journal_path: /vstorage/94aebe68/journal/journal-cs-b8efe751-96a6ff460b80
|                | journal_type: inner_cache

The journal size and location will be reported as highlighted in this example.

RAM and swap usage

Though the system may operate at nearly 100 percent RAM consumption, it should not aggressively swap virtual memory to and from a disk.

You can check the state of the swap space by running:

# free -hm
            total      used      free      shared  buff/cache   available
Mem:         7,7G      2,2G      498M        3,7G        5,0G        1,5G
Swap:        3,9G      352M      3,6G

In the command output, the Swap row shows the current swap space size. If the reported size is zero, it means swapping is disabled, which may be done intentionally in some configurations.

You can also check the use of swap space by running:

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  1 244208  10312   1552  62636    4   23    98   249   44  304 28  3 68  1  0
 0  2 244920   6852   1844  67284    0  544  5248   544  236 1655  4  6  0 90  0
 1  2 256556   7468   1892  69356    0 3404  6048  3448  290 2604  5 12  0 83  0
 0  2 263832   8416   1952  71028    0 3788  2792  3788  140 2926 12 14  0 74  0
 0  3 274492   7704   1964  73064    0 4444  2812  5840  295 4201  8 22  0 69  0

In the command output, the si and so columns show the amount of memory swapped from and to a disk, in KB/s. Swap usage may be considered acceptable as long as it well below the device maximum throughput, that is, it does not interfere with the device performance.

S3 service performance

S3 load balancing

We recommend using load balancing at all times. The only scenario that does not benefit from load balancing is when there is a single client. For recommendations on setting up load balancing, refer to the Administrator Guide.

S3 gateways

By default, the S3 service runs with four S3 gateways per node. However, you can increase the number of S3 gateways to improve the overall performance if you notice the following signs:

  • The CPU usage of the S3 gateway is near 100 percent.
  • The latency of the S3 service is very high (for example, the average latency of more than two seconds).

You can change the number of S3 gateways per node by using one of the following commands:

  • To set the number of S3 gateways on all nodes in the S3 cluster:

    vinfra service s3 cluster change --s3gw-count <count>
    --s3gw-count <count>
    Number of S3 gateways per node

    For example, to increase the number of S3 gateways per node to 5, run:

    # vinfra service s3 cluster change --s3gw-count 5
  • To set the number of S3 gateways on a particular S3 node:

    vinfra service s3 node change --nodes <node_id> --s3gw-count <count>
    --nodes <node_id>
    A comma-separated list of node hostnames or IDs
    --s3gw-count <count>
    Number of S3 gateways

    For example, to reduce the number of S3 gateways on the node node003 to 3, run:

    # vinfra service s3 node change --nodes node003 --s3gw-count 3

When removing a gateway, keep in mind that all ongoing connections to this gateway will be dropped and all ongoing operations may result in a connection error on the client side.