7.1. Checking Data Flushing

Before creating the cluster, you are recommended to check that all storage devices (hard disk drives, solid disk drives, RAIDs, etc.) you plan to include in your cluster can successfully flush data to disk when the server power goes off unexpectedly. Doing so will help you detect possible problems with devices that may lose data stored in their cache in the event of a power failure.

Virtuozzo Storage ships with a special tool, vstorage-hwflush-check, for checking how a storage device flushes data to disk in an emergency case such as power outage. The tool is implemented as a client/server utility:

  • Client. The client continuously writes blocks of data to the storage device. When a data block is written, the client increases a special counter and sends it to the server that keeps it.

  • Server. The server keeps track of the incoming counters from the client so that it always knows the counter number the client will send next. If the server receives the counter that is less than the one already stored on the server (e.g., because the power was turned off and the storage device did not flush the cached data to disk), the server reports an error.

To check that a storage device can successfully flush data to disk when the power fails, follow the procedure below:

On the server part:

  1. On some computer running Virtuozzo Hybrid Server 7, install the vstorage-hwflush-check tool. This tool is part of the vstorage-ctl package and can be installed with this command:

    # yum install vstorage-ctl
    
  2. Run the vstorage-hwflush-check server:

    # vstorage-hwflush-check -l
    

On the client part:

  1. On the computer hosting a storage device you want to check, install the vstorage-hwflush-check tool:

    # yum install vstorage-ctl
    
  2. Run the vstorage-hwflush-check client, for example:

    # vstorage-hwflush-check -s vstorage1.example.com -d /vstorage/stor1-ssd/test -t 50
    

    Where

    • -s vstorage1.example.com is the hostname of the computer where the vstorage-hwflush-check server is running.

    • -d /vstorage/stor1-ssd/test defines the directory to use for testing data flushing. During its execution, the client creates a file in this directory and writes data blocks to it.

    • -t 50 sets the number of threads for the client to write data to disk. Each thread has its own file and counter. You can increase the number of threads (max. 200) to test your system in more stressful conditions. You can also specify other options when running the client. For more information on available options, see the vstorage-hwflush-check man page.

  3. Wait for 10-15 seconds or more and power off the computer where the client is running, and then turn it on again.

    Note

    The Reset button does not turn off the power so you need to press the Power button or pull out the power cord to switch off the computer.

  4. Restart the client by executing the same command you used to run it for the first time:

    # vstorage-hwflush-check -s vstorage1.example.com -d /vstorage/stor1-ssd/test -t 50
    

Once launched, the client reads all written data, determines the version of data on the disk, and then restarts the test from the last valid counter. It then sends this valid counter to the server, and the server compares it with the latest counter it has. You may see output like:

id<N>:<counter_on_disk> -> <counter_on_server>

which means one of the following:

  • If the counter on disk is lower than the counter on server, it means that the storage device has failed to flush the data to disk. Avoid using this storage device in production—especially for CS or journals—as you risk losing data.

  • If the counter on disk is higher than the counter on server, it means that the storage device has flushed the data to disk but the client has failed to report it to the server. The network may be too slow or the storage device may be too fast for the set number of load threads so you may consider increasing it. This storage device can be used in production.

  • If both counters are equal, it means the storage device has flushed the data to disk and the client has reported it to the server. This storage device can be used in production.

To be on the safe side, repeat the procedure several times. Once you check your first storage device, continue with all remaining devices you plan to use in the cluster. You need to test all devices you plan to use in the cluster: SSD disks used for client caching and CS journaling, disks used for MDS journals, disks used for chunk servers.