Prev Next

8.1. Appendix A - Troubleshooting¶

This chapter describes common issues you may encounter when working with Storage clusters and ways to resolve them. The main tool you use to solve cluster problems and detect hardware failures is vstorage top.

8.1.1. Submitting Problem Reports to Technical Support¶

If your cluster is experiencing a problem that you cannot solve on your own, you can use the vstorage make-report command to compile a detailed report about the cluster. You can then send the report to the support team who will closely examine your problem and make their best to solve it as quickly as possible.

To generate a report:

Configure passwordless SSH access for the root user from the server where you plan to run the vstorage make-report command to all servers that participate in the cluster.

The easiest way to do this is to create an SSH key with ssh-keygen and use ssh-copy-id to configure all servers to trust this key. For details, see the man pages for ssh-keygen and ssh-copy-id.
Run the vstorage make-report command to compile the report:
```
# vstorage -c stor1 make-report
The report is created and saved to vstorage-report-20121023-90630.tgz
```
The command collects cluster-related information on all servers participating in the cluster and saves it to a file in your current working directory. You can find the exact file name by checking the vstorage output ( vstorage-report-20121023-90630.tgz in the example above).

If necessary, you can save the report to a file with a custom name and put it to a custom location. To do this, pass the -f option to the command and specify the desired file name (with the .tgz extension) and location, for example:
```
# vstorage -c stor1 make-report -f /home/reportSTOR1.tgz
```

Once it is ready, submit the report to the support team.

Note

The report contains only information related to your cluster settings and configuration. It does not contain any private information.

8.1.2. Out of Disk Space¶

When very little free disk space remains in a Storage cluster, it is critically important to increase it as soon as possible by adding more chunk servers or removing some data. As soon as 95% of cluster disk space becomes occupied, the allocation of new data chunks is no longer possible and such requests are blocked until the cluster can satisfy the demand. As a result, user I/O becomes blocked as well, effectively freezing containers and virtual machines.

Note

It is highly recommended to keep at least 10% of disk space free for recovery in case of host machine failures. You should also monitor usage history, for example, using the vstorage top or vstorage get-event commands (for more information, see Monitoring Storage Clusters).

8.1.2.1. Symptoms¶

Stuck I/O or unresponsive mount point, dmesg messages about stuck I/O, frozen containers and virtual machines.
vstorage top and vstorage get-event show error messages like “Failed to allocate X replicas at tier Y since only Z chunk servers are available for allocation”.

8.1.2.2. Solutions¶

Remove any unnecessary data to free disk space.

Note

Additional effect which may surprise at first is that as soon as I/O queues in the kernel are full with the blocked I/O, a mount point on the client machine may stuck responding altogether and no longer be able to service the requests even such as file listing. In this case an additional mount point can be created to list, access and remove the unneeded data.
Add more Chunk Servers on unused disks (see Setting Up Chunk Servers).

If the solutions above are not possible, you can use one of the following temporary workarounds:

Lower the replication factor for some of the least critical user data (see Configuring Replication Parameters). Remember to revert the changes afterwards.
Reduce the allocation reserve. For example, for cluster stor1:
```
# vstorage -c stor1 set-config mds.alloc.fill_margin=2
```
Where mds.alloc.fill_margin is the percentage of reserved disk space for CS operation needs (the default value is 5). Remember to revert the changes afterwards.

8.1.3. Poor Write Performance¶

Some network adapters, like RTL8111/8168B, are known to fail to deliver full-bandwidth, full-duplex network traffic. This can result in poor write performance.

So before deploying a Storage cluster, you are highly recommended to test networks for full-duplex support. You can use the netperf utility to simultaneously generate in and out traffic. For example, in 1 GbE networks, it should constantly deliver about 2 Gbit/s of total traffic (1 Gbit/s for incoming and 1 Gbit/s for outgoing traffic).

8.1.4. Poor Disk I/O Performance¶

In most BIOS setups, AHCI mode is configured to work by default with the Legacy option enabled. With this option, your servers work with SATA devices via the legacy IDE mode, which may affect the cluster performance, making it up to 2 times slower than expected. You can check that the option is currently enabled by running the hdparm command, for example:

# hdparm -i /dev/sda
...
 PIO modes:  pio0 pio1 pio2 pio3 *pio4
 DMA modes:  mdma0 mdma1 mdma2
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 udma6

The asterisk before pio4 in the PIO modes field indicates that your hard drive /dev/sda is currently operating in the legacy mode.

To solve this problem and maximize the cluster performance, always enable the AHCI option in your BIOS settings.

8.1.5. Hardware RAID Controller and Disk Write Caches¶

It is important that all hard disk drives obey the “flush” command and write their caches before the command completes. Unfortunately, not all hardware RAID controllers and drives do this, which may lead to data inconsistencies and file system corruptions on a power failure.

Some 3ware RAID controllers do not disable disk write caches and do not send “flush” commands to disk drives. As a result, the file system may sometimes become corrupted even though the RAID controller itself has a battery. To solve this problem, disable writes caches on all disk drives in the RAID.

Also, make sure that your configuration is thoroughly tested for consistency before deploying a Storage cluster. For information on how you can do this, see SSD Drives Ignore Disk Flushes.

8.1.6. SSD Drives Ignore Disk Flushes¶

A lot of desktop-grade SSD drives can ignore disk flushes and fool operating systems by reporting that data was written while it was actually not. Examples of such drives are OCZ Vertex 3 and Intel X25-E, X-25-M G2 that are known to be unsafe on data commits. Such disk drives should not be used with databases and may easily corrupt the file system on a power failure.

The 3rd generation Intel SSD drives (S3700 and 710 series) do not have these problems, featuring a capacitor to provide a battery backup for flushing the drive cache when the power goes out.

Use SSD drives with care and only when you are sure that drives are server-grade drives and obey “flush” rules. For more information on this issue, read this article about PostreSQL.

8.1.7. Cluster Cannot Create Enough Replicas¶

Sometimes, the cluster might not create the required number of data chunks even if enough chunk servers are present in the cluster.

This may be the case when you create new chunk servers by making copies of an existing chunk server (e.g., you set up a chunk server in a virtual machine and then clone this machine). In this case, all copied chunk servers have the same UUID — that is, the UUID of the original server. The cluster has information that all chunk servers are located on the original host and cannot allocate new data chunks.

To solve the problem, generate a new UUID for a cloned chunk server by running the following command on the destination host:

# /usr/bin/uuidgen -r | tr '-' ' ' | awk '{print $1$2$3}' > /etc/vstorage/host_id

For more information on the uuidgen utility, see its man page.

8.1.8. Failed Chunk Servers¶

If a chunk server in your Storage cluster fails, you need to identify the cause of failure and choose a correct way to solve the problem.

Do the following:

Run the vstorage top command. For example:
```
# vstorage -c stor1 top
```
Press i to cycle to the FLAGS column in the chunk server section and find the flags corresponding to the failed CS.
Find the shown flags in the table below to identify the cause of failure and the way to solve the problem.

Flag	Issue	What to do
H	An I/O error. The disk on which the chunk server runs is broken.	Check the disk for errors. If the disk is broken, replace it and recreate the CS as described in Replacing Disks Used as Chunk Servers. Otherwise, contact technical support.
h	A chunk checksum mismatch. Either the chunk is corrupt or the disk where the chunk is stored is broken.	Check the disk for errors. If the disk is broken, replace it and recreate the CS as described in Replacing Disks Used as Chunk Servers. Otherwise, contact technical support.
S	The CS journal stored on a journaling SSD is not accessible. Either the journal is corrupt or the journaling SSD is broken.	Check the journaling SSD for errors. If the disk is broken, replace it as described in Failed Write Journaling SSDs.
R	The path to the chunk repository is invalid on CS start. The disk on which the chunk server runs is not attached or mounted.	Make sure the disk is attached and correctly mounted. Make sure the disk’s entry in `/etc/fstab` is correct.
T	An I/O request timeout. The disk may only be inaccessible for some reason and not necessarily broken.	Make sure the disk is attached and check dmesg output for I/O request timeout messages to find out why the disk might be inaccessible.

8.1.8.1. Replacing Disks Used as Chunk Servers¶

To replace a broken HDD or SSD disk used as a chunk server, do the following:

Remove the failed CS from the cluster as described in Removing Chunk Servers.

Note

Do not detach the broken disk until you remove the CS.
Replace the broken disk with a new one.
Prepare the SSD as described in Preparing Disks for Storage.
Create a new CS as described in Stage 2: Creating a Chunk Server.

8.1.9. Failed Write Journaling SSDs¶

If the SSD used to store write journals breaks, all chunk servers which had journals on this SSD will fail. The cluster will continue to work and will create new replicas to make up for those which have been lost. If you need to set up same write journals on a replacement SSD, do the following:

Remove the failed chunk servers as described in Removing Chunk Servers.
Prepare the SSD as described in Preparing Disks for Storage.
Create new chunk servers which will keep write journals on the new SSD as described in Using SSD Drives.

8.1.10. Failed MDS Servers¶

If the disk hosting an MDS server fails, replace it as follows:

Delete the failed MDS server as described in Removing MDS Servers.
Create a new MDS server as described in Adding MDS Servers.

Version 7.5 — Jan 22, 2025

Edit Print

Prev Next