8.2. Appendix B - Frequently Asked Questions¶
This Appendix lists most frequently asked questions about Virtuozzo Storage clusters.
Can /pstorage directory still be used on newer installations?
Yes. In newer installations,
/pstorage remains as a symlink to the new
/vstorage directory for compatibility purposes.
Do I need to buy additional storage hardware for Virtuozzo Storage?
No. Virtuozzo Storage eliminates the need for external storage devices typically used in SANs by converting locally attached storage from multiple nodes into a shared storage.
What are the hardware requirements for Virtuozzo Storage?
Virtuozzo Storage does not require any special hardware and can run on commodity computers with traditional SATA drives and 1 GbE networks. Some hard drives and RAID controllers, however, ignore the FLUSH command to imitate better performance and must not be used in clusters as this may lead to file system or journal corruptions. This is especially true for RAID controllers and SSD drives. Please consult with your hard drive’s manual to make sure you use reliable hardware.
For more information, see the Virtuozzo 7 Installation Guide.
How many servers do I need to run a Virtuozzo Storage cluster?
You need only one physical server to start using Virtuozzo Storage. However, to provide high availability for your data, you are recommended to configure your cluster to have at least 3 replicas per each data chunk. This requires at least 3 online servers—and at least 5 servers in total—in to be set up in the cluster. For details, see the Virtuozzo 7 Installation Guide and Configuring Replication Parameters.
Can I join Hardware Nodes running different supported operating systems into a single Virtuozzo Storage cluster?
Yes. You can create Virtuozzo Storage clusters of Hardware Nodes running any combination of supported operating systems. For example, you can have metadata servers on Hardware Nodes with Ubuntu 14.04, chunk servers on Hardware Nodes with Red Hat Enterprise Linux 7, and clients on computers with CentOS 7.
The current standalone version of Virtuozzo Storage does not support Virtuozzo.
8.2.2. Scalability and Performance¶
How many servers can I join to a Virtuozzo Storage cluster?
There is no strict limit on the number of servers you can include in a Virtuozzo Storage cluster. However, you are recommended to limit the servers in the cluster to a single rack to avoid any possible performance degradation due to inter-rack communications.
How much disk space can a Virtuozzo Storage cluster have?
A Virtuozzo Storage cluster can support up to 8 PB of effective available disk space, which totals to 24 PB of physical disk space when 3 replicas are kept for each data chunk.
Can I add nodes to an existing Virtuozzo Storage cluster?
Yes, you can dynamically add and remove nodes from a Virtuozzo Storage cluster to increase its capacity or to take nodes offline for maintenance. For more information, see Configuring Chunk Servers.
What is the expected performance of a Virtuozzo Storage cluster?
The performance depends on the network speed and the hard disks used in the cluster. In general, the performance should be similar to locally attached storage or better. You can also use SSD caching to increase the performance of commodity hardware by adding SSD drives to the cluster for caching purposes. For more information, see Using SSD Drives.
What performance should I expect on 1-gigabit Ethernet?
The maximum speed of a 1 GbE network is close to that of a single rotational drive. In most workloads, however, random I/O access is prevalent and the network is usually not a bottleneck. Research with large service providers has proved that average I/O performance rarely exceeds 20 MB/sec due to randomization. Virtualization itself introduces additional randomization as multiple independent environments perform I/O access simultaneously. Nevertheless, 10-gigabit Ethernet will often result in better performance and is recommended for use in production.
Will the overall cluster performance improve if I add new chunk servers to the cluster?
Yes. Since data is distributed among all hard drives in the cluster, applications performing random I/O experience an increase in IOPS when more drives are added to the cluster. Even a single client machine may get noticeable benefits by increasing the number of chunk servers and achieve performance far beyond traditional, locally attached storage.
Does performance depend on the number of chunk replicas?
Each additional replica degrades write performance by about 10%, but at the same time it may also improve read performance because the Virtuozzo Storage cluster has more options to select a faster server.
How does Virtuozzo Storage protect my data?
Virtuozzo Storage protects against data loss and temporary unavailability by creating data copies (replicas) and storing them on different servers. To provide additional reliability, you can configure Virtuozzo Storage to maintain user data checksums and verify them when necessary.
What happens when a disk is lost or a server becomes unavailable?
Virtuozzo Storage automatically recovers from a degraded state to the specified redundancy level by replicating data on live servers. Users can still access their data during the recovery process.
How fast does Virtuozzo Storage recover from a degraded state?
Since Virtuozzo Storage recovers from a degraded state using all the available hard disks in the cluster, the recovery process is much faster than for traditional, locally attached RAIDs. This makes the reliability of the storage system significantly better as the probability of losing the only remaining copy of data during the recovery period is very small.
Can I change redundancy settings on the fly?
Yes, at any point you can change the number of data copies, and Virtuozzo Storage will apply the new settings by creating new copies or removing unneeded ones. For more details on configuring replication parameters, see Configuring Replication Parameters.
Do I still need to use local RAIDs?
No, Virtuozzo Storage provides the same built-in data redundancy as a mirror RAID1 array with multiple copies. However, for better sequential performance, you can use local stripping RAID0 exported to your Virtuozzo Storage cluster. For more information on using RAIDs, see Exploring Possible Disk Drive Configurations.
Does Virtuozzo Storage have redundancy levels similar to RAID5?
No. To build a reliable software-based RAID5 system, you also need to use special hardware capabilities like backup power batteries. In the future, Virtuozzo Storage may be enhanced to provide RAID5-level redundancy for read-only data such as backups.
What is the recommended number of data copies?
It is recommended to configure Virtuozzo Storage to maintain 2 or 3 copies, which allows your cluster to survive the simultaneous loss of 1 or 2 hard drives.
8.2.4. Cluster Operation¶
How do I know that the new replication parameters have been successfully applied to the cluster?
To check whether the replication process is complete, run the
vstorage top command, press the V key on your keyboard, and check information in the Chunks field:
- When decreasing the replication parameters, no chunks in the overcommitted or deleting state should be present in the output.
- When increasing the replication parameters, no chunks in the blocked or urgent state should be present in the output. Besides, the overall cluster health should equal 100%.
For details, see Monitoring the Status of Replication Parameters.
How do I shut down a cluster?
To shut down a Virtuozzo Storage cluster:
- Stop all clients.
- Stop all MDS servers.
- Stop all chunk servers.
For details, see Shutting Down Virtuozzo Storage Clusters.
What tool do I use to monitor the status and health of a cluster?
You can monitor the status and health of your cluster using the
vstorage top command. For details, see Monitoring Virtuozzo Storage Clusters.
To view the total amount of disk space occupied by all user data in the cluster, run the
vstorage top command, press the V key on your keyboard, and look for the FS field in the output. The FS field shows how much disk space is used by all user data in the cluster and in how many files these data are stored. For details, see Understanding Disk Space Usage.
How do I configure a Virtuozzo server for a cluster?
To prepare a server with Virtuozzo for work in a cluster, you simply tell the server to store its Containers and virtual machines in the cluster rather than on its local disk. For details, see Stage 3: Configuring Virtual Machines and Containers.
Why vmstat/top and vstorage stat show different IO times?
top utilities use different methods to compute the percentage of CPU time spent waiting for disk IO ( wa% in
top, wa in
vmstat, and IOWAIT in
top utilities mark an idle CPU as waiting only if an outstanding IO request is started on that CPU, while the
vstorage utility marks all idle CPUs as waiting, regardless of the number of IO requests waiting for IO. As a result,
vstorage can report much higher IO values. For example, on a system with 4 CPUs and one thread doing IO,
vstorage will report over 90% IOWAIT time, while
top will show no more than 25% IO time.
What effect tier numbering has on Virtuozzo Storage operation?
When assigning storage to tiers, have in mind that faster storage drives should be assigned to higher tiers. For example, you can use tier 0 for backups and other cold data (CS without SSD journals), tier 1 for virtual environments—a lot of cold data but fast random writes (CS with SSD journals), tier 2 for hot data (CS on SSD), journals, caches, specific virtual machine disks, and such.
This recommendation is related to how Virtuozzo Storage works with storage space. If a storage tier runs out of free space, Virtuozzo Storage will attempt to temporarily use a lower tier. If you add more storage to the original tier later, the data, temporarily stored elsewhere, will be moved to the tier where it should have been stored originally.
For example, if you try to write data to the tier 2 and it is full, Virtuozzo Storage will attempt to write that data to tier 1, then to tier 0. If you add more storage to tier 2 later, the aforementioned data, now stored on the tier 1 or 0, will be moved back to the tier 2 where it was meant to be stored originally.