About object storage

Virtuozzo Hybrid Infrastructure allows you to export cluster disk space to customers in the form of an S3-like object-based storage.

Virtuozzo Hybrid Infrastructure is implemented as an Amazon S3-like API, which is one of the most common object storage APIs. End users can work with Virtuozzo Hybrid Infrastructure as they work with Amazon S3. You can use the usual applications for S3 and continue working with them after the data migration from Amazon S3 to Virtuozzo Hybrid Infrastructure.

Object storage is a storage architecture that enables managing data as objects (like in a key-value storage), as opposed to files in file systems or blocks in a block storage. Except for the data, each object has metadata that describes it as well as a unique identifier that allows finding the object in the storage. Object storage is optimized for storing billions of objects, in particular for application storage, static web content hosting, online storage services, big data, and backups. All of these uses are enabled by object storage thanks to a combination of very high scalability, data availability, and consistency.

Compared to other types of storage, the key difference of object storage is that parts of an object cannot be modified; so if the object changes a new version of it is spawned instead. This approach is extremely important for maintaining data availability and consistency. First of all, changing an object as a whole eliminates the issue of conflicts. That is, the object with the latest timestamp is considered to be the current version and that is it. As a result, objects are always consistent, that is, their state is relevant and appropriate.

Another feature of object storage is eventual consistency. Eventual consistency does not guarantee that reads are to return the new state after the write has been completed. Readers can observe the old state for an undefined period of time, until the write is propagated to all the replicas (copies). This is very important for storage availability, as geographically distant datacenters may not be able to perform data update synchronously (for example, due to network issues) and the update itself may also be slow, as awaiting acknowledges from all the data replicas over long distances can take hundreds of milliseconds. So eventual consistency helps hide communication latencies on writes at the cost of the probable old state observed by readers. However, many use cases can easily tolerate it.