6.2. Exporting Data via S3

Virtuozzo Storage allows you to export cluster disk space to customers in the form of an S3-like object-based storage.

Virtuozzo Storage is implemented as an Amazon S3-like API, which is one of the most common object storage APIs. End users can work with Virtuozzo Storage as they work with Amazon S3. You can use the usual applications for S3 and continue working with it after the data migration from Amazon S3 to Virtuozzo Storage.

Object storage is a storage architecture that enables managing data as objects (like in a key-value storage) as opposed to files on file systems or blocks in a block storage. Except for the data, each object has metadata that describes it as well as a unique identifier that allows finding the object in the storage. Object storage is optimized for storing billions of objects, in particular for application storage, static web content hosting, online storage services, big data, and backups. All of these uses are enabled by object storage thanks to a combination of very high scalability and data availability and consistency.

Compared to other types of storage, the key difference of object storage is that parts of an object cannot be modified, so if the object changes a new version of it is spawned instead. This approach is extremely important for maintaining data availability and consistency. First of all, changing an object as a whole eliminates the issue of conflicts. That is, the object with the latest timestamp is considered to be the current version and that is it. As a result, objects are always consistent, i.e. their state is relevant and appropriate.

Another feature of object storage is eventual consistency. Eventual consistency does not guarantee that reads are to return the new state after the write has been completed. Readers can observe the old state for an undefined period of time until the write is propagated to all the replicas (copies). This is very important for storage availability as geographically distant data centers may not be able to perform data update synchronously (e.g., due to network issues) and the update itself may also be slow as awaiting acknowledges from all the data replicas over long distances can take hundreds of milliseconds. So eventual consistency helps hide communication latencies on writes at the cost of the probable old state observed by readers. However, many use cases can easily tolerate it.

6.2.1. Object Storage Infrastructure Overview

The infrastructure of the object storage consists of the following entities: object servers (OS), name servers (NS), and the S3 gateways (GW).

These entities run as services on the Virtuozzo Storage nodes. Each service should be deployed on multiple Virtuozzo Storage nodes for high availability.

../_images/stor_image42.png
  • An object server stores actual object data received from S3 gateway, packed into special containers to achieve high performance. The containers are redundant, you can specify the redundancy mode while configuring object storage.
  • A name server stores information about objects (metadata) received from the S3 gateway. Metadata includes object name, size, ACL, location, owner, and such.
  • S3 gateway is a data proxy between object servers and users. It receives and handles Amazon S3 protocol requests and uses nginx web server for external connections. S3 gateway handles S3 user authentication and ACL checks. It has no data of its own (i.e. is stateless).

6.2.2. Planning the S3 Cluster

Before creating an S3 cluster, do the following:

  1. Define which nodes of the Virtuozzo Storage cluster will run the S3 storage access point services. It is recommended to have all nodes available in Virtuozzo Storage run these services.

  2. Configure the network so that the following is achieved:

    • All components of the S3 cluster communicate with each other via the S3 private network. All nodes of an S3 cluster must be connected to the S3 private network. Virtuozzo Storage internal network can be used for this purpose.
    • The nodes running S3 gateways must have access to the public network.
    • The public network for the S3 gateways must be balanced by an external DNS load balancer.

    For more details on network configuration, refer to Planning Network in the Installation Guide.

  3. All components of the S3 cluster should run on multiple nodes for high-availability. Name server and object server components in the S3 cluster are automatically balanced and migrated between S3 nodes. S3 gateways are not automatically migrated; their high availability is based on DNS records. You should maintain the DNS records manually when adding or removing the S3 gateways.

6.2.3. Sample Object Storage

This section shows a sample object storage deployed on top of an Virtuozzo Storage cluster of five nodes that run various services. The final setup is shown on the figure below.

../_images/stor_image43.png

6.2.4. Creating the S3 Cluster

Note

Joining a node to an S3 cluster automatically enables high availability for virtual machines, containers, and S3 resourses on this node. For detailed information on high availability, refer to the Virtuozzo 7 User’s Guide.

  1. Make sure that S3 private network is configured on each node that will run object storage services.

  2. On the SERVICES > Nodes screen, check the box of each cluster node where object storage services will run.

    ../_images/stor_image44.png
  3. Click Create S3 cluster.

  4. Make sure a network interface with an S3 (private) role is selected in the drop-down list. The corresponding interfaces with S3 public roles will be selected automatically.

    Note

    If necessary, click the cogwheel icon and, on the Network Configuration screen, configure S3 roles.

    ../_images/stor_image45.png
  5. Click Proceed.

  6. In Tier, select the storage tier that will be used for the object storage. For information about storage tiers, consult Understanding Storage Tiers in the Installation Guide.

  7. In Failure domain, choose a placement policy for replicas. For more details, see Understanding Failure Domains in the Installation Guide.

  8. In Data redundancy, select the redundancy mode that the object storage will use. For more details, see Understanding Data Redundancy in the Installation Guide.

    ../_images/stor_image46.png

    Note

    You can later change the redundancy mode on the S3 > Settings panel.

  9. Click Proceed.

  10. Specify the external (publicly resolvable) DNS name for the S3 endpoint that will be used by the end users to access the object storage. For example, mys3storage.example.com. Click Proceed.

    Important

    Configure your DNS server according to the example suggested in the management panel.

  11. From the drop-down list, select an S3 endpoint protocol: HTTP, HTTPS or both.

    ../_images/stor_image46_1.png

    Note

    It is recommended to use only HTTPS for production deployments.

    If you have selected HTTPS, do one of the following:

    • Check Generate self-signed certificate to get a self-signed certificate for HTTPS evaluation purposes.

      Note

      1. S3 geo-replication requires a certificate from a trusted authority. It does not work with self-signed certificates.
      2. To access the data in the S3 cluster via a browser, add the self-signed certificate to browser’s exceptions.
    • Acquire a key and a trusted wildcard SSL certificate for endpoint’s bottom-level domain. For example, the endpoint s3.storage.example.com would need a wildcard certificate for *.s3.storage.example.com with the subject alternative name s3.storage.example.com.

      Upload the certificate, and, depending on the certificate type, do one of the following:

      • in case the certificate is contained in a PKCS#12 file, specify the passphrase;
      • upload the SSL key.
  12. If required, click Configure Acronis Notary and specify Notary DNS name and Notary user key. For more information on Acronis Notary, see Managing Acronis Notary in Buckets.

  13. Click Done to create an S3 cluster.

After the cluster is created, on the S3 Overview screen, you can view cluster status, hostname, used disk capacity, the number of users, I/O activity, and the state of S3 services.

../_images/stor_image67.png

To check if the S3 cluster is successfully deployed and can be accessed by users, visit https://<S3_DNS_name> or http://<S3_DNS_name> in your browser. You should receive the following XML response:

<Error>
<Code>AccessDenied</Code>
<Message/>
</Error>

To start using the S3 storage, you will also need to create at least one S3 user.

6.2.5. Managing Object Storage Users

The concept of S3 user is one of the base concepts of object storage along with those of object and bucket (container for storing objects). The Amazon S3 protocol uses a permission model based on access control lists (ACLs) where each bucket and each object is assigned an ACL that lists all users with access to the given resource and the type of this access (read, write, read ACL, write ACL). The list of users includes the entity owner assigned to every object and bucket at creation. The entity owner has extra rights compared to other users. For example, the bucket owner is the only one who can delete that bucket.

User model and access policies implemented in Virtuozzo Storage comply with the Amazon S3 user model and access policies.

User management scenarios in Virtuozzo Storage are largely based on the Amazon Web Services user management and include the following operations: create, query, and delete users as well as generate and revoke user access key pairs.

6.2.5.1. Adding S3 users

To add an S3 user, do the following:

  1. On the SERVICES > S3 Users screen, click Add user.

    ../_images/stor_image47.png
  2. Specify a valid email address as login for the user and click Done.

    ../_images/stor_image48.png

6.2.5.2. Managing S3 Access Key Pairs

Each S3 user has one or two key pairs (access key and secret key) for accessing the S3 cloud. You can think of the access key as login and the secret key as password. (For more information about S3 key pairs, refer to http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSGettingStartedGuide/AWSCredentials.html.) The access keys are generated and stored locally in the Virtuozzo Storage cluster on S3 name servers. Each user can have up to two key pairs. It is recommended to periodically revoke old and generate new access key pairs.

To view, add, or revoke the S3 access key pairs for an S3 user, do the following:

  1. Select a user in the list and click Keys.

    ../_images/stor_image49.png
  2. The existing keys will be shown on the Keys panel.

    • To revoke a key, click Revoke.
    • To add a new key, click Generate access key.

To access a bucket, a user will need the following information:

  • management panel IP address,

  • DNS name of the S3 cluster specified during configuration,

  • S3 access key ID,

  • S3 secret access key,

  • SSL certificate if the HTTPS protocol was chosen during configuration.

    Note

    The certificate file can be found in the /etc/nginx/ssl/ directory on any node hosting the S3 gateway service.

To automatically log in to S3 with user credentials using the generated keys, select a user and click Browse.

Note

To Browse using an SSL certificate, make sure it is valid or, in case of a self-signed one, add it to browser’s exceptions.

6.2.6. Managing Object Storage Buckets

All objects in Amazon S3-like storage are stored in containers called buckets. Buckets are addressed by names that are unique in the given object storage, so an S3 user of that object storage cannot create a bucket that has the same name as a different bucket in the same object storage. Buckets are used to:

  • group and isolate objects from those in other buckets,
  • provide ACL management mechanisms for objects in them,
  • set per-bucket access policies, for example, versioning in the bucket.

In the current version of Virtuozzo Storage, you can enable and disable Acronis Notary for object storage buckets and monitor the space used by them on the SERVICES > S3 > Buckets screen. You cannot create and manage object storage buckets from Virtuozzo Storage management panel. However, you can do it via the Virtuozzo Storage user panel or by using a third-party application. For example, the applications listed below allow you to perform the following actions:

  • CyberDuck: create and manage buckets and their contents.
  • MountainDuck: mount object storage as a disk drive and manage buckets and their contents.
  • Backup Exec: store backups in the object storage.

6.2.6.1. Listing Bucket Contents

You can list bucket contents with a web browser. To do this, visit the URL that consists of the external DNS name for the S3 endpoint that you specified when creating the S3 cluster and the bucket name. For example, mys3storage.example.com/mybucket.

Note

You can also copy the link to bucket contents by right-clicking it in CyberDuck, and then selecting Copy URL.

6.2.6.2. Managing Acronis Notary in Buckets

Virtuozzo Storage offers integration with the Acronis Notary service to leverage blockchain notarization and ensure the immutability of data saved in object storage clusters. To use Acronis Notary in user buckets, you need to set it up in the S3 cluster and enable it for said buckets.

6.2.6.2.1. Setting Up Acronis Notary

To set up Acronis Notary, do the following:

  1. Get the DNS name and the user key for the notary service from your Acronis sales contact.

  2. On the SERVICES > S3 screen, click Notary settings.

    ../_images/stor_image67.png
  3. On the Notary Settings screen, specify the DNS name and user key in the respective fields and click Done.

    ../_images/stor_image66.png

6.2.6.2.2. Enabling and Disabling Acronis Notary

To enable or disable blockchain notarization for a bucket, select a bucket on the SERVICES > S3 > Buckets screen and click Enable Notary or Disable Notary, respectively.

Notarization is disabled for new buckets by default.

Note

Once you enable notarization for a bucket, certificates are created automatically only for the newly uploaded files. The previously uploaded files are left unnotarized. Once a file was notarized, it will remain notarized even if you disable notarization later.

6.2.7. Best Practices for Using S3 in Virtuozzo Storage

This section offers recommendations on how to best use the S3 feature of Virtuozzo Storage.

6.2.7.1. Bucket and Key Naming Policies

It is recommended to use bucket names that comply with DNS naming conventions:

  • can be from 3 to 63 characters long,
  • must start and end with a lowercase letter or number,
  • can contain lowercase letters, numbers, periods (.), hyphens (-), and underscores (_),
  • can be a series of valid name parts (described previously) separated by periods.

An object key can be a string of any UTF-8 encoded characters up to 1024 bytes long.

6.2.7.2. Improving Performance of PUT Operations

Object storage supports uploading of objects as large as 5 GB in size with a single PUT request, or 5 TB in size with multipart upload. Upload performance can be improved, however, by splitting large objects into pieces and uploading them concurrently with multipart upload API. This approach will divide the load between multiple OS services.

It is recommended to use multipart uploads for objects larger than 5 MB.

6.2.8. Replicating Data Between Geographically Distributed Datacenters with S3 Clusters

Virtuozzo Storage can store replicas of S3 cluster data and keep them up-to-date in multiple geographically distributed datacenters with S3 clusters based on Virtuozzo Storage. Geo-replication reduces the response time for local S3 users accessing the data in a remote S3 cluster or remote S3 users accessing the data in a local S3 cluster as they do not need to have an Internet connection.

Geo-replication schedules the update of the replicas as soon as any data is modified. Geo-replication performance depends on the speed of Internet connection, the redundancy mode, and cluster performance.

If you have multiple datacenters with enough free space, it is recommended to set up geo-replication between S3 clusters residing in these datacenters.

To set up geo-replication between S3 clusters, exchange tokens between datacenters as follows:

  1. In the management panel of a remote datacenter, open the SERVICES > S3 > GEO-REPLICATION screen.

    ../_images/stor_image66_1.png
  2. In the section of the home S3 cluster, click TOKEN and, on the Get token panel, copy the token.

  3. In the management panel of the local datacenter, open the SERVICES > S3 > GEO-REPLICATION screen and click ADD DATACENTER.

    ../_images/stor_image66_2.png
  4. Enter the copied token and click Done.

  5. Configure the remote Virtuozzo Storage S3 cluster the same way.

6.2.9. Monitoring S3 Access Points

The S3 monitoring screen enables you to inspect the availability of each S3 component as well as the performance of NS and OS services (which are highly available).

If you see that some of the NS or OS services are offline, it means that the S3 access point does not function properly, and you should contact support or consult the CLI guide for low-level troubleshooting. S3 gateways are not highly available, but DNS load balancing should be enough to avoid downtime if the gateway fails.

The performance charts represent the number of operations that the OS/NS services are performing.

6.2.10. Releasing Nodes from S3 Clusters

Before releasing a node, make sure that the cluster has enough nodes running the Name Server, Object Server, and S3 Gateway services left.

Warning

When the last node in the S3 cluster is removed, the cluster is destroyed, and all the data is deleted.

To release a node from an S3 cluster, do the following:

  1. On the SERVICES > S3 Nodes screen, check the box of the node to release.
  2. Click Release.

6.2.11. Supported Amazon S3 Features

This section lists Amazon S3 operations, headers, and authentication schemes supported by the Virtuozzo Storage implementation of the Amazon S3 protocol.

6.2.11.1. Supported Amazon S3 REST Operations

The following Amazon S3 REST operations are currently supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

Supported service operations:

  • GET Service

Bucket operations:

Operation Supported
DELETE/HEAD/PUT Bucket Yes
GET Bucket (List Objects) Yes (only version 1)
GET/PUT Bucket acl Yes
GET Bucket location Yes (returns US East)
GET Bucket Object versions Yes
GET/PUT Bucket versioning Yes
List Multipart Uploads Yes
DELETE/GET/PUT Bucket analytics No
DELETE/GET/PUT Bucket cors No
DELETE/GET/PUT Bucket inventory No
DELETE/GET/PUT Bucket lifecycle No
DELETE/GET/PUT Bucket metrics No
DELETE/GET/PUT Bucket policy No
DELETE/GET/PUT Bucket replication No
DELETE/GET/PUT Bucket tagging No
DELETE/GET/PUT Bucket website No
GET/PUT Bucket accelerate No
GET/PUT Bucket logging No
GET/PUT Bucket notification No
GET/PUT Bucket requestPayment No
List Bucket Analytics Configurations No
List Bucket Inventory Configurations No
List Bucket Metrics Configurations No

Object operations:

Operation Supported
DELETE/GET/HEAD/POST/PUT Object Yes
Delete Multiple Objects Yes
PUT Object - Copy Yes
GET/PUT Object acl Yes
Delete Multiple Objects Yes
Abort Multipart Upload Yes
Complete Multipart Upload Yes
Initiate Multipart Upload Yes
List Parts Yes
Upload Part Yes
Upload Part - Copy No
DELETE/GET/PUT Object tagging No
GET Object torrent No
OPTIONS Object No
POST Object restore No

Note

For more information on Amazon S3 REST operations, see Amazon S3 REST API documentation.

6.2.11.2. Supported Amazon Request Headers

The following Amazon S3 REST request headers are currently supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

  • Authorization
  • Content-Length
  • Content-Type
  • Content-MD5
  • Date
  • Host
  • x-amz-content-sha256
  • x-amz-date
  • x-amz-security-token

The following Amazon S3 REST request headers are ignored:

  • Expect
  • x-amz-security-token

Note

For more information on Amazon S3 REST request headers, see Amazon S3 REST API Common Request Headers.

6.2.11.3. Supported Amazon Response Headers

The following Amazon S3 REST response headers are currently supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

  • Content-Length
  • Content-Type
  • Connection
  • Date
  • ETag
  • x-amz-delete-marker
  • x-amz-request-id
  • x-amz-version-id

The following Amazon S3 REST response headers are not used:

  • Server
  • x-amz-id-2

Note

For more information on Amazon S3 REST response headers, see Amazon S3 REST API Common Response Headers.

6.2.11.4. Supported Amazon Error Response Headers

The following Amazon S3 REST error response headers are currently supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

  • Code
  • Error
  • Message
  • RequestId
  • Resource

The following Amazon S3 REST error response headers are not supported:

  • RequestId (not used)
  • Resource

Note

For more information on Amazon S3 REST response headers, see Amazon S3 REST API Error Response Headers.

6.2.11.5. Supported Authentication Scheme and Methods

The following authentication scheme is supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

The following authentication methods is supported by the Virtuozzo Storage implementation of the Amazon S3 protocol: