4.3. Accessing Virtuozzo Storage Clusters via S3-like Object Storage

Virtuozzo Storage can export data via an Amazon S3-compatible API, enabling service providers to:

  • Run S3-based services in their own Virtuozzo Storage infrastructures.

  • Sell S3-based storage-as-a-service to customers along with Virtuozzo Storage.

S3 support expands the functionality of Virtuozzo Storage and requires a working Virtuozzo Storage cluster.

4.3.1. About Object Storage

Object storage is a storage architecture that enables managing data as objects (like in key-value storage) as opposed to files in file systems or blocks in block storage. Except data, each object has name (i.e. full path to object) that describes it and also a unique identifier that allows finding said object in the storage. Object storage is optimized for storing billions of objects, in particular for application back-end storage, static web content hosting, online storage services, big data, and backups. All of these uses are enabled by object storage thanks to a combination of very high scalability and data availability and consistency.

Compared to other types of storage, the key difference of S3 object storage is that parts of an object cannot be modified, so if the object changes a new version of it is spawned instead. This approach is extremely important for maintaining data availability and consistency. First of all, changing an object as a whole eliminates the issue of conflicts. That is, the object with the latest timestamp is considered to be the current version and that is it. As a result, objects are always consistent, i.e. their state is relevant and appropriate.

Another feature of object storage is eventual consistency. Eventual consistency does not guarantee that reads are to return the new state after the write has been completed. Readers can observe the old state for an undefined period of time until the write is propagated to all the replicas (copies). This is very important for storage availability as geographically distant data centers may not be able to perform data update synchronously (e.g., due to network issues) and the update itself may also be slow as awaiting acknowledges from all the data replicas over long distances can take hundreds of milliseconds. So eventual consistency helps hide communication latencies on writes at the cost of the probable old state observed by readers. However, many use cases can easily tolerate it.

4.3.1.1. Object Storage Infrastructure

The infrastructure of Virtuozzo Object Storage consists of the following entities: object servers, name servers, S3 gateways, and the block level backend.

../_images/vzos_diagram.png
  • Object server (OS) stores actual object data (contents) received from S3 gateway. It stores its own data in block storage with built-in high availability.

  • Name server stores object metadata received from S3 gateway. Metadata includes object name, size, ACL (access control list), location, owner, and such. Name server (NS) also stores its own data in block storage with built-in high availability.

  • S3 gateway (GW) is a data proxy between object storage services and end users. It receives and handles Amazon S3 protocol requests and uses nginx Web server for external connections. S3 gateway handles S3 user authentication and ACL checks. It has no data of its own (i.e. is stateless).

  • Block level backend is block storage with high availability of services and data. Since all object storage services run on hosts, no virtual environments (or respective licenses) are required for object storage.

4.3.1.2. Object Storage Overview

In terms of S3 object storage, a file is an object. Object servers store each object loaded via the S3 API as a pair of entities:

  • Object names and associated object metadata stored on an NS. An object name in the storage is determined based on request parameters and bucket properties in the following way:

    • If bucket versioning is disabled, an object name in the storage contains bucket name and object name taken from an S3 request.

    • If bucket versioning is enabled, an object name also contains a list of object versions.

  • Object data stored on an OS. The directory part of an object name determines an NS to store it while the full object name determines an OS to store the object data.

4.3.1.2.1. Interaction between S3 Storage and the Storage Cluster

An S3 storage cluster requires a working Virtuozzo Storage cluster on each of S3 cluster nodes. Virtuozzo Storage provides content sharing, strong consistency, data availability, reasonable performance for random I/O operations, and high availability for storage services. In storage terms, S3 data is a set of files (see Object Server) that the Virtuozzo Storage file system layer (vstorage-mount) does not interpret in any way.

4.3.1.2.2. Multipart Uploads

A name of a multipart upload is defined by a pattern similar to that of an object name but the object that corresponds to it contains a table instead of file contents. The table contains index numbers of parts and their offsets within the file. This allows to upload parts of a multi-part upload in parallel (recommended for large files). The maximum number of parts is 10,000.

4.3.1.3. Object Storage Components

This section familiarises you with S3 storage components—gateways, object servers, and name servers—and describes S3 management tools and service buckets.

4.3.1.3.1. Gateway

Gateway performs the following functions:

  • Receives S3 requests from the web server (via nginx and FastCGI).

  • Parses S3 packets and validates S3 requests (checks fields of a request and XML documents in its body).

  • Authenticates S3 users.

  • Validates access permissions to buckets and objects using ACL.

  • Collects statistics on the number of various requests as well as the amount of the data received and transmitted.

  • Determines paths to NS and OS storing the object’s data.

  • Inquires names and associated metadata from NS.

  • Receives links to objects stored on OSes by requesting the name from NSes.

  • Caches metadata and ACL of S3 objects received from NSes as well as the data necessary for user authentication also stored on the NSes.

  • Acts as a proxy server when clients write and read object data to and from the OSes. Only the requested data is transferred during read and write operations. For example, if a user requests to read 10MB from a 1TB object, only said 10MB will be read from the OS.

S3 gateway consists of incoming requests parser, type-dependent asynchronous handlers of these requests, and an asynchronous handler of the interrupted requests that require completion (complex operations such as bucket creation or removal). Gateway does not store its state data in the long-term memory. Instead, it stores all the data needed for S3 storage in the object storage itself (on NS and OS).

4.3.1.3.2. Name Server

Name server performs the following functions:

  • Stores object names and metadata.

  • Provides the API for pasting, deleting, listing object names and changing object metadata.

Name server consists of data (i.e. object metadata), object change log, an asynchronous garbage collector, and asynchronous handlers of incoming requests from different system components.

The data is stored in a B-tree where to each object’s name corresponds that object’s metadata structure. S3 object metadata consists of three parts: information on object, user-defined headers (optional), and ACL for the object. Files are stored in the corresponding directory on base shared storage (i.e. Virtuozzo Storage).

Name server is responsible for a subset of S3 cluster object namespace. Each NS instance is a userspace process that works in parallel with other processes and can utilize up to one CPU core. The optimal number of name servers are 4-10 per node. We recommend to start with creating 10 instances per node during cluster creation to simplify scalability later. If your node has CPU cores that are not utilized by other storage services, you can create more NSes to utilize these CPU cores.

4.3.1.3.3. Object Server

Object server performs the following functions:

  • Stores object data in pools (data containers).

  • Provides an API for creating, reading (including partial reads), writing to, and deleting objects.

Object server consists of the following:

  • Information on object’s blocks stored on this OS

  • Containers that store object data

  • Asynchronous garbage collector that frees container sections after object delete operations

Object data blocks are stored in pools. The storage uses 12 pools with blocks the size of the power of 2, ranging from 4 kilobytes to 8 megabytes. A pool is a regular file on block storage made of fixed-size blocks (regions). In other words, each pool is an extremely large file designed to hold objects of specific size: the first pool is for 4KB objects, the second pool is for 8KB objects, etc.

Each pool consists of a block with system information, and fixed-size data regions. Each region contains has a free/dirty bit mask. The region’s data is stored in the same file with an object’s B-tree. It provides atomicity during the block’s allocation and deallocation. Every block in the region contains a header and object’s data. The header stores the ID of an object to which the data belong. The ID is required for a pool-level defragmentation algorithm that does not have an access to the object’s B-tree. A pool to store an object is chosen depending on object size.

For example, a 30KB object will be placed into the pool for 32KB objects and will occupy a single 32KB object. A 129KB object will be split into one 128KB part and one 1KB part. The former will be placed in the pool for 128KB objects while the latter will go to the pool for 4KB objects. The overhead may seem significant in case of small objects as even a 1-byte object will occupy a 4KB block. In addition, about 4KB of metadata per object will be stored on NS. However, this approach allows achieving the maximum performance, eliminates free space fragmentation, and offers guaranteed object insert performance. Moreover, the larger the object, the less noticeable the overhead. Finally, when an object is deleted, its pool block is marked free and can be used to store new objects.

Multi-part objects are stored as parts (each part being itself an object) that may be stored on different object servers.

4.3.1.3.4. S3 Management Tools

Object storage has two tools:

  • ostor-ctl for configuring storage components

  • ostor-s3-admin for user management, an application that allows to create, edit, and delete S3 user accounts as well as manage account access keys (create and delete paired S3 access key IDs and S3 secret access keys)

4.3.1.3.5. Service Bucket

The service bucket stores service and temporary information necessary for the S3 storage. This bucket is only accessible by the S3 admin (while the system admin would need access keys created with the ostor-s3-admin tool).

4.3.1.4. Data Interchange

In Virtuozzo object storage, every service has a 64-bit unique identifier. At the same time, every object has a unique name. The directory part of an object’s name determines a name server to store it, and the full object’s name—an object server to store the object’s data. Name and object server lists are stored in a vstorage cluster directory intended for object storage data and available to anyone with a cluster access. This directory includes subdirectories that correspond to services hosted on name and object servers. The names of subdirectories match hexadecimal representations of the service’s ID. In each service’s subdirectory, there is a file containing an ID of a host that runs the service. Thus, with the help of a gateway, a system component with a cluster access can discover an ID of a service, detect its host, and send a request to it.

S3 gateway handles data interchange with the following components:

  • Clients via a web server. Gateway receives S3 requests from users and responds to them.

  • Name servers. Gateway creates, deletes, changes the names that correspond to S3 buckets or objects, checks their existence, and requests name sets of bucket lists.

  • Object servers in the storage. Gateway sends data altering requests to object and name servers.

4.3.1.4.1. Data Caching

To enable efficient data use in object storage, all gateways, name servers, and object servers cache the data they store. Name and object servers both cache B-trees.

Gateways store and cache the following data received from name services:

  • Lists of paired user IDs and e-mails.

  • Data necessary for user authentication: access key IDs and secret access keys. For more information on their semantics, consult the Amazon S3 documentation.

  • Metadata and bucket’s ACLs. The metadata contains its epoch, current version identifier and transmits it to NS to check if the gateway has the latest version of the metadata.

4.3.1.5. Operations on Objects

This section familiarizes you with operations S3 storage processes: operations requests; create, read, and delete operations.

4.3.1.5.1. Operation Requests

To create, delete, read an object or alter its data, S3 object storage must first request one if these operations and then perform it. The overall process of requesting and performing an operation consists of the following:

  1. Requesting user authentication data. It will be stored on a name server in a specific format (see Service Buckets). To receive data (identifier, e-mail, access keys), a request with a lookup operation code is sent to an appropriate name server.

  2. Authenticating the user.

  3. Requesting bucket’s and object’s metadata. To receive it, another request with a lookup operation code is sent to the name server that stores names of objects and buckets.

  4. Checking user’s access permissions to buckets and objects.

  5. Performing the requested object operation: creating, editing or reading data or deleting the object.

4.3.1.5.2. Create Operation

To create an object, gateway sends the following requests:

  1. Request with a guard operation code to a name server. It creates a guard with a timer which will check after a fixed time period if an object with the data was indeed created. If it was not, the create operation will fail and the guard will request the object server to delete the object’s data if some were written. After that the guard is deleted.

  2. Request with a create operation code to an object server followed by fixed-size messages containing the object’s data. The last message includes an end-of-data flag.

  3. Another request with a create operation code to the name server. The server checks if the corresponding guard exists and, if it does not, the operation fails. Otherwise, the server creates a name and sends a confirmation of successful creation to the gateway.

4.3.1.5.3. Read Operation

To fulfill an S3 read request, gateway determines an appropriate name server’s identifier based on the name of a directory and corresponding object server’s identifier based on the object’s full name. To perform a read operation, gateway sends the following requests:

  1. Request with a read operation code to an appropriate name server. A response to it contains a link to an object.

  2. Request to an appropriate object server with a read operation code and a link to an object received from the name server.

To fulfill the request, object server transmits fixed-size messages with the object’s data to the gateway. The last message contains an end-of-data flag.

4.3.1.5.4. Delete Operation

To delete an object (and its name) from the storage, gateway determines a name server’s identifier based on the directory’s part of a name and sends a request with a delete operation code to the server. In turn, the name server removes the name from its structures and sends the response. After some time, the garbage collector removes the corresponding object from the storage.

4.3.2. Deploying Object Storage

This chapter describes deploying object storage on top of a ready Virtuozzo Storage cluster. As a result you will create a setup like shown on the figure. Note that not all cluster nodes have to run object storage services. The choice should be based on workload and hardware configurations.

../_images/vzos_example.png

To set up object storage services, do the following:

  1. Plan the S3 network. Like a Virtuozzo Storage cluster, an object storage cluster needs two networks:

    • An internal network in which NS, OS, and GW will interact. These services will generate traffic similar in amount to the total (incoming and outgoing) S3 user traffic. If this is not going to be much, it is reasonable to use the same internal network for both object storage and Virtuozzo Storage. If, however, you expect that object storage traffic will compete with Virtuozzo Storage traffic, it is reasonable to have S3 traffic go through the user data network (i.e. datacenter network). Once you choose a network for S3 traffic, you determine which IP addresses can be used while adding cluster nodes.

    • An external (public) network through which end users will access the S3 storage. Standard HTTP and HTTPS ports must be open in this network.

    An object storage cluster is almost completely independent on base block storage (like all access points, including virtual environments and iSCSI). Object and name servers keep their data in the Virtuozzo Storage cluster in the same way as virtual environments, iSCSI, and other services do. So the OS and NS services depend on vstorage-mount (client) and can only work when the cluster is mounted. Unlike them, gateway is a stateless service that has no data. It is thus independent on vstorage-mount and can theoretically be run even on nodes where the Virtuozzo Storage cluster is not mounted. However, for simplicity, we recommend creating gateways on nodes with object and name servers.

    Object and name servers also utilize the standard high availability means of Virtuozzo Storage (i.e. the shaman service). Like virtual environments and iSCSI, OS and NS are subscribed to HA cluster events. However, unlike other services, S3 cluster components cannot be managed (tracked and relocated between nodes) by shaman. Instead, this is done by the S3 configuration service that is subscribed to HA cluster events and notified by shaman whether nodes are healthy and can run services. For this reason, S3 cluster components are not shown in shaman top output.

    Gateway services which are stateless are never relocated and their high availability is not managed by the Virtuozzo Storage cluster. Instead, a new gateway service is created when necessary.

  2. Make sure that each node that will run OS and NS services is in the high availability cluster. You can add nodes to HA cluster with the shaman join command.

  3. Install the vstorage-ostor package on each cluster node.

    # yum install vstorage-ostor
    
  4. Create a cluster configuration on one of the cluster nodes where object storage services will run. It is recommended to create 10 NS and 10 OS services per each node. For example, if you are going to use five nodes, you will need 50 NS and 50 OS. Run this command on the first cluster node.

    # ostor-ctl create -r /var/lib/ostor/configuration -n <IP_addr>
    

    Where <IP_addr> is the node’s IP address (that belongs to the internal S3 network) that the configuration service will listen on.

    You will be asked to enter and confirm a password for the new object storage (it can be the same as your Virtuozzo Storage cluster password). You will need this password to add new nodes.

    The configuration service will store the cluster configuration locally in /var/lib/ostor/configuration. In addition, <IP_addr> will be stored in /<storage_mount>/<ostor_dir>/control/name (<ostor_dir> is the directory in the cluster with object storage service files). If the first configuration service fails (and the ostor-ctl get-config command stops working), replace the IP address in /<storage_mount>/<ostor_dir>/control/name with that of a node running a healthy configuration service (created on step 6).

  5. Launch the configuration service.

    # systemctl start ostor-cfgd.service
    # systemctl enable ostor-cfgd.service
    
  6. Add at least two more configuration services for redundancy (to have at least three in total). A configuration service is only required for adding and removing nodes to and from the S3 cluster and does not affect operation of S3 services and their high availability. So a failure of a configuration service is not critical for the S3 cluster. However, it is still undesirable and we recommend creating several configuration services so at least one is always up.

    To add one more configuration service, run the following commands on a node where object storage services will run. Repeat to create the required number of configuration services.

    # ostor-ctl join -n <remote_IP_addr> -a <local_IP_addr>
    # systemctl start ostor-cfgd.service
    # systemctl enable ostor-cfgd.service
    

    Where <remote_IP_addr> is <IP_addr> from step 4.

    Each added configuration service will store the cluster configuration locally in /var/lib/ostor/configuration.

  7. Initialize the new object storage on the first node. The <ostor_dir> directory will be created in the root of your cluster.

    # ostor-ctl init-storage -n <IP_addr> -s <cluster_mount_point>
    

    You will need to provide the IP address and object storage password specified on step 3.

  8. Add to the DNS public IP addresses of nodes that will run GW services. You can configure the DNS to enable access to your object storage via a hostname, and to have the S3 endpoint receive virtual hosted-style REST API requests with URIs like http://bucketname.s3.example.com/objectname.

    After configuring DNS, make sure that DNS resolver for your S3 access point works from client machines.

    Note

    Only buckets with DNS-compatible names can be accessed with virtual hosted-style requests. For more details, see Bucket and Key Naming Policies.

    Below is an example of a DNS zones configuration file for the BIND DNS server:

    ;$Id$
    $TTL 1h  @  IN   SOA    ns.example.com. s3.example.com. (
                            2013052112      ; serial
                            1h      ; refresh
                            30m     ; retry
                            7d      ; expiration
                            1h )    ; minimum
                    NS      ns.example.com.
    $ORIGIN s3.example.com
    h1 IN   A 10.29.1.95
            A 10.29.0.142
            A 10.29.0.137
    * IN    CNAME    @
    

    This configuration instructs the DNS to redirect all requests with URI http://s3.example.com and its subdomains (http://*.s3.example.com/*) to one of the endpoints listed in resource record h1 (10.29.1.95, 10.29.0.142 or 10.29.0.137) in a cyclic (round-robin) manner.

  9. Add nodes where object storage services will run to the configuration.

    Note

    Adding nodes to existing clusters is done in the similar way by performing steps 8-12.

    To do this, run the ostor-ctl add-host command on every such node:

    # ostor-ctl add-host -r /var/lib/ostor/configuration --hostname <name> --roles OBJ
    

    You will need to provide the object storage password set on step 3.

    Note

    If you want the object storage agent service to listen on an internal IP address, add the option -H <internal_IP_address> to the command above.

  10. Create a new S3 volume with the desired number of NS and OS:

    # ostor-ctl add-vol --type OBJ -s <cluster_mount_point> --os-count <OS_num> \
    --ns-count <NS_num> --vstorage-attr "failure-domain=host,tier=0,replicas=3"
    

    Where:

    • <NS_num> and <OS_num> are the numbers of NS and OS

    • failure-domain=host, tier=0, replicas=3 parameters set volume’s failure domain, tier, and redundancy mode (for more details, see Cluster Parameters Overview).

    The command will return the ID for the created volume. You will need it on the next step.

  11. Create S3 gateway instances on chosen nodes with Internet access and external IP addresses. It is recommended to create 4 GWs per node.

    Note

    For security reasons, make sure that only nginx can access the external network and that S3 gateways only listen on internal IP addresses.

    # ostor-ctl add-s3gw -a <internal_IP_address>:<port> -V <volume_ID>
    

    Where:

    • <internal_IP_address> is the internal IP address of the node with the gateway

    • <port> (mandatory) is an unused port unique for each GW instance on the node

    • <volume_ID> is the ID of the volume you created on the previous step (it can also be obtained from ostor-ctl get-config)

    For example:

    # ostor-ctl add-s3gw -a 127.0.0.1:9001 -V 0100000000000001
    # ostor-ctl add-s3gw -a 127.0.0.1:9002 -V 0100000000000001
    # ostor-ctl add-s3gw -a 127.0.0.1:9003 -V 0100000000000001
    # ostor-ctl add-s3gw -a 127.0.0.1:9004 -V 0100000000000001
    
  12. Launch object storage agent on each cluster node added to the object storage configuration.

    # systemctl start ostor-agentd.service
    # systemctl enable ostor-agentd.service
    
  13. Make sure NS and OS services are bound to the nodes.

    By default agents will try to assign NS and OS services to the nodes automatically in a round-robin manner. However, manual assignment is required if a new host has been added to the configuration, or if the current configuration is not optimized (for details, see Manually Binding Services to Nodes.

    You can check the current binding configuration with the ostor-ctl agent-status command. For example:

    # ostor-ctl agent-status
    TYPE     SVC_ID               STATUS          UPTIME  HOST_ID            ADDRS
    S3GW     8000000000000009     ACTIVE             527  fcbf5602197245da   127.0.0.1:9090
    S3GW     8000000000000008     ACTIVE             536  4f0038db65274507   127.0.0.1:9090
    S3GW     8000000000000007     ACTIVE             572  958e982fcc794e58   127.0.0.1:9090
    OS       1000000000000005     ACTIVE             452  4f0038db65274507   10.30.29.124:39746
    OS       1000000000000004     ACTIVE             647  fcbf5602197245da   10.30.27.69:56363
    OS       1000000000000003     ACTIVE             452  4f0038db65274507   10.30.29.124:52831
    NS       0800000000000002     ACTIVE             647  fcbf5602197245da   10.30.27.69:56463
    NS       0800000000000001     ACTIVE             452  4f0038db65274507   10.30.29.124:53044
    NS       0800000000000000     ACTIVE             647  fcbf5602197245da   10.30.27.69:37876
    
  14. On each node with GW instances, install nginx that will serve S3 requests from end users:

    # yum install nginx
    

    Use the ostor-configure-nginx tool to configure nginx for S3. There are two configuration options:

    • create that creates a new configuration

    • update that updates the existing upstream configuration; use it after changing GW configuration.

    For the initial nginx configuration, use create. For example, for HTTP:

    # ostor-configure-nginx create -D s3.mydomain.com -p 80
    

    Where s3.mydomain.com is the S3 endpoint domain and 80 is the port for nginx to listen to.

    To configure HTTP with an SSL certificate for the S3 endpoint domain and its subdomains, specify the certificate and key. For example:

    # ostor-configure-nginx create -D s3.mydomain.com -p 443 **ssl **ssl-cert file.cert **ssl-key file.key
    

    The configuration file will be created at /etc/nginx/conf.d/ostor-s3.conf. It will handle FastCGI redirection to local GW instances.

  15. Launch nginx:

    # systemctl start nginx.service
    # systemctl enable nginx.service
    
  16. Add nodes that are in the HA cluster but run no S3 services to the S3 cluster. That is, make sure that all nodes in the HA cluster are also in the S3 cluster. This is required for high availability to work correctly.

    # ostor-ctl add-host -n <IP_addr>
    # systemctl start ostor-agentd.service
    # systemctl enable ostor-agentd.service
    

The object storage is deployed. Now you can add S3 users with the ostor-s3-admin tool as described in Creating S3 Users.

To check that installation has been successful or just monitor object storage status, use the ostor-ctl get-config command. For example:

# ostor-ctl get-config
07-08-15 11:58:45.470 Use configuration service 'ostor'
SVC_ID             TYPE  URI
8000000000000006   S3GW  svc://1039c0dc90d64607/?address=127.0.0.1:9000
0800000000000000     NS  vstorage://cluster1/ostor/services/0800000000000000
1000000000000001     OS  vstorage://cluster1/ostor/services/1000000000000001
1000000000000002     OS  vstorage://cluster1/ostor/services/1000000000000002
1000000000000003     OS  vstorage://cluster1/ostor/services/1000000000000003
1000000000000004     OS  vstorage://cluster1/ostor/services/1000000000000004
8000000000000009   S3GW  svc://7a1789d20d9f4490/?address=127.0.0.1:9000
800000000000000c   S3GW  svc://7a1789d20d9f4490/?address=127.0.0.1:9090

4.3.2.1. Manually Binding Services to Nodes

When deploying object storage, you can manually bind services to nodes with the ostor-ctl bind command. You will need to specify the target node ID and one or more service IDs to bind to it. For example, the command:

# ostor-ctl bind -H 4f0038db65274507 -S 0800000000000001 -S 1000000000000003 -S 1000000000000005

binds services with IDs 800000000000001, 1000000000000003, and 1000000000000005 to a host with ID 4f0038db65274507.

A service can only be bound to a host that is connected to the shared storage which stores that service’s data. That is, the cluster name in service URI must match the cluster name in host URI.

For example, in a configuration with two shared storages stor1 and stor2 (see below) services with URIs starting with vstorage://stor1 can only be bound to hosts host510 and host511 while services with URIs starting with vstorage://stor2 can only be bound to hosts host512 and host513.

# ostor-ctl get-config
SVC_ID             TYPE  URI
0800000000000000     NS  vstorage://stor1/s3-data/services/0800000000000000
0800000000000001     NS  vstorage://stor1/s3-data/services/0800000000000001
0800000000000002     NS  vstorage://stor2/s3-data/services/0800000000000002
1000000000000003     OS  vstorage://stor1/s3-data/services/1000000000000003
1000000000000004     OS  vstorage://stor2/s3-data/services/1000000000000004
1000000000000005     OS  vstorage://stor1/s3-data/services/1000000000000005
HOST_ID            HOSTNAME      URI
0fcbf5602197245da  host510:2530  vstorage://stor1/s3-data
4f0038db65274507   host511:2530  vstorage://stor1/s3-data
958e982fcc794e58   host512:2530  vstorage://stor2/s3-data
953e976abc773451   host513:2530  vstorage://stor2/s3-data

4.3.3. Managing S3 Users

The concept of S3 user is one of the base concepts of object storage along with those of object and bucket (container for storing objects). Amazon S3 protocol uses permissions model based on access control lists (ACLs) where each bucket and each object is assigned an ACL that lists all users with access to the given resource and the type of this access (read, write, read ACL, write ACL). The list of users includes entity owner assigned to every object and bucket at creation. Entity owner has extra rights compared to other users, for example, bucket owner is the only one who can delete that bucket.

User model and access policies implemented in Virtuozzo Object Storage comply with the Amazon S3 user model and access policies.

User management scenarios in Virtuozzo Object Storage are largely based on the Amazon Web Services user management and include the following operations: create, query, delete users as well as generate, revoke user access key pairs.

You can manage users with the ostor-s3-admin tool. To do this, you will need to know the ID of the volume that the users are in. You can obtain it with the ostor-ctl get-config command. For example:

# ostor-ctl get-config -n 10.94.97.195
VOL_ID             TYPE     STATE
0100000000000002   OBJ     READY
...

Note

As ostor-s3-admin commands are assumed to be issued by object storage administrators, they do not include any authentication or authorization checks.

4.3.3.1. Creating S3 Users

You can generate a unique random S3 user ID and an access key pair (S3 Access Key ID, S3 Secret Access Key) using the ostor-s3-admin create-user command. You need to specify a user email. For example:

# ostor-s3-admin create-user -e user@email.com -V 0100000000000002
UserEmail:user@email.com
UserId:a49e12a226bd760f
KeyPair[0]:S3AccessKeyId:a49e12a226bd760fGHQ7
KeyPair[0]:S3SecretAccessKey:HSDu2DA00JNGjnRcAhLKfhrvlymzOVdLPsCK2dcq
Flags:none

S3 user ID is a 16-digit hexadecimal string. The generated access key pair is used to sign requests to the S3 object storage according to the Amazon S3 Signature Version 2 authentication scheme.

4.3.3.2. Listing S3 Users

You can list all object storage users with the ostor-s3-admin query-users command. Information for each user can take one or more sequential rows in the table. Additional rows are used to lists S3 access key pairs associated with the user. If the user does not have any active key pairs, minus signs are shown in the corresponding table cells. For example:

# ostor-s3-admin query-users -V 0100000000000002
      S3 USER ID      S3 ACCESS KEY ID              S3 SECRET ACCESS KEY  S3 USER EMAIL
bf0b3b15eb7c9019  bf0b3b15eb7c9019I36Y                               ***  user2@abc.com
d866d9d114cc3d20  d866d9d114cc3d20G456                               ***  user1@abc.com
                  d866d9d114cc3d20D8EW                               ***
e86d1c19e616455                      -                                 -  user3@abc.com

To output the list in XML, use the -X option; to output secret keys, use the -a option. For example:

# ostor-s3-admin query-users -V 0100000000000002 -a -X
<?xml version="1.0" encoding="UTF-8"?><QueryUsersResult><Users><User><Id>a49e12a226bd760f</Id><Email>user@email.com</Email><Keys><OwnerId>0000000000000000</OwnerId><KeyPair><S3AccessKeyId>a49e12a226bd760fGHQ7</S3AccessKeyId><S3SecretAccessKey>HSDu2DA00JNGjnRcAhLKfhrvlymzOVdLPsCK2dcq</S3SecretAccessKey></KeyPair></Keys></User><User><Id>d7c53fc1f931661f</Id><Email>user@email.com</Email><Keys><OwnerId>0000000000000000</OwnerId><KeyPair><S3AccessKeyId>d7c53fc1f931661fZLIV</S3AccessKeyId><S3SecretAccessKey>JL7gt1OH873zR0Fzv8Oh9ZuA6JtCVnkgV7lET6ET</S3SecretAccessKey></KeyPair></Keys></User></Users></QueryUsersResult>

4.3.3.3. Querying S3 User Information

To display information about the specified user, use the ostor-s3-admin query-user-info command. You need to specify either the user email (-e) or S3 ID (-i). For example:

# ostor-s3-admin query-user-info -e user@email.com -V 0100000000000002
Query user: user id=d866d9d114cc3d20, user email=user@email.com
Key pair[0]: access key id=d866d9d114cc3d20G456,
secret access key=5EAne6PLL1jxprouRqq8hmfONMfgrJcOwbowCoTt
Key pair[1]: access key id=d866d9d114cc3d20D8EW,
secret access key=83tTsNAuuRyoBBqhxMFqHAC60dhKHtTCCkQe54zu

4.3.3.4. Disabling S3 Users

You can disable a user with the ostor-s3-admin disable-user command. You need to specify either the user email (-e) or S3 ID (-i). For example:

# ostor-s3-admin disable-user -e user@email.com -V 0100000000000002

4.3.3.5. Deleting S3 Users

You can delete existing object storage users with the ostor-s3-admin delete-user command. Users who own any buckets cannot be deleted, so delete user’s buckets first. You need to specify either the user email (-e) or S3 ID (-i). For example:

# ostor-s3-admin delete-user -i bf0b3b15eb7c9019 -V 0100000000000002
Deleted user: user id=bf0b3b15eb7c9019

4.3.3.6. Generating S3 User Access Key Pairs

You can generate a new access key pair for the specified user with the ostor-s3-admin gen-access-key command. The maximum of 2 active access key pairs are allowed per user (same as with the Amazon Web Services). You need to specify either the user email (-e) or S3 ID (-i). For example:

# ostor-s3-admin gen-access-key -e user@email.com -V 0100000000000002
Generate access key: user id=d866d9d114cc3d20, access key id=d866d9d114cc3d20D8EW,
secret access key=83tTsNAuuRyoBBqhxMFqHAC60dhKHtTCCkQe54zu

Note

It is recommended to periodically revoke old and generate new access key pairs.

4.3.3.7. Revoking S3 User Access Key Pairs

You can revoke the specified access key pair of the specified user with the ostor-s3-admin revoke-access-key command. You need to specify the access key in the key pair you want to delete as well as the user email or S3 ID. For example:

# ostor-s3-admin revoke-access-key -e user@email.com -k de86d1c19e616455YIPU -V 0100000000000002
Revoke access key: user id=de86d1c19e616455, access key id=de86d1c19e616455YIPU

Note

It is recommended to periodically revoke old and generate new access key pairs.

4.3.4. Managing Object Storage Buckets

All objects in Amazon S3-like storage are stored in containers named buckets. Buckets are addressed by names that are unique in the given object storage, so an S3 user of that object storage cannot create a bucket that has the same name as a different bucket in the same object storage. Buckets are used to:

  • Group and isolate objects from those in other buckets.

  • Provide ACL management mechanisms for objects in them.

  • Set per-bucket access policies, for example, versioning in the bucket.

You can manage buckets with the ostor-s3-admin tool as well as S3 API third-party S3 browsers like CyberDuck or DragonDisk. The ostor-s3-admin tool is to be used by object storage administrators, so these commands do not include any authentication or authorization checks. It is recommended to use standard Amazon S3 API commands first.

To manage buckets via CLI, you will need to know the ID of the volume that the buckets are in. You can obtain it with the ostor-ctl get-config command. For example:

# ostor-ctl get-config -n 10.94.97.195
VOL_ID             TYPE     STATE
0100000000000002   OBJ     READY
<...>

Important

The change and delete bucket operations are forced on the object storage. These commands are not part of the standard S3 API and may break integration with external billing and accounting systems. Use them only for a good reason and when you know what you are doing.

4.3.4.1. Managing Buckets with CyberDuck

4.3.4.1.1. Creating Buckets

To create a new S3 bucket with CyberDuck, do the following:

  1. Click Open Connection.

  2. Specify the following parameters:

    • The external DNS name for the S3 endpoint that you specified when creating the S3 cluster.

    • The Access Key ID and the Secret Access Key of an object storage user (see Creating S3 Users).

    ../_images/stor_image50.png

    By default, the connection is established over HTTPS. To use CyberDuck over HTTP, you must install a special S3 profile.

  3. Once the connection is established, click File > New Folder.

    ../_images/stor_image51.png
  4. Specify a name for the new bucket, and then click Create.

    Note

    It is recommended to use bucket names that comply with DNS naming conventions. For more information on bucket naming, see Bucket and Key Naming Policies.

The new bucket will appear in CyberDuck and you can manage it and upload files into it.

4.3.4.1.2. Managing Bucket Versions

Versioning is a way of keeping multiple variants of an object in the same bucket. You can use versioning to preserve, retrieve, and restore every version of every object stored in your S3 bucket. With versioning, you can easily recover from both unintended user actions and application failures. For more information about bucket versioning, refer to the Amazon documentation.

Bucket versioning is turned by default. You can turn it on from a third-party S3 browser by selecting a checkbox in the bucket property. For example:

../_images/stor_image52.png

4.3.4.1.3. Listing Bucket Contents

You can list bucket contents with a web browser. To do this, visit the URL that consists of the external DNS name for the S3 endpoint that you specified when creating the S3 cluster and the bucket name. For example, mys3storage.example.com/mybucket.

Note

You can also copy the link to bucket contents by right-clicking it in CyberDuck, and then selecting Copy URL.

4.3.4.2. Managing Buckets from Command Line

4.3.4.2.1. Listing Object Storage Buckets

You can list all buckets in the S3 object storage with the ostor-s3-admin list-all-buckets command. For each bucket, the command shows owner, creation data, versioning status, and total size (the size of all objects stored in the bucket plus the size of all unfinished multipart uploads for this bucket). For example:

# ostor-s3-admin list-all-buckets -V 0100000000000002
Total 3 buckets
BUCKET                OWNER             CREATION_DATE  VERSIONING     TOTAL SIZE, BYTES
bucket1    968d1a79968d1a79  2015-08-18T09:32:35.000Z        none                  1024
bucket2    968d1a79968d1a79  2015-08-18T09:18:20.000Z     enabled                     0
bucket3    968d1a79968d1a79  2015-08-18T09:22:15.000Z   suspended               1024000

To output the list in XML, use the -X option. For example:

# ostor-s3-admin list-all-buckets -X
<?xml version="1.0" encoding="UTF-8"?><ListBucketsResult><Buckets><Bucket><Name>bucker2</Name><Owner>d7c53fc1f931661f</Owner><CreationDate>2017-04-03T17:11:44.000Z</CreationDate><Versioning>none</Versioning><Notary>off</Notary><TotalSize>0</TotalSize></Bucket><Bucket><Name>bucket1</Name><Owner>d7c53fc1f931661f</Owner><CreationDate>2017-04-03T17:11:33.000Z</CreationDate><Versioning>none</Versioning><Notary>off</Notary><TotalSize>0</TotalSize></Bucket></Buckets></ListBucketsResult>

To filter buckets by user who owns them, use the -i option. For example:

# ostor-s3-admin list-all-buckets -i d7c53fc1f931661f
BUCKET   OWNER             CREATION_DATE             VERSIONING  TOTAL_SIZE NOTARY NOTARY_PROVIDER
bucker2  d7c53fc1f931661f  2017-04-03T17:11:44.000Z  none        0          off    0

4.3.4.2.2. Querying Object Storage Bucket Information

You can query bucket metadata information and ACL with the ostor-s3-admin query-bucket-info command. For example, for bucket1:

# ostor-s3-admin query-bucket-info -b bucket1 -V 0100000000000002
BUCKET   OWNER             CREATION_DATE             VERSIONING  TOTAL_SIZE
bucket1  d339edcf885eeafc  2017-12-21T12:42:46.000Z  none        0

ACL: d339edcf885eeafc: FULL_CONTROL

4.3.4.2.3. Changing Object Storage Bucket Owners

You can pass ownership of a bucket to the specified user with the ostor-s3-admin change-bucket-owner command. For example, to make user with ID bf0b3b15eb7c9019 the owner of bucket1:

# ostor-s3-admin change-bucket-owner -b bucket1 -i bf0b3b15eb7c9019 -V 0100000000000002
Changed owner of the bucket bucket1. New owner bf0b3b15eb7c9019

4.3.4.2.4. Deleting Object Storage Buckets

You can delete the specified bucket with the ostor-s3-admin delete-bucket command. Deleting a bucket will delete all objects in it (including their old versions) as well as all unfinished multipart uploads for this bucket For example:

# ostor-s3-admin delete-bucket -b bucket1 -V 0100000000000002

4.3.5. Best Practices for Using Object Storage

This chapter describes recommendations on using various features of Virtuozzo Object Storage. These recommendations are called to help you enable additional functionality or improve convenience or performance of Virtuozzo Object Storage.

4.3.5.1. Bucket and Key Naming Policies

It is recommended to use bucket names that comply with DNS naming conventions:

  • 3 to 63 characters long.

  • Start and end with a lowercase letter or number.

  • Contain only lowercase letters, numbers, periods (.), hyphens (-), and underscores (_).

  • Can be a series of valid name parts (described previously) separated by periods.

An object key can be a string of any UTF-8 encoded characters up to 1024 bytes long.

4.3.5.2. Improving Performance of PUT Operations

Object storage supports uploading of objects as large as 5 GB in size with a single PUT request. Upload performance can be improved, however, by splitting large objects into pieces and uploading them concurrently with multipart upload API. This approach will divide the load between multiple OS services.

It is recommended to use multipart uploads for objects larger than 5 MB.

4.3.6. Appendices

This section provides reference information related to Virtuozzo Object Storage.

4.3.6.1. Appendix A: Supported Amazon S3 REST Operations

The following Amazon S3 REST operations are currently supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

Service operations:

  • GET Service

Bucket operations:

  • DELETE Bucket

  • GET Bucket (List Objects)

  • GET Bucket acl

  • GET Bucket location

  • GET Bucket Object versions

  • GET Bucket versioning

  • HEAD Bucket

  • List Multipart Uploads

  • PUT Bucket

  • PUT Bucket acl

  • PUT Bucket versioning

Object operations:

  • DELETE Object

  • DELETE Multiple Objects

  • GET Object

  • GET Object ACL

  • HEAD Object

  • POST Object

  • PUT Object

  • PUT Object - Copy

  • PUT Object acl

  • Initiate Multipart Upload

  • Upload Part

  • Complete Multipart Upload

  • Abort Multipart Upload

  • List Parts

  • Upload Part Copy

Note

For a complete list of Amazon S3 REST operations, see Amazon S3 REST API documentation.

4.3.6.2. Appendix B: Supported Amazon Request Headers

The following Amazon S3 REST request headers are currently supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

  • x-amz-acl

  • x-amz-delete-marker

  • x-amz-grant-full-control

  • x-amz-grant-read-acp

  • x-amz-grant-read

  • x-amz-grant-write

  • x-amz-grant-write-acp

  • x-amz-meta-**

  • x-amz-version-id

  • x-amz-copy-source

  • x-amz-metadata-directive

  • x-amz-copy-source-version-id

4.3.6.3. Appendix C: Supported Authentication Schemes

The following authentication scheme is supported by the Virtuozzo Storage implementation of the Amazon S3 protocol:

4.3.6.4. Appendix D: Amazon S3 Features Supported by Bucket Policies

The Virtuozzo Storage implementation of the Amazon S3 bucket policies supports the following S3 actions, condition keys, and condition comparators:

Object actions:

  • s3:AbortMultipartUpload

  • s3:DeleteObject

  • s3:DeleteObjectTagging

  • s3:DeleteObjectVersion

  • s3:DeleteObjectVersionTagging

  • s3:GetObject

  • s3:GetObject

  • s3:GetObjectAcl

  • s3:GetObjectTagging

  • s3:GetObjectTorrent

  • s3:GetObjectVersion

  • s3:GetObjectVersionAcl

  • s3:GetObjectVersionTagging

  • s3:ListMultipartUploadParts

  • s3:PutObject

  • s3:PutObjectAcl

  • s3:PutObjectTagging

  • s3:PutObjectVersionAcl

  • s3:PutObjectVersionTagging

  • s3:RestoreObject

Bucket actions:

  • s3:CreateBucket

  • s3:DeleteBucket

  • s3:ListBucket

  • s3:ListBucketMultipartUploads

  • s3:ListBucketVersions

Bucket subresource actions:

  • s3:DeleteBucketPolicy

  • s3:DeleteBucketWebsite

  • s3:GetBucketAcl

  • s3:GetBucketCORS

  • s3:GetBucketLocation

  • s3:GetBucketLogging

  • s3:GetBucketNotification

  • s3:GetBucketPolicy

  • s3:GetBucketRequestPayment

  • s3:GetBucketTagging

  • s3:GetBucketVersioning

  • s3:GetBucketWebsite

  • s3:GetLifecycleConfiguration

  • s3:GetReplicationConfiguration

  • s3:PutBucketAcl

  • s3:PutBucketCORS

  • s3:PutBucketLogging

  • s3:PutBucketNotification

  • s3:PutBucketPolicy

  • s3:PutBucketRequestPayment

  • s3:PutBucketTagging

  • s3:PutBucketVersioning

  • s3:PutBucketWebsite

  • s3:PutLifecycleConfiguration

  • s3:PutReplicationConfiguration

Condition keys:

  • s3:x-amz-storage-class

  • s3:x-amz-acl

  • s3:x-amz-grant-full-control

  • s3:x-amz-grant-read

  • s3:x-amz-grant-read-acp

  • s3:x-amz-grant-write

  • s3:x-amz-grant-write-acp

  • s3:x-amz-copy-source

  • s3:TlsVersion

  • s3:x-amz-content-sha256

  • s3:signatureversion

  • s3:signatureAge

  • s3:authType

  • s3:x-amz-website-redirect-location

  • s3:object-lock-mode

  • s3:object-lock-retain-until-date

  • s3:object-lock-legal-hold

  • s3:object-lock-remaining-retention-days

  • s3:prefix

  • s3:versionid

  • s3:max-keys

  • s3:locationconstraint

  • aws:SourceIp

Condition comparators:

  • StringNotEquals

  • StringEqualsIgnoreCase

  • StringNotEqualsIgnoreCase

  • StringLike

  • StringNotLike

  • NumericEquals

  • NumericNotEquals

  • NumericLessThan

  • NumericLessThanEquals

  • NumericGreaterThan

  • NumericGreaterThanEquals

  • DateEquals

  • DateNotEquals

  • DateLessThan

  • DateLessThanEquals

  • DateGreaterThan

  • DateGreaterThanEquals

  • BinaryEquals

  • IpAddress

  • NotIpAddress