8.3. Configuring Resource Relocation Modes

You can configure how the cluster will deal with situations when a node fails. Three modes are available:

  • DRS (default; only for virtual environments). In this mode, virtual machines and containers which were running on a failed node are relocated to healthy nodes based on available RAM and license capacity. This mode can be used for nodes on which the pdrs daemon is running. To learn how to configure this mode, refer to pdrs Configuration File.

    Note

    If CPU pools are used, virtual machines and containers can only be relocated to other nodes in the same CPU pool. For details, see Managing CPU Pools.

    The DRS mode works as follows. The master DRS continuously collects the following data from each healthy node in the cluster via SNMP:

    • Total node RAM

    • Total RAM used by virtual machines

    • Total RAM used by containers

    • Maximum running virtual machines allowed

    • Maximum running containers allowed

    • Maximum running virtual machines and containers allowed

    If a node fails, the shaman service sends a list of virtual machines and containers which were running on that node to the master DRS that sorts it by most required RAM. Using the collected data on node RAM and licenses, the master DRS then attempts to find a node with the most available RAM and a suitable license for the virtual environment on top of the list (requiring the most RAM). If such a node exists, the master DRS marks the virtual environment for relocation to that node. Otherwise, it marks the virtual environment as broken. Then the master DRS processes the next virtual environment down the list, adjusting the collected node data by the requirements of the previous virtual environment. Having processed all virtual environments on the list, the master DRS sends the list to the shaman service for actual relocation.

  • Round-robin (default fallback). In this mode, virtual machines, containers, and other resources from a failed node are relocated to healthy nodes in the round-robin manner. This mode cannot be configured. To switch to this mode, run:

    # shaman set-config RESOURCE_RELOCATION_MODE=round-robin
    
  • Spare. In this mode, virtual machines, containers, and other resources from a failed node are relocated to a spare node—an empty node with enough resources and a license to host all virtual environments from any given node in the cluster. Such a node is required for high availability to work in this mode.

    Before switching to this mode, make sure the spare node is added to the HA configuration and has no resources (virtual machines, containers, iSCSI targets, and S3 clusters) stored on it. To check this, run the shaman stat command on any node in the cluster and check that RESOURCES column shows zeroes for the node:

    # shaman stat
    Cluster 'stor1'
    Nodes: 3
    Resources: 12
    
       NODE_IP       STATUS     ROLES                    RESOURCES
    *  10.10.20.1    Active     VM:QEMU,CT:VZ7,ISCSI,S3  0 CT, 0 S3, 0 VM, 0 ISCSI
     M 10.10.20.2    Active     VM:QEMU,CT:VZ7,ISCSI,S3  4 CT, 1 S3, 3 VM, 1 ISCSI
       10.10.20.3    Active     VM:QEMU,CT:VZ7,S3        1 CT, 1 S3, 1 VM, 0 ISCSI
    

    In the example above, the current node (marked by the asterisk) is empty and can be used as spare.

    If the node is not empty, you can free it:

    • From VMs and containers by migrating them to the other cluster nodes

    • From iSCSI targets by unregistering them on said node and registering them on the other cluster nodes

    • From S3 resources by releasing said node from the S3 cluster

    Once you have a spare node in your cluster, you can switch to the spare mode by running:

    # shaman set-config RESOURCE_RELOCATION_MODE=spare
    

Additionally, you can set a fallback relocation mode in case the chosen relocation mode fails. For example:

# shaman set-config RESOURCE_RELOCATION_MODE=drs,spare

In case several types of resources are to be relocated from a failed node, the default sequence drs, round-robin works as follows:

  • Virtual machines and containers are relocated using the DRS mode, that is, according to the available RAM and license capacity on cluster nodes. If no suitable nodes are found for some virtual environments, these resources will be marked as broken and relocated to the master node in the stopped state.

  • All other types of resources (ISCSI, NFS, S3) are relocated using the round-robin mode.