8.3. Configuring Resource Relocation Modes

You can configure how the cluster will deal with situations when a node fails. Three modes are available:

  • DRS (default). In this mode, virtual machines and containers which were running on a failed node are relocated to healthy nodes based on available RAM and license capacity. This mode can be used for nodes on which the pdrs service is running.

    Note

    If CPU pools are used, virtual machines and containers can only be relocated to other nodes in the same CPU pool. For details, see Managing CPU Pools.

    The DRS mode works as follows. The master DRS continuously collects the following data from each healthy node in the cluster via SNMP:

    • total node RAM,
    • total RAM used by virtual machines,
    • total RAM used by containers,
    • maximum running virtual machines allowed,
    • maximum running containers allowed,
    • maximum running virtual machines and containers allowed.

    If a node fails, the shaman service sends a list of virtual machines and containers which were running on that node to the master DRS that sorts it by most required RAM. Using the collected data on node RAM and licenses, the master DRS then attempts to find a node with the most available RAM and a suitable license for the virtual environment on top of the list (requiring the most RAM). If such a node exists, the master DRS marks the virtual environment for relocation to that node. Otherwise, it marks the virtual environment as broken. Then the master DRS processes the next virtual environment down the list, adjusting the collected node data by the requirements of the previous virtual environment. Having processed all virtual environments on the list, the master DRS sends the list to the shaman service for actual relocation.

  • Spare. In this mode, virtual machines and containers from a failed node are relocated to a spare node—an empty node with enough resources and a license to host all virtual environments from any given node in the cluster. Such a node is required for high availability to work in this mode.

    Before switching to this mode, make sure the spare node is added to the HA configuration and has no resources (virtual machines, containers, iSCSI targets, and S3 clusters) stored on it. To check this, run the shaman stat command on any node in the cluster and check that RESOURCES column shows zeroes for the node:

    # shaman stat
    Cluster 'stor1'
    Nodes: 3
    Resources: 12
    
       NODE_IP       STATUS     ROLES                    RESOURCES
    *  10.10.20.1    Active     VM:QEMU,CT:VZ7,ISCSI,S3  0 CT, 0 S3, 0 VM, 0 ISCSI
     M 10.10.20.2    Active     VM:QEMU,CT:VZ7,ISCSI,S3  4 CT, 1 S3, 3 VM, 0 ISCSI
       10.10.20.3    Active     VM:QEMU,CT:VZ7,S3        1 CT, 1 S3, 1 VM, 1 ISCSI
    

    In the example above, the current node (marked by the asterisk) is empty and can be used as spare.

    If the node is not empty, you can free it:

    • from VMs and containers by migrating them to the other cluster nodes,
    • from iSCSI targets by unregistering them on said node and registering them on the other cluster nodes,
    • from S3 resources by releasing said node from the S3 cluster.

    Once you have a spare node in your cluster, you can switch to the spare mode by running:

    # shaman set-config RESOURCE_RELOCATION_MODE=spare
    
  • Round-robin (default fallback). In this mode, virtual machines, containers, and iSCSI targets from a failed node are relocated to healthy nodes in the round-robin manner. To switch to this mode, run:

    # shaman set-config RESOURCE_RELOCATION_MODE=round-robin
    

Additionally, you can set a fallback relocation mode in case the chosen relocation mode fails. For example:

# shaman set-config RESOURCE_RELOCATION_MODE=drs,spare