8.3. Configuring Resource Relocation Modes¶
You can configure how the cluster will deal with situations when a node fails. Three modes are available:
DRS (default; only for virtual environments). In this mode, virtual machines and containers which were running on a failed node are relocated to healthy nodes based on available RAM and license capacity. This mode can be used for nodes on which the
pdrs
daemon is running. To learn how to configure this mode, refer to pdrs Configuration File.Note
If CPU pools are used, virtual machines and containers can only be relocated to other nodes in the same CPU pool. For details, see Managing CPU Pools.
The DRS mode works as follows. The master DRS continuously collects the following data from each healthy node in the cluster via SNMP:
Total node RAM
Total RAM used by virtual machines
Total RAM used by containers
Maximum running virtual machines allowed
Maximum running containers allowed
Maximum running virtual machines and containers allowed
If a node fails, the
shaman
service sends a list of virtual machines and containers which were running on that node to the master DRS that sorts it by most required RAM. Using the collected data on node RAM and licenses, the master DRS then attempts to find a node with the most available RAM and a suitable license for the virtual environment on top of the list (requiring the most RAM). If such a node exists, the master DRS marks the virtual environment for relocation to that node. Otherwise, it marks the virtual environment as broken. Then the master DRS processes the next virtual environment down the list, adjusting the collected node data by the requirements of the previous virtual environment. Having processed all virtual environments on the list, the master DRS sends the list to theshaman
service for actual relocation.Round-robin (default fallback). In this mode, virtual machines, containers, and other resources from a failed node are relocated to healthy nodes in the round-robin manner. This mode cannot be configured. To switch to this mode, run:
# shaman set-config RESOURCE_RELOCATION_MODE=round-robin
Spare. In this mode, virtual machines, containers, and other resources from a failed node are relocated to a spare node—an empty node with enough resources and a license to host all virtual environments from any given node in the cluster. Such a node is required for high availability to work in this mode.
Before switching to this mode, make sure the spare node is added to the HA configuration and has no resources (virtual machines, containers, iSCSI targets, and S3 clusters) stored on it. To check this, run the
shaman stat
command on any node in the cluster and check that RESOURCES column shows zeroes for the node:# shaman stat Cluster 'stor1' Nodes: 3 Resources: 12 NODE_IP STATUS ROLES RESOURCES * 10.10.20.1 Active VM:QEMU,CT:VZ7,ISCSI,S3 0 CT, 0 S3, 0 VM, 0 ISCSI M 10.10.20.2 Active VM:QEMU,CT:VZ7,ISCSI,S3 4 CT, 1 S3, 3 VM, 1 ISCSI 10.10.20.3 Active VM:QEMU,CT:VZ7,S3 1 CT, 1 S3, 1 VM, 0 ISCSI
In the example above, the current node (marked by the asterisk) is empty and can be used as spare.
If the node is not empty, you can free it:
From VMs and containers by migrating them to the other cluster nodes
From iSCSI targets by unregistering them on said node and registering them on the other cluster nodes
From S3 resources by releasing said node from the S3 cluster
Once you have a spare node in your cluster, you can switch to the spare mode by running:
# shaman set-config RESOURCE_RELOCATION_MODE=spare
Additionally, you can set a fallback relocation mode in case the chosen relocation mode fails. For example:
# shaman set-config RESOURCE_RELOCATION_MODE=drs,spare
In case several types of resources are to be relocated from a failed node, the default sequence drs, round-robin
works as follows:
Virtual machines and containers are relocated using the DRS mode, that is, according to the available RAM and license capacity on cluster nodes. If no suitable nodes are found for some virtual environments, these resources will be marked as broken and relocated to the master node in the stopped state.
All other types of resources (ISCSI, NFS, S3) are relocated using the round-robin mode.