Recovering nodes

Information about storage services is stored on a system disk and may be lost in case of a system disk failure. If this happens, you can recover the system disk and the node configuration by reinstalling the product from an ISO image in the recovery mode. The recovery mode provides basic troubleshooting steps when a node fails to boot.

During the recovery process, the configuration of deployed services and infrastructure is automatically detected and recovered from storage disks.

Limitations

  • Single-node clusters cannot be recovered.
  • Management nodes with disabled high availability cannot be recovered.
  • The configuration of the compute, iSCSI, S3, and NFS services cannot be automatically recovered. For the manual recovery, contact the technical support.
  • Recovery is only possible if the node hardware configuration has not been changed.
  • The primary management node can only be recovered during an upgrade.
  • Remote iSCSI devices that are attached to cluster nodes as storage disks cannot be recovered. Data stored on such devices will be lost.

To recover a node

  1. Before recovering a node, place the node into the maintenance mode to evacuate services and virtual machines from the node. To do this, click Enter maintenance on the node right pane.
  2. Prepare the bootable media by using the distribution ISO image, as described in Preparing the bootable media.
  3. Attach the bootable media to the node, and then reboot the node.
  4. Configure the node to boot from the chosen media.
  5. Reinstall Virtuozzo Hybrid Infrastructure on the node:

    1. On the welcome screen, click Troubleshooting–>, and then Recover - Node recovery.
    2. On step 1, accept the End-User License Agreement by selecting I accept the End-User License Agreement, and then click Next.
    3. On step 2, specify the current network configuration for this node, and then click Next.
    4. On step 3, choose the correct time zone, and then click Next.
    5. On step 4, select No, add it to an existing cluster, and specify the private IP address of the management node and the token, obtained on the Infrastructure > Nodes > Connect node screen. Then, click Next.
    6. On step 5, choose the system disk for reinstalling the operating system, and then click Next.
    7. On step 6, enter and confirm the password for the root account, and then click Start installation.

    Once the installation is complete and the node is rebooted, the recovery script will be executed automatically. Wait until the node recovery is finished.

  6. If a node has the iSCSI, S3, or NFS services deployed:

    1. Release the node from these services.
    2. Return the node to operation by exiting the maintenance mode.
    3. Rejoin the node to the relevant services.
  7. If a node has the compute services deployed:

    1. The recovery process fails when a network with the VM public traffic type does not include the Internal management traffic type. This happens because this network is not reassigned to a public interface after the product reinstallation. In this case, do the following:

      1. Go to the node's Network interfaces tab in the admin panel.
      2. Assign the network with the VM public traffic type to your public interface.
      3. On the node pane, click Retry recovery.
    2. Return the node to operation by exiting the maintenance mode.
    3. Go to Settings > System settings > Management node high availability and destroy the HA configuration.
    4. On the Compute > Nodes screen, release the node from the compute cluster, and then add it again.
    5. Re-create the HA configuration.
  8. If a node is not included in the compute cluster but is a part of the HA configuration, go to Settings > System settings > Management node high availability and do one of the following:

    • If the cluster has spare nodes for replacement, replace the node in HA configuration.
    • If the cluster has no spare nodes for replacement, destroy the HA configuration, and then re-create it.

If the node recovery fails

  • On the node right pane, click Retry recovery to repeat the attempt to recover the node.
  • On the node right pane, click Cancel recovery to wipe out and re-import disks on the node.