Installing updates

Virtuozzo Hybrid Infrastructure supports non-disruptive rolling updates. Nodes are updated one by one, with the data availability unaffected.

An update may have the following impact on a node:

  • Reboot required. During such an update, a node needs to be rebooted to apply a new kernel. In this case, you can place the node into the maintenance mode and evacuate its services and virtual machines to other nodes, to avoid their downtime. During the node maintenance, the compute API services are fenced from processing compute requests. Once the node is updated, it is automatically rebooted. If the node entered maintenance, it returns to operation after the reboot and the migrated workloads, except for VMs, are moved back on the node. The evacuated VMs remain on other nodes.
  • Maintenance required. During such an update, a node needs to enter the maintenance mode to install new packages for major services. In this case, you can place the node into the maintenance mode and evacuate its services and virtual machines to other nodes, to avoid their downtime. During the node maintenance, the compute API services are fenced from processing compute requests. Once updated, the node exits maintenance without a reboot and the migrated workloads, except for VMs, are moved back on the node. The evacuated VMs remain on other nodes.
  • No impact. In this case, an update is performed without a node reboot or maintenance.

You can update different cluster components all together or separately. In either case, the components are updated in the following order:

  1. Cluster nodes are updated first. If any of these nodes are included in the compute cluster, the compute services on them are updated at this step.
  2. Management nodes are updated only when all of the cluster nodes are up to date. The primary management node is the last one to be updated. If management nodes have the compute services deployed, these services are updated at this step.
  3. The management panel (admin and self-service) is updated on management nodes and only when all of the nodes, both cluster and management, are up to date. While updating this component, management nodes do not require a reboot.

Limitations

  • Nodes must be updated only in the admin panel or via the vinfra tool. Do not use yum update.
  • Unassigned nodes can be updated.
  • Updates are applied to one node at a time.
  • You can only update management nodes all together, one at a time, and after updating all of the cluster nodes.
  • You can only update the management panel after updating all of the management and cluster nodes.
  • In a single-node deployment, the node does not enter maintenance during an update. If an update requires a node reboot or maintenance, the cluster downtime is expected.
  • Live migration is not supported for suspended virtual machines, as well as for virtual machines with attached vGPU or PCI devices.

Prerequisites

  • The storage cluster is created by following the instructions in Deploying the storage cluster.
  • Any third-party repositories are disabled.
  • The cluster is healthy and each node in the infrastructure is online.
  • The cluster DNS is configured, as described in Adding external DNS servers, and point to a DNS table to resolve external host names.

To update cluster components

Admin panel

  1. Open the Settings > Updates screen. The date of the last check is displayed in the upper right corner. Click the round arrow to check for new updates. If updates are available for a cluster component, its update status changes to Available. Check if a node needs a reboot or maintenance in the Update impact column.

  2. Click Download in the upper right corner to get the updates. Wait until the updates are downloaded and the update status changes to Ready to install.

    Once the updates are downloaded, the automatic check for updates is disabled until the updates are installed or the operation is canceled. To reset the software updates state and make it possible to check for a newer version at this step, use the vinfra software-updates reset command.

  3. Click Release notes to read the release notes.

  4. Select components that you want to be updated:

    • To update cluster nodes, select the desired cluster nodes.
    • To update management nodes, select all of the management nodes and those cluster nodes that require an update.
    • To update the management panel, select this component and all of the management nodes if they require an update.
  5. Click Update to continue.
  6. When upgrading to a new major version, review the upgrade notes, and then click Next.
  7. If you have selected nodes that require a reboot or maintenance, do the following:

    1. Decide whether these nodes will enter the maintenance mode. Select Maintenance mode, if you want to place the nodes in the maintenance mode.
    2. If you have selected nodes with the compute service, choose how to migrate virtual machines running on these nodes:

      • With the option Ignore VMs that cannot be live migrated, VMs from a node that enters the maintenance mode will be live migrated to other compute nodes. VMs that cannot be live migrated will be ignored. This applies to suspended VMs, VMs with vGPU or PCI devices attached, or if other compute nodes have insufficient vCPU or RAM resources. During the node update, ignored VMs will be stopped, resulting in downtime. They will be started automatically once the update is complete. After the node exits maintenance, the migrated VMs will not be moved back on the node.
      • With the option Ignore VMs that cannot be or failed to be live migrated, VMs from a node that enters the maintenance mode will be live migrated to other compute nodes. VMs that cannot be live migrated will be ignored. This applies to suspended VMs, VMs with vGPU or PCI devices attached, or if other compute nodes have insufficient vCPU or RAM resources. During the node update, ignored VMs and VMs that failed to be live migrated will be stopped, resulting in downtime. They will be started automatically once the update is complete. After the node exits maintenance, the migrated VMs will not be moved back on the node.
      • With the option Live migrate all VMs, all of the VMs from a node that enters the maintenance mode will be live migrated to other compute nodes. After the node exits maintenance, the migrated VMs will not be moved back on the node.
  8. Review the selected components, and then click Install. The system will start update eligibility checks. In case of a major release, the system will also check for removed and unmaintained hardware:

    • Removed hardware means that its support has been discontinued. If removed hardware is detected, review its details. If your hardware is detected correctly, the upgrade will fail because such devices are no longer available in the new version. Otherwise, you can force the upgrade by clicking Force update.
    • Unmaintained hardware includes devices (drivers and adapters) that are no longer being tested and updated in the new version. If unmaintained hardware is detected, you cannot continue the upgrade because such devices should no longer be used in production.

    You can export the list of detected hardware to a JSON file by clicking Export to file.

Do not perform any cluster configuration tasks in the admin panel or command-line interface during the update, as this will lead to an update failure and cluster downtime.

While the updates are being installed, you can pause or cancel the process. After the update is complete, the component statuses will change to Up to date.

If the update fails, click Details to view the issue details and decide how to proceed. You can cancel the update, solve the issues, and retry updating without downtime. Alternatively, you can force the update without putting the nodes into maintenance. However, this will cause a downtime of workloads running on them.

Command-line interface

Use the following commands:

  1. Check if there are updates for the storage cluster:

    # vinfra software-updates check-for-updates
  2. View the results of the check-up:

    # vinfra software-updates status
    +---------------------------+--------------------------------------------+
    | Field                     | Value                                      |
    +---------------------------+--------------------------------------------+
    | available_storage_release | release: '127'                             |
    |                           | version: 5.2.0                             |
    | control_plane             | available_storage_release:                 |
    |                           |   release: '127'                           |
    |                           |   version: 5.2.0                           |
    |                           | installed_storage_release:                 |
    |                           |   release: '206'                           |
    |                           |   version: 5.1.0                           |
    |                           | status: available                          |
    | last_check_datetime       | 2021-11-01T12:22:10.630818                 |
    | nodes                     | - available_storage_release:               |
    |                           |     release: '127'                         |
    |                           |     version: 5.2.0                         |
    |                           |   current_storage_release:                 |
    |                           |     release: '206'                         |
    |                           |     version: 5.1.0                         |
    |                           |   downloaded_storage_release: null         |
    |                           |   host: node001.vstoragedomain             |
    |                           |   id: 0175ce44-c86d-7818-3259-3182f5fd83f6 |
    |                           |   is_in_ha: false                          |
    |                           |   is_primary: true                         |
    |                           |   maintenance_required: true               |
    |                           |   orig_hostname: node001                   |
    |                           |   reboot_required: true                    |
    |                           |   status: available                        |
    |                           | - available_storage_release:               |
    |                           |     release: '127'                         |
    |                           |     version: 5.2.0                         |
    |                           |   current_storage_release:                 |
    |                           |     release: '206'                         |
    |                           |     version: 5.1.0                         |
    |                           |   downloaded_storage_release: null         |
    |                           |   host: node002.vstoragedomain             |
    |                           |   id: 923926da-a879-5f56-1b24-1462917ed335 |
    |                           |   is_in_ha: false                          |
    |                           |   is_primary: false                        |
    |                           |   maintenance_required: true               |
    |                           |   orig_hostname: node002                   |
    |                           |   reboot_required: true                    |
    |                           |   status: available                        |
    |                           | - available_storage_release:               |
    |                           |     release: '127'                         |
    |                           |     version: 5.2.0                         |
    |                           |   current_storage_release:                 |
    |                           |     release: '206'                         |
    |                           |     version: 5.1.0                         |
    |                           |   downloaded_storage_release: null         |
    |                           |   host: node003.vstoragedomain             |
    |                           |   id: ef24c47c-620d-8726-2677-ed94d853de2e |
    |                           |   is_in_ha: false                          |
    |                           |   is_primary: false                        |
    |                           |   maintenance_required: true               |
    |                           |   orig_hostname: node003                   |
    |                           |   reboot_required: true                    |
    |                           |   status: available                        |
    | status                    | available                                  |
    +---------------------------+--------------------------------------------+
    

    The output above shows that an update to build 234 is available.

  3. Download the software update:

    # vinfra software-updates download
  4. Check whether the nodes in the storage cluster are eligible for the update:

    # vinfra software-updates eligibility-check
    +---------+--------------------------------------+
    | Field   | Value                                |
    +---------+--------------------------------------+
    | task_id | 88e51115-8f0e-4c6f-b33b-949728d1fb99 |
    +---------+--------------------------------------+
    # vinfra task show 88e51115-8f0e-4c6f-b33b-949728d1fb99
    +---------+------------------------------------------------------------------+
    | Field   | Value                                                            |
    +---------+------------------------------------------------------------------+
    | details |                                                                  |
    | name    | backend.presentation.software_updates.tasks.EligibilityCheckTask |
    | result  | chunks_rebalancing_rate:                                         |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: info                                                 |
    |         | cluster_has_releasing_nodes:                                     |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | cluster_unhealthy:                                               |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | not_enough_space_on_agents:                                      |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | not_enough_space_on_mn:                                          |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | postgres_not_running:                                            |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | request_accept_eula:                                             |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | server_with_pci_devices:                                         |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: info                                                 |
    |         | shaman:                                                          |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | tgtd:                                                            |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | too_many_pending_chunks:                                         |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: info                                                 |
    | state   | success                                                          |
    | task_id | 88e51115-8f0e-4c6f-b33b-949728d1fb99                             |
    +---------+------------------------------------------------------------------+
    
  5. Start the software update procedure by running the command:

    vinfra software-updates start [--maintenance enabled={yes,no}[,key=value,…]]
                                  [--nodes <nodes>] [--skip-control-plane]
                                  
    --maintenance enabled={yes,no}[,key=value,…]>

    Specify maintenance parameters:

    • enabled: enter maintenance during the upgrade (yes or no)
    • comma-separated key=value pairs with keys (optional):

      • on-fail: choose how to proceed with the update if maintenance fails:

        • stop (default): stop the update if a node cannot enter maintenance mode. Nodes that have already been updated will remain so.
        • skip: skip and do not update nodes that cannot enter maintenance mode
        • force: forcibly update and reboot (if needed) all nodes even if they cannot enter maintenance mode. Using this option may result in downtime.
      • compute-mode: choose how to proceed with the update if a VM cannot be live migrated:

        • strict: stop the upgrade if a VM cannot be live migrated
        • ignore: ignore a VM that cannot be live migrated
        • ignore_ext: ignore a VM that cannot be or failed to be live migrated
    --nodes <nodes>
    A comma-separated list of node IDs or hostnames
    --skip-control-plane
    Update the cluster without updating the management panel.
    --accept-eula
    Accept EULA

    For example, to start updating the nodes node001, node002, and node003 and put them into maintenance, run:

    # vinfra software-updates start --nodes node001,node002,node003 \
    --maintenance enabled=yes,on-fail=skip,compute-mode=ignore
    

    Those nodes that cannot enter maintenance will be skipped. Virtual machines that cannot be live migrated during maintenance will be ignored.

To pause software updates, use the command vinfra software-updates pause. To resume the update procedure, run vinfra software-updates resume.

You can cancel the update and exit the maintenance mode for a node by using the command:

vinfra software-updates cancel [--maintenance-mode {exit,exit-keep-resources,hold}]
--maintenance-mode {exit,exit-keep-resources,hold}

Maintenance mode:

  • exit: exit maintenance for the node and return evacuated resources back on the node
  • exit-keep-resources (default): exit maintenance for the node but keep evacuated resources on another node
  • hold: do not exit maintenance

For example, to cancel the update and return the node to operation, run:

# vinfra software-updates cancel exit

The evacuated resources from this node will be moved back to it.