Installing updates

Virtuozzo Hybrid Infrastructure supports non-disruptive rolling updates. Nodes are updated one by one, with the data availability unaffected. During an update, a node that needs to be rebooted can enter the maintenance mode. In this case, workloads and virtual machines hosted on this node are migrated to other nodes. Once the node is updated, it is automatically rebooted. After the reboot, the node returns to operation and the migrated workloads and VMs are moved back on the node.

You can update different cluster components all together or separately. In either case, the components are updated in the following order:

  1. Cluster nodes are updated first.
  2. Management nodes are updated only when all of the cluster nodes are up to date. The primary management node is the last to be updated.
  3. The management panel (admin and self-service) and compute API are updated on management nodes and only when all of the nodes, both cluster and management, are up to date. While updating this component, management nodes do not require a reboot.

Limitations

  • Nodes must be updated only in the admin panel or via the vinfra tool. Do not use yum update.
  • Unassigned nodes can be updated.
  • Updates are applied to one node at a time.
  • You can only update management nodes all together and after updating all of the cluster nodes.
  • You can only update the management panel and compute API after updating all of the management and cluster nodes.
  • In a single-node deployment, the node does not enter maintenance during an update.
  • Live migration is not supported for virtual machines with attached vGPU or PCI devices.

Prerequisites

  • The storage cluster is created by following the instructions in Deploying the storage cluster.
  • Any third-party repositories are disabled.
  • The cluster is healthy and each node in the infrastructure is online.
  • The cluster DNS is configured, as described in Adding external DNS servers, and point to a DNS table to resolve external host names.

To update cluster components

Admin panel

  1. Open the Settings > Updates screen. The date of the last check is displayed in the upper right corner. Click the round arrow to check for new updates. If updates are available for a cluster component, its update status changes to Available. If a node needs to be rebooted, it has Reboot is required added next to the available version.

  2. Click Download in the upper right corner to get the updates. Wait until the updates are downloaded and the update status changes to Ready to install.
  3. Click Release notes to read the release notes.

  4. Select components that you want to be updated:

    • To update cluster nodes, select the desired cluster nodes.
    • To update management nodes, select all of the management nodes and those cluster nodes that require an update.
    • To update the management panel and compute API, select this component and all of the management nodes if they require an update.
  5. Click Update to continue.
  6. If you have selected nodes that require a reboot, do the following:

    1. Decide whether these nodes will enter the maintenance mode. Select Maintenance mode, if you want to place the nodes in the maintenance mode.
    2. If you have selected nodes with the compute service, choose how to migrate virtual machines running on these nodes:

      • With the option Ignore VMs that cannot be live migrated, VMs from a node that enters the maintenance mode will be live migrated to other compute nodes. VMs that cannot be live migrated will be ignored. This applies for VMs with vGPU or PCI devices attached, or if other compute nodes have insufficient vCPU or RAM resources. Ignored VMs will continue running until you reboot or shut down the node. In this case, they will be stopped, resulting in downtime. They will be started automatically once the node is up again.
      • With the option Ignore VMs that cannot be or failed to be live migrated, VMs from a node that enters the maintenance mode will be live migrated to other compute nodes. VMs that cannot be live migrated will be ignored. This applies for VMs with vGPU or PCI devices attached, or if other compute nodes have insufficient vCPU or RAM resources. Ignored VMs and VMs that failed to be live migrated will continue running until you reboot or shut down the node. In this case, they will be stopped, resulting in downtime. They will be started automatically once the node is up again.
      • With the option Live migrate all VMs, all of the VMs from a node that enters the maintenance mode will be live migrated to other compute nodes.
    3. Select Abort the update if the node cannot enter maintenance to stop the update if entering maintenance fails.

  7. Review the selected components, and then click Install.

While the updates are being installed, you can pause or cancel the process. After the update is complete, the component statuses will change to Up to date.

If the update fails, click Details to view the issue details and decide how to proceed. You can cancel the update, solve the issues, and retry updating without downtime. Alternatively, you can force the update without putting the nodes into maintenance. The nodes will be rebooted, potentially causing a downtime of workloads running on them.

Command-line interface

Use the following commands:

  1. Check if there are updates for the storage cluster:

    # vinfra software-updates check-for-updates
  2. View the results of the check-up:

    # vinfra software-updates status
    +---------------------------+--------------------------------------------+
    | Field                     | Value                                      |
    +---------------------------+--------------------------------------------+
    | available_storage_release | release: '234'                             |
    |                           | version: 4.7.0                             |
    | control_plane             | available_storage_release:                 |
    |                           |   release: '234'                           |
    |                           |   version: 4.7.0                           |
    |                           | installed_storage_release:                 |
    |                           |   release: '217'                           |
    |                           |   version: 4.7.0                           |
    |                           | status: available                          |
    | last_check_datetime       | 2021-11-01T12:22:10.630818                 |
    | nodes                     | - available_storage_release:               |
    |                           |     release: '234'                         |
    |                           |     version: 4.7.0                         |
    |                           |   current_storage_release:                 |
    |                           |     release: '217'                         |
    |                           |     version: 4.7.0                         |
    |                           |   downloaded_storage_release: null         |
    |                           |   host: node001.vstoragedomain             |
    |                           |   id: 0175ce44-c86d-7818-3259-3182f5fd83f6 |
    |                           |   is_in_ha: false                          |
    |                           |   is_primary: true                         |
    |                           |   orig_hostname: node001                   |
    |                           |   reboot_required: false                   |
    |                           |   status: available                        |
    |                           | - available_storage_release:               |
    |                           |     release: '234'                         |
    |                           |     version: 4.7.0                         |
    |                           |   current_storage_release:                 |
    |                           |     release: '217'                         |
    |                           |     version: 4.7.0                         |
    |                           |   downloaded_storage_release: null         |
    |                           |   host: node002.vstoragedomain             |
    |                           |   id: 923926da-a879-5f56-1b24-1462917ed335 |
    |                           |   is_in_ha: false                          |
    |                           |   is_primary: false                        |
    |                           |   orig_hostname: node002                   |
    |                           |   reboot_required: false                   |
    |                           |   status: available                        |
    |                           | - available_storage_release:               |
    |                           |     release: '234'                         |
    |                           |     version: 4.7.0                         |
    |                           |   current_storage_release:                 |
    |                           |     release: '217'                         |
    |                           |     version: 4.7.0                         |
    |                           |   downloaded_storage_release: null         |
    |                           |   host: node003.vstoragedomain             |
    |                           |   id: ef24c47c-620d-8726-2677-ed94d853de2e |
    |                           |   is_in_ha: false                          |
    |                           |   is_primary: false                        |
    |                           |   orig_hostname: node003                   |
    |                           |   reboot_required: false                   |
    |                           |   status: available                        |
    | status                    | available                                  |
    +---------------------------+--------------------------------------------+
    

    The output above shows that an update to build 234 is available.

  3. Download the software update:

    # vinfra software-updates download
  4. Check whether the nodes in the storage cluster are eligible for the update:

    # vinfra software-updates eligibility-check
    +---------+--------------------------------------+
    | Field   | Value                                |
    +---------+--------------------------------------+
    | task_id | 88e51115-8f0e-4c6f-b33b-949728d1fb99 |
    +---------+--------------------------------------+
    # vinfra task show 88e51115-8f0e-4c6f-b33b-949728d1fb99
    +---------+------------------------------------------------------------------+
    | Field   | Value                                                            |
    +---------+------------------------------------------------------------------+
    | details |                                                                  |
    | name    | backend.presentation.software_updates.tasks.EligibilityCheckTask |
    | result  | chunks_rebalancing_rate:                                         |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: info                                                 |
    |         | cluster_has_releasing_nodes:                                     |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | cluster_unhealthy:                                               |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | not_enough_space_on_agents:                                      |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | not_enough_space_on_mn:                                          |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | postgres_not_running:                                            |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | request_accept_eula:                                             |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | server_with_pci_devices:                                         |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: info                                                 |
    |         | shaman:                                                          |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | tgtd:                                                            |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: critical                                             |
    |         | too_many_pending_chunks:                                         |
    |         |   details: null                                                  |
    |         |   exception: null                                                |
    |         |   message: null                                                  |
    |         |   passed: true                                                   |
    |         |   severity: info                                                 |
    | state   | success                                                          |
    | task_id | 88e51115-8f0e-4c6f-b33b-949728d1fb99                             |
    +---------+------------------------------------------------------------------+
    
  5. Start the software update procedure by running the command:

    vinfra software-updates start [--maintenance enabled={yes,no}[,key=value,…]]
                                  [--nodes <nodes>] [--skip-control-plane]
                                  
    --maintenance enabled={yes,no}[,key=value,…]>

    Specify maintenance parameters:

    • enabled: enter maintenance during the upgrade (yes or no)
    • comma-separated key=value pairs with keys (optional):

      • on-fail: choose how to proceed with the update if maintenance fails:

        • stop (default): stop the update if a node cannot enter maintenance mode. Nodes that have already been updated will remain so.
        • skip: skip and do not update nodes that cannot enter maintenance mode
        • force: forcibly update and reboot (if needed) all nodes even if they cannot enter maintenance mode. Using this option may result in downtime.
      • compute-mode: choose how to proceed with the update if a VM cannot be live migrated:

        • strict: stop the upgrade if a VM cannot be live migrated
        • ignore: ignore a VM that cannot be live migrated
        • ignore_ext: ignore a VM that cannot be or failed to be live migrated
    --nodes <nodes>
    A comma-separated list of node IDs or hostnames
    --skip-control-plane
    Update the cluster without updating the management panel.
    --accept-eula
    Accept EULA

    For example, to start updating the nodes node001, node002, and node003 and put them into maintenance, run:

    # vinfra software-updates start --nodes node001,node002,node003 \
    --maintenance enabled=yes,on-fail=skip,compute-mode=ignore
    

    Those nodes that cannot enter maintenance will be skipped. Virtual machines that cannot be live migrated during maintenance will be ignored.

To pause software updates, use the command vinfra software-updates pause. To resume the update procedure, run vinfra software-updates resume.

You can cancel the update and exit the maintenance mode for a node by using the command:

vinfra software-updates cancel [--maintenance-mode {exit,exit-keep-resources,hold}]
--maintenance-mode {exit,exit-keep-resources,hold}

Maintenance mode:

  • exit: exit maintenance for the node and return evacuated resources back on the node
  • exit-keep-resources (default): exit maintenance for the node but keep evacuated resources on another node
  • hold: do not exit maintenance

For example, to cancel the update and return the node to operation, run:

# vinfra software-updates cancel exit

The evacuated resources from this node will be moved back to it.