7    Rolling Upgrade

A rolling upgrade is a software upgrade of a cluster that is performed while the cluster is in operation. One member at a time is upgraded and returned to operation while the cluster transparently maintains a mixed-version environment for the base operating system, cluster, and Worldwide Language Support (WLS) software. Clients accessing services are not aware that a rolling upgrade is in progress.

A rolling upgrade consists of an ordered series of steps, called stages. The commands that control a rolling upgrade enforce this order.

This first part of the chapter contains instructions for performing a rolling upgrade, for displaying the status of a rolling upgrade, and for undoing one or more stages of a rolling upgrade. Those interested in how a rolling upgrade works can find the details in Section 7.6 and the sections that follow it.

This chapter discusses the following topics:

Figure 7-1 provides a simplified flow chart of the tasks and stages that are part of a rolling upgrade initiated on a Version 5.1B cluster:

Figure 7-1:  Rolling Upgrade Flow Chart

7.1    Rolling Upgrade Supported Tasks

The tasks that you can perform during a rolling upgrade depend on which versions of the base operating system and cluster software are currently running on the cluster. The main focus of this chapter is to describe the behavior of a rolling upgrade that starts on a TruCluster Server Version 5.1B cluster. However, because you may read this chapter in preparation for a rolling upgrade from TruCluster Server Version 5.1A to Version 5.1B, we point out rolling upgrade differences between the two versions.

The following list describes the basic tasks you can perform within a rolling upgrade:

Rolling in a patch kit or an NHD kit uses the same procedure as rolling in a new release of the base operating system and cluster software. The difference is which commands you run during the install stage:

Throughout this chapter, the term rolling upgrade refers to the overall procedure used to roll one or more software kits into a cluster.

As shown in Figure 7-1, you can perform more than one task during a rolling upgrade.

If the cluster is running Version 5.1A or Version 5.1B, a rolling upgrade can include the task combinations listed in Table 7-1:

Table 7-1:  Rolling Upgrade Tasks Supported by Version 5.1A and Version 5.1B

An update installation from Version 5.1A to Version 5.1B

An update installation from Version 5.1B to the next release

A patch of Version 5.1A

A patch of Version 5.1B

The installation of a New Hardware Delivery (NHD) kit onto a Version 5.1A cluster

The installation of an NHD kit onto a Version 5.1B cluster

An update installation from Version 5.1A to Version 5.1B of the base operating system and cluster software, followed by a patch of Version 5.1B

An update installation from Version 5.1B to the next release of the base operating system and cluster software followed by a patch of the next release [Footnote 5]

An NHD installation onto a Version 5.1A cluster followed by a patch of Version 5.1A

An NHD installation onto a Version 5.1B cluster followed by a patch of Version 5.1B

An update installation from Version 5.1A to Version 5.1B followed by the installation of an NHD kit for Version 5.1B

An update installation from Version 5.1B to the next release of the base operating system and cluster software followed by the installation of an NHD kit for that next release [Footnote 6]

An update installation from Version 5.1A to Version 5.1B, followed by the installation of an NHD kit for Version 5.1B, followed by a patch of Version 5.1B

An update installation from Version 5.1B to the next release, followed by the installation of an NHD kit for the next release, followed by a patch of the next release [Footnote 6]

7.2    Unsupported Tasks

The following list describes tasks that you cannot perform or that we recommend you do not attempt during a rolling upgrade:

7.3    Rolling Upgrade Procedure

In the procedure in this section, unless otherwise stated, run commands in multiuser mode. Each step that corresponds to a stage refers to the section that describes that stage in detail. We recommend that you read the detailed description of stages in Section 7.7 before performing the rolling upgrade procedure.

Some stages of a rolling upgrade take longer to complete than others. Table 7-2 lists the approximate time it takes to complete each stage.

Table 7-2:  Time Estimates for Rolling Upgrade Stages

Stage Duration
Preparation Not under program control.
Setup 45 - 120 minutes. [Footnote 7]
Preinstall 15 - 30 minutes. [Footnote 7]
Install The same amount of time it takes to run installupdate, dupatch, nhd_install, or a supported combination of these commands on a single system.
Postinstall Less than 1 minute.
Roll (per member)

Patch: less than 5 minutes.

Update installation: about the same amount of time it takes to add a member. [Footnote 8]

Switch Less than 1 minute.
Clean 30 - 90 minutes. [Footnote 7]

You can use the following procedure to upgrade a TruCluster Server Version 5.1A cluster to Version 5.1B, and to upgrade a cluster that is already at Version 5.1B.

  1. Prepare the cluster for the rolling upgrade (Section 7.7.1):

    1. Choose one cluster member to be the lead member (the first member to roll). (The examples in this procedure use a member whose memberid is 2 as the lead member. The example member's hostname is provolone.)

    2. Back up the cluster.

    3. If you will perform an update installation during the install stage, remove any blocking layered products, listed in Table 7-6, that are installed on the cluster.

    4. To determine whether the cluster is ready for an upgrade, run the clu_upgrade -v check setup lead_memberid command on any cluster member. For example:

      clu_upgrade -v check setup 2
       
      

      If a file system needs more free space, use AdvFS utilities such as addvol to add volumes to domains as needed. For disk space requirements, see Section 7.7.1. For information on managing AdvFS domains, see the Tru64 UNIX AdvFS Administration manual.

    5. Verify that each system's firmware will support the new software. Update firmware as needed before starting the rolling upgrade.

  2. Perform the setup stage (Section 7.7.2).

    Notes

    If your current cluster is at Version 5.1A or later and if you plan to upgrade the base operating system and cluster software during the install stage, mount the device or directory that contains the new TruCluster Server kit before running clu_upgrade setup. The setup command will copy the kit to the /var/adm/update/TruClusterKit directory.

    If your current cluster is at Version 5.1A or later and if you plan to install an NHD kit during the install stage, mount the device or directory that contains the new NHD kit before running clu_upgrade setup. The setup command will copy the kit to the /var/adm/update/NHDKit directory.

    On any member, run the clu_upgrade setup lead_memberid command. For example:

    clu_upgrade setup 2
     
    

    Section 7.7.2 shows the menu displayed by the clu_upgrade command.

    When the setup stage is completed, clu_upgrade prompts you to reboot all cluster members except the lead member.

  3. One at a time, reboot all cluster members except the lead member. Do not start the preinstall stage until these members are either rebooted or halted.

  4. Perform the preinstall stage (Section 7.7.3).

    On the lead member, run the following command:

    clu_upgrade preinstall
     
    

    If your current cluster is at Version 5.1A or later, the preinstall command gives you the option of verifying or not verifying the existence of the tagged files created during the setup stage.

  5. Perform the install stage (Section 7.7.4).

    Note

    During the install stage you load the new software on the lead member, in effect rolling that member. When you perform the roll stage, this new software is propagated to the remaining members of the cluster.

    The clu_upgrade command does not load software during the install stage. The loading of software is controlled by the commands you run: installupdate, dupatch, or nhd_install.

    See Table 7-1 for the list of rolling upgrade tasks and combination of tasks supported for Version 5.1A and Version 5.1B.

    1. See the Tru64 UNIX Installation Guide for detailed information on using the installupdate command.

      See the Tru64 UNIX and TruCluster Server Patch Kit Installation Instructions that came with your patch kit for detailed information on using the dupatch command.

      See the Tru64 UNIX New Hardware Delivery Release Notes and Installation Instructions that came with your NHD kit for detailed information on using the nhd_install command.

    2. If the software you are installing requires that its installation command be run from single-user mode, halt the system and boot the system to single-user mode:

      shutdown -h now
      >>> boot -fl s
       
      

      Note

      Halting and booting the system ensures that it provides the minimal set of services to the cluster and that the running cluster has a minimal reliance on the member running in single-user mode. In particular, halting the member satisfies services that require the cluster member to have a status of DOWN before completing a service failover. If you do not first halt the cluster member, services will probably not fail over as expected.

      When the system reaches single-user mode, run the following commands:

      init sbcheckrclmf reset
       
      

    3. Run the installupdate, dupatch, or nhd_install command.

      To roll in multiple patch kits, you can invoke dupatch multiple times in a single install stage. Be aware that doing so may make it difficult to isolate problems should any arise after the patch process is completed and the cluster is in use.

      You cannot run a dupatch command followed by an installupdate command. To patch the current software before you perform a rolling upgrade, you must perform two complete rolling upgrade operations: one to patch the current software, and one to perform the update installation.

  6. (Optional) After the lead member performs its final reboot with its new custom kernel, you can perform the following manual tests before you roll any additional members:

    1. Verify that the newly rolled lead member can serve the shared root (/) file system.

      1. Use the cfsmgr command to determine which cluster member is currently serving the root file system. For example:

        cfsmgr -v -a server /
         
         Domain or filesystem name = /
         Server Name = polishham
         Server Status : OK
         
        

      2. Relocate the root (/) file system to the lead member. For example:

        cfsmgr -h polishham -r -a SERVER=provolone /
         
        

    2. Verify that the lead member can serve applications to clients. Make sure that the lead member can serve all important applications that the cluster makes available to its clients.

      You decide how and what to test. We suggest that you thoroughly exercise critical applications and satisfy yourself that the lead member can serve these applications to clients before continuing the roll. For example:

      • Manually relocate CAA services to the lead member. For example, to relocate the application resource named cluster_lockd to lead member provolone:

        caa_relocate cluster_lockd -c provolone
         
        

      • Temporarily modify the default cluster alias selection priority attribute, selp, to force the lead member to serve all client requests directed to that alias. For example:

        cluamgr -a alias=DEFAULTALIAS,selp=100
         
        

        The lead member is now the end recipient for all connection requests and packets addressed to the default cluster alias.

        From another member or from an outside client, use services such as telnet and ftp to verify that the lead member can handle alias traffic. Test client access to all important services that the cluster provides.

        When you are satisfied, reset the alias attributes on the lead member to their original values.

  7. Perform the postinstall stage (Section 7.7.5).

    On the lead member, run:

    clu_upgrade postinstall
     
    

  8. Perform the roll stage (Section 7.7.6).

    Roll the members of the cluster that have not already rolled. [Footnote 9]

    You can roll multiple members simultaneously (parallel roll), subject to the restriction that the number of members not being rolled (plus the quorum disk, if one is configured) is sufficient to maintain cluster quorum.

    To roll a member, do the following:

    1. Halt the member system and boot it to single-user mode. For example:

      shutdown -h now
      >>> boot -fl s
       
      

    2. When the system reaches single-user mode, run the following commands:

      init sbcheckrclmf reset
       
      

    3. Roll the member:

      clu_upgrade roll
       
      

      If you are performing parallel rolls, use the -f option with the clu_upgrade roll command. This option causes the member to automatically reboot without first prompting for permission:

      clu_upgrade -f roll
       
      

      The roll command verifies that rolling the member will not result in a loss of quorum. If a loss of quorum will result, then the roll of the member does not occur and an error message is displayed. You can roll the member later, after one of the currently rolling members has rejoined the cluster and its quorum vote is available.

      If the roll proceeds, the member is prepared for a reboot. If you used the -f option, no prompt is displayed; the reboot occurs automatically. If you did not use the -f option, clu_upgrade displays a prompt that asks whether you want to reboot at this time. Unless you want to examine something specific before you reboot, enter yes. (If you enter yes, it may take approximately half a minute before the actual reboot occurs.)

      Perform parallel rolls to minimize the time needed to complete the roll stage. For example, on an eight-member cluster with a quorum disk, after rolling the lead member, you can roll four members in parallel.

      1. Begin the roll stage on a member. (The lead member was rolled during the install stage. You do not perform the roll stage on the lead member.)

      2. When you see a message similar to the following, begin the roll stage on the next member:

           *** Info ***
        You may now begin the roll of another cluster member.
        

        If you see a message that begins like the following, it is probably caused by the number of currently rolling members that contribute member votes.

          *** Info ***
        The current quorum conditions indicate that beginning
        a roll of another member at this time may result in
        the loss of quorum.
        
        

        In this case, you have the following options:

        • You can wait until a member completes the roll stage before you begin to roll the next member.

        • If there is an unrolled member that does not contribute member votes, you can begin the roll stage on it.

    4. Continue to roll members until all members of the cluster have rolled. Before starting each roll stage, wait until you see the message that it is all right to do so.

      When you roll the last member, you will see a message similar to the following:

        *** Info ***
      This is the last member requiring a roll.
      

    Note

    The roll actually takes place during the reboot. The clu_upgrade roll command sets up the it(8) scripts that will be run during the reboot. When you reboot, the it scripts roll the member, build a customized kernel, and then reboot again so the member will be running on its new customized kernel. When the member boots its new customized kernel, it has completed its roll and is no longer running on tagged files.

  9. Perform the switch stage (Section 7.7.7).

    After all members have rolled, run the switch command on any member.

    clu_upgrade switch
     
    

  10. One at a time, reboot each member of the cluster.

  11. Perform the clean stage (Section 7.7.8).

    Run the following command on any member to remove the tagged (.Old..) files from the cluster and complete the upgrade.

    clu_upgrade clean
     
    

7.4    Displaying the Status of a Rolling Upgrade

The clu_upgrade command provides the following options for displaying the status of a rolling upgrade. You can run status commands at any time.

Notes

During a roll, there might be two versions of the clu_upgrade command in the cluster — an older version used by members that have not yet rolled, and a newer version (if included in the update distribution or patch kit). The information that is displayed by the status command might differ depending on whether the command is run on a member that has rolled. Therefore, if you run the status command on two members, do not be surprised if the format of the displayed output is not the same.

If you run clu_upgrade status after running installupdate, clu_upgrade will display a message indicating that the install stage is complete. However, the install stage is not really complete until you run the clu_upgrade postinstall command.

7.5    Undoing a Stage

The clu_upgrade undo command provides the ability to undo a rolling upgrade that has not completed the switch stage. You can undo any stage except the switch stage and the clean stage. You must undo stages in order; for example, if you decide to undo a rolling upgrade after completing the preinstall stage, you undo the preinstall stage and then undo the setup stage.

Note

Before undoing any stage, we recommend that you read the relevant version of the Cluster Release Notes to determine whether there are restrictions related to the undoing of any stage.

To undo a stage, use the undo command with the stage that you want to undo. The clu_upgrade command determines whether the specified stage is a valid stage to undo. Table 7-3 outlines the requirements for undoing a stage:

Table 7-3:  Undoing a Stage

Stage to Undo Command Comments
Setup clu_upgrade undo setup

You must run this command on the lead member. In addition, no members can be running on tagged files when you undo the setup stage.

Before you undo the setup stage, use the clu_upgrade -v status command to determine which members are running on tagged files. Then use the clu_upgrade tagged disable memberid command to disable tagged files on those members. (See Section 7.8 for information about tagged files and the commands used to manipulate them.)

When no members are running on tagged files, run the clu_upgrade undo setup command on the lead member.

Preinstall clu_upgrade undo preinstall You must run this command on the lead member.
Install clu_upgrade undo install

You can run this command on any member except the lead member.

Halt the lead member. Then run the clu_upgrade undo install command on any member that has access to the halted lead member's boot disk. When the command completes, boot the lead member.

Postinstall clu_upgrade undo postinstall You must run this command on the lead member.
Roll clu_upgrade undo roll memberid

You can run this command on any member except the member whose roll stage will be undone.

Halt the member whose roll stage is being undone. Then run the clu_upgrade undo roll memberid command on any other member that has access to the halted member's boot disk. When the command completes, boot the halted member. The member will now be using tagged files.

7.6    Rolling Upgrade Commands

The clu_upgrade command, described in clu_upgrade(8), controls the overall flow of a rolling upgrade and ensures that the stages are run in order. During the install stage, you run one or more of installupdate, dupatch, or nhd_install to load and install software. These commands are rolling upgrade aware; they are modified to understand which actions they are allowed to take during the install and roll stages of a rolling upgrade.

When you start a rolling upgrade, the cluster is running the software from the previous release. For the first part of any rolling upgrade, you are running the clu_upgrade command that is already installed on the cluster. If a new version is installed during the rolling upgrade, there may be minor differences in the on-screen display and behavior between the two versions of the command.

The following two tables show at which stages during a rolling upgrade new versions of upgrade commands, if shipped with the kits being installed, become available during a rolling upgrade: [Footnote 10]

Table 7-4:  Stages and clu_upgrade Versions When Performing a Rolling Upgrade from Version 5.1A

Stage Version 5.1A Next Release [Footnote 11] Comments
Preparation X   The currently installed (old) version of clu_upgrade is always run in this stage.
Setup X  

The currently installed (old) version of clu_upgrade is always run in this stage.

If performing an update installation, the new version of the clu_upgrade is extracted from the TruCluster Server kit and installed at /usr/sbin/clu_upgrade, replacing the old version. Because this replacement is done before tagged files are created, all members will use the new clu_upgrade throughout the remainder of the rolling upgrade.

Preinstall   X If the rolling upgrade includes an update installation, all members use the new version of clu_upgrade installed during the setup stage. (Otherwise, members continue to run the current version of clu_upgrade.)
Install   X

If the rolling upgrade includes an update installation, all members use the version of clu_upgrade installed during the setup stage.

During the update installation, a new version of installupdate replaces the old one.

A patch kit always installs the latest version of dupatch.

If performing a patch, and if the patch kit includes a new version of clu_upgrade, the new version is installed and will be used by all cluster members starting with the postinstall stage.

Postinstall   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.
Roll   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.
Switch   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.
Clean   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.

Table 7-5:  Stages and clu_upgrade Versions When Performing a Rolling Upgrade from Version 5.1B

Stage Version 5.1B Next Release [Footnote 12] Comments
Preparation X   The currently installed (old) version of clu_upgrade is always run in this stage.
Setup X  

The currently installed (old) version of clu_upgrade is always run in this stage.

If performing an update installation, the new version of the clu_upgrade is extracted from the TruCluster Server kit and installed at /usr/sbin/clu_upgrade, replacing the old version. Because this replacement is done before tagged files are created, all members will use the new clu_upgrade throughout the remainder of the rolling upgrade.

Preinstall   X If the rolling upgrade includes an update installation, all members use the new version of clu_upgrade installed during the setup stage. (Otherwise, members continue to run the current version of clu_upgrade.)
Install   X

If the rolling upgrade includes an update installation, all members use the version of clu_upgrade installed during the setup stage.

During the update installation, a new version of installupdate replaces the old one.

A patch kit always installs the latest version of dupatch.

If performing a patch, and if the patch kit includes a new version of clu_upgrade, the new version is installed and will be used by all cluster members starting with the postinstall stage.

Postinstall   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.
Roll   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.
Switch   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.
Clean   X If a new version of clu_upgrade was installed in either the setup stage or the install stage, all members use the new version.

7.7    Rolling Upgrade Stages

The following sections describe each of the rolling upgrade stages.

Note

These sections only describe the stages. Use the procedure in Section 7.3 to perform a rolling upgrade.

7.7.1    Preparation Stage

Command Where Run Run Level
clu_upgrade -v check setup lead_memberid any member multiuser mode

During the preparation stage, you back up all important cluster data and verify that the cluster is ready for a roll. Before beginning a rolling upgrade, do the following:

  1. Choose one member of the cluster as the first member to roll. This member, known as the lead member, must have direct access to the root (/), /usr, /var, and, if used, i18n file systems.

    Make sure that the lead member can run any critical applications. You can test these applications after you update this member during the install stage, but before you roll any other members. If a problem occurs, you can try to resolve it on this member before you continue. If you cannot resolve a problem, you can undo the rolling upgrade and return the cluster to its pre-roll state. (Section 7.5 describes how to undo rolling upgrade stages.)

  2. Back up the clusterwide root (/), /usr, and /var file systems, including all member-specific files in these file systems. If the cluster has a separate i18n file system, back up that file system. In addition, back up any other file systems that contain critical user or application data.

    Note

    If you perform an incremental or full backup of the cluster during a rolling upgrade, make sure to perform the backup on a member that is not running on tagged files. If you back up from a member that is using tagged files, you will only back up the contents of the .Old.. files. Because the lead member never uses tagged files, you can back up the cluster from the lead member (or any other member that has rolled) during a rolling upgrade.

    Most sites have automated backup procedures. If you know that an automatic backup will take place while the cluster is in the middle of a rolling upgrade, make sure that backups are done on the lead member or on a member that has rolled.

  3. If you plan to run the installupdate command in the install stage, remove any blocking layered products listed in Table 7-6 that are installed on the cluster.

  4. Run the clu_upgrade -v check setup lead_memberid command, which verifies the following information:

  5. Verify that each system's firmware will support the new software. Update firmware as needed before starting the rolling upgrade.

A cluster can continue to operate during a rolling upgrade because two copies exist of the operating system and cluster software files. (Only one copy exists of shared configuration files so that changes made by any member are visible to all members.) This approach makes it possible to run two different versions of the base operating system and the cluster software at the same time in the same cluster. The trade-off is that, before you start an upgrade, you must make sure that there is adequate free space in each of the clusterwide root (/), /usr, and /var file systems, and, if a separate domain exists for the Worldwide Language Support (WLS) subsets, in the i18n file system.

A rolling upgrade has the following disk space requirements:

If a file system needs more free space, use AdvFS utilities such as addvol to add volumes to domains as needed. For information on managing AdvFS domains, see the Tru64 UNIX AdvFS Administration manual. (The AdvFS Utilities require a separate license.) You can also expand the clusterwide root (/) domain.

Note

The clu_upgrade command verifies whether sufficient space exists at the start of a rolling upgrade. However, nothing prevents a cluster member from consuming disk space during a rolling upgrade, thus creating a situation where a later stage might not have enough disk space.

Disk space is dynamic. If you know that a member will be consuming disk space during a rolling upgrade, add additional space before you start the upgrade.

7.7.2    Setup Stage

Command Where Run Run Level
clu_upgrade setup lead_memberid any member multiuser mode

The setup stage performs the clu_upgrade check setup command, creates tagged files, and prepares the cluster for the roll.

The clu_upgrade setup lead_memberid command performs the following tasks:

7.7.3    Preinstall Stage

Command Where Run Run Level
clu_upgrade preinstall lead member multiuser mode

The purpose of the preinstall stage is to verify that the cluster is ready for the lead member to run one or more of the installupdate, dupatch, or nhd_install commands.

The clu_upgrade preinstall command performs the following tasks:

7.7.4    Install Stage

Command Where Run Run Level
installupdate lead member single-user mode
dupatch lead member single-user or multiuser mode
nhd_install lead member single-user mode

If your current cluster is running TruCluster Server Version 5.1B or Version 5.1A, you can perform one of the tasks or combinations of tasks listed in Table 7-1.

The install stage starts when the clu_upgrade preinstall command completes, and continues until you run the clu_upgrade postinstall command.

Note

If you run clu_upgrade status after running installupdate, clu_upgrade displays a message indicating that the install stage is complete. However, the install stage is not really complete until you run the clu_upgrade postinstall command.

The lead member must be in single-user mode to run the installupdate command or the nhd_install command; single-user mode is recommended for the dupatch command. When taking the system to single-user mode, you must halt the system and then boot it to single-user mode.

When the system is in single-user mode, run the init s, bcheckrc, and lmf reset commands before you run the installupdate, dupatch, or nhd_install commands. See the Tru64 UNIX Installation Guide, the Tru64 UNIX and TruCluster Server Patch Kit Installation Instructions, and the Tru64 UNIX New Hardware Delivery Release Notes and Installation Instructions for information on how to use these commands.

Notes

You can run the dupatch command multiple times in order to install multiple patches. Doing so may make isolating problems difficult if any arise after the patch process is completed and the cluster is in use.

During the install stage, you cannot run a dupatch command followed by an installupdate command. To patch the current software before you perform a rolling upgrade, you must perform two complete rolling upgrade operations: one to patch the current software, and one to perform the update installation.

If an NHD installation is part of a rolling upgrade that includes an update installation, you do not have to manually run nhd_install; the installupdate command will install the NHD kit. Otherwise, use the nhd_install command copied by clu_upgrade during the setup stage: /var/adm/update/NHDKit/nhd_install.

7.7.5    Postinstall Stage

Command Where Run Run Level
clu_upgrade postinstall lead member multiuser mode

The postinstall stage verifies that the lead member has completed an update installation, a patch, or an NHD installation. If an update installation was performed, clu_upgrade postinstall verifies that the lead member has rolled to the new version of the base operating system.

7.7.6    Roll Stage

Command Where Run Run Level
clu_upgrade roll member being rolled single-user mode

The lead member was upgraded in the install stage. The remaining members are upgraded in the roll stage.

In many cluster configurations, you can roll multiple members in parallel and shorten the time required to upgrade the cluster. The number of members rolled in parallel is limited only by the requirement that the members not being rolled (plus the quorum disk, if one is configured) have sufficient votes to maintain quorum. Parallel rolls can be performed only after the lead member is rolled.

The clu_upgrade roll command performs the following tasks:

Note

If you need to add a member to the cluster during a rolling upgrade, you must add the member from a member that has completed its roll.

If a member goes down (and cannot be repaired and rebooted) before all members have rolled, you must delete the member to complete the roll of the cluster. However, if you have rolled all members but one, and this member goes down before it has rebooted in the roll stage, you must delete this member and then reboot any other member of the cluster. (The clu_upgrade command runs during reboot and tracks the number of members rolled versus the number of members currently in the cluster; clu_upgrade marks the roll stage as completed when the two values are equal. That is why, in the case where you have rolled all members except one, deleting the unrolled member and rebooting another member completes the roll stage and lets you continue the rolling upgrade.)

7.7.7    Switch Stage

Command Where Run Run Level
clu_upgrade switch any member

multiuser mode

All members must be up and running [Footnote 13]

The switch stage sets the active version of the software to the new version, which results in turning on any new features that had been deliberately disabled during the rolling upgrade. (See Section 7.9 for a description of active version and new version.)

The clu_upgrade switch command performs the following tasks:

Note

After the switch stage completes, you must reboot each member of the cluster, one at a time.

7.7.8    Clean Stage

Command Where Run Run Level
clu_upgrade clean any member multiuser mode

The clean stage removes the tagged (.Old..) files from the cluster and completes the upgrade.

The clu_upgrade clean command performs the following tasks:

7.8    Tagged Files

A rolling upgrade updates the software on one cluster member at a time. To support two versions of software within the cluster during a roll, clu_upgrade creates a set of tagged files in the setup stage.

A tagged file is a copy of a current file with .Old.. prepended to the copy filename, and an AdvFS property (DEC_VERSION_TAG) set on the copy. For example, the tagged file for the vdump command is named /sbin/.Old..vdump. Because tagged files are created in the same file system as the original files, you must have adequate free disk space before beginning a rolling upgrade.

Whether a member is running on tagged files is controlled by that member's sysconfigtab rolls_ver_lookup variable. The upgrade commands set the value to 1 when a member must run on tagged files, and to 0 when a member must not run on tagged files.

If a member's sysconfigtab rolls_ver_lookup attribute is set to 1, pathname resolution includes determining whether a specified filename has a .Old..filename copy and whether the copy has the DEC_VERSION_TAG property set on it. If both conditions are met, the requested file operation is transparently diverted to use the .Old..filename version of the file. Therefore, if the vdump command is issued on a member that has not rolled, the /sbin/.Old..vdump file is executed; if the command is issued on a member that has rolled, the /sbin/vdump file is executed. The only member that never runs on tagged files is the lead member (the first member to roll).

Note

File system operations on directories are not bound by this tagged file restraint. For example, an ls of a directory on any cluster member during a rolling upgrade lists both versions of a file. However, the output of an ls -ail command on a member that has not rolled is different from the output on a member that has rolled. In the following examples the ls -ail command is run first on a member that has not rolled and then on a member that has rolled. (The awk utility is used to print only the inode, size, month and day timestamp, and name of each file.)

The following output from the ls command is taken from a cluster member running with tags before it has rolled. The tagged files are the same as their untagged counterparts (same inode, size, and timestamp). When this member runs the hostname command, it runs the tagged version (inode 3643).

# cd /sbin
# ls -ail hostname .Old..hostname ls .Old..ls init .Old..init |\
awk '{printf("%d\t%d\t%s %s\t%s\n",$1,$6,$7,$8,$10)}`
 
 3643   16416    Aug 24   .Old..hostname
 3648   395600   Aug 24   .Old..init
 3756   624320   Aug 24   .Old..ls
 3643   16416    Aug 24   hostname
 3648   395600   Aug 24   init
 3756   624320   Aug 24   ls
 

The following output from the ls command is taken from a cluster member running without tags after it has rolled. The tagged files now differ from their untagged counterparts (different inode, size, and timestamp). When this member runs the hostname command, it runs the non-tagged version (inode 1370).

# cd /sbin
# ls -ail hostname .Old..hostname ls .Old..ls init .Old..init |\
awk '{printf("%d\t%d\t%s %s\t%s\n",$1,$6,$7,$8,$10)}`
 
 3643   16416    Aug 24   .Old..hostname
 3648   395600   Aug 24   .Old..init
 3756   624320   Aug 24   .Old..ls
 1187   16528    Mar 12   hostname
 1370   429280   Mar 12   init
 1273   792640   Mar 12   ls
 

After you create tagged files in the setup stage, we recommend that you run any administrative command, such as tar, from a member that has rolled. You can always run commands on the lead member because it never runs on tagged files.

The following rules determine which files have tagged files automatically created for them in the setup stage:

The clu_upgrade command provides several tagged command options to manipulate tagged files: check, add, remove, enable, and disable. When dealing with tagged files, take the following into consideration:

7.9    Version Switch

A version switch manages the transition of the active version to the new version of an operating system. The active version is the one that is currently in use. The purpose of a version switch in a cluster is to prevent the introduction of potentially incompatible new features until all members have been updated. For example, if a new version introduces a change to a kernel structure that is incompatible with the current structure, you do not want cluster members to use the new structure until all members have updated to the version that supports it.

At the start of a rolling upgrade, each member's active version is the same as its new version. When a member rolls, its new version is updated. After all members have rolled, the switch stage sets the active version to the new version on all members. At the completion of the upgrade, all members' active versions are again the same as their new versions. The following simple example uses an active version of 1 and a new version of 2 to illustrate the version transitions during a rolling upgrade:

All members at start of roll:   active (1)  = new (1)
Each member after its roll:     active (1) != new (2)
All members after switch stage: active (2)  = new (2)
 

The clu_upgrade command uses the versw command, which is described in versw(8), to manage version transitions. The clu_upgrade command manages all the version switch activity when rolling individual members. In the switch stage, after all members have rolled, the following command completes the transition to the new software:

clu_upgrade switch
 

7.10    Rolling Upgrade and Layered Products

This section discusses the interaction of layered products and rolling upgrades:

7.10.1    General Guidelines

The clu_upgrade setup command prepares a cluster for a rolling upgrade of the operating system. Do not use the setld command to load software onto the cluster between performing the clu_upgrade setup command and rolling the first cluster member to the new version. If you install software between performing the clu_upgrade setup command and rolling a cluster member to the new version, the new files will not have been processed by clu_upgrade setup. As a result, when you roll the first cluster member, these new files will be overwritten.

If you must load software:

7.10.2    Blocking Layered Products

A blocking layered product is a product that prevents the installupdate command from completing. Blocking layered products must be removed from the cluster before starting a rolling upgrade that will include running the installupdate command. You do not have to remove blocking layered products when performing a rolling upgrade solely to patch the cluster or install an NHD kit.

Table 7-6 lists blocking layered products for this release.

Table 7-6:  Blocking Layered Products

Product Code Description
3X0 Open3D
4DT Open3D
ATM Atom Advanced Developers Kit
DCE Distributed Computing Environment
DNA DECnet
DTA Developer's Toolkit (Program Analysis Tools)
DTC Developer's Toolkit (C compiler)
MME Multimedia Services
O3D Open 3D
PRX PanoramiX Advanced Developers Kit

Notes

The three-letter product codes are the first three letters of subset names. For example, a subset named ATMBASExxx is part of the ATM product (Atom Advanced Developers Kit), which is a blocking layered product. However, a subset named OSFATMBINxxx contains the letters ATM, but the subset is not part of a blocking layered product; it is a subset in the OSF product (the base operating system).

When a blocking layered product is removed as part of the rolling upgrade, it is removed for all members. Any services that rely on the blocking product will not be available until the roll completes and the blocking layered product is reinstalled.

7.11    Rolling Upgrade and RIS

When performing the install stage of a rolling upgrade, you can load the base operating system subsets from a CD-ROM or from a Remote Installation Services (RIS) server.

Note

You can use RIS only to load the base operating system subsets.

To use RIS, you must register both the lead member and the default cluster alias with the RIS server. When registering for operating system software, you must provide a hardware address for each host name. Therefore, you must create a hardware address for the default cluster alias in order to register the alias with the RIS server. (RIS will reject an address that is already in either of the RIS server's /etc/bootptab or /var/adm/ris/clients/risdb files.)

If your cluster uses the cluster alias virtual MAC (vMAC) feature, register that virtual hardware address with the RIS server as the default cluster alias's hardware address. If your cluster does not use the vMAC feature, you can still use the algorithm that is described in the vMAC section of the Cluster Administration manual to manually create a hardware address for the default cluster alias.

A vMAC address consists of a prefix (the default is AA:01) followed by the IP address of the alias in hexadecimal format. For example, the default vMAC address for the default cluster alias deli whose IP address is 16.140.112.209 is AA:01:10:8C:70:D1. The address is derived in the following manner:

        Default vMAC prefix:       AA:01
        Cluster Alias IP Address:  16.140.112.209
        IP address in hex. format: 10.8C.70.D1
        vMAC for this alias:       AA:01:10:8C:70:D1
 

Another method for creating a hardware address is to append an arbitrary string of eight hexadecimal numbers to the default vMAC prefix, AA:01. For example, AA:01:00:00:00:00. Make sure that the address is unique within the area served by the RIS server. If you have more than one cluster, remember to increment the arbitrary hexadecimal string when adding the next alias. (The vMAC algorithm is useful because it creates an address that has a high probability of being unique within your network.)