5    Managing Cluster Members

This chapter discusses the following topics:

For information on the following topics that are related to managing cluster members, see the TruCluster Server Cluster Installation manual:

For information about configuring and managing your Tru64 UNIX and TruCluster Server systems for availability and serviceability, see the Tru64 UNIX Managing Online Addition and Removal manual. This manual provides users with guidelines for configuring and managing any system for higher availability, with an emphasis on those capable of Online Addition and Replacement (OLAR) management of system components.

Note

As described in the Managing Online Addition and Removal manual, the /etc/olar.config file is used to define system-specific policies and the /etc/olar.config.common file is used to define cluster-wide policies. Any settings in a system's /etc/olar.config file override clusterwide policies in the /etc/olar.config.common file for that system only.

5.1    Managing Configuration Variables

The hierarchy of the /etc/rc.config* files lets you define configuration variables consistently over all systems within a local area network (LAN) and within a cluster. Table 5-1 presents the uses of the configuration files.

Table 5-1:  /etc/rc.config* Files

File Scope
/etc/rc.config

Member-specific variables.

/etc/rc.config is a context-dependent symbolic link (CDSL). Each cluster member has a unique version of the file.

Configuration variables in /etc/rc.config override those in /etc/rc.config.common and /etc/rc.config.site.

/etc/rc.config.common

Clusterwide variables. These configuration variables apply to all members.

Configuration variables in /etc/rc.config.common override those in /etc/rc.config.site, but are overridden by those in /etc/rc.config.

/etc/rc.config.site

Sitewide variables, which are the same for all machines on the LAN.

Values in this file are overridden by any corresponding values in /etc/rc.config.common or /etc/rc.config.

By default, there is no /etc/rc.config.site. If you want to set sitewide variables, you have to create the file and copy it to /etc/rc.config.site on every participating system.

You must then edit /etc/rc.config on each participating system and add the following code just before the line that executes /etc/rc.config.common:

# Read in the cluster sitewide attributes

# before overriding them with the

# clusterwide and member-specific values.

#

./etc/rc.config.site

For more information, see rcmgr(8).

The rcmgr command accesses these variables in a standard search order (first /etc/rc.config, then /etc/rc.config.common, and finally etc/rc.config.site) until it finds or sets the specified configuration variable.

Use the -h option to get or set the run-time configuration variables for a specific member. The command then acts on /etc/rc.config, the member-specific CDSL configuration file.

To make the command act clusterwide, use the -c option. The command then acts on /etc/rc.config.common, which is the clusterwide configuration file.

If you specify neither -h nor -c, then the member-specific values in /etc/rc.config are used.

For information about member-specific configuration variables, see Appendix B.

5.2    Managing Kernel Attributes

Each member of a cluster runs its own kernel and therefore has its own /etc/sysconfigtab file. This file contains static member-specific attribute settings. Although a clusterwide /etc/sysconfigtab.cluster file exists, its purpose is different from that of /etc/rc.config.common, and it is reserved to utilities that are shipped in the TruCluster Server product.

This section presents a partial list of those kernel attributes that are provided by each TruCluster Server subsystem.

Use the following command to display the current settings of these attributes for a given subsystem:

#  sysconfig -q subsystem-name attribute-list

To get a list and the status of all the subsystems, use the following command:

# sysconfig -s

In addition to the cluster-related kernel attributes presented here, two kernel attributes are set during cluster installation. Table 5-2 lists these kernel attributes. You can increase the values for these attributes, but do not decrease them.

Table 5-2:  Kernel Attributes Not to Decrease

Attribute Value (Do Not Decrease)
vm_page_free_min 30
vm_page_free_reserved 20

Table 5-3 lists the subsystem names that are associated with each TruCluster Server component.

Table 5-3:  Configurable TruCluster Server Subsystems

Subsystem Name Component For More Information
cfs Cluster file system (CFS) sys_attrs_cfs(5)
clua Cluster alias sys_attrs_clua(5)
clubase Cluster base sys_attrs_clubase(5)
cms Cluster mount service sys_attrs_cms(5)
cnx Connection manager sys_attrs_cnx(5)
dlm Distributed lock manager sys_attrs_dlm(5)
drd Device request dispatcher sys_attrs_drd(5)
hwcc Hardware components cluster sys_attrs_hwcc(5)
icsnet Internode communications service's network service sys_attrs_icsnet(5)
ics_hl Internode communications service (ICS) high level sys_attrs_ics_hl(5)
mcs Memory Channel application programming interface (API) sys_attrs_mcs(5)
rm Memory Channel sys_attrs_rm(5)
token CFS token subsystem sys_attrs_token(5)

To tune the performance of a kernel subsystem, use one of the following methods to set one or more attributes in the /etc/sysconfigtab file:

You can also use the configuration manager framework, as described in the Tru64 UNIX System Administration manual, to change attributes and otherwise administer a cluster kernel subsystem on another host. To do this, set up the host names in the /etc/cfgmgr.auth file on the remote client system and then specify the -h option to the /sbin/sysconfig command, as in the following example:

# sysconfig -h fcbra13 -r drd drd-do-local-io=0
drd-do-local-io: reconfigured

5.3    Managing Remote Access Within and From the Cluster

An rlogin, rsh, or rcp command from the cluster uses the default cluster alias as the source address. Therefore, if a noncluster host must allow remote host access from any account in the cluster, the .rhosts file on the noncluster member must include the cluster alias name in one of the forms by which it is listed in the /etc/hosts file or one resolvable through Network Information Service (NIS) or Domain Name System (DNS).

The same requirement holds for rlogin, rsh, or rcp to work between cluster members. At cluster creation, the clu_create utility prompts for all required host names and puts them in the correct locations in the proper format. The clu_add_member does the same when a new member is added to the cluster. You do not need to edit /.rhosts to enable /bin/rsh commands from a cluster member to the cluster alias or between individual members. Do not change the generated name entries in /etc/hosts and /.rhosts.

If the /etc/hosts and /.rhosts files are configured incorrectly, many applications will not function properly. For example, the Advanced File System (AdvFS) rmvol and addvol commands use rsh when the member where the commands are executed is not the server of the domain. These commands fail if /etc/hosts or /.rhosts is configured incorrectly.

The following error indicates that the /etc/hosts or /.rhosts file has been configured incorrectly:

rsh cluster-alias date
Permission denied.
 

5.4    Shutting Down the Cluster

To halt all members of a cluster, use the -c option to the shutdown command. For example, to shut down the cluster in 5 minutes, enter the following command:

# shutdown -c +5 Cluster going down in 5 minutes
 

For information on shutting down a single cluster member, see Section 5.5.

During the shutdown grace period, which is the time between when the cluster shutdown command is entered and when actual shutdown occurs, the clu_add_member command is disabled and new members cannot be added to the cluster.

To cancel a cluster shutdown during the grace period, kill the processes that are associated with the shutdown command as follows:

  1. Get the process identifiers (PIDs) that are associated with the shutdown command. For example:

    # ps ax | grep -v grep | grep shutdown
     14680 ttyp5    I <    0:00.01 /usr/sbin/shutdown +20 going down
    

    Depending on how far along shutdown is in the grace period, ps might show either /usr/sbin/shutdown or /usr/sbin/clu_shutdown.

  2. Terminate all shutdown processes by specifying their PIDs in a kill command from any member. For example:

    # kill 14680
     
    

If you kill the shutdown processes during the grace period, the shutdown is canceled.

The shutdown -c command fails if a clu_quorum, clu_add_member, clu_delete_member, or clu_upgrade is in progress.

There is no clusterwide reboot. The shutdown -r command, the reboot command, and the halt command act only on the member on which they are executed. The halt, reboot, and init commands have been modified to leave file systems in a cluster mounted, so the cluster continues functioning when one of its members is halted or rebooted, as long as it retains quorum.

For more information, see shutdown(8).

5.5    Shutting Down and Starting One Cluster Member

When booting a member, you must boot from the boot disk that was created by the clu_add_member command. You cannot boot from a copy of the boot disk.

Shutting down a single cluster member is more complex than shutting down a standalone server. If you halt a cluster member whose vote is required for quorum (referred to as a critical voting member), the cluster will lose quorum and hang. As a result, you will be unable to enter commands from any cluster member until you reboot the halted member. Therefore, before you shut down a cluster member, you must first determine whether that member's vote is required for quorum. You must also determine whether the cluster member that you are shutting down is the only hosting member for one or more applications with a restricted placement policy.

5.5.1    Identifying a Critical Voting Member

A cluster that contains a critical voting member is either operating in a degraded mode (for example, one or more voting members or a quorum disk is down) or was not configured for availability to begin with (for example, it is a two-member configuration with each member assigned a vote). Removing a critical voting member from a cluster causes the cluster to hang and compromise availability. Before halting or deleting a cluster member, ensure that it is not supplying a critical vote.

To determine whether a member is a critical voting member, follow these steps:

  1. If possible, make sure that all voting cluster members are up.

  2. Enter the clu_quorum command and note the running values of current votes, quorum votes, and the node votes of the member in question.

  3. Subtract the member's node votes from the current votes. If the result is less than the quorum votes, the member is a critical voting member and you cannot shut it down without causing the cluster to lose quorum and hang.

5.5.2    Preparing to Halt or Delete a Critical Voting Member

Before halting or deleting a critical voting member, ensure that its votes are no longer critical to the cluster retaining quorum. The best way to do this involves restoring node votes or a quorum disk vote to the cluster without increasing expected votes. Some ways to accomplish this are:

If the cluster has an even number of votes, adding a new voting member or configuring a quorum disk can also make a critical voting member noncritical. In these cases, expected votes is incremented, but quorum votes remains the same.

5.5.3    Halting a Noncritical Member

A noncritical member, one with no vote or whose vote is not required to maintain quorum, can be shut down, halted, or rebooted like a standalone system.

Execute the shutdown command on the member to be shut down. To halt a member, enter the following command:

# shutdown -h time
 

To reboot a member, enter the following command:

# shutdown -r time
 

For information on identifying critical voting members, see Section 5.5.1.

5.5.4    Shutting Down a Hosting Member

The cluster application availability (CAA) profile for an application allows you to specify an ordered list of members, separated by white space, that can host the application resource. The hosting members list is used in conjunction with the application resource's failover policy (favored or restricted), as discussed in caa(4).

If the cluster member that you are shutting down is the only hosting member for one or more applications with a restricted placement policy, you need to specify another hosting member or the application cannot run while the member is down. You can add an additional hosting member, or replace the existing hosting member with another.

To do this, perform these steps:

  1. Verify the current hosting members and placement policy.

    # caa_profile -print resource-name
    

  2. If the cluster member that you are shutting down is the only hosting member, you can add an additional hosting member to the hosting members list, or replace the existing member.

    # caa_profile -update resource-name -h hosting-member another-hosting-member
    # caa_profile -update resource-name -h hosting-member
     
    

  3. Update the CAA registry entry with the latest resource profile.

    # caa_register -u resource-name
    

  4. Relocate the application to the other member.

    # caa_relocate resource-name -c member-name
    

5.6    Shutting Down a Cluster Member to Single-User Mode

If you need to shut down a cluster member to single-user mode, you must first halt the member and then boot it to single user-mode. Shutting down the member in this manner assures that the member provides the minimal set of services to the cluster and that the running cluster has a minimal reliance on the member running in single-user mode. In particular, halting the member satisfies services that require the cluster member to have a status of DOWN before completing a service failover. If you do not first halt the cluster member, the services do not fail over as expected.

To take a cluster member to single-user mode, use the shutdown -h command to halt the member, and then boot the member to single-user mode. When the system reaches single-user mode, run the init s, bcheckrc, and lmf reset commands. For example:

Note

Before halting a cluster member, make sure that the cluster can maintain quorum without the member's vote. Also make sure that the cluster member is not the only hosting member for one or more applications with a restricted placement policy.

/sbin/shutdown -h now
 
>>> boot -fl s
 
# /sbin/init s/sbin/bcheckrc/usr/sbin/lmf reset
 

A cluster member that is shut down to single-user mode (that is, not shut down to a halt and then booted to single-user mode as recommended) continues to have a status of UP. Shutting down a cluster member to single-user mode in this manner does not affect the voting status of the member: a member contributing a vote before being shut down to single-user mode continues contributing the vote in single-user mode.

5.7    Rebooting Cluster Members

Do not reboot all members simultaneously. If you attempt to reboot all cluster members simultaneously, one or more members will hang on the way down due to quorum loss, and the other rebooting nodes may fail to boot because they are unable to rejoin the cluster (the hung nodes).

The method you use to reboot the entire cluster depends on your intent:

5.8    Deleting a Cluster Member

The clu_delete_member command permanently removes a member from the cluster.

Caution

If you are reinstalling TruCluster Server, see the TruCluster Server Cluster Installation manual. Do not delete a member from an existing cluster and then create a new single-member cluster from the member that you just deleted. If the new cluster has the same name as the old cluster, the newly installed system might join the old cluster. This can cause data corruption.

The clu_delete_member command has the following syntax:

/usr/sbin/clu_delete_member [-f] [-m memberid]

If you do not supply a member ID, the command prompts you for the member ID of the member to delete.

The clu_delete_member command does the following:

To delete a member from the cluster, follow these steps:

  1. Determine whether or not the member is a critical voting member of the cluster. If the member supplies a critical vote to the cluster, halting it will cause the cluster to lose quorum and suspend operations. Before halting the member, use the procedure in Section 5.5 to determine whether it is safe to do so.

    You must also determine whether the cluster member is the only hosting member for one or more applications with a restricted placement policy, as described in Section 5.5.4.

  2. Halt the member to be deleted.

  3. If possible, make sure that all voting cluster members are up.

  4. Use the clu_delete_member command from another member to remove the member from the cluster. For example, to delete a halted member whose member ID is 3, enter the following command:

    # clu_delete_member -m 3
    

  5. When you run clu_delete_member and the boot disk for the member is inaccessible, the command displays a message to that effect and exits.

    The -f forces the deletion of the member even if the boot disk for the member is inaccessible. In this case, if the member being deleted is a voting member, after the member is deleted you must manually lower by one vote the expected votes for the cluster. Do this with the following command:

    # clu_quorum -e expected-votes
    

For an example of the /cluster/admin/clu_delete_member.log that results when a member is deleted, see Appendix C.

5.9    Removing a Cluster Member and Restoring It as a Standalone System

To restore a cluster member as a standalone system, follow these steps:

  1. Halt and delete the member by following the procedures in Section 5.5 and Section 5.8.

  2. Physically disconnect the halted member from the cluster, disconnecting the cluster interconnect and storage.

  3. On the halted member, select a disk that is local to the member and install Tru64 UNIX. See the Tru64 UNIX Installation Guide for information on installing system software.

For information about moving clusterized Logical Storage Manager (LSM) volumes to a noncluster system, see the Tru64 UNIX Logical Storage Manager manual.

5.10    Moving a Cluster to Another IP Subnet

This section describes how to move a cluster from one IP subnet to another IP subnet. This is usually required only when a site reconfigures its network topology such that the cluster's external IP addresses will be on a different IP subnet.

In order of increasing complexity, moving a cluster to another IP subnet might involve changing the following items:

This section provides tables for you to use when gathering information and performing the move. The tables describe the file edits required for each of the three move scenarios. Use the table that corresponds to the type of move you plan to make. Section 5.10.2 can help you gather information before starting the move, and provides a checklist you can use to keep track of your edits.

Before you apply this procedure, take note of the following requirements:

To move a cluster to another IP subnet, perform the following steps:

  1. Obtain the IP names and addresses you will need for the move. Use the tables in Section 5.10.2 to record this information. Note any changes in subnet masks required for the move. If the move will result in the cluster using different name servers, note the changes you will need to make to the /etc/resolv.conf file.

  2. If this move requires changing any physical network connections, make sure the new ones are in place and ready for use.

  3. Tell users and other cluster and network administrators when the move will occur. If other systems or clusters depend on any of the IP names and addresses you plan to change, their administrators will have to coordinate with you; for example, NIS, DNS, or mail servers might be affected. If the cluster provides any services that must not be interrupted, make preparations for another system or cluster to provide these services while the cluster is shut down.

  4. Determine where the current cluster's IP names and addresses appear in common system configuration files. One way to do this is use clu_get_info to get information about the current cluster and then uses grep to search for that information in common system configuration files.

  5. Look for any other files, such as CAA scripts, that might contain host names, IP addresses, cluster aliases, interface IP aliases, name aliases, or subnet masks.

  6. Make both save and work copies of the configuration files you plan to modify.

    Note

    If, for example, any CAA scripts or site-specific scripts reference IP addresses, host names, aliases, or subnet masks that will change, make copies of those files and keep track of the changes you make to them.

  7. Use the table that corresponds to the changes you plan to make:

    1. Using the information in the table, edit the work copies of the files (not the originals). For CDSLs, remember to edit the work files in the member-specific directories.

      Notes

      Whether you use sysman or an editor to make the edits is up to you. Making the correct edits is important. You must be 100 percent certain that the information is correct before you copy the work files to the original file locations and halt all cluster members.

      When editing the work files, if the subnet mask on the new subnet is not the same as the one for the current subnet, remember to make those edits.

      If you edit the /.rhosts or /etc/hosts.equiv work files, add new host names but do not remove those currently listed there. You will remove the old host names after you have successfully booted the cluster on the other subnet.

    2. After you have made all edits, compare the contents of the work files to the original files. Verify your edits.

  8. After all edits are made to your satisfaction:

    1. Send messages to everyone who will be affected; tell them when you plan to disable access to the cluster and start the actual move.

    2. Disable logins and all external network interfaces. For example, you can use the wall -c command to tell users how soon you will disable logins, then use the touch command to create the /etc/nologin file, and then use the rcinet stop command on each member to stop the network on that member.

    3. Copy the work files to the original file locations. Keep track of each file you replace with an edited version. (Remember to keep track of any other files you modify.)

    4. After all the edited configuration files are in place, make sure that all the information is correct. When you shut down the cluster, the success of the reboot depends on your edits. Depending on the extent of the changes required for the move, verify the correct modifications of some or all of the following items:

      • IP addresses, interface IP aliases, and subnet masks

      • Host names and name aliases

      • Default cluster alias and other cluster aliases

  9. Halt each member of the cluster. For example, on each member, run the following command:

    shutdown -h now
     
    

    Note

    Because the shutdown -c command gets some of its information from the files you just edited, you must halt each member. (If you run clu_get_info -full after the edited files are in place, some of the displayed information will reflect the edits.)

  10. When the cluster has halted, make any required changes to the network connections.

  11. Boot the cluster.

  12. Use the information in Section 5.10.3 to determine whether everything is working as expected.

  13. Remove outdated entries from /.rhosts and /etc/hosts.equiv.

5.10.1    File Edit Tables

This section provides tables that list the required edits to common system configuration files for the different types of moves:

5.10.1.1    Changing External IP Addresses Only

For common configuration files, the following table lists items that you might have to edit when changing a cluster's external IP addresses:

File Items to Edit
Shared Files
/etc/hosts IP addresses associated with each of the cluster's external interfaces and cluster aliases.
/etc/networks If it is defined, the address of the network.
/etc/resolv.conf If the cluster will use different name servers on the new subnet, their addresses.
CDSLs
/etc/clu_alias.config For each member, any alias that is defined by its IP address (and possibly a subnet mask). (You probably do not have to modify the DEFAULTALIAS entry.)
/etc/inet.local For each member, if this file is used to configure interface IP aliases.
/etc/ntp.conf For each member, if the IP address changes for a server or peer system entry.
/etc/rc.config For each member, the IFCONFIG entries. If necessary, modify subnet masks.
/etc/routes For each member, if routes are defined. If necessary, modify subnet masks.
Member-Specific But Not CDSLs
/etc/gated.conf.membern For each member, any entry whose IP address must change. If CLUAMGR_ROUTE_ARGS is set to nogated in a member's rc.config file, modify that member's /etc/gated.conf file (CDSL).
/cluster/admin/.membern.cfg If you use these files, update any changes to IP addresses. (Otherwise you will reset the cluster to the old values if you use the files with clu_create or clu_add_member.)

5.10.1.2    Changing External IP Addresses and Host Names

For common configuration files, the following table lists items that you might have to edit when changing a cluster's external IP addresses and host names:

File Items to Edit
Shared Files
/.rhosts Add the new cluster name. Do not remove any entries.
/etc/cfgmgr.auth Member host names.
/etc/hosts IP addresses, host names, and aliases associated with the cluster's external interfaces and cluster aliases.
/etc/hosts.equiv Add the new cluster name. Do not remove any entries.
/etc/networks If it is defined, the address of the network.
/etc/resolv.conf If you are changing the domain name or name servers on the new subnet.
CDSLs
/etc/clu_alias.config For each member, any alias that is defined by its host name or IP address (and possibly a subnet mask). (You probably do not have to modify the DEFAULTALIAS entry.)
/etc/inet.local For each member, if this file is used to configure interface IP aliases.
/etc/ntp.conf For each member, if the IP address or host name changes for a server or peer system entry.
/etc/rc.config For each member, the HOSTNAME, IFCONFIG, and CLUSTER_NET entries. If necessary, modify subnet masks.
/etc/routes For each member, if routes are defined. If necessary, modify subnet masks.
/etc/sysconfigtab For each member, the cluster_name and the cluster_node_name.
Member-Specific But Not CDSLs
/etc/gated.conf.membern For each member, any entry whose IP address must change. If CLUAMGR_ROUTE_ARGS is set to nogated in a member's rc.config file, modify that member's /etc/gated.conf file (CDSL).
/cluster/admin/.membern.cfg If you use these files, update any changes to cluster names, host names, and IP addresses. (Otherwise you will reset the cluster to the old values if you use the files with clu_create or clu_add_member.)

5.10.1.3    Changing External and Internal IP Addresses and Host Names

For common configuration files, the following table lists items that you might have to edit when changing a cluster's external and internal IP addresses and host names:

File Items to Edit
Shared Files
/.rhosts Add the new cluster name and the new host names associated with the cluster interconnect: *-mc0 for Version 5.0A through 5.1; *-ics0 for Version 5.1A or higher. Do not remove any entries.
/etc/cfgmgr.auth Member host names.
/etc/hosts IP addresses, host names, and aliases associated with cluster's external interfaces, cluster aliases, and cluster interconnect interfaces.
/etc/hosts.equiv Add the new cluster name and and the new host names associated with the cluster interconnect: *-mc0 for Version 5.0A through 5.1; *-ics0 for Version 5.1A or higher. Do not remove any entries.
/etc/networks If it is defined, the address of the network.
/etc/resolv.conf If you are changing the domain name or name servers on the new subnet.
CDSLs
/etc/clu_alias.config For each member, any alias that is defined by its host name or IP address (and possibly a subnet mask). (You probably do not have to modify the DEFAULTALIAS entry.)
/etc/ifaccess.conf IP addresses associated with the cluster interconnect. If necessary, modify subnet masks.
/etc/inet.local For each member, if this file is used to configure interface IP aliases.
/etc/ntp.conf For each member, if the IP address or host name changes for a server or peer system entry. (If you change the names associated with the cluster interconnect, make sure to change those peer names.)
/etc/rc.config For each member, the HOSTNAME, IFCONFIG, and CLUSTER_NET entries. If necessary, modify subnet masks.
/etc/routes For each member, if routes are defined. If necessary, modify subnet masks.
/etc/sysconfigtab For each member, the value of cluster_name, cluster_node_name, and cluster_node_inter_name.
Member-Specific But Not CDSLs
/etc/gated.conf.membern For each member, any entry whose IP address must change. If CLUAMGR_ROUTE_ARGS is set to nogated in a member's rc.config file, modify that member's /etc/gated.conf file (CDSL).
/cluster/admin/.membern.cfg If you use these files, update any changes to cluster names, host names, cluster interconnect names, and IP addresses. (Otherwise you will reset the cluster to the old values if you use the files with clu_create or clu_add_member.)

5.10.2    Attribute and Checklist Tables

Use the tables in this section to record the IP name and address information you will need to move the cluster to its new subnet. If you have more than four cluster members, or more than three cluster aliases, make copies of the pertinent tables and relabel rows as needed.

5.10.2.1    External Host Names and IP Addresses

Member Attribute Value
Member 1 Host Name Old  
New  
IP Address (and subnet mask) Old  
New  
Member 2 Host Name Old  
New  
IP Address (and subnet mask) Old  
New  
Member 3 Host Name Old  
New  
IP Address (and subnet mask) Old  
New  
Member 4 Host Name Old  
New  
IP Address (and subnet mask) Old  
New  

5.10.2.2    Cluster Name and Cluster Aliases

Cluster Alias Value
Fully qualified cluster name (the cluster name is the default cluster alias) Old  
New  
Default cluster alias IP address (and subnet mask) Old  
New  
Name of additional cluster alias #1 Old  
New  
IP Address of additional cluster alias #1 (and subnet mask) Old  
New  
Name of additional cluster alias #2 Old  
New  
IP Address of additional cluster alias #2 (and subnet mask) Old  
New  

5.10.2.3    Interface IP Aliases

Member Attribute Value
Member 1 IP Alias #1 (and subnet mask) Old  
New  
IP Alias #2 (and subnet mask) Old  
New  
Member 2 IP Alias #1 (and subnet mask) Old  
New  
IP Alias #2 (and subnet mask) Old  
New  
Member 3 IP Alias #1 (and subnet mask) Old  
New  
IP Alias #2 (and subnet mask) Old  
New  
Member 4 IP Alias #1 (and subnet mask) Old  
New  
IP Alias #2 (and subnet mask) Old  
New  

5.10.2.4    External Servers

If the cluster will use different servers for network services such as BIND, NIS, or NTP on the new subnet, record the old and new IP addresses used by these services in the following table:

Server IP Address
  Old  
New  
  Old  
New  
  Old  
New  
  Old  
New  

5.10.2.5    Checklist

Use the checklist's Status column to keep track of edits to the work copies of configuration files.

File Status
Shared Files
/.rhosts  
/etc/cfgmgr.auth  
/etc/hosts  
/etc/hosts.equiv  
/etc/networks  
/etc/resolv.conf  
CDSLs
/etc/clu_alias.config Member 1  
Member 2  
Member 3  
Member 4  
/etc/ifaccess.conf Member 1  
Member 2  
Member 3  
Member 4  
/etc/inet.local Member 1  
Member 2  
Member 3  
Member 4  
/etc/ntp.conf Member 1  
Member 2  
Member 3  
Member 4  
/etc/rc.config Member 1  
Member 2  
Member 3  
Member 4  
/etc/routes Member 1  
Member 2  
Member 3  
Member 4  
/etc/sysconfigtab Member 1  
Member 2  
Member 3  
Member 4  
Member-Specific But Not CDSLs
/etc/gated.conf.membern Member 1  
Member 2  
Member 3  
Member 4  
/cluster/admin/.membern.cfg Member 1  
Member 2  
Member 3  
Member 4  

5.10.3    Verifying Success

After you apply the procedure for moving a cluster to another IP subnet, you can verify whether it was successful.

If all edits are made correctly, and the edited files are put in their proper places, the systems will boot, form a cluster, and assume their new identities. Use the following commands to verify that the cluster and its subsystems are operating correctly:

hostnameifconfig -anetstat -iclu_get_info -full | morecluamgr -s allcaa_stat
 

You can also use standard networking commands like ping, rlogin, and rpcinfo to verify that the cluster members are available for use, will accept logins, and can communicate with other systems.

If the procedure was not successful, see Section 5.10.4 for information about identifying and solving problems.

5.10.4    Troubleshooting

If you determine that the procedure was not successful, as described in Section 5.10.3, use the following table to identify and solve problems:

Problem Possible Solutions
Member cannot boot.

If the failure occurs early in the boot path, you probably made a mistake editing the sysconfigtab file.

If you can get one member to boot, mount the boot_partition of the member that cannot boot, and fix the edits.

If no members can boot, first determine whether any members fail because they cannot gain quorum. If so, perform an interactive boot for one member and set the value of expected votes to zero. Boot that member, fix the other members' files, boot those members, and readjust the expected votes for quorum.

If you cannot boot any members interactively, boot the Tru64 UNIX operating system, mount the boot_partition for the first member of the cluster, fix the edits for that member, halt the Tru64 UNIX system, boot the cluster member (interactively, if necessary), and fix the remaining members.

Cluster boots but some network problems were encountered in multiuser mode. Define the problem and decide which files on which members are most likely to have bad edits. Fix the edits, then stop and restart network services on those systems.
Even after applying the preceding solutions, you are unsuccessful. Restore the saved copies of the original files. Restore the original network connections. Boot the cluster on its old subnet so the cluster can continue to serve clients while you figure out what went wrong.

5.11    Changing the Cluster Name or IP Address

This section describes how to change the cluster name or IP address. Because the name of the cluster is also the default cluster alias, changing the cluster name also changes the default cluster alias.

Changing the name of a cluster requires a shutdown and reboot of the entire cluster. Changing the IP address of a cluster requires that you shut down and reboot each member individually.

5.11.1    Changing the Cluster Name

To change the cluster name, follow these steps carefully. Any mistake can prevent the cluster from booting.

  1. Create a file with the new cluster_name attribute for the clubase subsystem stanza entry. For example, to change the cluster name to deli, add the following clubase subsystem stanza entry:

    clubase:
     cluster_name=deli
     
    

    Notes

    Ensure that you include a line-feed at the end of each line in the file that you create. If you do not, when the sysconfigtab file is modified, you will have two attributes on the same line. This may prevent your system from booting.

    If you create the file in the cluster root directory, you can use it on every system in the cluster without a need to copy the file.

  2. On each cluster member, use the sysconfigdb -m -f file clubase command to merge the new clubase subsystem attributes from the file that you created with the clubase subsystem attributes in the /etc/sysconfig file.

    For example, assume that the file cluster-name-change contains the information shown in the example in step 1. To use the file cluster-name-change to change the cluster name from poach to deli, use the following command:

    # sysconfigdb -m -f cluster-name-change clubase
    Warning: duplicate attribute in clubase: 
    was cluster_name = poach, now cluster_name = deli
     
    

    Caution

    Do not use the sysconfigdb -u command with a file with only one or two attributes to be changed. The -u flag causes the subsystem entry in the input file to replace a subsystem entry (for instance clubase). If you specify only the cluster_name attribute for the clubase subsystem, the new clubase subsystem will contain only the cluster_name attribute and none of the other required attributes.

  3. Change the cluster name in each of the following files:

    There is only one copy of these files in a cluster.

  4. Add the new cluster name to the /.rhosts file (which is common to all cluster members).

    Leave the current cluster name in the file. The current name is needed for the shutdown -c command in the next step to function.

    Change any client .rhosts file as appropriate.

  5. Shut down the entire cluster with the shutdown -c command and reboot each system in the cluster.

  6. Remove the previous cluster name from the /.rhosts file.

  7. To verify that the cluster name has changed, run the /usr/sbin/clu_get_info command:

    # /usr/sbin/clu_get_info
    Cluster information for cluster deli    
      
    .
    .
    .

5.11.2    Changing the Cluster IP Address

To change the cluster IP address, follow these steps:

  1. Edit the /etc/hosts file, and change the IP address for the cluster.

  2. One at a time (to keep quorum), shut down and reboot each cluster member system.

To verify that the cluster IP address has changed, run the /usr/sbin/ping command from a system that is not in the cluster to ensure that the cluster provides the echo response when you use the cluster address:

# /usr/sbin/ping -c 3 16.160.160.160
PING 16.160.160.160 (16.160.160.160): 56 data bytes
64 bytes from 16.160.160.160: icmp_seq=0 ttl=64 time=26 ms
64 bytes from 16.160.160.160: icmp_seq=1 ttl=64 time=0 ms
64 bytes from 16.160.160.160: icmp_seq=2 ttl=64 time=0 ms
 
----16.160.160.160 PING Statistics----
3 packets transmitted, 3 packets received, 0% packet loss
round-trip (ms)  min/avg/max = 0/9/26 ms
 

5.12    Changing the Member Name, IP Address, or Cluster Interconnect Address

To change the member name, member IP address, or cluster interconnect address, remove the member from the cluster and then add it back in with the desired member name or address. Before you do this, make sure that the cluster will not lose quorum and that CAA restricted placement policies are not affected.

5.12.1    Check Quorum Before Removing Member

Before you remove a member, make sure that enough voting members are operating in the cluster so that, in concert with any configured quorum disk vote, the cluster has sufficient votes to survive the deletion of the member. See Section 5.5 for information on shutting down a single cluster member.

5.12.2    Check CAA Restricted Placement Policies Before Removing Member

You must determine whether any CAA profiles use a restricted placement policy, and if so, whether the HOSTING_MEMBERS resource contains only the name of the member system whose name you want to change.

Use the /usr/sbin/caa_profile -print command to display the CAA profiles. If the application PLACEMENT resource is restricted (PLACEMENT=restricted) for an application, and the HOSTING_MEMBERS resource contains only the name of the member whose name or address is going to change, do the following:

  1. Add another member system to the list of members that can run this application by updating the application resource profile. For example, if the HOSTING_MEMBERS resource presently indicates that member provolone is restricted to run an application, add pepicelli to the HOSTING_MEMBERS resource as follows:

    # /usr/sbin/caa_profile -update resource_name -h provolone pepicelli
    

    Note

    Do not remove the name of the system whose name or address is changing.

  2. To prevent inconsistencies across cluster members, update the existing CAA registry entries with the latest resource profile, as follows:

    # /usr/sbin/caa_register resource_name -u
    

  3. Relocate the application to the system that was added to the HOSTING_MEMBERS resource:

    # /usr/sbin/caa_relocate resource_name -c pepicelli
    

5.12.3    Remove and Add the Member

Follow these steps to save the existing license data, remove and add the member, and restore the license data:

  1. Log in to the member system whose member name, member IP address, or cluster interconnect address you want to change.

  2. Use the lmf utility to reconstruct product authorization keys (PAKs) for the products that have been licensed to run on the system. The KornShell script in the following example places all of the reconstructed PAKs in the /licenses directory:

    # mkdir /licenses
    # for i in `lmf list | grep -v Product | awk '{print $1}'`
    do
    lmf issue /licenses/$i.license $i
    done
     
    

  3. Halt the member. If you have not already done so, see Section 5.5 for information on shutting down a single cluster member.

  4. On an active member of the cluster, delete the member that you just shut down. Do this by running the clu_delete_member command:

    # clu_delete_member -m memberid
     
    

    To learn the member ID of the member to be deleted, use the clu_get_info command.

    See Section 5.8 for details on using clu_delete_member.

  5. Use the clu_add_member command to add the system back into the cluster, specifying the desired member name, member IP address, and cluster interconnect address.

    For details on adding a member to the cluster, see the TruCluster Server Cluster Installation manual.

  6. When you boot genvmunix from the newly reinstalled boot disk, the new member automatically configures subsets and builds a customized kernel, then continues to boot to multiuser mode. Log in and register the saved licenses, as follows:

    # for i in /licenses/*.license
    do
    lmf register - < $i
    done
    # lmf reset
     
    

  7. Reboot the system so that it is using its customized cluster kernel:

    # shutdown -r now
    

  8. If the placement policy for any application is favored or restricted, and the cluster member system had its name changed and is listed in the HOSTING_MEMBERS resource for that application, remove the old name and add the new name to the resource, as follows:

    1. Modify the HOSTING_MEMBERS resource to remove the old name and add the new name.

    2. Update the existing CAA registry entries with the latest resource profile, as follows:

      # /usr/sbin/caa_register resource_name -u
      

5.13    Managing Software Licenses

When you add a new member to a cluster, you must register application licenses on that member for those applications that may run on that member.

For information about adding new cluster members and Tru64 UNIX licenses, see the chapter on adding members in the TruCluster Server Cluster Installation manual.

5.14    Installing and Deleting Layered Applications

The procedure to install or delete an application is usually the same for both a cluster and a standalone system. Applications can be installed once in a cluster. However, some applications require additional steps.

5.15    Managing Accounting Services

The system accounting services are not cluster-aware. The services rely on files and databases that are member-specific. Because of this, to use accounting services in a cluster, you must set up and administer the services on a member-by-member basis.

The /usr/sbin/acct directory is a CDSL. The accounting services files in /usr/sbin/acct are specific to each cluster member.

To set up accounting services on a cluster, use the following modifications to the directions in the chapter on administering system accounting services in the Tru64 UNIX System Administration manual:

  1. To enable accounting on all cluster members, enter the following command on any member:

    # rcmgr -c set ACCOUNTING YES
    

    If you want to enable accounting on only certain members, use the -h option to the rcmgr command. For example, to enable accounting on members 2, 3, and 6, enter the following commands:

    # rcmgr -h 2 set ACCOUNTING YES
    # rcmgr -h 3 set ACCOUNTING YES
    # rcmgr -h 6 set ACCOUNTING YES
     
    

  2. You must start accounting on each member. Log in to each member where you want to start accounting, and enter the following command:

    # /usr/sbin/acct/startup
    

    To stop accounting on a member, you must log in to that member and run the command /usr/sbin/acct/shutacct.

The directory /usr/spool/cron is a CDSL; the files in this directory are member-specific, and you can use them to tailor accounting on a per-member basis. To do so, log in to each member where accounting is to run. Use the crontab command to modify the crontab files as desired. For more information, see the chapter on administering the system accounting services in the Tru64 UNIX System Administration manual.

The file /usr/sbin/acct/holidays is a CDSL. Because of this, you set accounting service holidays on a per-member basis.

For more information on accounting services, see acct(8).