|Document revision date: 15 July 2002|
Once your cluster is up and running, you can implement routine, site-specific maintenance operations---for example, backing up disks or adding user accounts, performing software upgrades and installations, running AUTOGEN with the feedback option on a regular basis, and monitoring the system for performance.
You should also maintain records of current configuration data, especially any changes to hardware or software components. If you are managing a cluster that includes satellite nodes, it is important to monitor LAN activity.
From time to time, conditions may occur that require the following special maintenance operations:
As a part of the regular system management procedure, you should copy operating system files, application software files, and associated files to an alternate device using the OpenVMS Backup utility.
Some backup operations are the same in an OpenVMS Cluster as they are on a single OpenVMS system. For example, an incremental back up of a disk while it is in use, or the backup of a nonshared disk.
Backup tools for use in a cluster include those listed in Table 10-1.
Use from a running system to back up:
Caution: Files open for writing at the time of the backup procedure may not be backed up correctly.
|Menu-driven or +standalone BACKUP||
Use one of the following methods:
Plan to perform the backup process regularly, according to a schedule that is consistent with application and user needs. This may require creative scheduling so that you can coordinate backups with times when user and application system requirements are low.
Reference: See the OpenVMS System Management Utilities Reference Manual: A--L for complete
information about the OpenVMS Backup utility.
10.2 Updating the OpenVMS Operating System
When updating the OpenVMS operating system, follow the steps in Table 10-2.
|1||Back up the system disk.|
|2||Perform the update procedure once for each system disk.|
|3||Install any mandatory updates.|
|4||Run AUTOGEN on each node that boots from that system disk.|
|5||Run the user environment test package (UETP) to test the installation.|
|6||Use the OpenVMS Backup utility to make a copy of the new system volume.|
Reference: See the appropriate OpenVMS upgrade and
installation manual for complete instructions.
10.2.1 Rolling Upgrades
The OpenVMS operating system allows an OpenVMS Cluster system running on multiple system disks to continue to provide service while the system software is being upgraded. This process is called a rolling upgrade because each node is upgraded and rebooted in turn, until all the nodes have been upgraded.
If you must first migrate your system from running on one system disk to running on two or more system disks, follow these steps:
|1||Follow the procedures in Section 8.5 to create a duplicate disk.|
|2||Follow the instructions in Section 5.10 for information about coordinating system files.|
These sections help you add a system disk and prepare a common user
environment on multiple system disks to make the shared system files
such as the queue database, rightslists, proxies, mail, and other files
available across the OpenVMS Cluster system.
10.3 LAN Network Failure Analysis
The OpenVMS operating system provides a sample program to help you analyze OpenVMS Cluster network failures on the LAN. You can edit and use the SYS$EXAMPLES:LAVC$FAILURE_ANALYSIS.MAR program to detect and isolate failed network components. Using the network failure analysis program can help reduce the time required to detect and isolate a failed network component, thereby providing a significant increase in cluster availability.
Reference: For a description of the network failure
analysis program, refer to Appendix D.
10.4 Recording Configuration Data
To maintain an OpenVMS Cluster system effectively, you must keep accurate records about the current status of all hardware and software components and about any changes made to those components. Changes to cluster components can have a significant effect on the operation of the entire cluster. If a failure occurs, you may need to consult your records to aid problem diagnosis.
Maintaining current records for your configuration is necessary both
for routine operations and for eventual troubleshooting activities.
10.4.1 Record Information
At a minimum, your configuration records should include the following information:
The first time you execute CLUSTER_CONFIG.COM to add a satellite, the procedure creates the file NETNODE_UPDATE.COM in the boot server's SYS$SPECIFIC:[SYSMGR] directory. (For a common-environment cluster, you must rename this file to the SYS$COMMON:[SYSMGR] directory, as described in Section 5.10.2.) This file, which is updated each time you add or remove a satellite or change its Ethernet or FDDI hardware address, contains all essential network configuration data for the satellite.
If an unexpected condition at your site causes configuration data to be lost, you can use NETNODE_UPDATE.COM to restore it. You can also read the file when you need to obtain data about individual satellites. Note that you may want to edit the file occasionally to remove obsolete entries.
Example 10-1 shows the contents of the file after satellites EUROPA and GANYMD have been added to the cluster.
|Example 10-1 Sample NETNODE_UPDATE.COM File|
$ RUN SYS$SYSTEM:NCP define node EUROPA address 2.21 define node EUROPA hardware address 08-00-2B-03-51-75 define node EUROPA load assist agent sys$share:niscs_laa.exe define node EUROPA load assist parameter $1$DJA11:<SYS10.> define node EUROPA tertiary loader sys$system:tertiary_vmb.exe define node GANYMD address 2.22 define node GANYMD hardware address 08-00-2B-03-58-14 define node GANYMD load assist agent sys$share:niscs_laa.exe define node GANYMD load assist parameter $1$DJA11:<SYS11.> define node GANYMD tertiary loader sys$system:tertiary_vmb.exe
Reference: See the DECnet--Plus documentation for
equivalent NCL command information.
10.5 Cross-Architecture Satellite Booting
Cross-architecture satellite booting permits VAX boot nodes to provide boot service to Alpha satellites and Alpha boot nodes to provide boot service to VAX satellites. For some OpenVMS Cluster configurations, cross-architecture boot support can simplify day-to-day system operation and reduce the complexity of managing OpenVMS Cluster that include both VAX and Alpha systems.
Note: Compaq will continue to provide
cross-architecture boot support while it is technically feasible. This
support may be removed in future releases of the OpenVMS operating
10.5.1 Sample Configurations
The sample configurations that follow show how you might configure an OpenVMS Cluster to include both Alpha and VAX boot nodes and satellite nodes. Note that each architecture must include a system disk that is used for installations and upgrades.
Caution: The OpenVMS operating system and layered product installations and upgrades cannot be performed across architectures. For example, OpenVMS Alpha software installations and upgrades must be performed using an Alpha system. When configuring OpenVMS Cluster systems that use the cross-architecture booting feature, configure at least one system of each architecture with a disk that can be used for installations and upgrades. In the configurations shown in Figure 10-1 and Figure 10-2, one of the workstations has been configured with a local disk for this purpose.
In Figure 10-1, several Alpha workstations have been added to an existing VAXcluster configuration that contains two VAX boot nodes based on the DSSI interconnect and several VAX workstations. For high availability, the Alpha system disk is located on the DSSI for access by multiple boot servers.
Figure 10-1 VAX Nodes Boot Alpha Satellites
In Figure 10-2, the configuration originally consisted of a VAX boot node and several VAX workstations. The VAX boot node has been replaced with a new, high-performance Alpha boot node. Some Alpha workstations have also been added. The original VAX workstations remain in the configuration and still require boot service. The new Alpha boot node can perform this service.
Figure 10-2 Alpha and VAX Nodes Boot Alpha and VAX Satellites
Consider the following guidelines when using the cross-architecture booting feature:
|alpha_system_disk or vax_system_disk||The appropriate disk name on the server|
|label||The appropriate label name for the disk on the server|
|ccc-n||The server circuit name|
|alpha or vax||The DECnet node name of the satellite|
|xx.yyyy||The DECnet area.address of the satellite|
|aa-bb-cc-dd-ee-ff||The hardware address of the LAN adapter on the satellite over which the satellite is to be loaded|
|satellite_root||The root on the system disk (for example, SYS10) of the satellite|
Example 10-2 shows how to set up a VAX system to serve a locally mounted Alpha system disk.
|Example 10-2 Defining an Alpha Satellite in a VAX Boot Node|
$! VAX system to load Alpha satellite $! $! On the VAX system: $! ----------------- $! $! Mount the system disk for MOP server access. $! $ MOUNT /SYSTEM alpha_system_disk: label ALPHA$SYSD $! $! Enable MOP service for this server. $! $ MCR NCP NCP> DEFINE CIRCUIT ccc-n SERVICE ENABLED STATE ON NCP> SET CIRCUIT ccc-n STATE OFF NCP> SET CIRCUIT ccc-n ALL NCP> EXIT $! $! Configure MOP service for the ALPHA satellite. $! $ MCR NCP NCP> DEFINE NODE alpha ADDRESS xx.yyyy NCP> DEFINE NODE alpha HARDWARE ADDRESS aa-bb-cc-dd-ee-ff NCP> DEFINE NODE alpha LOAD ASSIST AGENT SYS$SHARE:NISCS_LAA.EXE NCP> DEFINE NODE alpha LOAD ASSIST PARAMETER ALPHA$SYSD:[satellite_root.] NCP> DEFINE NODE alpha LOAD FILE APB.EXE NCP> SET NODE alpha ALL NCP> EXIT
Example 10-3 shows how to set up an Alpha system to serve a locally mounted VAX system disk.
|Example 10-3 Defining a VAX Satellite in an Alpha Boot Node|
$! Alpha system to load VAX satellite $! $! On the Alpha system: $! -------------------- $! $! Mount the system disk for MOP server access. $! $ MOUNT /SYSTEM vax_system_disk: label VAX$SYSD $! $! Enable MOP service for this server. $! $ MCR NCP NCP> DEFINE CIRCUIT ccc-n SERVICE ENABLED STATE ON NCP> SET CIRCUIT ccc-n STATE OFF NCP> SET CIRCUIT ccc-n ALL NCP> EXIT $! $! Configure MOP service for the VAX satellite. $! $ MCR NCP NCP> DEFINE NODE vax ADDRESS xx.yyyy NCP> DEFINE NODE vax HARDWARE ADDRESS aa-bb-cc-dd-ee-ff NCP> DEFINE NODE vax TERTIARY LOADER SYS$SYSTEM:TERTIARY_VMB.EXE NCP> DEFINE NODE vax LOAD ASSIST AGENT SYS$SHARE:NISCS_LAA.EXE NCP> DEFINE NODE vax LOAD ASSIST PARAMETER VAX$SYSD:[satellite_root.] NCP> SET NODE vax ALL NCP> EXIT
Then, to boot the satellite, perform these steps:
Table 10-3 shows how to define the following system logical names in the command procedure SYS$MANAGER:SYLOGICALS.COM to override the OPCOM default states.
|System Logical Name||Function|
|OPC$OPA0_ENABLE||If defined to be true, OPA0: is enabled as an operator console. If defined to be false, OPA0: is not enabled as an operator console. DCL considers any string beginning with T or Y or any odd integer to be true, all other values are false.|
Defines the operator classes to be enabled on OPA0:. The logical name
can be a search list of the allowed classes, a list of classes, or a
combination of the two. For example:
$ DEFINE/SYSTEM OP$OPA0_CLASSES CENTRAL,DISKS,TAPE
You can define OPC$OPA0_CLASSES even if OPC$OPA0_ENABLE is not defined. In this case, the classes are used for any operator consoles that are enabled, but the default is used to determine whether to enable the operator console.
|OPC$LOGFILE_ENABLE||If defined to be true, an operator log file is opened. If defined to be false, no log file is opened.|
|OPC$LOGFILE_CLASSES||Defines the operator classes to be enabled for the log file. The logical name can be a search list of the allowed classes, a comma-separated list, or a combination of the two. You can define this system logical even when the OPC$LOGFILE_ENABLE system logical is not defined. In this case, the classes are used for any log files that are open, but the default is used to determine whether to open the log file.|
|OPC$LOGFILE_NAME||Supplies information that is used in conjunction with the default name SYS$MANAGER:OPERATOR.LOG to define the name of the log file. If the log file is directed to a disk other than the system disk, you should include commands to mount that disk in the SYLOGICALS.COM command procedure.|
The following example shows how to use the OPC$OPA0_CLASSES system logical to define the operator classes to be enabled. The following command prevents SECURITY class messages from being displayed on OPA0.
$ DEFINE/SYSTEM OPC$OPA0_CLASSES CENTRAL,PRINTER,TAPES,DISKS,DEVICES, - _$ CARDS,NETWORK,CLUSTER,LICENSE,OPER1,OPER2,OPER3,OPER4,OPER5, - _$ OPER6,OPER7,OPER8,OPER9,OPER10,OPER11,OPER12
In large clusters, state transitions (computers joining or leaving the
cluster) generate many multiline OPCOM messages on a boot server's
console device. You can avoid such messages by including the DCL
command REPLY/DISABLE=CLUSTER in the appropriate site-specific startup
command file or by entering the command interactively from the system
10.7 Shutting Down a Cluster
The SHUTDOWN command of the SYSMAN utility provides five options for shutting down OpenVMS Cluster computers:
|privacy and legal statement|