OpenVMS Cluster Systems

Document revision date: 15 July 2002

OpenVMS Cluster Systems

Contents

Index

8.6.8 Rebooting Satellites Configured with OpenVMS on a Local Disk

Satellite nodes can be set up to reboot automatically when recovering from system failures or power failures.

Reboot behavior varies from system to system. Many systems provide a console variable that allows you to specify which device to boot from by default. However, some systems have predefined boot "sniffers" that automatically detect a bootable device. The following table describes the rebooting conditions.

IF... AND... THEN...

If your system does not allow you to specify the boot device for automatic reboot (that is, it has a boot sniffer) An operating system is installed on the system's local disk That disk will be booted in preference to requesting a satellite MOP load. To avoid this, you should take one of the measures in the following list before allowing any operation that causes an automatic reboot---for example, executing SYS$SYSTEM:SHUTDOWN.COM with the REBOOT option or using CLUSTER_CONFIG.COM to add that satellite to the cluster:

Rename the directory file ddcu:[000000]SYS0.DIR on the local disk to ddcu:[000000]SYS x.DIR (where SYS x is a root other than SYS0, SYSE, or SYSF). Then enter the DCL command SET FILE/REMOVE as follows to remove the old directory entry for the boot image SYSBOOT.EXE:
$ RENAME DUA0:[000000]SYS0.DIR DUA0:[000000]SYS1.DIR
$ SET FILE/REMOVE DUA0:[SYSEXE]SYSBOOT.EXE

+On VAX systems, for subsequent reboots of VAX computers from the local disk, enter a command in the format B/ x0000000 at the console-mode prompt (>>>). For example:
>>> B/10000000

Disable the local disk. For instructions, refer to your computer-specific installation and operations guide. Note that this option is not available if the satellite's local disk is being used for paging and swapping.

IF...	AND...	THEN...
If your system does not allow you to specify the boot device for automatic reboot (that is, it has a boot sniffer)	An operating system is installed on the system's local disk	That disk will be booted in preference to requesting a satellite MOP load. To avoid this, you should take one of the measures in the following list before allowing any operation that causes an automatic reboot---for example, executing SYS$SYSTEM:SHUTDOWN.COM with the REBOOT option or using CLUSTER_CONFIG.COM to add that satellite to the cluster: Rename the directory file ddcu:[000000]SYS0.DIR on the local disk to ddcu:[000000]SYS x.DIR (where SYS x is a root other than SYS0, SYSE, or SYSF). Then enter the DCL command SET FILE/REMOVE as follows to remove the old directory entry for the boot image SYSBOOT.EXE: $ RENAME DUA0:[000000]SYS0.DIR DUA0:[000000]SYS1.DIR $ SET FILE/REMOVE DUA0:[SYSEXE]SYSBOOT.EXE +On VAX systems, for subsequent reboots of VAX computers from the local disk, enter a command in the format B/ x0000000 at the console-mode prompt (>>>). For example: >>> B/10000000 Disable the local disk. For instructions, refer to your computer-specific installation and operations guide. Note that this option is not available if the satellite's local disk is being used for paging and swapping.

+VAX specific

8.7 Running AUTOGEN with Feedback

AUTOGEN includes a mechanism called feedback. This mechanism examines data collected during normal system operations, and it adjusts system parameters on the basis of the collected data whenever you run AUTOGEN with the feedback option. For example, the system records each instance of a disk server waiting for buffer space to process a disk request. Based on this information, AUTOGEN can size the disk server's buffer pool automatically to ensure that sufficient space is allocated.

Execute SYS$UPDATE:AUTOGEN.COM manually as described in the OpenVMS System Manager's Manual.

8.7.1 Advantages

To ensure that computers are configured adequately when they first join the cluster, you can run AUTOGEN with feedback automatically as part of the initial boot sequence. Although this step adds an additional reboot before the computer can be used, the computer's performance can be substantially improved.

Compaq strongly recommends that you use the feedback option. Without feedback, it is difficult for AUTOGEN to anticipate patterns of resource usage, particularly in complex configurations. Factors such as the number of computers and disks in the cluster and the types of applications being run require adjustment of system parameters for optimal performance.

Compaq also recommends using AUTOGEN with feedback rather than the SYSGEN utility to modify system parameters, because AUTOGEN:

Uses parameter changes in MODPARAMS.DAT and AGEN$ files. (Changes recorded in MODPARAMS.DAT are not lost during updates to the OpenVMS operating system.)
Reconfigures other system parameters to reflect changes.

8.7.2 Initial Values

When a computer is first added to an OpenVMS Cluster, system parameters that control the computer's system resources are normally adjusted in several steps, as follows:

The cluster configuration command procedure (CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM) sets initial parameters that are adequate to boot the computer in a minimum environment.
When the computer boots, AUTOGEN runs automatically to size the static operating system (without using any dynamic feedback data), and the computer reboots into the OpenVMS Cluster environment.
After the newly added computer has been subjected to typical use for a day or more, you should run AUTOGEN with feedback manually to adjust parameters for the OpenVMS Cluster environment.
At regular intervals, and whenever a major change occurs in the cluster configuration or production environment, you should run AUTOGEN with feedback manually to readjust parameters for the changes.

Because the first AUTOGEN operation (initiated by either CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM) is performed both in the minimum environment and without feedback, a newly added computer may be inadequately configured to run in the OpenVMS Cluster environment. For this reason, you might want to implement additional configuration measures like those described in Section 8.7.3 and Section 8.7.4.

8.7.3 Obtaining Reasonable Feedback

When a computer first boots into an OpenVMS Cluster, much of the computer's resource utilization is determined by the current OpenVMS Cluster configuration. Factors such as the number of computers, the number of disk servers, and the number of disks available or mounted contribute to a fixed minimum resource requirements. Because this minimum does not change with continued use of the computer, feedback information about the required resources is immediately valid.

Other feedback information, however, such as that influenced by normal user activity, is not immediately available, because the only "user" has been the system startup process. If AUTOGEN were run with feedback at this point, some system values might be set too low.

By running a simulated user load at the end of the first production boot, you can ensure that AUTOGEN has reasonable feedback information. The User Environment Test Package (UETP) supplied with your operating system contains a test that simulates such a load. You can run this test (the UETP LOAD phase) as part of the initial production boot, and then run AUTOGEN with feedback before a user is allowed to log in.

To implement this technique, you can create a command file like that in step 1 of the procedure in Section 8.7.4, and submit the file to the computer's local batch queue from the cluster common SYSTARTUP procedure. Your command file conditionally runs the UETP LOAD phase and then reboots the computer with AUTOGEN feedback.

8.7.4 Creating a Command File to Run AUTOGEN

As shown in the following sample file, UETP lets you specify a typical user load to be run on the computer when it first joins the cluster. The UETP run generates data that AUTOGEN uses to set appropriate system parameter values for the computer when rebooting it with feedback. Note, however, that the default setting for the UETP user load assumes that the computer is used as a timesharing system. This calculation can produce system parameter values that might be excessive for a single-user workstation, especially if the workstation has large memory resources. Therefore, you might want to modify the default user load setting, as shown in the sample file.

Follow these steps:

Create a command file like the following:

$! $! ***** SYS$COMMON:[SYSMGR]UETP_AUTOGEN.COM ***** $! $! For initial boot only, run UETP LOAD phase and $! reboot with AUTOGEN feedback. $! $ SET NOON $ SET PROCESS/PRIVILEGES=ALL $! $! Run UETP to simulate a user load for a satellite $! with 8 simultaneously active user processes. For a $! CI connected computer, allow UETP to calculate the load. $! $ LOADS = "8" $ IF F$GETDVI("PAA0:","EXISTS") THEN LOADS = "" $ @UETP LOAD 1 'loads' $! $! Create a marker file to prevent resubmission of $! UETP_AUTOGEN.COM at subsequent reboots. $! $ CREATE SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE $! $! Reboot with AUTOGEN to set SYSGEN values. $! $ @SYS$UPDATE:AUTOGEN SAVPARAMS REBOOT FEEDBACK $! $ EXIT

Edit the cluster common SYSTARTUP file and add the following commands at the end of the file. Assume that queues have been started and that a batch queue is running on the newly added computer. Submit UETP_AUTOGEN.COM to the computer's local batch queue.

$! $ NODE = F$GETSYI("NODE") $ IF F$SEARCH ("SYS$SPECIFIC:[SYSMGR]UETP_AUTOGEN.DONE") .EQS. "" $ THEN $ SUBMIT /NOPRINT /NOTIFY /USERNAME=SYSTEST - _$ /QUEUE='NODE'_BATCH SYS$MANAGER:UETP_AUTOGEN $ WAIT_FOR_UETP: $ WRITE SYS$OUTPUT "Waiting for UETP and AUTOGEN... ''F$TIME()'" $ WAIT 00:05:00.00 ! Wait 5 minutes $ GOTO WAIT_FOR_UETP $ ENDIF $!

Note: UETP must be run under the user name SYSTEST.

Execute CLUSTER_CONFIG_LAN.COM or CLUSTER_CONFIG.COM to add the computer.

When you boot the computer, it runs UETP_AUTOGEN.COM to simulate the user load you have specified, and it then reboots with AUTOGEN feedback to set appropriate system parameter values.

Chapter 9
Building Large OpenVMS Cluster Systems

This chapter provides guidelines for building OpenVMS Cluster systems that include many computers---approximately 20 or more---and describes procedures that you might find helpful. (Refer to the OpenVMS Cluster Software Software Product Description (SPD) for configuration limitations.) Typically, such OpenVMS Cluster systems include a large number of satellites.

Note that the recommendations in this chapter also can prove beneficial in some clusters with fewer than 20 computers. Areas of discussion include:

Booting
Availability of MOP and disk servers
Multiple system disks
Shared resource availability
Hot system files
System disk space
System parameters
Network problems
Cluster alias

9.1 Setting Up the Cluster

When building a new large cluster, you must be prepared to run AUTOGEN and reboot the cluster several times during the installation. The parameters that AUTOGEN sets for the first computers added to the cluster will probably be inadequate when additional computers are added. Readjustment of parameters is critical for boot and disk servers.

One solution to this problem is to run the UETP_AUTOGEN.COM command procedure (described in Section 8.7.4) to reboot computers at regular intervals as new computers or storage interconnects are added. For example, each time there is a 10% increase in the number of computers, storage, or interconnects, you should run UETP_AUTOGEN.COM. For best results, the last time you run the procedure should be as close as possible to the final OpenVMS Cluster environment.

To set up a new, large OpenVMS Cluster, follow these steps:

Step Task

1 Configure boot and disk servers using the CLUSTER_CONFIG_LAN.COM or the CLUSTER_CONFIG.COM command procedure (described in Chapter 8).

2 Install all layered products and site-specific applications required for the OpenVMS Cluster environment, or as many as possible.

3 Prepare the cluster startup procedures so that they are as close as possible to those that will be used in the final OpenVMS Cluster environment.

4 Add a small number of satellites (perhaps two or three) using the cluster configuration command procedure.

5 Reboot the cluster to verify that the startup procedures work as expected.

6 After you have verified that startup procedures work, run UETP_AUTOGEN.COM on every computer's local batch queue to reboot the cluster again and to set initial production environment values. When the cluster has rebooted, all computers should have reasonable parameter settings. However, check the settings to be sure.

7 Add additional satellites to double their number. Then rerun UETP_AUTOGEN on each computer's local batch queue to reboot the cluster, and set values appropriately to accommodate the newly added satellites.

8 Repeat the previous step until all satellites have been added.

9 When all satellites have been added, run UETP_AUTOGEN a final time on each computer's local batch queue to reboot the cluster and to set new values for the production environment.

Step	Task
1	Configure boot and disk servers using the CLUSTER_CONFIG_LAN.COM or the CLUSTER_CONFIG.COM command procedure (described in Chapter 8).
2	Install all layered products and site-specific applications required for the OpenVMS Cluster environment, or as many as possible.
3	Prepare the cluster startup procedures so that they are as close as possible to those that will be used in the final OpenVMS Cluster environment.
4	Add a small number of satellites (perhaps two or three) using the cluster configuration command procedure.
5	Reboot the cluster to verify that the startup procedures work as expected.
6	After you have verified that startup procedures work, run UETP_AUTOGEN.COM on every computer's local batch queue to reboot the cluster again and to set initial production environment values. When the cluster has rebooted, all computers should have reasonable parameter settings. However, check the settings to be sure.
7	Add additional satellites to double their number. Then rerun UETP_AUTOGEN on each computer's local batch queue to reboot the cluster, and set values appropriately to accommodate the newly added satellites.
8	Repeat the previous step until all satellites have been added.
9	When all satellites have been added, run UETP_AUTOGEN a final time on each computer's local batch queue to reboot the cluster and to set new values for the production environment.

For best performance, do not run UETP_AUTOGEN on every computer simultaneously, because the procedure simulates a user load that is probably more demanding than that for the final production environment. A better method is to run UETP_AUTOGEN on several satellites (those with the least recently adjusted parameters) while adding new computers. This technique increases efficiency because little is gained when a satellite reruns AUTOGEN shortly after joining the cluster.

For example, if the entire cluster is rebooted after 30 satellites have been added, few adjustments are made to system parameter values for the 28th satellite added, because only two satellites have joined the cluster since that satellite ran UETP_AUTOGEN as part of its initial configuration.

9.2 General Booting Considerations

Two general booting considerations, concurrent booting and minimizing boot time, are described in this section.

9.2.1 Concurrent Booting

One of the rare times when all OpenVMS Cluster computers are simultaneously active is during a cluster reboot---for example, after a power failure. All satellites are waiting to reload the operating system, and as soon as a boot server is available, they begin to boot in parallel. This booting activity places a significant I/O load on the system disk or disks, interconnects, and boot servers.

For example, Table 9-1 shows a VAX system disk's I/O activity and elapsed time until login for a single satellite with minimal startup procedures when the satellite is the only one booting. Table 9-2 shows system disk I/O activity and time elapsed between boot server response and login for various numbers of satellites booting from a single system disk. The disk in these examples has a capacity of 40 I/O operations per second.

Note that the numbers in the tables are fabricated and are meant to provide only a generalized picture of booting activity. Elapsed time until login on satellites in any particular cluster depends on the complexity of the site-specific system startup procedures. Computers in clusters with many layered products or site-specific applications require more system disk I/O operations to complete booting operations.

Table 9-1 Sample System Disk I/O Activity and Boot Time for a Single VAX Satellite
Total I/O Requests to System Disk Average System Disk I/O Operations per Second Elapsed Time Until Login (minutes)

4200 6 12

**Table 9-1 Sample System Disk I/O Activity and Boot Time for a Single VAX Satellite**
Total I/O Requests to System Disk	Average System Disk I/O Operations per Second	Elapsed Time Until Login (minutes)
4200	6	12

Table 9-2 Sample System Disk I/O Activity and Boot Times for Multiple VAX Satellites
Number of Satellites I/Os Requested per Second I/Os Serviced per Second Elapsed Time Until Login (minutes)

1 6 6 12

2 12 12 12

4 24 24 12

6 36 36 12

8 48 40 14

12 72 40 21

16 96 40 28

24 144 40 42

32 192 40 56

48 288 40 84

64 384 40 112

96 576 40 168

**Table 9-2 Sample System Disk I/O Activity and Boot Times for Multiple VAX Satellites**
Number of Satellites	I/Os Requested per Second	I/Os Serviced per Second	Elapsed Time Until Login (minutes)
1	6	6	12
2	12	12	12
4	24	24	12
6	36	36	12
8	48	40	14
12	72	40	21
16	96	40	28
24	144	40	42
32	192	40	56
48	288	40	84
64	384	40	112
96	576	40	168

While the elapsed times shown in Table 9-2 do not include the time required for the boot server itself to reload, they illustrate that the I/O capacity of a single system disk can be the limiting factor for cluster reboot time.

9.2.2 Minimizing Boot Time

A large cluster needs to be carefully configured so that there is sufficient capacity to boot the desired number of nodes in the desired amount of time. As shown in Table 9-2, the effect of 96 satellites rebooting could induce an I/O bottleneck that can stretch the OpenVMS Cluster reboot times into hours. The following list provides a few methods to minimize boot times.

Careful configuration techniques
Guidelines for OpenVMS Cluster Configurations contains data on configurations and the capacity of the computers, system disks, and interconnects involved.
Adequate system disk throughput
Achieving enough system disk throughput typically requires a combination of techniques. Refer to Section 9.5 for complete information.
Sufficient network bandwidth
A single Ethernet is unlikely to have sufficient bandwidth to meet the needs of a large OpenVMS Cluster. Likewise, a single Ethernet adapter may become a bottleneck, especially for a disk server. Sufficient network bandwidth can be provided using some of the techniques listed in step 1 of Table 9-3.
Installation of only the required layered products and devices.

9.3 Booting Satellites

OpenVMS Cluster satellite nodes use a single LAN adapter for the initial stages of booting. If a satellite is configured with multiple LAN adapters, the system manager can specify with the console BOOT command which adapter to use for the initial stages of booting. Once the system is running, the OpenVMS Cluster uses all available LAN adapters. This flexibility allows you to work around broken adapters or network problems.

The procedures and utilities for configuring and booting satellite nodes are the same or vary only slightly between Alpha and VAX systems. These are described in Section 9.4.

In addition, VAX nodes can MOP load Alpha satellites, and Alpha nodes can MOP load VAX satellites. Cross-architecture booting is described in Section 10.5.

9.4 Configuring and Booting Satellite Nodes

Complete the items in the following Table 9-3 before proceeding with satellite booting.

Table 9-3 Checklist for Satellite Booting
Step Action

1 Configure disk server LAN adapters.
Because disk-serving activity in an OpenVMS Cluster system can generate a substantial amount of I/O traffic on the LAN, boot and disk servers should use the highest-bandwidth LAN adapters in the cluster. The servers can also use multiple LAN adapters in a single system to distribute the load across the LAN adapters.
The following list suggests ways to provide sufficient network bandwidth:

Select network adapters with sufficient bandwidth.
Use switches to segregate traffic and to provide increased total bandwidth.
Use multiple LAN adapters on MOP and disk servers.
Use switch or higher speed LAN, fanning out to slower LAN segments.
Use multiple independent networks.
Provide sufficient MOP and disk server CPU capacity by selecting a computer with sufficient power and by configuring multiple server nodes to share the load.

2 If the MOP server node and system-disk server node (Alpha or VAX) are not already configured as cluster members, follow the directions in Section 8.4 for using the cluster configuration command procedure to configure each of the VAX or Alpha nodes. Include multiple boot and disk servers to enhance availability and distribute I/O traffic over several cluster nodes.

3 Configure additional memory for disk serving.

4 Run the cluster configuration procedure on the Alpha or VAX node for each satellite you want to boot into the OpenVMS Cluster.

**Table 9-3 Checklist for Satellite Booting**
Step	Action
1	Configure disk server LAN adapters. Because disk-serving activity in an OpenVMS Cluster system can generate a substantial amount of I/O traffic on the LAN, boot and disk servers should use the highest-bandwidth LAN adapters in the cluster. The servers can also use multiple LAN adapters in a single system to distribute the load across the LAN adapters. The following list suggests ways to provide sufficient network bandwidth: Select network adapters with sufficient bandwidth. Use switches to segregate traffic and to provide increased total bandwidth. Use multiple LAN adapters on MOP and disk servers. Use switch or higher speed LAN, fanning out to slower LAN segments. Use multiple independent networks. Provide sufficient MOP and disk server CPU capacity by selecting a computer with sufficient power and by configuring multiple server nodes to share the load.
2	If the MOP server node and system-disk server node (Alpha or VAX) are not already configured as cluster members, follow the directions in Section 8.4 for using the cluster configuration command procedure to configure each of the VAX or Alpha nodes. Include multiple boot and disk servers to enhance availability and distribute I/O traffic over several cluster nodes.
3	Configure additional memory for disk serving.
4	Run the cluster configuration procedure on the Alpha or VAX node for each satellite you want to boot into the OpenVMS Cluster.

+VAX specific
++Alpha specific

Contents

Index

privacy and legal statement

4477PRO_017.HTML

Number of Satellites	I/Os Requested per Second	I/Os Serviced per Second	Elapsed Time Until Login (minutes)
1	6	6	12
2	12	12	12
4	24	24	12
6	36	36	12
8	48	40	14
12	72	40	21
16	96	40	28
24	144	40	42
32	192	40	56
48	288	40	84
64	384	40	112
96	576	40	168

Number of Satellites	I/Os Requested per Second	I/Os Serviced per Second	Elapsed Time Until Login (minutes)
1	6	6	12
2	12	12	12
4	24	24	12
6	36	36	12
8	48	40	14
12	72	40	21
16	96	40	28
24	144	40	42
32	192	40	56
48	288	40	84
64	384	40	112
96	576	40	168

OpenVMS Cluster Systems

8.6.8 Rebooting Satellites Configured with OpenVMS on a Local Disk

8.7 Running AUTOGEN with Feedback

8.7.1 Advantages

8.7.2 Initial Values

8.7.3 Obtaining Reasonable Feedback

8.7.4 Creating a Command File to Run AUTOGEN

Chapter 9Building Large OpenVMS Cluster Systems

9.1 Setting Up the Cluster

9.2 General Booting Considerations

9.2.1 Concurrent Booting

9.2.2 Minimizing Boot Time

9.3 Booting Satellites

9.4 Configuring and Booting Satellite Nodes

Chapter 9
Building Large OpenVMS Cluster Systems

Number of Satellites	I/Os Requested per Second	I/Os Serviced per Second	Elapsed Time Until Login (minutes)
1	6	6	12
2	12	12	12
4	24	24	12
6	36	36	12
8	48	40	14
12	72	40	21
16	96	40	28
24	144	40	42
32	192	40	56
48	288	40	84
64	384	40	112
96	576	40	168