OpenVMS Cluster Systems

Document revision date: 15 July 2002

OpenVMS Cluster Systems

Contents

Index

10.7.1 The NONE Option

If you select the default SHUTDOWN option NONE, the shutdown procedure performs the normal operations for shutting down a standalone computer. If you want to shut down a computer that you expect will rejoin the cluster shortly, you can specify the default option NONE. In that case, cluster quorum is not adjusted because the operating system assumes that the computer will soon rejoin the cluster.

In response to the "Shutdown options [NONE]:" prompt, you can specify the DISABLE_AUTOSTART=n option, where n is the number of minutes before autostart queues are disabled in the shutdown sequence. For more information about this option, see Section 7.13.

10.7.2 The REMOVE_NODE Option

If you want to shut down a computer that you expect will not rejoin the cluster for an extended period, use the REMOVE_NODE option. For example, a computer may be waiting for new hardware, or you may decide that you want to use a computer for standalone operation indefinitely.

When you use the REMOVE_NODE option, the active quorum in the remainder of the cluster is adjusted downward to reflect the fact that the removed computer's votes no longer contribute to the quorum value. The shutdown procedure readjusts the quorum by issuing the SET CLUSTER/EXPECTED_VOTES command, which is subject to the usual constraints described in Section 10.12.

Note: The system manager is still responsible for changing the EXPECTED_VOTES system parameter on the remaining OpenVMS Cluster computers to reflect the new configuration.

10.7.3 The CLUSTER_SHUTDOWN Option

When you choose the CLUSTER_SHUTDOWN option, the computer completes all shut down activities up to the point where the computer would leave the cluster in a normal shutdown situation. At this point the computer waits until all other nodes in the cluster have reached the same point. When all nodes have completed their shutdown activities, the entire cluster dissolves in one synchronized operation. The advantage of this is that individual nodes do not complete shutdown independently, and thus do not trigger state transitions or potentially leave the cluster without quorum.

When performing a CLUSTER_SHUTDOWN you must specify this option on every OpenVMS Cluster computer. If any computer is not included, clusterwide shutdown cannot occur.

10.7.4 The REBOOT_CHECK Option

When you choose the REBOOT_CHECK option, the shutdown procedure checks for the existence of basic system files that are needed to reboot the computer successfully and notifies you if any files are missing. You should replace such files before proceeding. If all files are present, the following informational message appears:

%SHUTDOWN-I-CHECKOK, Basic reboot consistency check completed.

Note: You can use the REBOOT_CHECK option separately or in conjunction with either the REMOVE_NODE or the CLUSTER_SHUTDOWN option. If you choose REBOOT_CHECK with one of the other options, you must specify the options in the form of a comma-separated list.

10.7.5 The SAVE_FEEDBACK Option

Use the SAVE_FEEDBACK option to enable the AUTOGEN feedback operation.

Note: Select this option only when a computer has been running long enough to reflect your typical work load.

Reference: For detailed information about AUTOGEN feedback, see the OpenVMS System Manager's Manual.

10.8 Dump Files

Whether your OpenVMS Cluster system uses a single common system disk or multiple system disks, you should plan a strategy to manage dump files.

10.8.1 Controlling Size and Creation

Dump-file management is especially important for large clusters with a single system disk. For example, on a 256 MB OpenVMS Alpha computer, AUTOGEN creates a dump file in excess of 500,000 blocks.

In the event of a software-detected system failure, each computer normally writes the contents of memory to a full dump file on its system disk for analysis. By default, this full dump file is the size of physical memory plus a small number of pages. If system disk space is limited (as is probably the case if a single system disk is used for a large cluster), you may want to specify that no dump file be created for satellites or that AUTOGEN create a selective dump file. The selective dump file is typically 30% to 60% of the size of a full dump file.

You can control dump-file size and creation for each computer by specifying appropriate values for the AUTOGEN symbols DUMPSTYLE and DUMPFILE in the computer's MODPARAMS.DAT file. Specify dump files as shown in Table 10-4.

Table 10-4 AUTOGEN Dump-File Symbols
Value Specified Result

DUMPSTYLE = 0 Full dump file created (default)

DUMPSTYLE = 1 Selective dump file created

DUMPFILE = 0 No dump file created

**Table 10-4 AUTOGEN Dump-File Symbols**
Value Specified	Result
DUMPSTYLE = 0	Full dump file created (default)
DUMPSTYLE = 1	Selective dump file created
DUMPFILE = 0	No dump file created

Caution: Although you can configure computers without dump files, the lack of a dump file can make it difficult or impossible to determine the cause of a system failure.

For example, use the following commands to modify the system dump-file size on large-memory systems:

$ MCR SYSGEN SYSGEN> USE CURRENT SYSGEN> SET DUMPSTYLE 1 SYSGEN> CREATE SYS$SYSTEM:SYSDUMP.DMP/SIZE=70000 SYSGEN> WRITE CURRENT SYSGEN> EXIT $ @SHUTDOWN

The dump-file size of 70,000 blocks is sufficient to cover about 32 MB of memory. This size is usually large enough to encompass the information needed to analyze a system failure.

After the system reboots, you can purge SYSDUMP.DMP.

10.8.2 Sharing Dump Files

Another option for saving dump-file space is to share a single dump file among multiple computers. This technique makes it possible to analyze isolated computer failures. But dumps are lost if multiple computers fail at the same time or if a second computer fails before you can analyze the first failure. Because boot server failures have a greater impact on cluster operation than do failures of other computers you should configure full dump files on boot servers to help ensure speedy analysis of problems.

VAX systems cannot share dump files with Alpha computers and vice versa. However, you can share a single dump file among multiple Alpha computers and another single dump file among VAX computers. Follow these steps for each operating system:

Step Action

1 Decide whether to use full or selective dump files.

2 Determine the size of the largest dump file needed by any satellite.

3 Select a satellite whose memory configuration is the largest of any in the cluster and do the following:

Specify DUMPSTYLE = 0 (or DUMPSTYLE = 1) in that satellite's MODPARAMS.DAT file.
Remove any DUMPFILE symbol from the satellite's MODPARAMS.DAT file.
Run AUTOGEN on that satellite to create a dump file.

4 Rename the dump file to SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP or create a new dump file named SYSDUMP-COMMON.DMP in SYS$COMMON:[SYSEXE].

5 For each satellite that is to share the dump file, do the following:

Create a file synonym entry for the dump file in the system-specific root. For example, to create a synonym for the satellite using root SYS1E, enter a command like the following:
$ SET FILE SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP -
_$ /ENTER=SYS$SYSDEVICE:[SYS1E.SYSEXE]SYSDUMP.DMP

Add the following lines to the satellite's MODPARAMS.DAT file:
DUMPFILE = 0
DUMPSTYLE = 0 (or DUMPSTYLE = 1)

6 Rename the old system-specific dump file on each system that has its own dump file:
$ RENAME SYS$SYSDEVICE:[SYS n.SYSEXE]SYSDUMP.DMP .OLD

The value of n in the command line is the root for each system (for example, SYS0 or SYS1). Rename the file so that the operating system software does not use it as the dump file when the system is rebooted.

7 Reboot each node so it can map to the new common dump file. The operating system software cannot use the new file for a crash dump until you reboot the system.

8 After you reboot, delete the SYSDUMP.OLD file in each system-specific root. Do not delete any file called SYSDUMP.DMP; instead, rename it, reboot, and then delete it as described in steps 6 and 7.

Step	Action
1	Decide whether to use full or selective dump files.
2	Determine the size of the largest dump file needed by any satellite.
3	Select a satellite whose memory configuration is the largest of any in the cluster and do the following: Specify DUMPSTYLE = 0 (or DUMPSTYLE = 1) in that satellite's MODPARAMS.DAT file. Remove any DUMPFILE symbol from the satellite's MODPARAMS.DAT file. Run AUTOGEN on that satellite to create a dump file.
4	Rename the dump file to SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP or create a new dump file named SYSDUMP-COMMON.DMP in SYS$COMMON:[SYSEXE].
5	For each satellite that is to share the dump file, do the following: Create a file synonym entry for the dump file in the system-specific root. For example, to create a synonym for the satellite using root SYS1E, enter a command like the following: $ SET FILE SYS$COMMON:[SYSEXE]SYSDUMP-COMMON.DMP - _$ /ENTER=SYS$SYSDEVICE:[SYS1E.SYSEXE]SYSDUMP.DMP Add the following lines to the satellite's MODPARAMS.DAT file: DUMPFILE = 0 DUMPSTYLE = 0 (or DUMPSTYLE = 1)
6	Rename the old system-specific dump file on each system that has its own dump file: $ RENAME SYS$SYSDEVICE:[SYS n.SYSEXE]SYSDUMP.DMP .OLD The value of n in the command line is the root for each system (for example, SYS0 or SYS1). Rename the file so that the operating system software does not use it as the dump file when the system is rebooted.
7	Reboot each node so it can map to the new common dump file. The operating system software cannot use the new file for a crash dump until you reboot the system.
8	After you reboot, delete the SYSDUMP.OLD file in each system-specific root. Do not delete any file called SYSDUMP.DMP; instead, rename it, reboot, and then delete it as described in steps 6 and 7.

10.9 Maintaining the Integrity of OpenVMS Cluster Membership

Because multiple LAN and mixed-interconnect clusters coexist on a single extended LAN, the operating system provides mechanisms to ensure the integrity of individual clusters and to prevent access to a cluster by an unauthorized computer.

The following mechanisms are designed to ensure the integrity of the cluster:

A cluster authorization file (SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT), which is initialized during installation of the operating system or during execution of the CLUSTER_CONFIG.COM CHANGE function. The file is maintained with the SYSMAN utility.
Control of conversational bootstrap operations on satellites.

The purpose of the cluster group number and password is to prevent accidental access to the cluster by an unauthorized computer. Under normal conditions, the system manager specifies the cluster group number and password either during installation or when you run CLUSTER_CONFIG.COM (see Example 8-11) to convert a standalone computer to run in an OpenVMS Cluster system.

OpenVMS Cluster systems use these mechanisms to protect the integrity of the cluster in order to prevent problems that could otherwise occur under circumstances like the following:

When setting up a new cluster, the system manager specifies a group number identical to that of an existing cluster on the same Ethernet.
A satellite user with access to a local system disk tries to join a cluster by executing a conversational SYSBOOT operation at the satellite's console.

Reference: These mechanisms are discussed in Section 10.9.1 and Section 8.2.1, respectively.

10.9.1 Cluster Group Data

The cluster authorization file, SYS$COMMON:[SYSEXE]CLUSTER_AUTHORIZE.DAT, contains the cluster group number and (in scrambled form) the cluster password. The CLUSTER_AUTHORIZE.DAT file is accessible only to users with the SYSPRV privilege.

Under normal conditions, you need not alter records in the CLUSTER_AUTHORIZE.DAT file interactively. However, if you suspect a security breach, you may want to change the cluster password. In that case, you use the SYSMAN utility to make the change.

To change the cluster password, follow these instructions:

Step Action

1 Invoke the SYSMAN utility.

2 Log in as system manager on a boot server.

3 Enter the following command:
$ RUN SYS$SYSTEM:SYSMAN
SYSMAN>

4 At the SYSMAN> prompt, enter any of the CONFIGURATION commands in the following list.

CONFIGURATION SET CLUSTER_AUTHORIZATION
Updates the cluster authorization file, CLUSTER_AUTHORIZE.DAT, in the directory SYS$COMMON:[SYSEXE]. (The SET command creates this file if it does not already exist.) You can include the following qualifiers on this command:

/GROUP_NUMBER---Specifies a cluster group number. Group number must be in the range from 1 to 4095 or 61440 to 65535.
/PASSWORD---Specifies a cluster password. Password may be from 1 to 31 characters in length and may include alphanumeric characters, dollar signs ($), and underscores (_).

CONFIGURATION SHOW CLUSTER_AUTHORIZATION
Displays the cluster group number.
HELP CONFIGURATION SET CLUSTER_AUTHORIZATION
Explains the command's functions.

5 If your configuration has multiple system disks, each disk must have a copy of CLUSTER_AUTHORIZE.DAT. You must run the SYSMAN utility to update all copies.

Caution: If you change either the group number or the password, you must reboot the entire cluster. For instructions, see Section 8.6.

Step	Action
1	Invoke the SYSMAN utility.
2	Log in as system manager on a boot server.
3	Enter the following command: $ RUN SYS$SYSTEM:SYSMAN SYSMAN>
4	At the SYSMAN> prompt, enter any of the CONFIGURATION commands in the following list. CONFIGURATION SET CLUSTER_AUTHORIZATION Updates the cluster authorization file, CLUSTER_AUTHORIZE.DAT, in the directory SYS$COMMON:[SYSEXE]. (The SET command creates this file if it does not already exist.) You can include the following qualifiers on this command: /GROUP_NUMBER---Specifies a cluster group number. Group number must be in the range from 1 to 4095 or 61440 to 65535. /PASSWORD---Specifies a cluster password. Password may be from 1 to 31 characters in length and may include alphanumeric characters, dollar signs ($), and underscores (_). CONFIGURATION SHOW CLUSTER_AUTHORIZATION Displays the cluster group number. HELP CONFIGURATION SET CLUSTER_AUTHORIZATION Explains the command's functions.
5	If your configuration has multiple system disks, each disk must have a copy of CLUSTER_AUTHORIZE.DAT. You must run the SYSMAN utility to update all copies.
Caution: If you change either the group number or the password, you must reboot the entire cluster. For instructions, see Section 8.6.

10.9.2 Example

Example 10-4 illustrates the use of the SYSMAN utility to change the cluster password.

Example 10-4 Sample SYSMAN Session to Change the Cluster Password

$ RUN SYS$SYSTEM:SYSMAN SYSMAN> SET ENVIRONMENT/CLUSTER %SYSMAN-I-ENV, current command environment: Clusterwide on local cluster Username SYSTEM will be used on nonlocal nodes SYSMAN> SET PROFILE/PRIVILEGES=SYSPRV SYSMAN> CONFIGURATION SET CLUSTER_AUTHORIZATION/PASSWORD=NEWPASSWORD %SYSMAN-I-CAFOLDGROUP, existing group will not be changed %SYSMAN-I-CAFREBOOT, cluster authorization file updated The entire cluster should be rebooted. SYSMAN> EXIT $

10.10 Adjusting Maximum Packet Size for LAN Configurations

You can adjust the maximum packet size for LAN configurations with the NISCS_MAX_PKTSZ system parameter.

10.10.1 System Parameter Settings for LANs

Starting with OpenVMS Version 7.3, the operating system (PEdriver) automatically detects the maximum packet size of all the virtual circuits to which the system is connected. If the maximum packet size of the system's interconnects is smaller than the default packet-size setting, PEdriver automatically reduces the default packet size.

For earlier versions of OpenVMS (VAX Version 6.0 to Version 7.2; Alpha Version 1.5 to Version 7.2-1), the NISCS_MAX_PKTSZ parameter should be set to 1498 for Ethernet clusters and to 4468 for FDDI clusters.

10.10.2 How to Use NISCS_MAX_PKTSZ

To obtain this parameter's current, default, minimum, and maximum values, issue the following command:

$ MC SYSGEN SHOW NISCS_MAX_PKTSZ

You can use the NISCS_MAX_PKTSZ parameter to reduce packet size, which in turn can reduce memory consumption. However, reducing packet size can also increase CPU utilization for block data transfers, because more packets will be required to transfer a given amount of data. Lock message packets are smaller than the minimum value, so the NISCS_MAX_PKTSZ setting will not affect locking performance.

You can also use NISCS_MAX_PKTSZ to force use of a common packet size on all LAN paths by bounding the packet size to that of the LAN path with the smallest packet size. Using a common packet size can avoid VC closure due to packet size reduction when failing down to a slower, smaller packet size network.

If a memory-constrained system, such as a workstation, has adapters to a network path with large-size packets, such as FDDI or Gigabit Ethernet with jumbo packets, then you may want to conserve memory by reducing the value of the NISCS_MAX_PKTSZ parameter.

10.10.3 Editing Parameter Files

If you decide to change the value of the NISCS_MAX_PKTSZ parameter, edit the SYS$SPECIFIC:[SYSEXE]MODPARAMS.DAT file to permit AUTOGEN to factor the changed packet size into its calculations.

10.11 Determining Process Quotas

On Alpha systems, process quota default values in SYSUAF.DAT are often higher than the SYSUAF.DAT defaults on VAX systems. How, then, do you choose values for processes that could run on Alpha systems or on VAX systems in an OpenVMS Cluster? Understanding how a process is assigned quotas when the process is created in a dual-architecture OpenVMS Cluster configuration will help you manage this task.

10.11.1 Quota Values

The quotas to be used by a new process are determined by the OpenVMS LOGINOUT software. LOGINOUT works the same on OpenVMS Alpha and OpenVMS VAX systems. When a user logs in and a process is started, LOGINOUT uses the larger of:

The value of the quota defined in the process's SYSUAF.DAT record
The current value of the corresponding PQL_Mquota system parameter on the host node in the OpenVMS Cluster

Example: LOGINOUT compares the value of the account's ASTLM process limit (as defined in the common SYSUAF.DAT) with the value of the PQL_MASTLM system parameter on the host Alpha system or on the host VAX system in the OpenVMS Cluster.

10.11.2 PQL Parameters

The letter M in PQL_M means minimum. The PQL_Mquota system parameters set a minumum value for the quotas. In the Current and Default columns of the following edited SYSMAN display, note how the current value of each PQL_Mquota parameter exceeds its system-defined default value in most cases. Note that the following display is Alpha specific. A similar SYSMAN display on a VAX system would show "Pages" in the Unit column instead of "Pagelets".

SYSMAN> PARAMETER SHOW/PQL

%SYSMAN-I-USEACTNOD, a USE ACTIVE has been defaulted on node DASHER Node DASHER: Parameters in use: ACTIVE Parameter Name Current Default Minimum Maximum Unit Dynamic -------------- ------- ------- ------- ------- ---- ------- PQL_MASTLM 120 4 -1 -1 Ast D PQL_MBIOLM 100 4 -1 -1 I/O D PQL_MBYTLM 100000 1024 -1 -1 Bytes D PQL_MCPULM 0 0 -1 -1 10Ms D PQL_MDIOLM 100 4 -1 -1 I/O D PQL_MFILLM 100 2 -1 -1 Files D PQL_MPGFLQUOTA 65536 2048 -1 -1 Pagelets D PQL_MPRCLM 10 0 -1 -1 Processes D PQL_MTQELM 0 0 -1 -1 Timers D PQL_MWSDEFAULT 2000 2000 -1 -1 Pagelets PQL_MWSQUOTA 4000 4000 -1 -1 Pagelets D PQL_MWSEXTENT 8192 4000 -1 -1 Pagelets D PQL_MENQLM 300 4 -1 -1 Locks D PQL_MJTQUOTA 0 0 -1 -1 Bytes D

In this display, the values for many PQL_Mquota parameters increased from the defaults to their current values. Typically, this happens over time when AUTOGEN feedback is run periodically on your system. The PQL_Mquota values also can change, of course, when you modify the values in MODPARAMS.DAT or in SYSMAN. As you consider the use of a common SYSUAF.DAT in an OpenVMS Cluster with both VAX and Alpha computers, keep the dynamic nature of the PQL_Mquota parameters in mind.

10.11.3 Examples

The following table summarizes common SYSUAF.DAT scenarios and probable results on VAX and Alpha computers in an OpenVMS Cluster system.

Table 10-5 Common SYSUAF.DAT Scenarios and Probable Results
WHEN you set values at... THEN a process that starts on... Will result in...

Alpha level An Alpha node Execution with the values you deemed appropriate.

A VAX node LOGINOUT not using the system-specific PQL_M quota values defined on the VAX system because LOGINOUT finds higher values for each quota in the Alpha style SYSUAF.DAT. This could cause VAX processes in the OpenVMS Cluster to use inappropriately high resources.

VAX level A VAX node Execution with the values you deemed appropriate.

An Alpha node LOGINOUT ignoring the typically lower VAX level values in the SYSUAF and instead use the value of each quota's current PQL_M quota values on the Alpha system. Monitor the current values of PQL_M quota system parameters if you choose to try this approach. Increase as necessary the appropriate PQL_M quota values on the Alpha system in MODPARAMS.DAT.

**Table 10-5 Common SYSUAF.DAT Scenarios and Probable Results**
WHEN you set values at...	THEN a process that starts on...	Will result in...
Alpha level	An Alpha node	Execution with the values you deemed appropriate.
	A VAX node	LOGINOUT not using the system-specific PQL_M quota values defined on the VAX system because LOGINOUT finds higher values for each quota in the Alpha style SYSUAF.DAT. This could cause VAX processes in the OpenVMS Cluster to use inappropriately high resources.
VAX level	A VAX node	Execution with the values you deemed appropriate.
	An Alpha node	LOGINOUT ignoring the typically lower VAX level values in the SYSUAF and instead use the value of each quota's current PQL_M quota values on the Alpha system. Monitor the current values of PQL_M quota system parameters if you choose to try this approach. Increase as necessary the appropriate PQL_M quota values on the Alpha system in MODPARAMS.DAT.

You might decide to experiment with the higher process-quota values that usually are associated with an OpenVMS Alpha system's SYSUAF.DAT as you determine values for a common SYSUAF.DAT in an OpenVMS Cluster environment. The higher Alpha-level process quotas might be appropriate for processes created on host VAX nodes in the OpenVMS Cluster if the VAX systems have large available memory resources.

You can determine the values that are appropriate for processes on your VAX and Alpha systems by experimentation and modification over time. Factors in your decisions about appropriate limit and quota values for each process will include the following:

Amount of available memory
CPU processing power
Average work load of the applications
Peak work loads of the applications

Contents

Index

privacy and legal statement

4477PRO_021.HTML