Troubleshooting: Possible UETP Errors

If both a Table of Contents and Search form are not displayed in separate frames along with this one, you may wish to redisplay this book in a frameset, beginning with the first page.

HP OpenVMS System Manager's Manual, Volume 2:...

Testing the System with UETP

Troubleshooting: An Overview

UETP Tests and Phases

Troubleshooting: Possible UETP Errors

This section is intended to help you identify and solve problems you can encounter running UETP. You should refer to this section if you need help understanding a system failure and isolating its cause. This section is not intended as a repair manual and is not expected to diagnose any flaws in your system. It should, however, help you to interpret and act upon the information in the error messages.

If you are unable to correct an error after following the steps in this section, you should contact a HP support representative. Any information you can supply about the measures you have taken to isolate the problem will help your a HP support representative diagnose the problem.

Summary of Common Failures

The following problems are the most common failures encountered while running UETP:

Wrong quotas, privileges, or account

UETINIT01 failure

UETVECTOR failure (VAX computers only)

Insufficient disk space

Incorrect cluster setup

Problems during the load test

DECnet for OpenVMS error

Errors logged but not displayed

No process control block (PCB) or swap slots

System hangups

Lack of default access for the file access listener (FAL) object

Bugchecks and machine checks

The sections that follow describe these errors and offer the best course of action for dealing with each one.

Wrong Quotas, Privileges, or Account

If your assigned quotas or privileges do not match standard quotas and privileges for the SYSTEST account, UETP displays the following error message:

**********************
*  UETINIT00         *
*  Error count =  1  *
**********************
-UETP-W-TEXT,   The following:
 
        OPER privilege,
        BIOLM  quota,
        ENQLM  quota,
        FILLM  quota,
 
are nonstandard for the SYSTEST account and may result in UETP errors.

This message informs you that the OPER privilege and the BIOLM, ENQLM, and FILLM quotas either are not assigned correctly or are not assigned at all.

UETP displays a similar message if you run the cluster integration test phase and the privileges and quotas for the SYSTEST_CLIG account are incorrect. The SYSTEST and SYSTEST_CLIG accounts require the same privileges and quotas. Take the action described in this section for both accounts.

Solution

To correct the problem, use the following procedure:

Display all privileges and quotas in effect for the SYSTEST account using the Authorize utility (AUTHORIZE) as follows:

$ SET DEFAULT SYS$SYSTEM
$ RUN SYS$SYSTEM:AUTHORIZE
UAF> SHOW SYSTEST
 
Username: SYSTEST                          Owner:  SYSTEST-UETP
Account:  SYSTEST                          UIC:    [1,7] ([SYSTEST])
CLI:      DCL                              Tables: DCLTABLES
Default:  SYS$SYSROOT:[SYSTEST]
LGICMD:   LOGIN
Login Flags:  
Primary days:   Mon Tue Wed Thu Fri Sat Sun
Secondary days:                            
No access restrictions
Expiration:            (none)    Pwdminimum:  8   Login Fails:     0
Pwdlifetime:         14 00:00    Pwdchange:   22-JUN-2000 10:12 
Last Login:            (none) (interactive),          (none) (non-interactive)
Maxjobs:         0  Fillm:       100  Bytlm:        65536
Maxacctjobs:     0  Shrfillm:      0  Pbytlm:           0
Maxdetach:       0  BIOlm:        12  JTquota:       1024
Prclm:          12  DIOlm:        55  WSdef:          256
Prio:            4  ASTlm:       100  WSquo:          512
Queprio:         0  TQElm:        20  WSextent:      2048
CPU:        (none)  Enqlm:       300  Pgflquo:      20480
Authorized Privileges: 
  CMKRNL CMEXEC SYSNAM GRPNAM DETACH DIAGNOSE LOG_IO GROUP
  PRMCEB PRMMBX SETPRV TMPMBX NETMBX VOLPRO PHY_IO SYSPRV
Default Privileges: 
  CMKRNL CMEXEC SYSNAM GRPNAM DETACH DIAGNOSE LOG_IO GROUP
  PRMCEB PRMMBX SETPRV TMPMBX NETMBX VOLPRO PHY_IO SYSPRV
UAF> SHOW SYSTEST_CLIG
.
.
.
UAF> EXIT

Make sure the default privileges and quotas assigned to the account match the following list:

Privileges

CMKRNL
CMEXEC
NETMBX
DIAGNOSE
IMPERSONATE

DETACH
PRMCEB
PRMMBX
PHY_IO

GRPNAM
TMPMBX
VOLPRO
LOG_IO

SYSNAM
SYSPRV
SETPRV
GROUP

Quotas

BIOLM: 150
PRCLM: 8

DIOLM: 150
ASTLM: 250

FILLM: 100
BYTLM: 64000

TQELM: 20
CPU: no limit

ENQLM: 2000
PGFLQUOTA: 50000 (Alpha- 800,000)

WSDEFAULT: 2000
WSQUOTA: 4000

WSEXTENT: 16384 (16)

If any privileges or quotas are incorrect, run AUTHORIZE to correct them.

If you are logged in to the wrong account, the following error message asks you to log in to the SYSTEST account:

$ @UETP

**********************
*  UETINIT00         *
*  Error count =  1  *
**********************
-UETP-E-ABORT, UETINIT00 aborted at  22-JUN-2000 14:24:10.13
-UETP-E-TEXT, You are logged in to the wrong account.
              Please log in to the SYSTEST account.
$

You must run UETP from the SYSTEST account.

UETINIT01 Failure

UETINIT01 failures are related to peripheral devices; this type of error message can indicate any of the following problems:

Device failure

Device not supported or not mounted

Device allocated to another user

Device write locked

Lost vacuum on a magnetic tape drive

Drive off line

In some cases, the corrective action is specified explicitly in the error message. For example, you can receive a message from the operator communication manager (OPCOM) informing you of a problem and recommending a corrective measure:

%OPCOM,  22-JUN-2004 14:10:52.96, request 1, from user SYSTEST
Please mount volume UETP in device _MTA0:
%MOUNT-I-OPRQST, Please mount volume UETP in device _MTA0:

Other error messages can relate information in which the solution is specified implicitly:

%UETP-S-BEGIN, UETDISK00 beginning at 22-JUN-2004 13:34:46.03
 
**********************
*  DISK_DRA          *
*  Error count =  1  *
**********************
-UETP-E-TEXT, RMS file error in file DRA0:DRA00.TST
-RMS-E-DNR, device not ready or not mounted
%UETP-S-ENDED, UETDISK00 ended at  22-JUN-2004 13:34:46.80

This message tells you that a disk drive is either not ready or not mounted. From this information, you know where to look for the cause of the failure (at the disk drive). If you cannot see the cause of the problem immediately, check the setup instructions in Setting Up the Devices to Be Tested.

In other cases, the cause of a failure might not be obvious from the information in the message. The problem can be related to hardware rather than software.

Solution

To determine where or when the failure occurs in the execution of UETP, use the following procedure:

Run the device test individually. (See Running a Subset of Phases.) By doing this, you can determine if the failure can be re-created, and you can isolate the cause of the problem by reproducing it using the least amount of software possible.

For example, if the failure occurs only when you run the entire device phase, and not when you run the affected device test individually, you can conclude the problem is related to device interaction. Conversely, if you can re-create the error by running the single device test, then you have proved that the error is not related to device interaction.

Run the device test with different media. If your run of the single device test succeeded in reproducing the error, the magnetic tape or disk media could be defective. Running the same test with different media determines whether the original media caused the problem.

Call an HP support representative. If you have tried all the previous steps without solving the problem, you should contact an HP support representative.

UETVECTOR Failure (VAX Only)

UETP displays a message similar to the following one to signal a vector processor failure:

     **********************
     *  UETVECTOR         *
     *  Error count = 1   *
     **********************
     %PPL-S-CREATED_SOME, created some of those requested - partial success
     -UETP-E-SUBSPNERR, Error spawning subordinate process.
     -UETP-E-SCHCTXERR, Error scheduling vector context test subprocess.
     -UETP-E-VECCTXERR, Error encountered during vector context testing.
      %UETP-I-ENDED, UETVECTOR_0000 ended at 22-JUN-2004 07:37:00.59

Solution

See Vector Processors and the VVIEF (VAX Only) for the correct setup for vector processor testing.

Insufficient Disk Space

When you run continuous passes of UETP, log files accumulate on the disk from which UETP was run. These files reduce the amount of free disk space available for each successive pass. If the amount of disk space available becomes too small for the current load, the following error message appears:

%UETP-S-BEGIN, UETDISK00 beginning at  22-JUN-2004 08:12:24.34
%UETP-I-ABORTC, DISK_DJA to abort this test, type ^C
 
**********************
*  DISK_DJA          *
*  Error count = 1   *
**********************
-UETP-F-TEXT, RMS file error in file DJA0:DJA00.TST
-RMS-F-FUL, device full (insufficient space for allocation)
 
**********************
*  DISK_DJA          *
*  Error count = 2   *
**********************
-UETP-F-TEXT, RMS file error in file DJA0:DJA01.TST
-RMS-F-FUL, device full (insufficient space for allocation)
%UETP-E-DESTP, DISK_DJA stopped testing DJA unit 0 at 08:12:36.91
%UETP-S-ENDED, UETDISK00 ended at  22-JUN-2004 08:12:37.98

Solution

Make more space available on the disk. You can do this by using one or more of the following techniques:

Delete unnecessary files to create more space.

Purge files, if multiple versions exist.

Mount a volume with sufficient space.

Check for disk quotas that might be enabled on the disk. If disk quotas are enabled, either disable or increase them. (Refer to the HP OpenVMS System Management Utilities Reference Manual for a description of the Disk Quota utility.)

Run VMSTAILOR if you have a small-disk system. Refer to the upgrade and installation manual for your operating system for more information.

See Using the SYSTEST Directories and How UETP Works on Disks for a further discussion of disk space.

Incorrect Setup of an OpenVMS Cluster System

Most problems that can occur during the cluster-integration test are related to improper setup of the OpenVMS Cluster system or of UETP on the cluster. These problems are most likely to occur at the following stages of the cluster test:

Near the beginning, when processes on OpenVMS nodes are started

Toward the end, when cluster file access is checked

The cluster test phase shows that various OpenVMS nodes in your cluster can simultaneously access files on selected nodes in the cluster. First, UETP tries to create a file on a disk drive that is accessible to the other selected nodes in the cluster. The following requirements are for creating a file in the cluster test phase:

A [SYSTEST] directory must exist on the disk in either the master file directory (MFD) or in the root directory [SYS0.].

The protection for [SYSTEST] directory must be set to allow the SYSTEST account to create a file in it.

If UETP is unable to find a suitable device on a certain node, the test displays a warning message and proceeds to the next cluster node.

Nodes on which the operator's terminal (OPA0) is set to the NO BROADCAST terminal characteristic will generate the following error message during the cluster test:

**********************
*  UETCLIG00master   *
*  Error count =  1  *
**********************
-UETP-E-TEXT, 0 operator consoles timed out on the cluster test warning
       and 1 operator console rejected it.
-UETP-E-TEXT, Status returned was,
      "%SYSTEM-F-DEVOFFLINE, device is not in configuration or not
      available"

Disregard this message if OPA0 is set to NO BROADCAST.

Solution

Whenever you suspect a problem, examine the SYS$TEST:NETSERVER.LOG file that was created when the SYSTEST_CLIG process was created. This file can contain additional error information that could not be transmitted to the node running the test. If it was not possible to create the SYSTEST_CLIG process on some node, the system accounting file for that node might contain a final process status in a process termination record.

The following problems can occur during a cluster test:

Logging in at other nodes--This problem is due to incorrect setup for the cluster test at the remote OpenVMS node. For example, if you specified a password for the SYSTEST_CLIG account or if you disabled the SYSTEST_CLIG account, the test displays the following message:
```
%SYSTEM-F-INVLOGIN, login information invalid at remote node
```
Refer to OpenVMS Cluster Testing and Defining a Remote Node for UETP Ethernet Testing for information about preparing for cluster testing.

Communicating with other nodes--A message indicates a DECnet problem. Check the NETSERVER.LOG file on the affected node to determine the cause.

Taking out locks or detecting deadlocks--The most likely cause of this problem is that you are not logged in to the SYSTEST account. Another possibility is that your cluster is not configured properly.

Creating files on cluster nodes--This problem is due to incorrect setup for the cluster test; refer to OpenVMS Cluster Testing for information about preparing for cluster testing.

Problems During the Load Test

A variety of errors can occur during the load test because the command procedures that are started during the tests run several utilities and do many functions. Tracking a problem can be difficult because UETP deletes the log files that are generated during the load test. (See System Load Test Phase.)

Solution

If a problem occurs during the load test and the cause is not obvious, you can modify UETP.COM to preserve the log files as follows:

Add the /NODELETE qualifier to the following line:

$ TCNTRL UETLOAD00.DAT/PARALLEL_COUNT='LOADS/REPORT_TYPE='REPORT

Delete or comment out the following line:
```
$ DELETE UETLO*.LOG;*
```

Rerun the load test with these changes to try to re-create the problem.

If you re-create the problem, look at the contents of the appropriate log file. You can determine which log file to read by understanding the scheme by which the load test names its processes and log files. (The log file names are derived from the process names.)

The load test creates processes that are named in the following format:

UETLOADnn_nnnn

For example:

%UETP-I-BEGIN, UETLOAD00 beginning at 22-JUN-2004 15:45:08.97
%UETP-I-BEGIN, UETLOAD02_0000 beginning at 22-JUN-2004 15:45:09.42
%UETP-I-BEGIN, UETLOAD03_0001 beginning at 22-JUN-2004 15:45:09.63
%UETP-I-BEGIN, UETLOAD04_0002 beginning at 22-JUN-2004 15:45:10.76
%UETP-I-BEGIN, UETLOAD05_0003 beginning at 22-JUN-2004 15:45:11.28
%UETP-I-BEGIN, UETLOAD06_0004 beginning at 22-JUN-2004 15:45:12.56
%UETP-I-BEGIN, UETLOAD07_0005 beginning at 22-JUN-2004 15:45:13.81
%UETP-I-BEGIN, UETLOAD08_0006 beginning at 22-JUN-2004 15:45:14.95
%UETP-I-BEGIN, UETLOAD09_0007 beginning at 22-JUN-2004 15:45:16.99
%UETP-I-BEGIN, UETLOAD10_0008 beginning at 22-JUN-2004 15:45:19.32
%UETP-I-BEGIN, UETLOAD11_0009 beginning at 22-JUN-2004 15:45:19.95
%UETP-I-BEGIN, UETLOAD02_0010 beginning at 22-JUN-2004 15:45:20.20
%UETP-I-BEGIN, UETLOAD03_0011 beginning at 22-JUN-2004 15:45:21.95
%UETP-I-BEGIN, UETLOAD04_0012 beginning at 22-JUN-2004 15:45:22.99

Note that if more than 10 processes are created, the numbering sequence for the UETLOADnn portion of the process name starts over at UETLOAD02; however, the 4 digits of the _nnnn portion continue to increase.

Each load test process creates two log files. The first log file is created by the test controller; the second log file is created by the process itself. The log file to look at for error information about any given load test process is the one that was created by the test controller (the first log file).

The load test log file derives its file name from the process name, appending the last four digits of the process name (from the _nnnn portion) to UETLO. The test-controller log file and the process log file for each process use the same file name; however, the process log file has the higher version number of the two. For example, the log files created by the process UETLOAD05_0003 would be named as follows:

UETLO0003.LOG;1 (test-controller log file)

UETLO0003.LOG;2 (process log file)

Make sure that you look at the log file with the lower version number; that file contains the load test commands and error information.

After you have isolated the problem, restore UETP.COM to its original state and delete the log files from the load test (UETL0*.LOG;*); failure to delete these files can result in disk space problems.

DECnet for OpenVMS Error

A DECnet error message can indicate that the network is unavailable.

Solution

If DECnet for OpenVMS software is included in your system, determine whether the product authorization key (PAK) is registered by entering the following command:
```
$ SHOW LICENSE
```
If the PAK is not registered, invoke the License utility to register it by entering the following command:
```
$ @SYS$UPDATE:VMSLICENSE
```
For information about registering licenses, refer to the following documents:

If DECnet for OpenVMS software is not included in your system, ignore the message; it is normal and does not affect the UETP run.

If you encounter other DECnet related errors, you should perform the following actions:

Run DECnet for OpenVMS software as a single phase (see Running a Subset of Phases) to determine whether the error can be re-created.

Use the Help Message or refer to the OpenVMS System Messages: Companion Guide for Help Message Users .

Errors Logged but Not Displayed

If no errors are displayed at the console terminal or reported in the UETP.LOG file, you should run Error Log Viewer (ELV) to see if any errors were logged in the ERRLOG.SYS file. Refer to the HP OpenVMS System Management Utilities Reference Manual A-L for information about running the ELV.

No PCB or Swap Slots

The following error message indicates that no PCB or swap slots are available:

%UETP-I-BEGIN, UETLOAD00 beginning at  22-JUN-2004 07:47:16.50
%UETP-I-BEGIN, UETLOAD02_0000 beginning at  22-JUN-2004 07:47:16.76
%UETP-I-BEGIN, UETLOAD03_0001 beginning at  22-JUN-2004 07:47:16.92
%UETP-I-BEGIN, UETLOAD04_0002 beginning at  22-JUN-2004 07:47:17.13
%UETP-I-BEGIN, UETLOAD05_0003 beginning at  22-JUN-2004 07:47:17.35
%UETP-I-BEGIN, UETLOAD06_0004 beginning at  22-JUN-2004 07:47:17.61
%UETP-W-TEXT, The process -UETLOAD07_0005- was unable to be created,
  the error message is
-SYSTEM-F-NOSLOT, no pcb or swap slot available
%UETP-W-TEXT, The process -UETLOAD08_0006- was unable to be created,
  the error message is
-SYSTEM-F-NOSLOT, no pcb or swap slot available
%UETP-W-TEXT, The process -UETLOAD09_0007- was unable to be created,
  the error message is
-SYSTEM-F-NOSLOT, no pcb or swap slot available
%UETP-W-TEXT, The process -UETLOAD10_0008- was unable to be created,
  the error message is
-SYSTEM-F-NOSLOT, no pcb or swap slot available
%UETP-W-TEXT, The process -UETLOAD11_0009- was unable to be created,
  the error message is
-SYSTEM-F-NOSLOT, no pcb or swap slot available
%UETP-W-ABORT, UETLOAD00 aborted at  22-JUN-2004 07:47:54.10
-UETP-W-TEXT, Aborted via a user Ctrl/C.
 ***************************************************
 *                                                 *
    END OF UETP PASS 1 AT  22-JUN-2004 07:48:03.17  
 *                                                 *
 ***************************************************

Solution

To solve this problem, use the following procedure:

Individually rerun the phase that caused the error message (the LOAD phase in the previous example) to see if the error can be reproduced.

Increase the size of the page file, using either the command procedure SYS$UPDATE:SWAPFILES.COM (see Managing Page, Swap, and Dump Files) or SYSGEN (refer to the HP OpenVMS System Management Utilities Reference Manual).

Increase the system parameter MAXPROCESSCNT, if necessary.

Reboot the system.

No Keyboard Response or System Disk Activity

If the keyboard does not respond or the system disk is inactive, the system might be hung.

Solution

A system hangup can be difficult to trace; you should save the dump file for reference. To learn why the system hung, run the System Dump Analyzer as described in the OpenVMS VAX System Dump Analyzer Utility Manual or the OpenVMS System Analysis Tools Manual .

Reasons for a system hangup include the following ones:

Insufficient pool space--Increase the value of the system parameter NPAGEVIR and reboot the system.

Insufficient page file space--Increase the page file space using the SYSGEN as described in the HP OpenVMS System Management Utilities Reference Manual M-Z.

I/O device failure causing driver-permanent loop--Call an HP support representative.

Lack of Default Access for the FAL Object

If default FAL access is disabled at the remote node selected by UETP for DECnet testing (the adjacent node on each active circuit, or a node defined by the group logical name UETP$NODE_ADDRESS), messages similar to the following ones appear:

%UETP-W-TEXT, The process -SVA019841_0001- returned a final status of:
%COPY-E-OPENOUT, error opening !AS as output

These messages are followed by:

%COPY-E-OPENOUT, error opening 9999""::SVA019841.D1; as output
-RMS-E-CRE, ACP file create failed
-SYSTEM-F-INVLOGIN, login information invalid at remote node
%COPY-W-NOTCOPIED, SYS$COMMON:[SYSTEST]UETP.COM;2 not copied
%UETP-E-TEXT, Remote file test data error

You can ignore these messages.

Bugchecks and Machine Checks

When the system aborts its run, a bugcheck message appears at the console.

Solution

Call your HP support representative. Often a hardware problem causes bugchecks and machine checks; solving bugchecks or machine checks is not easy. However, saving the SYS$SYSTEM:SYSDUMP.DMP and ERRLOG.SYS files is important so they are available for examination. Knowing whether the failure can be re-created is also important; you can run UETP again to verify the failure.

Troubleshooting: An Overview

UETP Tests and Phases

CMKRNL	CMEXEC	NETMBX	DIAGNOSE	IMPERSONATE
DETACH	PRMCEB	PRMMBX	PHY_IO
GRPNAM	TMPMBX	VOLPRO	LOG_IO
SYSNAM	SYSPRV	SETPRV	GROUP

BIOLM: 150	PRCLM: 8
DIOLM: 150	ASTLM: 250
FILLM: 100	BYTLM: 64000
TQELM: 20	CPU: no limit
ENQLM: 2000	PGFLQUOTA: 50000 (Alpha- 800,000)
WSDEFAULT: 2000	WSQUOTA: 4000
WSEXTENT: 16384 (16)