Solving Queue Manager Problems

If both a Table of Contents and Search form are not displayed in separate frames along with this one, you may wish to redisplay this book in a frameset, beginning with the first page.

HP OpenVMS System Manager's Manual, Volume 1:...

Managing the Queue Manager and Queue Database

Maximizing Queuing System Performance

Reporting a Queuing System Problem to HP

Solving Queue Manager Problems

Use the following sections to help solve queue manager problems:

Topic For More Information

Avoiding common problems: a troubleshooting checklist
Avoiding Common Problems: A Troubleshooting Checklist

If the queue manager does not start
If the Queue Manager Does Not Start

If the queuing system stops or the queue manager does not run on specific nodes
If the Queuing System Stops or the Queue Manager Does Not Run on Specific Nodes

If the queue manager becomes unavailable
If the Queue Manager Becomes Unavailable

If the queuing system does not work on a specific OpenVMS Cluster node
If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node

If you see inconsistent queuing behavior on different OpenVMS Cluster nodes
If You See Inconsistent Queuing Behavior on Different OpenVMS Cluster Nodes

Reporting a queuing system problem to HP support representatives
Reporting a Queuing System Problem to HP

Topic	For More Information
Avoiding common problems: a troubleshooting checklist	Avoiding Common Problems: A Troubleshooting Checklist
If the queue manager does not start	If the Queue Manager Does Not Start
If the queuing system stops or the queue manager does not run on specific nodes	If the Queuing System Stops or the Queue Manager Does Not Run on Specific Nodes
If the queue manager becomes unavailable	If the Queue Manager Becomes Unavailable
If the queuing system does not work on a specific OpenVMS Cluster node	If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node
If you see inconsistent queuing behavior on different OpenVMS Cluster nodes	If You See Inconsistent Queuing Behavior on Different OpenVMS Cluster Nodes
Reporting a queuing system problem to HP support representatives	Reporting a Queuing System Problem to HP

Avoiding Common Problems: A Troubleshooting Checklist

To avoid the most common queuing system problems, make sure you have met the following requirements:

Requirement For More Information

QMAN$MASTER is identically defined on all nodes in the cluster.
Specifying the Location of the Queue Database

The queue database is in the specified location.
Specifying the Location of the Queue Database

The queue database disk is mounted and available.
Specifying the Location of the Queue Database

The node list specified with the /ON qualifier contains a sufficient number of nodes. If you specify a node list, HP recommends that you include an asterisk (*) at the end of the node list.
If the Queue Manager Becomes Unavailable

The system address parameters SCSNODE and SCSSYSTEMID match the DECnet for OpenVMS node name and node ID.
If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node

Requirement	For More Information
QMAN$MASTER is identically defined on all nodes in the cluster.	Specifying the Location of the Queue Database
The queue database is in the specified location.	Specifying the Location of the Queue Database
The queue database disk is mounted and available.	Specifying the Location of the Queue Database
The node list specified with the /ON qualifier contains a sufficient number of nodes. If you specify a node list, HP recommends that you include an asterisk (*) at the end of the node list.	If the Queue Manager Becomes Unavailable
The system address parameters SCSNODE and SCSSYSTEMID match the DECnet for OpenVMS node name and node ID.	If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node

If the Queue Manager Does Not Start

If the queue manager does not start when you enter the START/QUEUE/MANAGER command, the system displays the following message:

%JBC-E-QMANNOTSTARTED, queue manager could not be started

Investigating the Problem

Search the operator log file SYS$MANAGER:OPERATOR.LOG (or look on the operator console) for messages from the queue manager and job controller for information about the problem, as follows:

$ SEARCH SYS$MANAGER:OPERATOR.LOG/WINDOW=5 QUEUE_MANAGE,-
_$ JOB_CONTROL,BATCH_MANAGE

Use the information provided with these messages to further investigate the problem, making sure you have met the requirements listed in Avoiding Common Problems: A Troubleshooting Checklist.

Cause

The cause of the problem is the system's inability to find the queue master file. Often the logical is not defined correctly, or the disk is not available. For example, the following message indicates that the master queue file does not exist in the expected location:

%%%%%%%%%%%  OPCOM  13-MAR-2000 15:53:52.84  %%%%%%%%%%%
Message from user SYSTEM on ABDCEF
%JBC-E-OPENERR, error opening SYS$COMMON:[SYSEXE]QMAN$MASTER.DAT
 
%%%%%%%%%%%  OPCOM  13-MAR-2000 15:53:53.04  %%%%%%%%%%%
Message from user SYSTEM on ABDCEF
-SYSTEM-W-NOSUCHFILE, no such file

Correcting the Problem

On systems with multiple queue managers, search for messages displayed by additional queue managers by including their process names in the search string. To display information about queue managers running on your system, use the SHOW QUEUE/MANAGERS command as explained in Displaying Information About Queue Managers. Correct any problem indicated in the displayed information.

Example

$ START/QUEUE/MANAGER DUA55:[SYSQUE] [1] 
%JBC-E-QMANNOTSTARTED, queue manager could not be started [2] 
$SEARCH SYS$MANAGER:OPERATOR.LOG /WINDOW=5 QUEUE_MANAGE,JOB_CONTROL [3] 
%%%%%%%%%%%  OPCOM  14-APR-2000 18:55:18.23  %%%%%%%%%%%
Message from user SYSTEM on CATNIP
%QMAN-E-OPENERR, error opening DUA55:[SYSQUE]SYS$QUEUE_MANAGER.QMAN$QUEUES;
 
%%%%%%%%%%%  OPCOM  14-APR-2000 18:55:18.29  %%%%%%%%%%%
Message from user SYSTEM on CATNIP
-RMS-F-DEV, error in device name or inappropriate device type for operation
 
%%%%%%%%%%%  OPCOM  14-APR-2000 18:55:18.31  %%%%%%%%%%%
Message from user SYSTEM on CATNIP
-SYSTEM-W-NOSUCHDEV, no such device available [4] 
$ START/QUEUE/MANAGER DUA5:[SYSQUE] [5]

This command attempts to start the queue manager, specifying DUA55:[SYSQUE] as the location of the queue and journal files.

The error message indicates that the queue manager did not start.

This command searches the operator log file for relevant messages. The SEARCH command does not include a second queue manager name, such as BATCH_MANAGE.

This message indicates that the queue file could not be opened because device DUA55: does not exist.

This command, which correctly specifies DUA5:[SYSQUE] as the location for the queue and journal files, successfully starts the queue manager.

For more information about multiple queue managers and their process names, see Understanding Multiple Queue Managers.

If the Queuing System Stops or the Queue Manager Does Not Run on Specific Nodes

Use this section if the queue manager does not run on a specific node in the cluster, or if the queuing system stops, especially after one of the following actions:

The node on which the queue manager was running leaves the cluster.

A new node boots into the cluster.

You change the node list specified with the /ON qualifier of the START/QUEUE/MANAGER command.

You start the queue manager after moving the queue database.

Investigating the Problem

Check the operator log that was current at the time the queue manager started up or failed over. Search the log for operator messages from the queue manager.

On systems with multiple queue managers, also search for messages displayed by additional queue managers by including their process names in the search string. To display information about queue managers running on your system, use the SHOW QUEUE/MANAGERS command, as explained in Displaying Information About Queue Managers.

For more information about multiple queue managers and their process names, see Understanding Multiple Queue Managers.

The following messages indicate that the queue database is not in the specified location:

%%%%%%%%%%%  OPCOM   4-FEB-2000 15:06:25.21  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
%QMAN-E-OPENERR, error opening CLU$COMMON:[SYSEXE]SYS$QUEUE_MANAGER.QMAN$QUEUES;
 
%%%%%%%%%%%  OPCOM   4-FEB-2000 15:06:27.29  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
-RMS-E-FNF, file not found
 
%%%%%%%%%%%  OPCOM   4-FEB-2000 15:06:27.45  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
-SYSTEM-W-NOSUCHFILE, no such file

The following messages indicate that the queue database disk is not mounted:

%%%%%%%%%%%  OPCOM   4-FEB-2000 15:36:49.15  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
%QMAN-E-OPENERR, error opening DISK888:[QUEUE_DATABASE]SYS$QUEUE_MANAGER.QMAN$QUEUES;
 
%%%%%%%%%%%  OPCOM   4-FEB-2000 15:36:51.69  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
-RMS-F-DEV, error in device name or inappropriate device type for operation
 
%%%%%%%%%%%  OPCOM   4-FEB-2000 15:36:52.20  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
-SYSTEM-W-NOSUCHDEV, no such device available

Cause

The queuing system does not work correctly under the following circumstances:

If the dirspec parameter specified with the START/QUEUE/MANAGER command (specifying the location of the queue and journal files) is not translated exactly the same on all nodes, and the queue manager starts on one of the affected nodes. You typically find this problem in an OpenVMS Cluster environment when you add a system disk or move the queue database.

If the queue database disk is not mounted for the node on which the queue manager attempts to run.

In general, the queuing system will be shut off completely if the queue manager encounters a serious error and forces a crash or failover twice in two minutes consecutively on the same node. Therefore, the queuing system may have stopped, or it may continue to run if the queue manager moves to yet another node on which it can access the database after the original failed startup.

Correcting the Problem

Perform the following steps:

If the queue manager is stopped, enter START/QUEUE/MANAGER and include the following information:
- An appropriate list of nodes with the /ON qualifier.
- The appropriate dirspec parameter (to specify the location of the queue and journal files). All the nodes included in the node list with the /ON qualifier must be able to access this directory.

On all nodes specified in the node list (except on any nodes that boot from the disk where the queue database files are stored), add a MOUNT command to the SYLOGICALS.COM procedure to mount the disk that holds the master file. You do not need to explicitly mount the disk on a node where it is the system disk.

If the Queue Manager Becomes Unavailable

The queue manager becomes unavailable if it does not start or has stopped running.

Investigating the Problem

To investigate the problem, enter SHOW CLUSTER to see if the nodes on the list are available.

Cause

An insufficient failover node list might have been specified for the queue manager, so that none of the nodes in the failover list is available to run the queue manager.

Correcting the Problem

Make sure the queue manager list contains a sufficient number of nodes by entering START/QUEUE/MANAGER with the /ON qualifier to specify a node list appropriate for your configuration.

If you are in doubt about what nodes to specify, HP recommends that you specify an asterisk (*) wildcard character as the last node in the list; the asterisk indicates that any remaining node in the cluster can run the queue manager. Specifying the asterisk prevents your queue manager from becoming unavailable because of an insufficient node list.

If the Queuing System Does Not Work on a Specific OpenVMS Cluster Node

Use this section if the queuing system does not work on a specific node when it starts up.

Investigating the Problem

Perform the following steps:

Search the operator log that was current when the problem existed for the following messages. These messages are broadcast every 30 seconds after the affected node boots.

%%%%%%%%%%%  OPCOM   4-FEB-2000 15:36:49.15  %%%%%%%%%%%
Message from user SYSTEM on ZNFNDL
%QMAN-E-COMMERROR, unexpected error #5 in communicating with node CSID 000000
 
%%%%%%%%%%%  OPCOM   4-FEB-2000 15:36:49.15  %%%%%%%%%%%
Message from user SYSTEM on ZNFNDL
-SYSTEM-F-WRONGACP, wrong ACP for device_

Compare the node's value for the system address parameters SCSNODE and SCSSYSTEMID with the values for the DECnet node name and node ID, as follows:

$ RUN SYS$SYSTEM:SYSMAN
SYSMAN> PARAMETERS SHOW SCSSYSTEMID

Parameter Name            Current    Default     Min.     Max.     Unit  Dynamic
--------------            -------    -------    -------  -------   ----  -------
SCSSYSTEMID                 19941          0        -1        -1 Pure-numbe
SYSMAN> PARAMETERS SHOW SCSNODE

Parameter Name            Current    Default     Min.     Max.     Unit  Dynamic
--------------            -------    -------    -------  -------   ----  -------
SCSNODE                 "RANDY  "    "    "    "    "    "ZZZZ" Ascii

SYSMAN> EXIT
$ RUN SYS$SYSTEM:NCP
NCP> SHOW EXECUTOR SUMMARY

 
Node Volatile Summary as of  5-FEB-2000 15:50:36
 
Executor node = 19.45 (DREAMR)
 
State                    = on
Identification           = DECnet for OpenVMS V7.2 
 

NCP> EXIT
$ WRITE SYS$OUTPUT 19*1024+45
19501

Cause

If the DECnet node name and node ID do not match the SCSNODE and SCSSYSTEMID system address parameters, IPC (interprocess communication, an operating system internal mechanism) cannot work properly and the affected node will not be able to participate in the queuing system.

Correcting the Problem

Perform the following steps:

Modify the system address parameters SCSNODE and SCSSYSTEMID or modify the DECnet node name and node ID, so the values match.

For more information about these system parameters, refer to the HP OpenVMS System Management Utilities Reference Manual. For more information about the DECnet node name and node ID, refer to the DECnet for OpenVMS Guide to Networking .¹

Reboot the system.

If You See Inconsistent Queuing Behavior on Different OpenVMS Cluster Nodes

Use this section if you see the following symptoms:

After submitting a print job, you can display the job with a SHOW ENTRY command on the same node, but not on other nodes in the OpenVMS Cluster environment.

After defining or modifying a queue, the changes appear in a SHOW QUEUE display on some nodes, but not on others.

You can successfully submit or print a job on some nodes, but on other nodes, you receive a JOBQUEDIS error.

Investigating the Problem

Perform the following steps:

Enter SHOW LOGICAL to translate the QMAN$MASTER logical name within the environment of each node in the cluster. If there is no translation on any given node, then translate the default value of SYS$COMMON:[SYSEXE].

If the SHOW LOGICAL translations show a different physical disk name on one or more nodes, you have identified the problem.

Check the operator log files that were current at the time that one of the affected nodes booted. Search for an OPCOM message similar to the following one from the process JOB_CONTROL:

%%%%%%%%%%%  OPCOM   4-FEB-2000 14:41:20.88  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
%JBC-E-OPENERR, error opening BOGUS:[QUEUE_DIR]QMAN$MASTER.DAT;
 
%%%%%%%%%%%  OPCOM   4-FEB-2000 14:41:21.12  %%%%%%%%%%%
Message from user SYSTEM on MANGLR
-RMS-E-FNF, file not found

Cause

This problem may be caused by different definitions for the logical name QMAN$MASTER on different nodes in the cluster, causing multiple queuing environments. You typically find this problem in OpenVMS Cluster environments when you have just added a system disk or moved the queuing database.

Correcting the Problem

Perform the following steps:

If only one queue manager and queue database exist, skip to step 2.

If more than one queue manager and queue database exist, perform the following steps:
1. Enter a command in the following format on one of the nodes where the QMAN$MASTER logical name is incorrectly defined:STOP/QUEUE/MANAGER/CLUSTER/NAME_OF_MANAGER=namewhere /NAME_OF_MANAGER specifies the name of the queue manager to be stopped.
2. Delete all three files for the invalid queue database. (On systems with multiple queue managers, you might have more than three invalid files.)

Reassign the logical name QMAN$MASTER on the affected systems and correct the definition in the startup procedure where the logical name is defined (usually SYLOGICALS.COM).

Enter STOP/QUEUE/MANAGER/CLUSTER on an unaffected node to stop the valid queue manager.

Enter START/QUEUE/MANAGER on any node and verify that the queuing system is working properly.

Footnotes

1
This manual has been archived.

( Number takes you back )

Maximizing Queuing System Performance

Reporting a Queuing System Problem to HP