Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS Performance Management


Previous Contents Index

4.4 Creating, Maintaining, and Interpreting MONITOR Summaries

Consider the following guidelines when using MONITOR:

See the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems and the OpenVMS System Management Utilities Reference Manual: M--Z for information about using MONITOR.

4.4.1 Types of Output

MONITOR generates the following types of output:

4.4.2 MONITOR Modes of Operation

MONITOR provides two input modes of operation for collecting data---live and playback.

Live Mode

Use live mode to collect data on a running system and to generate one or more of the following types of MONITOR output---ASCII screen images, binary recording files, or formatted ASCII summary files.

Use live mode to display data about a remote system connected to your system with DECnet for OpenVMS.

Playback Mode

Use playback mode to read a binary recording file and produce one or more of the following types of MONITOR output---ASCII screen images, binary recording files, or formatted ASCII summary files.

4.4.3 Creating a Performance Information Database

As a foundation for the strategy discussed in this chapter, you must develop a database of performance information for your system by running MONITOR continuously as a background process.

The SYS$EXAMPLES directory provides three command procedures you can use to establish the database. The following table describes the procedures:
Procedure Description
SUBMON.COM Starts MONITOR.COM as a detached process.
MONITOR.COM Creates a summary file from the binary recording file of the previous boot, then begins recording for this boot. The recording interval is 10 minutes.
MONSUM.COM (VAX only) Generates two OpenVMS Cluster multifile summary reports: one for the previous 24 hours and one for the previous day's prime-time period (9 a.m. to 6 p.m.). These reports are mailed to the system manager, and then the procedure resubmits itself to run each day at midnight.

When MONITOR data is recorded continuously, a summary report can cover any contiguous time segment.

4.4.4 Saving Your Summary Reports

The two multifile summary reports reports are not saved as files. To keep them, you must do either of the following:

4.4.5 Customizing Your Reports

The report you require for the evaluation procedure is one that covers a period that best represents the typical operation of your system. You might want, for example, to evaluate your system only during hours of peak acitvity.

To generate a summary of the appropriate time segment, edit the MONSUM.COM command procedure and change the beginning and ending times on one of the two MONITOR commands that produce the summary reports.

4.4.6 Report Formats

The summary reports produced by MONSUM.COM are in the multifile summary format---there is one column of averages for each node in a VMScluster, as well as some overall row statistics. For noncluster systems, the row statistics can be ignored.

If you prefer to use a report in the standard summary format (which includes current, minimum, and maximum statistics), execute a MONITOR playback summary command referencing the input data file of interest as the only file in the /INPUT list. Note that a new data file is created for each system whenever it reboots. Remember to use the /BEGINNING and /ENDING qualifiers to select the desired time period.

4.4.7 Using MONITOR in Live Mode

You are encouraged to observe current system activity regularly by running MONITOR in live mode. In live mode, always begin an analysis with the MONITOR CLUSTER and MONITOR SYSTEM classes to obtain an overview of system performance.

Then, monitor other classes to examine components of particular interest.

Note

All references to MONITOR items in this chapter are assumed to be for the average statistic, unless otherwise noted.

4.4.8 More About Multifile Reports

In multifile reports, a page or more is devoted to each MONITOR class. Each column represents one node, and is headed by the node name and beginning and ending times of the segment requested. In most cases, time segments for all nodes will be roughly the same. Differences of a few minutes are typical, because data collection on the various nodes is not synchronized.

In some cases, one or more time segments will be shorter than others; in these cases, some of the requested data was not recorded (probably because the nodes were unavailable). Note that if data is unavailable for some period within the bounds of a request, that fact is not explicitly specified.

However, such a gap can occur only when the column of data uses more than one input file; and if multiple files contributed to the column, the number is shown in parentheses to the right of the node name. In cases where a time segment is missing, this number must be greater than 1. If no number appears, there is only one input data file for that column, and the column includes no missing time segments.

To summarize, if all beginning and ending times are not roughly the same or if a parenthesized number appears, some data may be unavailable, and you may want to base your evaluation on a different time segment that includes more complete data. Whenever the multifile report is based on incomplete data, the Row Average statistic can be weighted unfairly in favor of one or more nodes.

4.4.9 Interpreting MONITOR Statistics

While interpreting MONITOR statistics, keep in mind that the collection interval has no effect on the accuracy of MONITOR rates. It does, however, affect levels, because they represent sampled data. In other words, the smaller the collection interval, the more accurate MONITOR level statistics will be. (For more information on MONITOR rates and levels, refer to the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.)

Although the interval value supplied with MONITOR.COM is adequate for most purposes, it does represent a trade-off between statistical accuracy and the consumption of disk space. Thus, before you base major decisions on MONITOR level statistics, be sure to verify them by running MONITOR for a time with a much smaller collection interval while carefully observing disk space usage.


Chapter 5
Diagnosing Resource Limitations

This chapter describes how to track down system resources that can limit performance. When you suspect that your system performance is suffering from a limited resource, you can begin to investigate which resource is most likely responsible. In a correctly behaving system that becomes fully loaded, one of the three resources---memory, I/O, or CPU---becomes the limiting resource. Which resource assumes that role depends on the kind of load your system is supporting.

5.1 Diagnostic Strategy

Appendix A contains a number of decision trees for diagnosing limiting resources. Note that the diagrams include command recommendations to help you obtain required information. The recommended commands appear in parentheses below the description of the information required.

The procedures use the process of elimination to determine the source of performance problems. There are fairly simple tests you can use to rule out certain classes of problems.

Use the following guidelines when conducting your preliminary investigation:

5.2 Investigating Resource Limitations

Your preliminary investigation can proceed by checking for the possibility of memory limitations, then I/O limitations, and finally a CPU limitation.

5.2.1 Memory Limitations

Memory limitations are manifestations of such diverse problems as too little physical memory for the work attempted, inappropriate use of the memory management features, improper assignments of memory resources to users, and so forth.

To determine if you may have memory limitations, use the DCL commands MONITOR IO or MONITOR PAGE as shown in the following table:
If you observe... Then you...
  • A substantial amount of free memory 1
  • Little or no paging 2
  • Little or no swapping 3
Can rule out memory limitations.
  • Significant inswapping
  • Little free memory
  • Significant paging
Should investigate memory limitations further. (See Chapter 7.)


1See the entries for Free List Size and Modified List Size.
2See the Page Fault Rate.
3See the Inswap Rate.

You can also determine memory limitations by using SHOW SYSTEM to review the RW_FPG and RW_MPG parameters. If either parameter is displayed consistently, there is a serious shortage of memory. Very little improvement can be made by tuning the system. Compaq recommends buying more memory.

5.2.2 I/O Limitations

I/O limitations occur when the number or speed of devices is insufficient. You will also find an I/O limitation when application design errors either place inappropriate demand on particular devices or do not employ sufficiently large blocking factors or numbers of buffers.

To determine if you may have an I/O limitation, enter the DCL command MONITOR IO or MONITOR SYSTEM and observe the rates for direct I/O and buffered I/O.
If... Then you...
Your system is not performing any direct I/O Do not have a disk I/O limitation.
You observe that there is no buffered I/O Do not have a terminal I/O limitation.
Either or both operations are occurring Cannot rule out the possibility of an I/O limitation. (See Chapter 8.)

5.2.3 CPU Limitations

The CPU can become the binding resource when the work load places extensive demand on it. Perhaps all the work becomes heavily computational, or there is some condition that gives unfair advantages to certain users.

To determine if there is a CPU limitation, use the DCL command MONITOR STATES.

You might also use the DCL command MONITOR MODES to observe the amount of user mode time. The MONITOR MODES display also reveals the amount of idle time, which is sometimes called the null time.
If... Then...
Many of your processes are in the computable state There is a CPU limitation.
Many of your processes are in the computable outswapped state Be sure to address the issue of a memory limitation first. (See Section 9.2.4.)
The user mode time is high It is likely there is a limitation occurring around the CPU utilization.
There is almost no idle time The CPU is being heavily used.

A final indicator of a CPU limitation that the MONITOR MODES display provides is the amount of kernel mode time. A high percentage of time in kernel mode can indicate excessive consumption of the CPU resource by the operating system. This problem is more likely the result of a memory limitation but could indicate a CPU limitation as well. If you decide to investigate the CPU limitation further, proceed through the steps in Chapter 9.

5.3 After the Preliminary Investigation

When you have completed your preliminary investigation, you are ready to:

5.3.1 Observing the Tuned System

Once you take the appropriate remedial action, monitor the effectiveness of the changes and, if you do not obtain sufficient improvement, try again. In some cases, you will need to repeat the same steps, but either increase or decrease the magnitude of the changes you made. In other cases, you will proceed further in the investigation and uncover some other underlying cause of the problem and take corrective steps.

The diagrams and text do not attempt to depict this looping. Rather, repetition is always implied, pending the outcome of the changes. Therefore, tuning is frequently an iterative process. The approach to tuning presented by this chapter and Chapter 10 assumes that you can uncover multiple causes of performance problems by repeating the steps shown until you achieve satisfactory performance.

Note

Effective tuning requires that you can observe the undesirable performance behavior while you test.

5.3.2 Obtaining a Listing of System Current Values

You will find it especially helpful to keep a listing of the current values of all your system parameters nearby as you conduct the following investigations. Running SYSGEN and specifying a file name is one method for obtaining this listing. (See the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.)


$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> SET/OUTPUT=filename
SYSGEN> SHOW/ALL
SYSGEN> SHOW/SPECIAL
SYSGEN> EXIT
$ PRINT/DELETE filename


Chapter 6
Managing System Resources

Overall responsiveness of a system depends largely on the responsiveness of its CPU, memory, and disk I/O resources. If each resource responds satisfactorily, then so will the entire system.

6.1 Understanding System Responsiveness

Each resource must operate efficiently by itself and it must also interact with other resources.

An important aspect of your evaluation is to distinguish between resources that might be performing poorly because they are overcommitted and those that might be doing so because one or both of the following conditions has occurred:

6.1.1 Detecting Bottlenecks

A binding resource or bottleneck is an overcommitted resource that causes the others to be blocked or burdened with overhead operations. Proper identification of such a resource is critical to correction of a performance problem. Upgrading a nonbinding resource will do nothing to improve a bottlenecked system.

Detecting bottlenecks is particularly important for analyzing interactions of the CPU with each of the other resources.

Example

For example, CPU blockage occurs when CPU capacity, though it appears sufficient to meet demand, cannot be used because the CPU must wait for disk I/O to complete or memory to be allocated.

6.1.2 Balancing Resource Capacities

Because of the potential for bottlenecks, it is especially important to maintain balance among the capacities of your system's resources.

Example

For example, when upgrading to a faster CPU, consider the effect the additional CPU power will have on the other primary resources. Because the faster CPU can initiate more I/O requests per unit of time, you must ensure that the disk I/O subsystem has sufficient capacity to handle the increased traffic.

6.2 Evaluating Responsiveness of System Resources

For each resource, key MONITOR statistics help you answer such questions as:

Two prime measures of resource responsiveness include:

For each resource, you can use MONITOR summaries to examine or estimate one or both of these quantities.

6.3 Improving Responsiveness of System Resources

You can investigate four main ways to improve responsiveness:

If the responsiveness of a poorly performing resource cannot be improved by these methods, you should consider augmenting its capacity with additional or upgraded hardware.


Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement  
6491PRO_005.HTML