OpenVMS Performance Management

Document revision date: 30 March 2001

OpenVMS Performance Management

Contents

Index

7.3.5 System Page Faulting

All the system tuning solutions for excessive paging involve a reallocation of the memory resource, and nothing more. Consider the following suggestions:

You should not reduce the size of the operating system's working set and offer that memory to the process working sets or the page cache because it is much more costly to performance when the system incurs page faults than when other processes experience either hard or soft page faults.
You should always strive to keep the system page fault rate below 2 faults per second. (You can observe the system fault rate with the MONITOR PAGE command.)
Rather than reducing the system's working set and risking the possibility of introducing system page faulting, you should consider purchasing more memory first.

7.3.6 Page Cache Is Too Small

In situations of excessive paging not due to image activations, you should determine what kinds of faults and faulting rates exist. Use the MONITOR PAGE command and your knowledge of your work load. If you are experiencing a high hard fault rate (represented by Page Read I/O Rate), evaluate the overall faulting rate (represented by Page Fault Rate). If the overall faulting rate is low while the hard fault rate is high, the page cache is ineffective; that is, the size of the free-page list, the modified-page list, or both, is too small. You need to increase the size of the cache. This relatively rare problem occurs when a system has been mistuned; for example, perhaps AUTOGEN was bypassed.

Before deciding to acquire more memory, try increasing the values of MPW_LOLIMIT, MPW_THRESH, FREEGOAL, and FREELIM. (See Section 11.3.) You might also try reducing the system parameter BALSETCNT or reducing the working set characteristics. However, if these changes result immediately in the following problems when the cache is too large and the working sets are too small (and lowering the cache parameter values a bit does not bring them into balance), you have no other tuning options. You must reduce demand or acquire more memory. (See Section 11.26.)

7.3.7 Saturated System Disk

If you have the combination of a high hard fault rate with high faulting overall, it is quite possible the load is too high on your system, which means that the system disk is saturated and you must reduce the page faulting to disk.

However, first perform the checks described in Chapter 11 for small working set sizes. This action will rule out or correct the possibility that the combination of heavy overall faulting with heavy hard faulting is due to too large a page cache while too many processes attempt to work with small working sets. The solution will require you to reduce the cache size and increase the WSQUOTA values.

If this investigation fails to produce results, you can conclude that the system disk is saturated. Therefore, you should consider:

Adding another paging file on another disk
Reducing demand
Adding more memory
Adding a faster, larger, or smarter disk
Configuring XFC

Because of the commoditization of components, prices have fallen significantly over the years and more than one option may be affordable. When evaluating the costs of different components, consider the cost of detailed analysis and the cost of the associated delay. Adding the more expensive component tomorrow may cost less than adding a cheaper component a week from today. Also note that the more expensive component may deliver other benefits for the rest of the system as a whole.

7.3.8 Page Cache Is Too Large

If you find that your faults are mostly of the soft variety, check to see if the overall faulting rate is high. If so, you might have the relatively rare problem of an unnecessarily large page cache. As a guideline, you should expect the size of your page cache to be one order of magnitude less than the total memory consumed by the balance set under load conditions.

The only way to create a page cache that is too large is by seriously mistuning a system. (Perhaps AUTOGEN was bypassed.) Section 11.4 describes how to reduce the size of the page cache through the MPW_LOLIMIT, MPW_THRESH, FREEGOAL, and FREELIM system parameters.

7.3.9 Small Total Working Set Size

If your page cache size is appropriate, you need to investigate the likelihood that excessive paging is induced when a number of processes attempt to run with working set sizes that are too small for them. If the total memory for the balance set is too small, one of the following three possibilities (or a combination thereof) is at work:

The working set size may be inappropriate because:
- The working sets have been set too small with the WSDEFAULT and WSQUOTA characteristics in the UAF.
- The effective working set quota has been lowered by DCL commands or system services that were invoked as the process ran.
- The processes are not succeeding in borrowing working set space (in the loan region).
Perhaps the automatic working set adjustment feature (AWSA) has been turned off or is for some reason not as effective as it could be.
Swapper trimming may be reducing the working set sizes too vigorously.

Figures A-4, A-5, and A-6 summarize the procedures for isolating the cause of working set sizes that are too small.

7.3.10 Inappropriate WSDEFAULT, WSQUOTA, and WSEXTENT Values

Begin to narrow down the possible causes of unusually small total working set sizes by looking first at your system's allocation of working set sizes. To gain some insight into the work load and which processes have too little memory, do the following:

Enter the MONITOR PROCESSES/TOPFAULT command to learn which processes are faulting because their working set sizes are too small.
Use the SHOW PROCESS/CONTINUOUS command to learn what the top faulting processes are doing and how much memory they are using.
Look at the memory consumed by the other larger processes by entering the SHOW SYSTEM and MONITOR PROCESSES commands.

Perhaps you can conclude that one large process (or several) does not need as much memory as it is using. If you reduced its WSQUOTA or WSEXTENT values, or both, the other processes could use the memory the large process currently takes. (For more information, see Section 11.5.)

7.3.10.1 Learning About the Process

To form any firm conclusions at this point, you need to learn more about the process's behavior as its working set size grows and shrinks. Use the MONITOR PROCESSES command and the lexical function F$GETJPI for this purpose.

To look at the current values as the process executes, follow these steps:

Note the process identification number (PID) on the MONITOR PROCESSES display.
Ensure that you have the WORLD privilege.
For each heavily faulting process you want to investigate, request these items:
Working set quota size
Process page count
Global page count
Working set extent

7.3.10.2 Obtaining Process Information

To request the items, use the system service SYS$GETJPI or the lexical function F$GETJPI. When using F$GETJPI, specify the process ID (PID) in quotation marks and a keyword (GPGCNT, PPGCNT, WSEXTENT, WSQUOTA, or WSSIZE) denoting the type of process information to be returned as shown in the following example:

$ WSQUOTA = F$GETJPI("pid","WSQUOTA") $ SHOW SYMBOL WSQUOTA $ WSSIZE = F$GETJPI("pid","WSSIZE") $ SHOW SYMBOL WSSIZE $ PPGCNT = F$GETJPI("pid","PPGCNT") $ SHOW SYMBOL PPGCNT $ GPGCNT = F$GETJPI("pid","GPGCNT") $ SHOW SYMBOL GPGCNT $ WSEXTENT = F$GETJPI("pid","WSEXTENT") $ SHOW SYMBOL WSEXTENT

Suggestion: Write a program or command procedure that requests the PID and then formats and displays the resulting data.

The lexical function item PPGCNT represents the process page count, while GPGCNT represents the global page count. You need these values to determine how full the working set list is. The sum of PPGCNT plus GPGCNT is the actual amount of memory in use and should always be less than or equal to the value of WSSIZE. By sampling the actual amount of memory in use while processes execute, you can begin to evaluate just how appropriate the values of WSQUOTA and WSEXTENT are.

If the values of WSQUOTA and WSEXTENT are either unnecessarily restricted or too large in a few obvious cases, they need to be adjusted; proceed next to the discussion of adjusting working sets in Section 11.5.

7.3.11 Ineffective Borrowing

If you observe that few of the processes are able to take advantage of loans, then borrowing is ineffective. Section 11.6 discusses how to make the necessary adjustments so that borrowing is more effective.

7.3.12 AWSA Might Be Disabled

You need to investigate the status of automatic working set adjustment (AWSA) by checking the value of the system parameter WSINC. If you find WSINC is greater than zero, you know that automatic working set adjustment is turned on. (More precisely, the part of automatic working set adjustment that permits working set sizes to grow is turned on). However, at the same time, you should also check whether WSDEC, PFRATL, or both, are zero. While setting WSINC=0 turns the full automatic working set adjustment mechanism off, setting PFRATL=0 when WSINC is greater than zero will disable just that part of automatic working set adjustment that provides the voluntary decrements in the working set sizes. (For example, in Figure 3-5, if PFRATL and WSDEC equaled zero, the actual working set limit line would have leveled off at Q4 and would not have changed until Q18.)

If automatic working set adjustment is disabled, processes are unable to increase their working set sizes. You will observe that although processes have WSQUOTA values greater than their WSDEFAULT values, those processes that are currently active (doing some computing) do not show a working set size count above their WSDEFAULT values. At the same time, your system is experiencing heavy page faulting. You should enable automatic working set adjustment, by setting WSINC greater than zero, so that working set growth is possible.

7.3.13 AWSA Is Ineffective

If AWSA is turned on, there are four ways that it could be performing less than optimally, and you must evaluate them:

AWSA may not be responding quickly enough to increased demand. That is, when page faulting increases significantly, working set sizes are not increased quickly enough to sufficiently large values.
AWSA with voluntary decrementing enabled may be causing the working set sizes to oscillate.
AWSA with voluntary decrementing enabled may be shrinking the working sets too quickly, thereby inducing unnecessary paging.
AWSA may not be decrementing the working set sizes where possible, because voluntary decrementing is disabled.

7.3.13.1 AWSA Is Not Responsive to Increased Demand

If you use the SHOW PROCESS/CONTINUOUS command for those processes that MONITOR PROCESSES/TOPFAULT shows are the heaviest page faulters, you might find that the automatic working set adjustment is not increasing their working set sizes quickly enough in response to their faulting. If the default values of WSINC, PFRATH, or AWSTIME have been changed, you should restore them to their original values and consider adjusting the WSDEF and WSQUO values of the offending process.

7.3.13.2 AWSA with Voluntary Decrementing Enabled Causes Oscillations

It is possible for the voluntary decrementing feature of the automatic working set adjustment to cause processes to go into a form of oscillation where the working set sizes never stabilize, but keep growing and shrinking while accompanied by page faulting. When you observe this situation, through the SHOW PROCESS/CONTINUOUS display, you should disable voluntary decrementing by setting PFRATL=0. See Section 11.8.

7.3.13.3 AWSA Shrinks Working Sets Too Quickly

From the SHOW PROCESS/CONTINUOUS display, you can also determine if the voluntary decrementing feature of automatic working set adjustment is shrinking the working sets too quickly. In that event, you should consider decreasing WSDEC and decreasing PFRATL. See Section 11.9.

7.3.13.4 AWSA Needs Voluntary Decrementing Enabled

You might observe the case of one or more processes that rapidly achieve a very large working set count and then maintain that size over some period of time. However, you know or suspect that those processes should not require that much memory continuously. Although those processes are not page faulting, other processes are. You should check whether voluntary decrementing is turned off (PFRATL=0 and optionally WSDEC=0). See Figure A-6. It may be that, for your work load, voluntary decrementing would bring about improvement since it is time based, not load based. You could enable voluntary decrementing according to the suggestions in Section 11.10 to see if any improvement is forthcoming.

If you decide to take this step, keep in mind that it is the exception rather than the rule. You could make conditions worse rather than better. Be certain to monitor your system very carefully to ensure that you do not induce working set size oscillations in your overall work load, as described previously. If no improvement is obtained, you should turn off voluntary decrementing. Probably your premise that the working set size could be reduced was incorrect. Also, if oscillations do result that do not seem to stabilize with a little time, you should turn voluntary decrementing off again. You must explore, instead, ways to schedule those processes so that they are least disruptive to the work load.

7.3.13.5 Swapper Trimming Is Too Vigorous

Perhaps there are valid reasons why at your site WSINC has been set to zero to turn off automatic working set adjustment. For example, the applications might be well understood, and the memory requirements for each image might be so predictable that the value for WSDEFAULT can be accurately set. Furthermore, it is possible that if automatic working set adjustment is enabled at your site, you are satisfied that your system is using appropriate values for WSQUOTA, WSEXTENT, PFRATH, BORROWLIM, and GROWLIM. In these situations, perhaps swapper trimming is to blame for the excessive paging. In particular, perhaps trimming on the second level is too severe.

Figure A-7 illustrates the investigation for paging problems induced by swapper trimming. Again, you must determine the top faulting processes and evaluate what is happening and how much memory is consumed by these processes. Use the MONITOR PROCESSES/TOPFAULT and MONITOR PROCESSES commands. By selecting the top faulting processes and scrutinizing their behavior with the SHOW PROCESS/CONTINUOUS command, you can determine if there are many active processes that seem to display working set sizes with the following values:

Their WSQUOTA values
The systemwide value set by the system parameter SWPOUTPGCNT

Either finding indicates that swapper trimming is too severe.

If such is the case, consider increasing the system parameter SWPOUTPGCNT while evaluating the need to increase the system parameter LONGWAIT. The swapper uses LONGWAIT to detect those processes that are truly idle. If LONGWAIT specifies too brief a time, the swapper can swap temporarily idle processes that would otherwise have become computable again soon (see Section 11.12). For computable processes, the same condition can occur if DORMANTWAIT is set too low.

7.4 Analyzing the Swapping Symptom

Experience with systems has shown that swapping of active processes is less desirable than modest paging, because swapping involves disk accesses (true only of hard page faults). Swapping requires each process and its context to be written out to disk, an event that is normally slower than the average paging operation, since it involves more blocks. There is additional system overhead for swapping caused by stopping and starting processes. In using the disk resource heavily, the swapper might cause additional entries in the queue on its disk, thus delaying other processes that need access to that disk.

Not only is swapping costly in terms of performance, but its relative cost is higher for slower processors. In fact, the single-disk, slower-speed system pays the highest price of all for swapping, since all other access to the disk is delayed while the disk is used for swapping. If your processor speed is an issue, you could decide to reduce swapping and make yours a system that primarily pages.

7.4.1 Detecting Harmful Swapping

Harmful swapping manifests itself in heavy consumption of the CPU resource and the disk, to the detriment of other processes. Use the following tests to check for any symptoms that indicate swapping is harmful:

Enter the DCL command MONITOR IO and examine the inswap rate. If the rate is zero, you have no swapping, and you need not pursue this series of tests any further.
Check the MONITOR MODES/CPU display to see if the inner modes (Executive, Kernel) receive a significant amount of service from the CPU. If you find this condition with swapping, the swapping is definitely harmful and needs to be remedied.
Enter the DCL command MONITOR STATES. If you observe few processes in the COMO state, which means compute swapped, swapping is not affecting CPU operations. COMO is the MONITOR STATE that indicates that outswapped processes are ready to use the processor.

If your swapping passes these three tests, you can conclude that swapping is not so harmful on your system that you should eliminate it.

7.4.2 Investigating Harmful Swapping

Indications of harmful swapper activity, such as heavy disk or CPU consumption, warrant attention. (Figures A-8, A-9, and A-10 summarize the investigation for swapping.)

Limiting Swapping

Consider converting your system to one that only pages and rarely if ever swaps, particularly if your system is a small configuration. You accomplish this by performing the following tasks:

Lowering the system parameter SWPOUTPGCNT
Setting the system parameter BALSETCNT equal to a value that is two less than the value of the system parameter MAXPROCESSCNT
Adding more memory

Reducing Process Working Set

Optionally, you could decide to reduce the process working set quotas (in the UAF). See Section 11.5.

Even if you tune your system so that it rarely swaps, you still need a swapping file on your system. However, the space requirement for the swapping file is reduced. If disk space is at a premium, you can adjust your swapping file space requirement to 75 percent of its previous value with the AUTOGEN command procedure. (See the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.)

7.4.3 Causes of Harmful Swapping

If you find that your system is showing symptoms of harmful swapping and that performance has degraded, no free balance slots and insufficient free memory for all working sets are two possible causes.

No Free Balance Slots

If there are no free balance slots, use the DCL command SHOW MEMORY to check the number of free balance slots. If the number available is small and you know there is still adequate free memory (which you can also check with SHOW MEMORY), then you should be able to alleviate the swapping by increasing the system parameter BALSETCNT (see Section 11.14).

On VAX, if you have no free balance slots, check the system parameter VBSS_ENABLE to determine whether virtual balance slots are enabled. See Section 3.6.6 for more information about virtual balance slots.

Insufficient Free Memory for All the Working Sets

If there are free balance slots but the total of the working set sizes exceeds available memory, you can safely conclude that there is not enough free memory to support all the working sets at once. This condition can result from one or more of the following factors:

Improper partitioning of memory due to a page cache that is too large
Situations where some users use unreasonably large amounts of memory
Demand that is simply too high for capacity

Large Page Cache

To determine if the page cache is too large, do the following:

Use the SHOW MEMORY display to determine the total usable memory (the total physical memory less the memory used by the operating system).
Add the values for the two system parameters FREEGOAL and MPW_THRESH to determine how much memory is allocated to the page cache. If the page cache size is more than 15 percent of the total usable memory, the page cache may be too large.

Only when a system has been seriously mistuned should you find that the page cache is too large. (Perhaps AUTOGEN was bypassed.) Section 11.4 describes how to reduce the size of the page cache through the MPW_LOLIMIT, MPW_THRESH, FREEGOAL, and FREELIM system parameters.

If you determine that the page cache is not too large, or having reduced its size, you find that there is still insufficient free memory for all the working sets, you need to investigate other potential causes for the problem. These causes are described in the next sections.

7.4.4 Why Processes Consume Unreasonable Amounts of Memory

Swapping can be induced whenever one or a small number of processes devour memory at the expense of other processes. You can find out if a few users are using large amounts of memory by examining the display produced by the MONITOR PROCESSES command.

7.4.5 Large, Compute-Bound Processes

At this point, you should be particularly alert for the situation where one or more very large, compute-bound processes at low priority consume memory at the expense of a number of smaller processes. Typically, the smaller processes might be trying to perform some terminal I/O, such as editing. When memory becomes tight, the large process that is compute bound is less likely to be selected for outswapping than any process that is in the local event flag wait state. Consequently, in this situation, the operating system will select processes running the editor for outswapping as soon as they start to wait for I/O. As a result, the editing processes will experience poor response times due to frequent outswapping. The SHOW SYSTEM command provides a valuable tool for checking the priority and state of the large process.

Note the process identification number from the MONITOR PROCESSES display and ensure that you have the WORLD privilege. Then, for each large process you want to investigate, use the lexical function F$GETJPI as described in Section 7.3.10, to request the working set quota, size, process page count, global page count, and working set extent.

If you find that any of the processes are above their working set quotas, decrease DORMANTWAIT and monitor performance for a time. If decreasing DORMANTWAIT proves ineffective, enter the DCL command SET PROCESS/SUSPEND to suspend the large, compute-bound process that is over WSQUOTA. This action offers a rapid means of restoring other process activities. (Once the process is suspended, the swapper can trim the process to its SWPOUTPGCNT value.) As soon as SHOW PROCESS/CONTINUOUS reveals that the process has been trimmed, you can safely resume it. If the AWSA is set correctly, the problem should not recur since the process will be unable to grow beyond its quota while memory is scarce.

However, you must determine the underlying cause of the problem (for example, the working set quota might be too large for the process) and take corrective action. For example, you could lower WSQUOTA and increase WSEXTENT. Borrowing will then be reclaimed by the swapper. If the large, compute-bound process is not above its working set quota, suspending the process may provide temporary relief, but as soon as you allow the process to resume, it can start to devour memory again. Thus, the most satisfactory corrective action is the permanent solution discussed in Section 11.5.

Contents

Index

privacy and legal statement

6491PRO_007.HTML