Document revision date: 30 March 2001
[Compaq] [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]
[OpenVMS documentation]

OpenVMS Performance Management

Previous Contents Index

Chapter 7
Evaluating the Memory Resource

The key to successful performance management of an OpenVMS system is to keep the memory management activity to a minimum. You will find that memory limitations cause paging, swapping, or both, precisely the activities you want to minimize. It requires skillful balancing of the memory management mechanism to reduce one without incurring too much of the other.

7.1 Understanding the Memory Resource

The memory resource shares some similarities with the other resources, but it exhibits some notable differences. It is similar to the CPU and disk in that it is a single resource pool that must be shared, but different in the sense that it can be separated into pieces of varying size, all of which can be allocated to processes simultaneously. A process can retain its allocation of memory until memory is demanded by other processes (page faulting), at which time the sizes of the pieces are reconfigured. In some cases, certain processes must wait longer for their allocations (swapping).

7.1.1 Working Set Size

The key to good performance of the memory subsystem is to maintain working sets of appropriate size for resident processes. As a rule, the total of all resident process working set quotas should be within the amount of free memory available on the system. When there is abundant free memory available, the borrowing mechanism of the memory management subsystem allows working sets to grow to the value specified in the user authorization file by WSEXTENT. However, you should set the WSQUOTA value so that user programs can have reasonable faulting behavior even if they can grow only to WSQUOTA.

7.1.2 Locality of Reference

Erratic code and data reference patterns by user programs can cause memory to be used inefficiently. Locality of reference is a characteristic of a program that indicates how close or far apart the references to locations in virtual memory are over time. A program with a high degree of locality does not refer to many widely scattered virtual addresses in a short period of time. If an application has been designed with poor virtual address reference patterns, it can require an extremely large WSQUOTA value to perform satisfactorily.

In addition, applications such as AI and CAD/CAM, which perform an inordinately large amount of dynamic memory allocation, often require very large WSQUOTA values. Database programs may also benefit from larger working sets if they cache significant amounts of data or indexes in memory.

7.1.3 Obtaining Working Set Values

One way to obtain information about working set values on the running system (Example 7-2) is to use the procedure shown in Example 7-1. You may want to execute it several times during some representative period of loading to gain an idea of the steady-state working set requirements for your system.

Example 7-1 Procedure to Obtain Working Set Information

$! WORKING_SET.COM - Command file to display working set information. 
$!                   Requires 'WORLD' privilege to display information 
$!                   on processes other than your own. 
$! the next symbol is used to insert quotes into command strings 
$! because of the way DCL processes quotes, you can't have a 
$! trailing comment after the quotes on the next line. 
$ quote = """ 
$ pid = ""  ! initialize to blank 
$ context = ""  ! initialize to blank 
$! Define a format control string which will be used with 
$! F$FAO to output the information.  The width of the 
$! string will be set according to the width of the 
$! display terminal (the image name is truncated, if needed). 
$    ctrlstring = "!AS!15AS!5AS!5(6SL)!7SL !10AS" 
$    ctrlstring = "!AS!15AS!5AS!5(6SL)!7SL !AS" 
$! Check to see if this procedure was invoked with the PID of 
$! one specific process to check.  If it was, use that PID.  If 
$! not, the procedure will scan for all PIDs where there is 
$! sufficient privilege to fetch the information. 
$ IF p1 .NES. "" THEN pid = p1 
$! write out a header. 
$ WRITE sys$output - 
"                          Working Set Information" 
$ WRITE sys$output "" 
$ WRITE sys$output - 
"                                 WS    WS    WS     WS   Pages  Page" 
$ WRITE sys$output - 
"Username    Processname   State  Extnt Quota Deflt  Size in WS  Faults Image" 
$ WRITE sys$output "" 
$! Begin collecting information. 
$ collect_loop: 
$ IF P1  .EQS. "" THEN pid = F$PID (context) ! get this process' PID 
$ IF pid .EQS. "" THEN EXIT   ! if blank, no more to 
$!      ! check, or no privilege 
$ pid = quote + pid + quote   ! enclose in quotes 
$ username     = F$GETJPI ('pid, "USERNAME") ! retrieve proc. info. 
$ IF username .EQS. "" THEN GOTO collect_loop ! if blank, no priv.; try 
$!      ! next PID 
$ processname  = F$GETJPI ('pid, "PRCNAM") 
$ imagename    = F$GETJPI ('pid, "IMAGNAME") 
$ imagename    = F$PARSE  (imagename,,,"NAME") ! separate name from filespec 
$ state        = F$GETJPI ('pid, "STATE") 
$ wsdefault    = F$GETJPI ('pid, "DFWSCNT") 
$ wsquota      = F$GETJPI ('pid, "WSQUOTA") 
$ wsextent     = F$GETJPI ('pid, "WSEXTENT") 
$ wssize       = F$GETJPI ('pid, "WSSIZE") 
$ globalpages  = F$GETJPI ('pid, "GPGCNT") 
$ processpages = F$GETJPI ('pid, "PPGCNT") 
$ pagefaults   = F$GETJPI ('pid, "PAGEFLTS") 
$ pages        = globalpages + processpages ! add pages together 
$! format the information into a text string 
$ text = F$FAO (ctrlstring, - 
  username, processname, state, wsextent, wsquota, wsdefault, wssize, - 
  pages, pagefaults, imagename) 
$ WRITE sys$output text    ! display information 
$ IF p1 .NES. "" THEN EXIT   ! if not invoked for a 
$!      ! specific PID, we're done. 
$ GOTO collect_loop    ! repeat for next PID 

7.1.4 Displaying Working Set Values

The WORKING_SET.COM procedure produces the following display:

Example 7-2 Displaying Working Set Values

                          Working Set Information 
                                   WS    WS    WS    WS  Pages  Page 
Username    Processname   State  Extnt Quota Deflt  Size in WS faults  Image 
SYSTEM      ERRFMT         HIB    1024   512   100    60    60    165 ERRFMT 
SYSTEM      CACHE_SERVER   HIB    1024   512   100   512    75     55 FILESERV 
SYSTEM      CLUSTER_SERVER HIB    1024   512   100    60    60    218 CSP 
SYSTEM      OPCOM          LEF    2048   512   100   210    59   5764 OPCOM 
SYSTEM      JOB_CONTROL    HIB    1024   512   100   360   238   1459 JOBCTL 
SYSTEM      CONFIGURE      HIB    1024   512   100   125   121    101 CONFIGURE 
SYSTEM      SYMBIONT_0001  HIB    1024   512   100   668    57  67853 PRTSMB 
DECNET      NETACP         HIB    1500   750   175  1200   812  10305 NETACP 
DECNET      EVL            HIB    1024   350   175   210    33  84080 EVL 
SYSTEM      REMACP         HIB    1024   350   175    60    47     74 REMACP 
SYSTEM      VAXsim_Monitor HIB    1024   200   100   350   210   1583 VAXSIM 
SYSTEM      DBMS_MONITOR   LEF    1000   512   150    62    62    488 DBMMON 
SYSTEM      TINKERBELLE    LEF    1024   350   175   325   177   1627 
SYSTEM      NULF           COM    1024   350   250   350   246   1007 FAC 
HALL        CFAI           COM    2400  1024   512   662   358    567 CFAI 
VTXUP       VTX_SERVER     LEF    2400  1024   512   962   696    624 VTXSRV 
WEINSTEIN   Jane           LEF    2400  1024   512   662   432  13132 EDT 
HURWITZ     HURWITZ        LEF    2400  1024   512   512   350   4605 
CARMODY     CARMODY        LEF    2400  1024   512   812   546  16822 MAIL 
CAPARILLIO  CAPARILLIO     CUR    2400  1024   512   512   282  10839 
STRATFORD   Kathy          LEF    2400  1024   512   512   210   9852 
FREY        _VTA270:       LEF    2400  1024   512   512   163   1021 
CHRISTOPHER _VTA271:       LEF    2400  1024   512   512   252    379 
STANLEY     STANLEY        LEF    2048  1024   512   512   295  10369 
MINSKY      MINSKY         LEF    2400  1024   512   512   143  60316 
TESTGEN     TESTGEN        LEF    4100  1024   512   234    84  75753 
CLAYMORE    Cluster Buster LEF    2400  1024   512  1262   932   1919 CREATOR 
DINEAUX     Sally          LEF    2400  1024   512   512   330  31803 
DECNET      SERVER_0848    LEF    1024   350   175   325   183    647 NETSERVER 
LUZ         Lars           LEF    2400  1024   512  1024   980  95420 TEX 
DECNET      MAIL_222       LEF    1024   350   175   325   234    526 MAIL 
STEVENS     STEVENS        LEF    2400  1024   512   512   221   7851 
ZEN         _VTA259:       LEF    2400  1024   512  1024   319   4267 SHOW 
ZEN         ZEN_2          LEF    2400  1024   512   512   171   3026) 
Field Description
WS Deflt Default working set size, which is reestablished at each image activation.
WS Size Current size of the working set. When the number of pages actually allocated (Pages in WS) reaches this threshold, subsequent page faults will cause page replacement.
Pages in WS Both private and global pages.
WS Extnt
WS Quota
Threshold values to which WS Size can be adjusted.
Page faults Total number of faults that have occurred since process creation.

7.2 Evaluating Memory Responsiveness

The key measure of responsiveness for the memory management subsystem is the amount of time required for a process to be allocated its share of memory.

Because allocation time is not measured directly, you should be concerned with the rates of the two memory management activities that extend the processing time experienced by processes in a virtual memory system---namely, page faulting and swapping. These activities not only incur overhead on the CPU and disk resources, but they also block the execution of processes during the time the system needs to allocate memory and the time the processes spend waiting for memory allocation.

Thus, your goal in evaluating the memory resource is to ensure that faulting and swapping rates are kept within reasonable bounds.

7.2.1 Page Faulting

Whenever a process references a virtual page that is not in its working set, a page fault occurs. For process execution to continue, memory management software is called to acquire and map a physical page into the working set. Hard and Soft Page Faults

The fault can be hard or soft. A hard fault (measured by the Page Read I/O Rate item in the MONITOR PAGE class) is one that requires a read operation from a page or image file on disk. A soft fault is one that is satisfied by mapping to a page already in memory; this can be a global page or a page in the secondary page cache. (The secondary page cache consists of the free-page and the modified-page list; the primary page cache is each process's working set.) The following categories of soft faults are measured and reported in the MONITOR PAGE class:

The total Page Fault Rate is equal to the sum of the hard fault rate (Page Read I/O Rate) plus the soft fault rate, which is the sum of the five categories listed above.

System Fault Rate is the rate of faults for which the referenced virtual address is in system space (hex address 80000000 and above). It is not included in the overall Page Fault Rate, and is discussed separately in Section 11.1.2.

Your own judgment, based on familiarity with the data in your MONITOR summaries, is the best determinant of an acceptable Page Fault Rate for your system.

When either of the following thresholds is exceeded, you may want to consider improving memory responsiveness. (See Section 11.1.) Secondary Page Cache

Paging problems typically occur when the secondary page cache (free-page list and modified-page list) is too small. This systemwide cache, which is sized by AUTOGEN, should be large enough to ensure that the overall fault rate is not excessive and that most faults are soft faults.

When evaluating paging activity on your system, you should check for processes in the free page wait (FPG), collided page wait (COLPG), and page fault wait (PFW) states and note departures from normal figures. The presence of processes in the FPG state almost always indicates serious memory management problems, because it implies that the free-page list has been depleted.

Processes in the PFW and COLPG states are waiting for hard faults (from disk) to be satisfied. Note, however, that while hard fault waiting is undesirable, it is not as serious as swapping.

An average free-page list size that is between the values of the FREELIM and FREEGOAL system parameters usually indicates deficient memory and is often accompanied by a high page fault rate. If either condition exists, or if the hard fault rate exceeds the recommended percentage, you must consider enlarging the free- and modified-page lists, if possible. Enlarging the secondary page cache could reduce hard faulting, provided such faulting is not the result of image activation.

The easiest way to increase the free page cache is to increase the value of FREEGOAL. Active reclamation will then attempt to recover more memory from idle processes. Typically, overall fault rates decrease when active reclamation is enabled because memory is more readily available to active processes.

A high rate of modified-page writing, for example, as shown in the Page Write I/O Rate field of the MONITOR PAGE display, is an indication that the modified-page list might be too small. A write rate of 1 every 2 seconds is fairly high. The modified-page list should be large enough to provide an equilibrium between the rate at which pages are added to the list versus the modified-page list fault rate without causing excessive list writing by reaching MPW_HILIMIT. If you do adjust the size of the modified-page list using MPW_HILIMIT, make sure you retain the relationship among MPW_HILIMIT, MPW_WAITLIMIT, and MPW_LOWAITLIMIT by using AUTOGEN.

If you are able to increase the size of the free-page list, you can then allocate more memory to the modified-page list. Using AUTOGEN, you can increase the modified-page list by adjusting the appropriate MPW system parameters. (See the OpenVMS System Management Utilities Reference Manual for a description of MPW parameters.)

7.2.2 Swapping and Swapper Trimming

Swapping, when considered in isolation, is an expensive operation. It can place a huge transfer load on the I/O subsystem instantaneously. Swapping also can place heavy demand on CPU resources. However, when used as part of the active memory reclamation policy, swapping results in improved---that is, reduced---memory consumption and a lower page fault rate.

Good and Bad Swapping

There is good swapping and bad swapping. The latter occurs as the last step of reactive memory reclamation when the free-page list is exhausted---that is, when it is smaller than FREELIM. However, having a significant number of outswapped processes on your system when active memory reclamation is enabled is not a cause for alarm. A much more reliable indicator that harmful swapping is occurring is a high inswap rate---for example, greater than one process per second.

Artificially Induced Swapping

Before attempting to improve a system with a high inswap rate, do the following:

You can obtain information on balance slots with the DCL command SHOW MEMORY.

A possible, although unlikely, reason for a high inswap rate might be an overly large value for FREEGOAL when active memory reclamation is enabled. Although this policy outswaps only long-waiting processes, a very large value for FREEGOAL will cause the outswapping of many long-waiting processes over time, thus increasing the inswap rate as these processes become computable.

7.3 Analyzing the Excessive Paging Symptom

Whenever you detect paging or swapping on a system with degraded performance, you should investigate a memory limitation. If you observe a lack of free memory but no serious paging or swapping, the system may be just at the point where it will begin to experience excessive paging or swapping if demand grows any more.

In this case, you have a bit of advance warning, and you may want to examine some preventive measures.

7.3.1 What Is Excessive Paging?

There are no universally applicable scales that rank page faulting rates from moderate to excessive.

Although the only good page faulting rate is zero page faults per second, you need to think in terms of the maximum tolerable rate of page faulting for your system.

7.3.2 Guidelines

Observe the following guidelines:

Once you have determined that the rate of paging is excessive, you need to determine the cause. As Figure A-3 shows, you can begin by looking at the number of image activations that have been occurring.

7.3.3 Excessive Image Activations

Use ACCOUNTING to examine the total number of images started.
If... Then...
Image-level accounting is enabled and the value is in the low-to-normal range for typical operations at your site The problem lies elsewhere.
Image-level accounting is NOT enabled Check the display produced by the MONITOR PAGE command for demand zero faults.
50% of all page faults are demand zero faults Image activations are too frequent.

Additional Considerations

If image activations seem to be excessive, do the following:

7.3.4 Characterizing Hard Versus Soft Faults

You should characterize your page faulting. Paging from disk is hard paging, and it is the less desirable of the two.

Soft paging refers to paging from the page cache in main memory. Although soft paging is undesirable when it is excessive, it is normally much less costly to overall system performance than disk paging, simply because it is faster.

Previous Next Contents Index

  [Go to the documentation home page] [How to order documentation] [Help on this site] [How to contact us]  
  privacy and legal statement