OpenVMS Performance Management

Document revision date: 30 March 2001

OpenVMS Performance Management

Contents

Index

Chapter 7
Evaluating the Memory Resource

The key to successful performance management of an OpenVMS system is to keep the memory management activity to a minimum. You will find that memory limitations cause paging, swapping, or both, precisely the activities you want to minimize. It requires skillful balancing of the memory management mechanism to reduce one without incurring too much of the other.

7.1 Understanding the Memory Resource

The memory resource shares some similarities with the other resources, but it exhibits some notable differences. It is similar to the CPU and disk in that it is a single resource pool that must be shared, but different in the sense that it can be separated into pieces of varying size, all of which can be allocated to processes simultaneously. A process can retain its allocation of memory until memory is demanded by other processes (page faulting), at which time the sizes of the pieces are reconfigured. In some cases, certain processes must wait longer for their allocations (swapping).

7.1.1 Working Set Size

The key to good performance of the memory subsystem is to maintain working sets of appropriate size for resident processes. As a rule, the total of all resident process working set quotas should be within the amount of free memory available on the system. When there is abundant free memory available, the borrowing mechanism of the memory management subsystem allows working sets to grow to the value specified in the user authorization file by WSEXTENT. However, you should set the WSQUOTA value so that user programs can have reasonable faulting behavior even if they can grow only to WSQUOTA.

7.1.2 Locality of Reference

Erratic code and data reference patterns by user programs can cause memory to be used inefficiently. Locality of reference is a characteristic of a program that indicates how close or far apart the references to locations in virtual memory are over time. A program with a high degree of locality does not refer to many widely scattered virtual addresses in a short period of time. If an application has been designed with poor virtual address reference patterns, it can require an extremely large WSQUOTA value to perform satisfactorily.

In addition, applications such as AI and CAD/CAM, which perform an inordinately large amount of dynamic memory allocation, often require very large WSQUOTA values. Database programs may also benefit from larger working sets if they cache significant amounts of data or indexes in memory.

7.1.3 Obtaining Working Set Values

One way to obtain information about working set values on the running system (Example 7-2) is to use the procedure shown in Example 7-1. You may want to execute it several times during some representative period of loading to gain an idea of the steady-state working set requirements for your system.

Example 7-1 Procedure to Obtain Working Set Information

$! $! WORKING_SET.COM - Command file to display working set information. $! Requires 'WORLD' privilege to display information $! on processes other than your own. $! $! the next symbol is used to insert quotes into command strings $! because of the way DCL processes quotes, you can't have a $! trailing comment after the quotes on the next line. $! $ quote = """ $! $ pid = "" ! initialize to blank $ context = "" ! initialize to blank $! $! Define a format control string which will be used with $! F$FAO to output the information. The width of the $! string will be set according to the width of the $! display terminal (the image name is truncated, if needed). $! $ IF F$GETDVI ("SYS$OUTPUT", "DEVBUFSIZ") .LE. 80 $ THEN $ ctrlstring = "!AS!15AS!5AS!5(6SL)!7SL !10AS" $ ELSE $ ctrlstring = "!AS!15AS!5AS!5(6SL)!7SL !AS" $ ENDIF $! $! Check to see if this procedure was invoked with the PID of $! one specific process to check. If it was, use that PID. If $! not, the procedure will scan for all PIDs where there is $! sufficient privilege to fetch the information. $! $ IF p1 .NES. "" THEN pid = p1 $! $! write out a header. $! $ WRITE sys$output - " Working Set Information" $ WRITE sys$output "" $ WRITE sys$output - " WS WS WS WS Pages Page" $ WRITE sys$output - "Username Processname State Extnt Quota Deflt Size in WS Faults Image" $ WRITE sys$output "" $! $! Begin collecting information. $! $ collect_loop: $! $ IF P1 .EQS. "" THEN pid = F$PID (context) ! get this process' PID $ IF pid .EQS. "" THEN EXIT ! if blank, no more to $! ! check, or no privilege $ pid = quote + pid + quote ! enclose in quotes $! $ username = F$GETJPI ('pid, "USERNAME") ! retrieve proc. info. $! $ IF username .EQS. "" THEN GOTO collect_loop ! if blank, no priv.; try $! ! next PID $ processname = F$GETJPI ('pid, "PRCNAM") $ imagename = F$GETJPI ('pid, "IMAGNAME") $ imagename = F$PARSE (imagename,,,"NAME") ! separate name from filespec $ state = F$GETJPI ('pid, "STATE") $ wsdefault = F$GETJPI ('pid, "DFWSCNT") $ wsquota = F$GETJPI ('pid, "WSQUOTA") $ wsextent = F$GETJPI ('pid, "WSEXTENT") $ wssize = F$GETJPI ('pid, "WSSIZE") $ globalpages = F$GETJPI ('pid, "GPGCNT") $ processpages = F$GETJPI ('pid, "PPGCNT") $ pagefaults = F$GETJPI ('pid, "PAGEFLTS") $! $ pages = globalpages + processpages ! add pages together $! $! format the information into a text string $! $ text = F$FAO (ctrlstring, - username, processname, state, wsextent, wsquota, wsdefault, wssize, - pages, pagefaults, imagename) $! $ WRITE sys$output text ! display information $! $ IF p1 .NES. "" THEN EXIT ! if not invoked for a $! ! specific PID, we're done. $ GOTO collect_loop ! repeat for next PID

7.1.4 Displaying Working Set Values

The WORKING_SET.COM procedure produces the following display:

Example 7-2 Displaying Working Set Values

Working Set Information WS WS WS WS Pages Page Username Processname State Extnt Quota Deflt Size in WS faults Image SYSTEM ERRFMT HIB 1024 512 100 60 60 165 ERRFMT SYSTEM CACHE_SERVER HIB 1024 512 100 512 75 55 FILESERV SYSTEM CLUSTER_SERVER HIB 1024 512 100 60 60 218 CSP SYSTEM OPCOM LEF 2048 512 100 210 59 5764 OPCOM SYSTEM JOB_CONTROL HIB 1024 512 100 360 238 1459 JOBCTL SYSTEM CONFIGURE HIB 1024 512 100 125 121 101 CONFIGURE SYSTEM SYMBIONT_0001 HIB 1024 512 100 668 57 67853 PRTSMB DECNET NETACP HIB 1500 750 175 1200 812 10305 NETACP DECNET EVL HIB 1024 350 175 210 33 84080 EVL SYSTEM REMACP HIB 1024 350 175 60 47 74 REMACP SYSTEM VAXsim_Monitor HIB 1024 200 100 350 210 1583 VAXSIM SYSTEM DBMS_MONITOR LEF 1000 512 150 62 62 488 DBMMON SYSTEM TINKERBELLE LEF 1024 350 175 325 177 1627 SYSTEM NULF COM 1024 350 250 350 246 1007 FAC HALL CFAI COM 2400 1024 512 662 358 567 CFAI VTXUP VTX_SERVER LEF 2400 1024 512 962 696 624 VTXSRV WEINSTEIN Jane LEF 2400 1024 512 662 432 13132 EDT HURWITZ HURWITZ LEF 2400 1024 512 512 350 4605 CARMODY CARMODY LEF 2400 1024 512 812 546 16822 MAIL CAPARILLIO CAPARILLIO CUR 2400 1024 512 512 282 10839 STRATFORD Kathy LEF 2400 1024 512 512 210 9852 FREY _VTA270: LEF 2400 1024 512 512 163 1021 CHRISTOPHER _VTA271: LEF 2400 1024 512 512 252 379 STANLEY STANLEY LEF 2048 1024 512 512 295 10369 MINSKY MINSKY LEF 2400 1024 512 512 143 60316 TESTGEN TESTGEN LEF 4100 1024 512 234 84 75753 CLAYMORE Cluster Buster LEF 2400 1024 512 1262 932 1919 CREATOR DINEAUX Sally LEF 2400 1024 512 512 330 31803 DECNET SERVER_0848 LEF 1024 350 175 325 183 647 NETSERVER LUZ Lars LEF 2400 1024 512 1024 980 95420 TEX DECNET MAIL_222 LEF 1024 350 175 325 234 526 MAIL STEVENS STEVENS LEF 2400 1024 512 512 221 7851 ZEN _VTA259: LEF 2400 1024 512 1024 319 4267 SHOW ZEN ZEN_2 LEF 2400 1024 512 512 171 3026)

Field Description

WS Deflt Default working set size, which is reestablished at each image activation.

WS Size Current size of the working set. When the number of pages actually allocated (Pages in WS) reaches this threshold, subsequent page faults will cause page replacement.

Pages in WS Both private and global pages.

WS Extnt
WS Quota Threshold values to which WS Size can be adjusted.

Page faults Total number of faults that have occurred since process creation.

Field	Description
WS Deflt	Default working set size, which is reestablished at each image activation.
WS Size	Current size of the working set. When the number of pages actually allocated (Pages in WS) reaches this threshold, subsequent page faults will cause page replacement.
Pages in WS	Both private and global pages.
WS Extnt WS Quota	Threshold values to which WS Size can be adjusted.
Page faults	Total number of faults that have occurred since process creation.

7.2 Evaluating Memory Responsiveness

The key measure of responsiveness for the memory management subsystem is the amount of time required for a process to be allocated its share of memory.

Because allocation time is not measured directly, you should be concerned with the rates of the two memory management activities that extend the processing time experienced by processes in a virtual memory system---namely, page faulting and swapping. These activities not only incur overhead on the CPU and disk resources, but they also block the execution of processes during the time the system needs to allocate memory and the time the processes spend waiting for memory allocation.

Thus, your goal in evaluating the memory resource is to ensure that faulting and swapping rates are kept within reasonable bounds.

7.2.1 Page Faulting

Whenever a process references a virtual page that is not in its working set, a page fault occurs. For process execution to continue, memory management software is called to acquire and map a physical page into the working set.

7.2.1.1 Hard and Soft Page Faults

The fault can be hard or soft. A hard fault (measured by the Page Read I/O Rate item in the MONITOR PAGE class) is one that requires a read operation from a page or image file on disk. A soft fault is one that is satisfied by mapping to a page already in memory; this can be a global page or a page in the secondary page cache. (The secondary page cache consists of the free-page and the modified-page list; the primary page cache is each process's working set.) The following categories of soft faults are measured and reported in the MONITOR PAGE class:

Free List Fault Rate---The rate of page faults satisfied by reclaiming from the free-page list a page that was previously allocated to a process. An excessive rate of free-page list faults can occur when working set quotas are too small, causing excessive page replacement.
Modified List Fault Rate---The rate of page faults satisfied by reclaiming a page from the modified-page list. An excessive rate of modified-page list faults can occur when working set quotas are too small.
Demand Zero Fault Rate---The rate of page faults satisfied by allocating a free page and initializing its contents to zero. This type of fault is typically seen during image activation and whenever the virtual address space is expanded.
Global Valid Fault Rate---The rate of page faults satisfied by mapping a shared page that is already valid (one already in another process's working set). Swapping or image activation can cause an elevated global valid fault rate.
Write in Progress Fault Rate---The rate of page faults satisfied by mapping to a page that is in the process of being written back to disk. The rate for this type of fault is typically very low.

The total Page Fault Rate is equal to the sum of the hard fault rate (Page Read I/O Rate) plus the soft fault rate, which is the sum of the five categories listed above.

System Fault Rate is the rate of faults for which the referenced virtual address is in system space (hex address 80000000 and above). It is not included in the overall Page Fault Rate, and is discussed separately in Section 11.1.2.

Your own judgment, based on familiarity with the data in your MONITOR summaries, is the best determinant of an acceptable Page Fault Rate for your system.

When either of the following thresholds is exceeded, you may want to consider improving memory responsiveness. (See Section 11.1.)

Hard faults (Page Read I/O Rate) should be kept as low as possible, but to no more than 10% of the overall Page Fault Rate. When the hard fault rate exceeds this threshold, you can assume that the secondary page cache is not being used efficiently.
Overall Page Fault Rate begins to become excessive when more than 1--2% of the CPU is devoted to soft faulting (faulting that involves no disk I/O).
While these rules do not represent absolute upper limits, rates that exceed the suggested limits are warning signs that the memory resource should either be improved by one of the four means listed in Section 11.1, or that a memory upgrade should be considered. Note, however, that more memory will not reduce the number of page faults caused by image activation.

7.2.1.2 Secondary Page Cache

Paging problems typically occur when the secondary page cache (free-page list and modified-page list) is too small. This systemwide cache, which is sized by AUTOGEN, should be large enough to ensure that the overall fault rate is not excessive and that most faults are soft faults.

When evaluating paging activity on your system, you should check for processes in the free page wait (FPG), collided page wait (COLPG), and page fault wait (PFW) states and note departures from normal figures. The presence of processes in the FPG state almost always indicates serious memory management problems, because it implies that the free-page list has been depleted.

Processes in the PFW and COLPG states are waiting for hard faults (from disk) to be satisfied. Note, however, that while hard fault waiting is undesirable, it is not as serious as swapping.

An average free-page list size that is between the values of the FREELIM and FREEGOAL system parameters usually indicates deficient memory and is often accompanied by a high page fault rate. If either condition exists, or if the hard fault rate exceeds the recommended percentage, you must consider enlarging the free- and modified-page lists, if possible. Enlarging the secondary page cache could reduce hard faulting, provided such faulting is not the result of image activation.

The easiest way to increase the free page cache is to increase the value of FREEGOAL. Active reclamation will then attempt to recover more memory from idle processes. Typically, overall fault rates decrease when active reclamation is enabled because memory is more readily available to active processes.

A high rate of modified-page writing, for example, as shown in the Page Write I/O Rate field of the MONITOR PAGE display, is an indication that the modified-page list might be too small. A write rate of 1 every 2 seconds is fairly high. The modified-page list should be large enough to provide an equilibrium between the rate at which pages are added to the list versus the modified-page list fault rate without causing excessive list writing by reaching MPW_HILIMIT. If you do adjust the size of the modified-page list using MPW_HILIMIT, make sure you retain the relationship among MPW_HILIMIT, MPW_WAITLIMIT, and MPW_LOWAITLIMIT by using AUTOGEN.

If you are able to increase the size of the free-page list, you can then allocate more memory to the modified-page list. Using AUTOGEN, you can increase the modified-page list by adjusting the appropriate MPW system parameters. (See the OpenVMS System Management Utilities Reference Manual for a description of MPW parameters.)

7.2.2 Swapping and Swapper Trimming

Swapping, when considered in isolation, is an expensive operation. It can place a huge transfer load on the I/O subsystem instantaneously. Swapping also can place heavy demand on CPU resources. However, when used as part of the active memory reclamation policy, swapping results in improved---that is, reduced---memory consumption and a lower page fault rate.

Good and Bad Swapping

There is good swapping and bad swapping. The latter occurs as the last step of reactive memory reclamation when the free-page list is exhausted---that is, when it is smaller than FREELIM. However, having a significant number of outswapped processes on your system when active memory reclamation is enabled is not a cause for alarm. A much more reliable indicator that harmful swapping is occurring is a high inswap rate---for example, greater than one process per second.

Artificially Induced Swapping

Before attempting to improve a system with a high inswap rate, do the following:

Check for a condition known as artificially induced swapping. This condition occurs when there are no available balance set slots.
Check the BALSETCNT system parameter. Swapping may have been artificially induced because BALSETCNT is set too low (see Section 11.14).

You can obtain information on balance slots with the DCL command SHOW MEMORY.

A possible, although unlikely, reason for a high inswap rate might be an overly large value for FREEGOAL when active memory reclamation is enabled. Although this policy outswaps only long-waiting processes, a very large value for FREEGOAL will cause the outswapping of many long-waiting processes over time, thus increasing the inswap rate as these processes become computable.

7.3 Analyzing the Excessive Paging Symptom

Whenever you detect paging or swapping on a system with degraded performance, you should investigate a memory limitation. If you observe a lack of free memory but no serious paging or swapping, the system may be just at the point where it will begin to experience excessive paging or swapping if demand grows any more.

In this case, you have a bit of advance warning, and you may want to examine some preventive measures.

7.3.1 What Is Excessive Paging?

There are no universally applicable scales that rank page faulting rates from moderate to excessive.

Although the only good page faulting rate is zero page faults per second, you need to think in terms of the maximum tolerable rate of page faulting for your system.

7.3.2 Guidelines

Observe the following guidelines:

You should define the maximum tolerable page fault rate. You should view any higher page fault rate as excessive.
Paging always consumes system resources (CPU and I/O), therefore, its harmfulness depends entirely on the availability of the resources consumed.
In judging what page faulting rate is the maximum tolerable rate for your system, you must consider your configuration and the type of paging that is occurring.
For example, on a system with slow disks, what might otherwise seem to be a low rate of paging to the disk could actually represent intolerable paging because of the response time through the slow disk. This is especially true if the percentage of page faults from the disk is high relative to the total number of faults.
You can judge page fault rates only in the context of your own configuration.
The statistics must be examined in the context of both the overall faulting and the apparent system performance. The system manager who knows the configuration can best evaluate the impact of page faulting.

Once you have determined that the rate of paging is excessive, you need to determine the cause. As Figure A-3 shows, you can begin by looking at the number of image activations that have been occurring.

7.3.3 Excessive Image Activations

Use ACCOUNTING to examine the total number of images started.

If... Then...

Image-level accounting is enabled and the value is in the low-to-normal range for typical operations at your site The problem lies elsewhere.

Image-level accounting is NOT enabled Check the display produced by the MONITOR PAGE command for demand zero faults.

50% of all page faults are demand zero faults Image activations are too frequent.

If...	Then...
Image-level accounting is enabled and the value is in the low-to-normal range for typical operations at your site	The problem lies elsewhere.
Image-level accounting is NOT enabled	Check the display produced by the MONITOR PAGE command for demand zero faults.
50% of all page faults are demand zero faults	Image activations are too frequent.

Additional Considerations

If image activations seem to be excessive, do the following:

Enable image-level accounting (if it is not enabled) at this time and collect enough data to confirm the conclusion about the high percentage of demand zero faults.
Determine how to reduce the number of image activations by reviewing the guidelines for application design in Section 11.2.
The problem of paging induced by image activations is unlikely to respond to any attempt at system tuning. The appropriate action involves application design changes.

7.3.4 Characterizing Hard Versus Soft Faults

You should characterize your page faulting. Paging from disk is hard paging, and it is the less desirable of the two.

Soft paging refers to paging from the page cache in main memory. Although soft paging is undesirable when it is excessive, it is normally much less costly to overall system performance than disk paging, simply because it is faster.

Contents

Index

privacy and legal statement

6491PRO_006.HTML

OpenVMS Performance Management

Chapter 7Evaluating the Memory Resource

7.2.1.2 Secondary Page Cache

7.3.4 Characterizing Hard Versus Soft Faults

Chapter 7
Evaluating the Memory Resource