OpenVMS Performance Management

Document revision date: 30 March 2001

OpenVMS Performance Management

Contents

Index

11.25 Use RMS Global Buffering

Using RMS Global Buffering reduces the amount of memory required by allowing processes to share caches. It can also reduce I/O if multiple processes access data in similar areas.

11.26 Reduce Demand or Add Memory

At this point, when all the tuning options have been exhausted, there are only two options: reduce the demand for memory by modifying the work load or add memory to the system.

The cost to add memory to a system has decreased significantly over time. This trend will likely continue.

For many modern systems, adding memory is the most cost effective way to address performance problems. For older systems, the cost of adding memory may be significantly higher than for newer systems, but it may still be less than the cost of a system manager performing many hours of system analysis and tuning, and the additional time it may take to achieve better performance. All relevant costs need to be taken into account when deciding if working with the existing hardware will be less expensive than adding hardware.

11.26.1 Reduce Demand

Section 1.4 describes a number of options (including workload management) that you can explore to shift the demand on your system so that it is reduced at peak times.

11.26.2 Add Memory

Adding memory is often the best solution to performance problems.

If you conclude you need to add memory, you must then determine how much to add. Add as much memory as you can afford. If you need to establish the amount more scientifically, try the following empirical technique:

Determine or estimate a paging rate you believe would represent a tolerable level of paging on the system. (If many applications share memory, make allowances for global valid faults by deducting the global valid fault rate from the total page fault rate.)
Turn off swapper trimming (set SWPOUTPGCNT to the maximum value found for WSQUOTA).
Give the processes large enough working set quotas so that you achieve the tolerable level of paging on the system while it is under load.

The amount of memory required by the processes that are outswapped represents an approximation of the amount of memory your system would need to obtain the desired performance under load conditions.

Once you add memory to your system, be sure to invoke AUTOGEN so that new parameter values can be assigned on the basis of the increased physical memory size.

Chapter 12
Compensating for I/O-Limited Behavior

This chapter describes corrective procedures for I/O resource limitations described in Chapters 5 and 8.

12.1 Improving Disk I/O Responsiveness

It is always a good practice to check methods for improving disk I/O responsiveness to see if there are ways to use the available capacity more efficiently, even if no problem exists.

12.1.1 Equitable Disk I/O Sharing

If you identify certain disks as good candidates for improvement, check for excessive use of the disk resource by one or more processes. The best way to do this is to use the MONITOR playback feature to obtain a display of the top direct I/O users during each collection interval. The direct I/O operations reported by MONITOR include all user disk I/O and any other direct I/O for other device types. In many cases, disk I/O represents the vast majority of direct I/O activity on OpenVMS systems, so you can use this technique to obtain information on processes that might be supporting excessive disk I/O activity.

Enter a MONITOR command similar to the following:

$ MONITOR /INPUT=SYS$MONITOR:file-spec /VIEWING_TIME=1 PROCESSES /TOPDIO

You may want to specify the /BEGINNING and /ENDING qualifiers to select a time interval that covers the problem period.

12.1.1.1 Examining Top Direct I/O Processes

If it appears that one or two processes are consistently the top direct I/O users, you may want to obtain more information about which images they are running and which files they are using. Because this information is not recorded by MONITOR, it can be obtained in any of the following ways:

Run MONITOR in live mode. Enter DCL SHOW commands when the situation reoccurs.
Use the ACCOUNTING image report described in Section 4.3.
Survey heavy users of system resources.

12.1.1.2 Using MONITOR Live Mode

To run MONITOR in live mode, do the following:

Choose a representative period.
Use the default 3-second collection interval.
When you have identified a process that consistently issues a significant number of direct I/O requests, use the SHOW PROCESS/CONTINUOUS DCL command to look for more information about the process and the image being run.
In addition, you can use the SHOW DEVICE /FILES command to show all open files on particular disk volumes. It is important to know the file names of heavily used files to perform the offloading and load-balancing operations. (For more information, see Sections 12.1.3 and 12.1.4.)

12.1.2 Reduction of Disk I/O Consumption by the System

The system uses the disk I/O subsystem for three activities: paging, swapping, and XQP operations. This kind of disk I/O is a good place to start when setting out to trim disk I/O load. All three types of system I/O can be reduced readily by offloading to memory. Swapping I/O is a particularly data-transfer-intensive operation, while the other types tend to be more seek-intensive.

12.1.2.1 Paging I/O Activity

Page Read I/O Rate, also known as the hard fault rate, is the rate of read I/O operations necessary to satisfy page faults. Since the system attempts to cluster several pages together whenever it performs a read, the number of pages actually read will be greater than the hard fault rate. The rate of pages read is given by the Page Read Rate.

Use the following equation to compute the average transfer size (in bytes) of a page read I/O operation:

average transfer size = page read rate/page read IO rate * page size in bytes

The page size is 512 bytes on a VAX; it is currently 8192 bytes on all Alphas, but this value is subject to change in future implementations of the Alpha architecture.

Effects on the Secondary Page Cache

Most page faults are soft faults. Such faults require no disk I/O operation, because they are satisfied by mapping to a global page or to a page in the secondary page cache (free-page list and modified-page list). An effectively functioning cache is important to overall system performance. A guideline that may be applied is that the rate of hard faults---those requiring a disk I/O operation---should be less than 10% of the overall page fault rate, with the remaining 90% being soft faults. Even if the hard fault rate is less than 10%, you should try to reduce it further if it represents a significant fraction of the disk I/O load on any particular node or individual disk (see Section 7.2.1.2).

Note that the number of hard faults resulting from image activation can be reduced only by curtailing the number of image activations or by exercising LINKER options such as /NOSYSSHR (to reduce image activations) and reassignment of PSECT attributes (to increase the effectiveness of page fault clustering).

This guideline is provided to direct your attention to a potentially suboptimal configuration parameter that may affect the overall performance of your system. The nature of your system may make this objective unachievable or render change of the parameter ineffective. Upon investigating the secondary page cache fault rate, you may determine that the secondary page cache size is not the only limiting factor. Manipulating the size of the cache may not affect system performance in any measurable way. This may be due to the nature of the workload, or bottlenecks that exist elsewhere in the system. You may need to upgrade memory, the paging disk, or other hardware.

Paging Write I/O Operations

The Page Write I/O Rate represents the rate of disk I/O operations to write pages from the modified-page list to backing store (paging and section files). As with page read operations, page write operations are clustered. The rate of pages written is given by the Page Write Rate.

Use the following equation to compute the average transfer size (in bytes) of a page write I/O operation:

average transfer size = page write rate/page write IO rate * page size in bytes

The frequency with which pages are written depends on the page modification behavior of the work load and on the size of the modified-page list. In general, a larger modified-page list must be written less often than a smaller one.

Obtaining Information About Paging Files

You can obtain information on each paging file, including the disks on which they are located, with the SHOW MEMORY/FILES/FULL DCL command.

12.1.2.2 Swapping I/O Activity

Swapping I/O should be kept as low as possible. The Inswap Rate item of the I/O class lists the rate of inswap I/O operations. In typical cases, for each inswap, there can also be just as many outswap operations. Try to keep the inswap rate as low as possible---no greater than 1. This is not to say that swapping should always be eliminated. Swapping, as implemented by the active memory reclamation policy, is desirable to force inactive processes out of memory.

Swap I/O operations are very large data transfers; they can cause device and channel contention problems if they occur too frequently. Enter the DCL command SHOW MEMORY/FILES/FULL to list the swapping files in use. If you have disk I/O problems on the channels servicing the swapping files, attempt to reduce the swap rate. (Refer to Section 11.13 for information about converting to a system that rarely swaps.)

12.1.2.3 File System (XQP) I/O Activity

To determine the rate of I/O operations issued by the XQP on a nodewide basis, do the following:

Add the Disk Read Rate and Disk Write Rate items of the FCP class for each node.
Compare this number to the sum of the I/O Operation Rate figures for all disks on that same node.
If this number represents a significant fraction of the disk I/O on that node, attempt to make improvements by addressing one or more of the following three sources of XQP disk I/O operations: cache misses, erase operations, and fragmentation.

Examining Cache Hit and Miss Rates

Check the FILE_SYSTEM_CACHE class for the level of activity (Attempt Rate) and Hit Percentage for each of the seven caches maintained by the XQP. The categories represent types of data maintained by the XQP on all mounted disk volumes. When an attempt to retrieve an item from a cache misses, the item must be retrieved by issuing one or more disk I/O requests. It is therefore important to supply memory caches large enough to keep the hit percentages high and disk I/O operations low.

XQP Cache Sizes

Cache sizes are controlled by the ACP/XQP system parameters. Data items in the FILE_SYSTEM_CACHE display correspond to ACP/XQP parameters as follows:

FILE_SYSTEM_CACHE Item ACP/XQP Parameters

Dir FCB ACP_SYSACC
ACP_DINDXCACHE

Dir Data ACP_DIRCACHE

File Hdr ACP_HDRCACHE

File ID ACP_FIDCACHE

Extent ACP_EXTCACHE
ACP_EXTLIMIT

Quota ACP_QUOCACHE

Bitmap ACP_MAPCACHE

FILE_SYSTEM_CACHE Item	ACP/XQP Parameters
Dir FCB	ACP_SYSACC ACP_DINDXCACHE
Dir Data	ACP_DIRCACHE
File Hdr	ACP_HDRCACHE
File ID	ACP_FIDCACHE
Extent	ACP_EXTCACHE ACP_EXTLIMIT
Quota	ACP_QUOCACHE
Bitmap	ACP_MAPCACHE

The values determined by AUTOGEN should be adequate. However, if hit percentages are low (less than 75%), you should increase the appropriate cache sizes (using AUTOGEN), particularly when the attempt rates are high.

If you decide to change the ACP/XQP cache parameters, remember to reboot the system to make the changes effective. For more information on these parameters, refer to the OpenVMS System Management Utilities Reference Manual.

High-Water Marking

If your system is running with the default HIGHWATER_MARKING attribute enabled on one or more disk volumes, check the Erase Rate item of the FCP class. This item represents the rate of erase I/O requests issued by the XQP to support the high-water marking feature. If you did not intend to enable this security feature, see Section 2.2 for instructions on how to disable it on a per-volume basis.

Disk Fragmentation

When a disk becomes seriously fragmented, it can cause additional XQP disk I/O operations and consequent elevation of the disk read and disk write rates. You can restore contiguity for badly fragmented files by using the Backup (BACKUP) and Convert (CONVERT) utilities, the COPY/CONTIGUOUS DCL command, or the Compaq File Optimizer for OpenVMS, an optional software product. It is a good performance management practice to do the following:

Perform image backups of all disks periodically, using the output disk as the new copy. BACKUP consolidates allocated space on the new copy, eliminating fragmentation.
Test individual files for fragmentation by entering the DCL command DUMP/HEADER to obtain the number of file extents. The fewer the extents, the lower the level of fragmentation.
Pay particular attention to heavily used indexed files, especially those from which records are frequently deleted.
Use the Convert utility (CONVERT) to reorganize the index file structure.

RMS Local and Global Buffers

To avoid excessive disk I/O, enable RMS local and global buffers on the file level. This allows processes to share data in file caches, which reduces the total memory requirement and reduces the I/O load for information already in memory.

Global buffering is enabled on a per file basis via the SET FILES/GLOBAL_BUFFER=n DCL command. You can also set default values for RMS for the entire system through the SET RMS_DEFAULT command and check values with the SHOW RMS_DEFAULT command. For more information on these commands, refer to the OpenVMS DCL Dictionary. Background material on this topic is available in the Guide to OpenVMS File Applications.

Note that file buffering can also be controlled programmatically by applications (see the description of XAB$_MULTIPLEBUFFER_COUNT in the OpenVMS Record Management Services Reference Manual). Therefore, your DCL command settings may be overridden.

12.1.3 Disk I/O Offloading

This section describes techniques for offloading disk I/O onto other resources, most notably memory.

Increase the size of the secondary page cache and XQP caches.
Install frequently used images to save memory and decrease the number of I/O operations required during image activation. (See Section 2.4.)
Decompress library files (especially HELP files) to decrease the number of I/O operations and reduce the CPU time required for library operations. Users will experience faster response to DCL HELP commands. (See Section 2.1.)
Use global data buffers (if your system has sufficient memory) for the following system files: VMSMAIL_PROFILE.DATA, SYSUAF.DAT, and RIGHTSLIST.DAT.
Tune applications to reduce the number of I/O requests by improving their buffering strategies. However, you should make sure that you have adequate working sets and memory to support the increased buffering. This approach will decrease the number of accesses to the volume at the expense of additional memory requirements to run the application.
The following are suggestions of particular interest to application programmers:
- Read or write more data per I/O operation.
  - For sequential files, increase the multiblock count to move more data per I/O operation while maintaining proper process working set sizes.
  - Turn on deferred write for sequential access to indexed and relative files; an I/O operation then occurs only when a bucket is full, not on each $PUT. For example, without deferred write enabled, 10 $PUTs to a bucket that holds 10 records require 10 I/O operations. With deferred write enabled, the 10 $PUTs require only a single I/O operation.
- Enable read ahead/write behind for sequential files. This provides for the effective use of the buffers by allowing overlap of I/O and buffer processing.
- Given ample memory on your system, consider having a deeper index tree structure with smaller buckets, particularly with shared files. This approach sometimes reduces the amount of search time required for buckets and can also reduce contention for buckets in high-contention index file applications.
- For indexed files, try to cache the entire index structure in memory by manipulating the number and size of buckets.
- If it is not possible to cache the entire index structure, you may be able to reduce the index depth by increasing the bucket size. This will reduce the number of I/O operations required for index information at the expense of increased CPU time required to scan the larger buckets.

12.1.4 Disk I/O Load Balancing

The objective of disk I/O load balancing is to minimize the amount of contention for use by the following:

Disk heads available to perform seek operations
Channels available to perform data transfer operations

You can accomplish that objective by moving files from one disk to another or by reconfiguring the assignment of disks to specific channels.

Contention causes increased response time and, ultimately, increased blocking of the CPU. In many systems, contention (and therefore response time) for some disks is relatively high, while for others, response time is near the achievable values for disks with no contention. By moving some of the activity on disks with high response times to those with low response times, you will probably achieve better overall response.

12.1.4.1 Moving Disks to Different Channels

Use the guidelines in Section 8.2 to identify disks with excessively high response times that are at least moderately busy and attempt to characterize them as mainly seek intensive or data-transfer intensive. Then use the following techniques to attempt to balance the load by moving files from one disk to another or by moving an entire disk to a different physical channel:

Separate data-transfer-intensive activity and seek-intensive activity onto separate disks or separate buses.
Reconfigure the assignment of disks to separate channels.
Distribute seek-intensive activity evenly among the disks available for that purpose.
Distribute data-transfer-intensive activity evenly among the disks available for that purpose (on separate channels where possible).

Note

When using Array Controllers (HSC, HSJ, HSZ, or other network or RAID controllers), the channels on the controller should also be balanced. You can use the controller console to obtain information on the location of the disks.

12.1.4.2 Moving Files to Other Disks

To move files from one disk to another, you must know, in general, what each disk is used for and, in particular, which files are ones for which large transfers are issued. You can obtain a list of open files on a disk volume by entering the SHOW DEVICE/FILES DCL command. However, because the system does not maintain transfer-size information, your knowledge of the applications running on your system must be your guide.

12.1.4.3 Load Balancing System Files

The following are suggestions for load balancing system files:

Use search lists to move read-only files, such as images, to different disks. This technique is not well suited for write operations to the target device, because the write will take place to the first volume/directory for which you have write access.
Define volume sets to distribute access to files requiring read and write access. This technique is particularly helpful for applications that perform many file create and delete operations, because the file system will allocate a new file on the volume with the greatest amount of free space.
Move paging and swapping activity off the system disk by creating, on a less heavily utilized disk, secondary page and swapping files that are significantly larger than the primary ones on the system disk. This technique is particularly important for a shared system disk in an OpenVMS Cluster, which tends to be very busy.
Move frequently accessed files off the system disk. Use logical names or, where necessary, other pointers to access them. (See Section 2.6 for a list of frequently accessed system files.) This technique is particularly effective for a shared system disk in an OpenVMS Cluster.

All the tuning solutions for performance problems based on I/O limitations involve using memory to relieve the I/O subsystem. The five most accessible mechanisms are the Virtual I/O or extended file cache, the ACP caches, RMS buffering, file system caches, and RAM disks.

12.2 Use Virtual I/O or Extended File Caching

Virtual I/O cache (VIOC) is a clusterwide, write-through, file-oriented, disk cache that can reduce the number of disk I/O operations and increase performance. The virtual I/O cache increases system throughput by reducing file I/O response times with minimum overhead. The virtual I/O cache operates transparently to system management and application software, and maintains system reliability while it significantly improves virtual disk I/O read performance.

The Extended File Cache (XFC) is a virtual block data cache provided with OpenVMS Alpha Version 7.3 as a replacement for the Virtual I/O Cache. Similar to the Virtual I/O Cache, the XFC is a clusterwide, file system data cache.

Both file system data caches are compatible and coexist in an OpenVMS Cluster. You can use only one cache (XFC or VIOC) on each node. XFC is available only on Alpha systems.

For more information, see the OpenVMS System Manager's Manual.

Contents

Index

privacy and legal statement

6491PRO_013.HTML

OpenVMS Performance Management

11.25 Use RMS Global Buffering

Chapter 12Compensating for I/O-Limited Behavior

12.1.1.2 Using MONITOR Live Mode

12.1.2 Reduction of Disk I/O Consumption by the System

12.1.4 Disk I/O Load Balancing

12.1.4.2 Moving Files to Other Disks

Chapter 12
Compensating for I/O-Limited Behavior