11 Managing File System Performance

To tune for better file-system performance, you must understand how your applications and users perform disk I/O, as described in Section 1.8, and how the file system you are using shares memory with processes, as described in Chapter 12. Using this information, you might improve file-system performance by changing the value of the kernel subsystem attributes described in this chapter.

This chapter describes how to tune:

Caches used by file systems (Section 11.1)

The Advanced File System (AdvFS) (Section 11.2)

The UNIX file system (UFS) (Section 11.3)

Network file system (NFS) (Section 11.4 and Chapter 5)

11.1 Tuning Caches

The kernel caches (temporarily stores) in memory recently accessed data. Caching data is effective because data is frequently reused and it is much faster to retrieve data from memory than from disk. When the kernel requires data, it checks if the data was cached. If the data was cached, it is returned immediately. If the data was not cached, it is retrieved from disk and cached. File-system performance is improved if data is cached and later reused.

Data found in a cache is called a cache hit, and the effectiveness of cached data is measured by a cache hit rate. Data that was not found in a cache is called a cache miss.

Cached data can be information about a file, user or application data, or metadata, which is data that describes an object (for example, a file). The following list identifies the types of data that are cached:

A file name and its corresponding vnode is cached in the namei cache (Section 11.1.2).

UFS user and application data and AdvFS user and application data and metadata are cached in the Unified Buffer Cache (UBC) (Section 11.1.3).

UFS file metadata is cached in the metadata buffer cache (Section 11.1.4).

AdvFS open file information is cached in access structures (Section 11.1.5).

11.1.1 Monitoring Cache Statistics

Table 11-1 describes the commands you can use to display and monitor cache information.

Table 11-1: Tools to Display Cache Information

Tools	Description	Reference
`(dbx) print processor number`	Displays namei cache statistics.	Section 11.1.2
`vmstat`	Displays virtual memory statistics.	Section 11.1.3 and Section 12.3.1
`(dbx) print bio_stats`	Displays metadata buffer cache statistics.	Section 11.3.2.3

11.1.2 Tuning the namei Cache

The virtual file system (VFS) presents to applications a uniform kernel interface that is abstracted from the subordinate file system layer. As a result, file access across different types of file systems is transparent to the user.

The VFS uses a structure called a vnode to store information about each open file in a mounted file system. If an application makes a read or write request on a file, VFS uses the vnode information to convert the request and direct it to the appropriate file system. For example, if an application makes a read() system call request on a file, VFS uses the vnode information to convert the system call to the appropriate type for the file system containing the file: ufs_read() for UFS, advfs_read() for AdvFS, or nfs_read() call if the file is in a file system mounted through NFS, then directs the request to the appropriate file system.

The VFS caches a recently accessed file name and its corresponding vnode in the namei cache. File-system performance is improved if a file is reused and its name and corresponding vnode are in the namei cache.

The following list describes the vfs subsystem attributes that relate to the namei cache:

Related Attributes

vnode_deallocation_enable — Specifies whether or not to dynamically allocate vnode according to system demands.

Value: 0 to 1
Default Value: 1 (enabled)
Disabling causes the operating system to use a static vnode pool. For the best performance, do not disable dynamic vnode allocation.

name_cache_hash_size — Specifies the size, in slots, of the hash chain table for the namei cache.

Default Value: 2 * (148 + 10 * maxusers) * 11 / 10 /15

vnode_age — Specifies the amount of time, in seconds, before a free vnode can be recycled.

Value: 0 to 2,147,483,647
Default Value: 120 seconds

namei_cache_valid_time — Specifies the amount of time, in seconds, that a namei cache entry can remain in the cache before it is discarded.

Value: 0 to 2,147,483,647
Default Value: 1200 (seconds) for 32-MB or larger systems; 30 (seconds) for 24-MB systems

Note

If you use increase the values of namei cache-related attributes, consider increasing file system attributes that cache file and directory information. If you use AdvFS, see Section 11.1.5 for more information. If you use UFS, see Section 11.1.4 for more information.

When to Tune

You can check namei cache statistics to see if you should change the values of namei cache related attributes. To check namei cache statistics, enter the dbx print command and specify a processor number to examine the nchstats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print processor_ptr[0].nchstats

Information similar to the following is displayed:

struct {
        ncs_goodhits = 18984
        ncs_neghits = 358
        ncs_badhits = 113
        ncs_falsehits = 23
        ncs_miss = 699
        ncs_long = 21
        ncs_badtimehits = 33
        ncs_collisions = 2
        ncs_unequaldups = 0
        ncs_newentry = 697
        ncs_newnegentry =  419
        ncs_gnn_hit = 1653
        ncs_gnn_miss = 12
        ncs_gnn_badhits = 12
        ncs_gnn_collision = 4
        ncs_pad = {
            [0] 0
        }
}

Table 11-2 describes when you might change the values of namei cache related attributes based on the dbx print output:

Table 11-2: When to Change the Values of the Namei Cache Related Attributes

If	Increase
The value of `ncs_goodhits` + `ncs_neghits` / `ncs_goodhits` + `ncs_neghits` + `ncs_miss` + `ncs_falsehits` is less than 80 percent	The value of either the `maxusers` attribute or the `name_cache_hash_size` attribute
The value of the `ncs_badtimehits` is more than 0.1 percent of the `ncs_goodhits`	The value of the `namei_cache_valid_time` attribute and the `vnode_age` attribute

You cannot modify the values of the name_cache_hash_size attribute, the namei_cache_valid_time attribute, or the vnode_deallocation_enable attribute without rebooting the system. You can modify the value of the vnode_age attribute without rebooting the system. See Chapter 3 for information about modifying subsystem attributes.

11.1.3 Tuning the UBC

The Unified Buffer Cache (UBC) shares with processes the memory that is not wired to cache UFS user and application data and AdvFS user and application data and metadata. File-system performance is improved if the data and metadata is reused and in the UBC.

Related Attributes

The following list describes the vm subsystem attributes that relate to the UBC:

vm_ubcdirtypercent — Specifies the percentage of pages that must be dirty (modified) before the UBC starts writing them to disk.

Value: 0 to 100
Default Value: 10 percent

ubc_maxdirtywrites — Specifies the number of I/O operations (per second) that the vm subsystem performs when the number of dirty (modified) pages in the UBC exceeds the value of the vm_ubcdirtypercent attribute.

Value: 0 to 2,147,483,647
Default Value: 5 (operations per second)

ubc_maxpercent — Specifies the maximum percentage of physical memory that the UBC can use at one time.

Value: 0 to 100
Default Value: 100 percent

ubc_borrowpercent — Specifies the percentage of memory above which the UBC is only borrowing memory from the vm subsystem. Paging does not occur until the UBC has returned all its borrowed pages.

Value: 0 to 100
Default Value: 20 percent
Increasing this value might degrade system response time when a low-memory condition occurs (for example, a large process working set).

ubc_minpercent — Specifies the minimum percentage of memory that the UBC can use. The remaining memory is shared with processes.

Value: 0 to 100
Default Value: 20 percent
Increasing the value prevents large programs from completely consuming the memory that the UBC can use.
For I/O servers, consider increasing the value to ensure that enough memory is available for the UBC.

vm_ubcpagesteal — Specifies the minimum number of pages to be available for file expansion. When the number of available pages falls below this number, the UBC steals additional pages to anticipate the file's expansion demands.

Value: 0 to 2,147,483,647
Default Value: 24 (file pages)

vm_ubcseqpercent — Specifies the maximum amount of memory allocated to the UBC that can be used to cache a single

Value: 0 to 100
Default Value: 10 percent of memory allocated to the UBC
Consider increasing the value if applications write large files.

vm_ubcseqstartpercent — Specifies a threshold value that determines when the UBC starts to recognize sequential file access and steal the UBC LRU pages for a file to satisfy its demand for pages. This value is the size of the UBC in terms of its percentage of physical memory.

Value: 0 to 100
Default Value: 50 percent
Consider increasing the value if applications write large files.

Note

If the values of the ubc_maxpercent and ubc_minpercent attributes are close, you may degrade file system performance.

When to Tune

An insufficient amount of memory allocated to the UBC can impair file system performance. Because the UBC and processes share memory, changing the values of UBC-related attributes might cause the system to page. You can use the vmstat command to display virtual memory statistics that will help you to determine if you need to change values of UBC-related attributes. Table 11-3 describes when you might change the values UBC-related attributes based on the vmstat output:

Table 11-3: When to Change the Values of the UBC-Related Attributes

If vmstat Output Displays Excessive:	Action:

Paging but few or no page outs	Increase the value of the `ubc_borrowpercent` attribute.
Paging and swapping	Decrease the `ubc_maxpercent` attribute.
Paging	Force the system to reuse pages in the UBC instead of from the free list by making the value of the `ubc_maxpercent` attribute greater than the value of the `vm_ubseqstartpercent` attribute, which it is by default, and that the value of the `vm_ubcseqpercent` attribute is greater than a referenced file.
Page outs	Increase the value of the `ubc_minpercent` attribute.

See Section 12.3.1 for information on the vmstat command. See Section 12.1.2.2 for information about UBC memory allocation.

You can modify the value of any of the UBC parameters described in this section without rebooting the system. See Chapter 3 for information about modifying subsystem attributes.

Note

The performance of an application that generates a lot of random I/O is not improved by a large UBC, because the next access location for random I/O cannot be predetermined.

11.1.4 Tuning the Metadata Buffer Cache

At boot time, the kernel wires a percentage of memory for the metadata buffer cache. UFS file metadata, such as superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries are cached in the metadata buffer cache. File-system performance is improved if the metadata is reused and in the metadata buffer cache.

Related Attributes

The following list describes the vfs subsystem attributes that relate to the metadata buffer cache:

bufcache — Specifies the size, as a percentage of memory, that the kernel wires for the metadata buffer cache.

Value: 0 to 50
Default value: 3 percent for 32-MB or larger systems and 2 percent for 24-MB systems

buffer_hash_size — Specifies the size, in slots, of the hash chain table for the metadata buffer cache.

Value: 0 to 524,287
Default value: 2048 (slots)
Increasing this value distributes the buffers to make the average chain lengths shorter, which improves UFS performance, but will reduce the amount of memory available to processes and the UBC.

You cannot modify the values of the buffer_hash_size attribute or the bufcache attribute without rebooting the system. See Chapter 3 for information about modifying kernel subsystem attributes.

When to Tune

Consider increasing the size of the bufcache attribute if you have a high cache miss rate (low hit rate).

To determine if you have a high cache miss rate, use the dbx print command to display the bio_stats data structure. If the miss rate (block misses divided by the sum of the block misses and block hits) is more than 3 percent, consider increasing the value of the bufcache attribute. See Section 11.3.2.3 for more information on displaying the bio_stats data structure.

Note that increasing the value of the bufcache attribute will reduce the amount of memory available to processes and the UBC.

11.1.5 Tuning AdvFS Access Structures

At boot time, the system reserves a portion of the physical memory that is not wired by the kernel for AdvFS access structures. AdvFS caches information about open files and information about files that were opened but are now closed in AdvFS access structures. File-system performance is improved if the file information is reused and in an access structure.

AdvFS access structures are dynamically allocated and deallocated according to the kernel configuration and system demands.

Related Attribute

AdvfsAccessMaxPercent — specifies, as a percentage, the maximum amount of pageable memory that can be allocated for AdvFS access structures.

Value: 5 to 95
Default value: 25 percent

You can modify the value of the AdvfsAccessMaxPercent attribute without rebooting the system. See Chapter 3 for information about modifying kernel subsystem attributes.

When to Tune

If users or applications reuse AdvFS files (for example, a proxy server), consider increasing the value of the AdvfsAccessMaxPercent attribute to allocate more memory for AdvFS access structures. Note that increasing the value of the AdvfsAccessMaxPercent attribute reduces the amount of memory available to processes and might cause excessive paging and swapping. You can use the vmstat command to display virtual memory statistics that will help you to determine excessive paging and swapping. See Section 12.3.1 for information on the vmstat command

Consider decreasing the amount of memory reserved for AdvFS access structures if:

You do not use AdvFS.

Your workload does not frequently open, close, and reopen the same files.

You have a large-memory system (because the number of open files does not scale with the size of system memory as efficiently as UBC memory usage and process memory usage).

11.2 Tuning AdvFS

This section describes how to tune Advanced File System (AdvFS) queues, AdvFS configuration guidelines, and commands that you can use to display AdvFS information.

See the AdvFS Administration manual for information about AdvFS features and setting up and managing AdvFS.

11.2.1 AdvFS Configuration Guidelines

The amount of I/O contention on the volumes in a file domain is the most critical factor for fileset performance. This can occur on large, very busy file domains. To help you determine how to set up filesets, first identify:

Frequently accessed data

Infrequently accessed data

Specific types of data (for example, temporary data or database data)

Data with specific access patterns (for example, create, remove, read, or write)

Then, use the previous information and the following guidelines to configure filesets and file domains:

Configure filesets that contain similar types of files in the same file domain to reduce disk fragmentation and improve performance. For example, do not place small temporary files, such as the output from cron and from news, mail, and Web cache servers, in the same file domain as a large database file.

For applications that perform many file create or remove operations, configure multiple filesets and distribute files across the filesets. This reduces contention on individual directories, the root tag directory, quota files, and the frag file.

Configure filesets used by applications with different I/O access patterns (for example, create, remove, read, or write patterns) in the same file domain. This might help to balance the I/O load.

To reduce I/O contention in a multivolume file domain with more than one fileset, configure multiple domains and distribute the filesets across the domains. This enables each volume and domain transaction log to be used by fewer filesets.

Filesets with a very large number of small files can affect vdump and vrestore commands at times. Using multiple filesets enables the vdump command to be run simultaneously on each fileset, and decreases the amount of time needed to recover filesets with the vrestore command.

Table 11-4 lists additional AdvFS configuration guidelines and performance benefits and tradeoffs. See the AdvFS Administration manual for more information about AdvFS.

Table 11-4: AdvFS Configuration Guidelines

Benefit	Guideline	Tradeoff
Data loss protection	Use LSM or RAID to store data using RAID1 (mirror data) or RAID5 (Section 11.2.1.1)	Requires LSM or RAID
Data loss protection	Force synchronous writes or enable atomic write data logging on a file (Section 11.2.1.2)	Might degrade file system performance
Improve performance for applications that read or write data only once	Enable direct I/O (Section 11.2.1.3)	Degrades performance of applications that repeatedly access the same data
Improve performance	Use AdvFS to distribute files in a file domain (Section 11.2.1.4)	None

Improve performance	Stripe data (Section 11.2.1.5)	None if using AdvFS or requires LSM or RAID
Improve performance	Defragment file domains (Section 11.2.1.6)	None
Improve performance	Decrease the I/O transfer size (Section 11.2.1.7)	None
Improves performance	Move the transaction log to a fast or uncongested disk (Section 11.2.1.8)	Might require an additional disk

The following sections describe these guidelines in more detail.

11.2.1.1 Storing Data Using RAID1 or RAID5

You can use LSM or hardware RAID to implement a RAID1 or RAID5 data storage configuration.

In a RAID1 configuration, LSM or hardware RAID stores and maintain mirrors (copies) of file domain or transaction log data on different disks. If a disk fails, LSM or hardware RAID uses a mirror to make the data available.

In a RAID5 configuration, LSM or hardware RAID stores parity information and data. If a disk fails, LSM or hardware RAID use the parity information and data on the remaining disks to reconstruct the missing data.

See the Logical Storage Manager manual for more information about LSM. See your storage hardware documentation for more information about hardware RAID.

11.2.1.2 Forcing a Synchronous Write Request or Enabling Persistent Atomic Write Data Logging

AdvFS writes data to disk in 8-KB units. By default, AdvFS asynchronous write requests are cached in the UBC, and the write system call returns a success value. The data is written to disk at a later time (asynchronously). AdvFS does not guarantee that all or part of the data will actually be written to disk if a crash occurs during or immediately after the write. For example, if the system crashes during a write that consists of two 8-KB units of data, only a portion (less than 16 KB) of the total write might have succeeded. This can result in partial data writes and inconsistent data.

You can configure AdvFS to force the write request for a specified file to be synchronous to ensure that data is successfully written to disk before the write system call returns a success value.

Enabling persistent atomic write data logging for a specified file writes the data to the transaction log file before it is written to disk. If a system crash occurs during or immediately after the write system call, the data in the log file is used to reconstruct the write system call upon recovery.

You cannot enable both forced synchronous writes and persistent atomic write data logging on a file. However, you can enable atomic write data logging on a file and also open the file with an O_SYNC option. This ensures that the write is synchronous, but also prevents partial writes if a crash occurs before the write system call returns.

To force synchronous write requests, enter:

# chfile -l on filename

A file that has persistent atomic write data logging enabled cannot be memory mapped by using the mmap system call, and it cannot have direct I/O enabled (see Section 11.2.1.3). To enable persistent atomic write data logging, enter:

# chfile -L on filename

A file that has persistent atomic write data logging will only be atomic if the writes are 8192 bytes or less. If the writes are greater than 8192 bytes, they are written in segments that are at most 8192 bytes in length with each segement an atomic-write.

To enable atomic-write data logging on AdvFS files that are NFS mounted, ensure that:

The NFS property list daemon, proplistd, is running on the NFS server and that the fileset is mounted on the client by using the mount command and the proplist option.

The offset into the file is on an 8-KB page boundary, because NFS performs I/O on 8-KB page boundaries. In this case, only 8192 byte segment that start on 8-KB page boundaries can be automatically written.

See chfile(8) and the AdvFS Administration manual for more information.

11.2.1.3 Enabling Direct I/O

You can enable direct I/O to significantly improve disk I/O throughput for applications that do not frequently reuse previously accessed data. The following lists considerations if you enable direct I/O:

Data is not cached in the UBC and reads and writes are synchronous. You can use the asynchronous I/O (AIO) functions (aio_read and aio_write) to enable an application to achieve an asynchronous-like behavior by issuing one or more synchronous direct I/O requests without waiting for their completion.

Although direct I/O supports I/O requests of any byte size, the best performance occurs when the requested byte transfer is aligned on a disk sector boundary and is an even multiple of the underlying disk sector size.

You cannot enable direct I/O for a file if it is already opened for data logging or if it is memory mapped. Use the fcntl system call with the F_GETCACHEPOLICY argument to determine if an open file has direct I/O enabled.

To enable direct I/O for a specific file, use the open system call and set the O_DIRECTIO file access flag. A file remains opened for direct I/O until all users close the file.

See fcntl(2), open(2), the AdvFS Administration manual, and the Programmer's Guide for more information.

11.2.1.4 Using AdvFS to Distribute Files

If the files in a multivolume domain are not evenly distributed, performance might be degraded. You can distribute space evenly across volumes in a multivolume file domain to balance the percentage of used space among volumes in a domain. Files are moved from one volume to another until the percentage of used space on each volume in the domain is as equal as possible.

To determine if you need to balance files, enter:

# showfdmn file_domain_name

Information similar to the following is displayed:

               Id     Date Created       LogPgs Version   Domain Name
3437d34d.000ca710  Sun Oct 5 10:50:05 2001  512       3   usr_domain
 Vol  512-Blks   Free % Used  Cmode Rblks  Wblks  Vol Name 
  1L   1488716 549232    63%     on   128    128  /dev/disk/dsk0g
  2     262144 262000     0%     on   128    128  /dev/disk/dsk4a
     --------- -------  ------
       1750860 811232    54%

The % Used field shows the percentage of volume space that is currently allocated to files or metadata (the fileset data structure). In the previous example, the usr_domain file domain is not balanced. Volume 1 has 63 percent used space while volume 2 has 0 percent used space (it was just added).

To distribute the percentage of used space evenly across volumes in a multivolume file domain, enter:

# balance file_domain_name

The balance command is transparent to users and applications, and does not affect data availability or split files. Therefore, file domains with very large files may not balance as evenly as file domains with smaller files and you might need to manually move large files into the same volume in a multivolume file domain.

To determine if you should move a file, enter:

# showfile -x file_name

Information similar to the following is displayed:

    Id Vol PgSz Pages XtntType  Segs  SegSz  I/O  Perf  File
8.8002   1   16    11   simple    **     ** async  18%  src
 
             extentMap: 1
        pageOff    pageCnt     vol    volBlock    blockCnt
              0          1       1      187296          16
              1          1       1      187328          16
              2          1       1      187264          16
              3          1       1      187184          16
              4          1       1      187216          16
              5          1       1      187312          16
              6          1       1      187280          16
              7          1       1      187248          16
              8          1       1      187344          16
              9          1       1      187200          16
             10          1       1      187232          16
        extentCnt: 11

The file in the previous example is a good candidate to move to another volume because it has 11 extents and an 18 percent performance efficiency as shown in the Perf field. A high percentage indicates optimal efficiency.

To move a file to a different volume in the file domain, enter:

# migrate [-p pageoffset] [-n pagecount] [-s volumeindex_from] \
[-d volumeindex_to] file_name

You can specify the volume from which a file is to be moved, or allow the system to pick the best space in the file domain. You can move either an entire file or specific pages to a different volume.

Note that using the balance utility after moving files might move files to a different volume.

See showfdmn(8), migrate(8), and balance(8) for more information.

11.2.1.5 Striping Data

You can use AdvFS, LSM, or hardware RAID to stripe (distribute) data. Striped data is data that is separated into units of equal size, then written to two or more disks, creating a stripe of data. The data can be simultaneously written if there are two or more units and the disks are on different SCSI buses.

Figure 11-1 shows how a write request of 384 KB of data is separated into six 64-KB data units and written to three disks as two complete stripes.

Figure 11-1: Striping Data

Use only one method to stripe data. In some specific cases, using multiple striping methods can improve performance, but only if:

Most of the I/O requests are large (greater than or equal to 1 MB)

The data is striped over multiple RAID sets on different controllers

The LSM or AdvFS stripe size is a multiple of the full hardware RAID stripe size

See stripe(8) for more information about using AdvFS to stripe data. See the Logical Storage Manager manual for more information about using LSM to stripe data. See your storage hardware documentation for more information about using hardware RAID to stripe data.

11.2.1.6 Defragmenting a File Domain

An extent is a contiguous area of disk space that AdvFS allocates to a file. Extents consist of one or more 8-KB pages. When storage is added to a file, it is grouped in extents. If all data in a file is stored in contiguous blocks, the file has one file extent. However, as files grow, contiguous blocks on the disk may not be available to accommodate the new data, so the file must be spread over discontiguous blocks and multiple file extents.

File I/O is most efficient when there are few extents. If a file consists of many small extents, AdvFS requires more I/O processing to read or write the file. Disk fragmentation can result in many extents and may degrade read and write performance because many disk addresses must be examined to access a file.

To display fragmentation information for a file domain, enter:

# defragment -vn file_domain_name

Information similar to the following is displayed:

 defragment: Gathering data for 'staff_dmn'
 Current domain data:
   Extents:                 263675
   Files w/ extents:        152693
   Avg exts per file w/exts:  1.73
   Aggregate I/O perf:         70%
   Free space fragments:     85574
                <100K   <1M   <10M   >10M
   Free space:   34%   45%    19%     2%
   Fragments:  76197  8930    440      7

Ideally, you want few extents for each file.

Although the defragment command does not affect data availability and is transparent to users and applications, it can be a time-consuming process and requires disk space. Run the defragment command during low file system activity as part of regular file system maintenance, or if you experience problems because of excessive fragmentation.

There is little performance benefit from defragmenting a file domain that contains files less than 8 KB, is used in a mail server, or is read-only.

You can also use the showfile command to check a file's fragmentation. See Section 11.2.2.4 and defragment(8) for more information.

11.2.1.7 Decreasing the I/O Transfer Size

AdvFS attempts to transfer data to and from the disk in sizes that are the most efficient for the device driver. This value is provided by the device driver and is called the preferred transfer size. AdvFS uses the preferred transfer size to:

Consolidate contiguous, small I/O transfers into a larger, single I/O of the preferred transfer size. This results in a fewer number of I/O requests, which increases throughput.

Prefetch, or read-ahead, as many subsequent pages for files being read sequentially up to the preferred transfer size in anticipation that those pages will eventually be read by the applicaton.

Generally, the I/O transfer size provided by the device driver is the most efficient. However, in some cases you may want to reduce the AdvFS I/O transfer size. For example, if your AdvFS fileset is using LSM volumes, the preferred transfer size might be very high. This could cause the cache to be unduly diluted by the buffers for the files being read. If this is suspected, reducing the read transfer size may alleviate the problem.

For systems with impaired mmap page faulting or with limited memory, limit the read transfer size to limit the amount of data that is prefetched; however, this will limit I/O consolidation for all reads from this disk.

To display the I/O transfer sizes for a disk, enter:

# chvol -l block_special_device_name domain

To modify the read I/O transfer size, enter:

# chvol -r blocks block_special_device_name domain

To modify the write I/O transfer size, enter:

# chvol -w blocks block_special_device_name domain

See chvol(8) for more information.

Each device driver has a minimum and maximum value for the I/O transfer size. If you use an unsupported value, the device driver automatically limits the value to either the largest or smallest I/O transfer size it supports. See your device driver documentation for more information on supported I/O transfer sizes.

11.2.1.8 Moving the Transaction Log

Place the AdvFS transaction log on a fast or uncongested disk and bus; otherwise, performance might be degraded.

To display volume information, enter:

# showfdmn file_domain_name

Information similar to the following is displayed:

               Id              Date Created  LogPgs  Domain Name
35ab99b6.000e65d2  Tue Jul 14 13:47:34 2002     512  staff_dmn
 
  Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name
   3L     262144      154512     41%     on    256    256  /dev/rz13a
   4      786432      452656     42%     on    256    256  /dev/rz13b
      ----------  ----------  ------
         1048576      607168     42%

In the showfdmn command display, the letter L displays next to the volume that contains the transaction log.

If the transaction log is located on a slow or busy disk, you can:

Move the transaction log to a different disk.
Use the switchlog command to move the transaction log.

Divide a large multivolume file domain into several smaller file domains. This will distribute the transaction log I/O across multiple logs.
To divide a multivolume domain into several smaller domains, create the smaller domains and then copy portions of the large domain into the smaller domains. You can use the AdvFS vdump and vrestore commands to allow the disks being used in the large domain to be used in the construction of the several smaller domains.

See showfdmn(8), switchlog(8), vdump(8), and vrestore(8) for more information.

11.2.2 Monitoring AdvFS Statistics

Table 11-5 describes the commands you can use to display AdvFS information.

Table 11-5: Tools to Display AdvFS Information

Tool	Description	Reference
`advfsstat`	Displays AdvFS performance statistics.	Section 11.2.2.1
`advscan`	Displays disks in a file domain.	Section 11.2.2.2
`showfdmn`	Displays information about AdvFS file domans and volumes.	Section 11.2.2.3
`showfsets`	Displays AdvFS fileset information for a file domain.	Section 11.2.2.5
`showfile`	Displays information about files in an AdvFS fileset.	Section 11.2.2.4

The following sections describe these commands in more detail.

11.2.2.1 Displaying AdvFS Performance Statistics

To display detailed information about a file domain, including use of the UBC and namei cache, fileset vnode operations, locks, bitfile metadata table (BMT) statistics, and volume I/O performance, use the advfsstat command.

The following example displays volume I/O queue statistics:

# advfsstat -v 3 [-i number_of_seconds] file_domain

Information, in units of one disk block (512 bytes), similar to the following is displayed:

  rd   wr   rg  arg   wg  awg  blk ubcr flsh  wlz  sms  rlz  con  dev
   0    0    0    0    0    0   1M    0  10K 303K  51K  33K  33K  44K

You can use the -i option to display information at specific time intervals, in seconds.

The previous example displays:

rd (read) and wr (write) requests
Compare the number of read requests to the number of write requests. Read requests are blocked until the read completes, but asynchronous write requests will not block the calling thread, which increases the throughput of multiple threads.

rg and arg (consolidated reads) and wg and awg (consolidated writes)
The consolidated read and write values indicate the number of disparate reads and writes that were consolidated into a single I/O to the device driver. If the number of consolidated reads and writes decreases compared to the number of reads and writes, AdvFS may not be consolidating I/O.

blk (blocking queue), ubcr (ubc request queue), flsh (flush queue), wlz (wait queue), sms (smooth sync queue), rlz (ready queue), con (consol queue), and dev (device queue). See Section 11.2.3 for information on AdvFS I/O queues.
If you are experiencing poor performance, and the number of I/O requests on the flsh, blk, or ubcr queues increases continually while the number on the dev queue remains fairly constant, the application may be I/O bound to this device. You might eliminate the problem by adding more disks to the domain or by striping with LSM or hardware RAID.

To display the number of file creates, reads, and writes and other operations for a specified domain or fileset, enter:

# advfsstat [-i number_of_seconds] -f 2 file_domain file_set

Information similar to the following is displayed:

  lkup  crt geta read writ fsnc dsnc   rm   mv rdir  mkd  rmd link
     0    0    0    0    0    0    0    0    0    0    0    0    0
     4    0   10    0    0    0    0    2    0    2    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
    24    8   51    0    9    0    0    3    0    0    4    0    0
  1201  324 2985    0  601    0    0  300    0    0    0    0    0
  1275  296 3225    0  655    0    0  281    0    0    0    0    0
  1217  305 3014    0  596    0    0  317    0    0    0    0    0
  1249  304 3166    0  643    0    0  292    0    0    0    0    0
  1175  289 2985    0  601    0    0  299    0    0    0    0    0
   779  148 1743    0  260    0    0  182    0   47    0    4    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0

See advfsstat(8) for more information.

11.2.2.2 Displaying Disks in an AdvFS File Domain

Use the advscan command:

To search all devices and LSM disk groups for AdvFS domains.

To rebuild all or part of your /etc/fdmns directory if you deleted the /etc/fdmns directory, a directory domain under /etc/fdmns, or links from a domain directory under /etc/fdmns.

If you moved devices in a way that has changed device numbers.

To display AdvFS volumes on devices or in an LSM disk group, enter:

# advscan device | LSM_disk_group

Information similar to the following is displayed:

Scanning disks  dsk0 dsk5 
Found domains: 
usr_domain
          Domain Id       2e09be37.0002eb40
          Created         Thu Jun 26 09:54:15 2002
          Domain volumes          2
          /etc/fdmns links        2
          Actual partitions found:
                                  dsk0c                     
                                  dsk5c

To re-create missing domains on a device, enter:

# advscan -r device

Information similar to the following is displayed:

Scanning disks  dsk6 
Found domains: *unknown*      
          Domain Id       2f2421ba.0008c1c0                 
          Created         Mon Jan 20 13:38:02 2002                   
          Domain volumes          1   
          /etc/fdmns links        0                   
          Actual partitions found:                                         
                                  dsk6a*    
*unknown*       
         Domain Id       2f535f8c.000b6860                 
         Created         Tue Feb 25 09:38:20 2002                   
         Domain volumes          1    
         /etc/fdmns links        0                   
         Actual partitions found:
                                 dsk6b*    
 
Creating /etc/fdmns/domain_dsk6a/
        linking dsk6a   
Creating /etc/fdmns/domain_dsk6b/         
        linking dsk6b

See advscan(8) for more information.

11.2.2.3 Displaying AdvFS File Domains

To display information about a file domain, including the date created and the size and location of the transaction log, and information about each volume in the domain, including the size, the number of free blocks, the maximum number of blocks read and written at one time, and the device special file, enter:

# showfdmn file_domain

Information similar to the following is displayed:

               Id              Date Created  LogPgs  Version  Domain Name
34f0ce64.0004f2e0  Wed Mar 17 15:19:48 2002     512        4  root_domain
 
  Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name 
   1L     262144       94896     64%     on    256    256  /dev/disk/dsk0a

For multivolume domains, the showfdmn command also displays the total volume size, the total number of free blocks, and the total percentage of volume space currently allocated.

See showfdmn(8) for more information about the output of the command.

11.2.2.4 Displaying AdvFS File Information

To display detailed information about files (and directories) in an AdvFS fileset, enter:

# showfile filename...

or

# showfile *

The * displays the AdvFS characteristics for all of the files in the current working directory.

Information similar to the following is displayed:

         Id  Vol  PgSz  Pages  XtntType  Segs  SegSz  I/O   Perf  File
  23c1.8001    1    16      1    simple    **     **  ftx   100%  OV
  58ba.8004    1    16      1    simple    **     **  ftx   100%  TT_DB
         **   **    **     **   symlink    **     **   **     **  adm
  239f.8001    1    16      1    simple    **     **  ftx   100%  advfs
         **   **    **     **   symlink    **     **   **     **  archive
     9.8001    1    16      2    simple    **     **  ftx   100%  bin (index)
         **   **    **     **   symlink    **     **   **     **  bsd
         **   **    **     **   symlink    **     **   **     **  dict
   288.8001    1    16      1    simple    **     **  ftx   100%  doc
   28a.8001    1    16      1    simple    **     **  ftx   100%  dt
         **   **    **     **   symlink    **     **   **     **  man
  5ad4.8001    1    16      1    simple    **     **  ftx   100%  net
         **   **    **     **   symlink    **     **   **     **  news
   3e1.8001    1    16      1    simple    **     **  ftx   100%  opt
         **   **    **     **   symlink    **     **   **     **  preserve
         **   **    **     **     advfs    **     **   **     **  quota.group
         **   **    **     **     advfs    **     **   **     **  quota.user
     b.8001    1    16      2    simple    **     **  ftx   100%  sbin (index)
         **   **    **     **   symlink    **     **   **     **  sde
   61d.8001    1    16      1    simple    **     **  ftx   100%  tcb
         **   **    **     **   symlink    **     **   **     **  tmp
         **   **    **     **   symlink    **     **   **     **  ucb
  6df8.8001    1    16      1    simple    **     **  ftx   100%  users

See showfile(8) for more information about the command output.

11.2.2.5 Displaying the AdvFS Filesets in a File Domain

To display information about the filesets in a file domain, including the fileset names, the total number of files, the number of used blocks, the quota status, and the clone status, enter:

# showfsets file_domain

Information similar to the following is displayed:

usr
        Id           : 3d0f7cf8.000daec4.1.8001
        Files        :    30469,  SLim=        0,  HLim=        0
        Blocks (512) :  1586588,  SLim=        0,  HLim=        0
        Quota Status : user=off group=off
        Object Safety: off
        Fragging     : on
        DMAPI        : off

The previous example displays that a file domain called dmn1 has one fileset and one clone fileset.

See showfsets(8) for more information.

11.2.3 Tuning AdvFS Queues

For each AdvFS volume, I/O requests are sent to one of the following queues:

Blocking, UBC request, and flush queue
The blocking, UBC request, and flush queues are queues in which reads and synchronous write requests are cached. A synchronous write request must be written to disk before it is considered complete and the application can continue.
The blocking queue is used primarily for reads and for kernel synchronous write requests. The UBC request queue is used for handling UBC requests to flush pages to disk. The flush queue is used primarily for buffer write requests, either through fsync(), sync(), or synchronous writes. Because the buffers on the blocking and UBC request queues are given slightly higher priority than those on the flush queue, kernel requests are handled more expeditiously and are not blocked if many buffers are waiting to be written to disk.
Processes that need to read or modify data in a buffer in the blocking, UBC request, or flush queue must wait for the data to be written to disk. This is in direct contrast with buffers on the lazy queues that can be modified at any time until they are finally moved down to the device queue.

Lazy queue
The lazy queue is a logical series of queues in which asynchronous write requests are cached. When an asynchronous I/O request enters the lazy queue, it is assigned a timestamp. This timestamp is used to periodically flush the buffers down toward the disk in numbers large enough to allow them to be consolidated into larger I/Os. Processes can modify data in buffers at any time while they are on the lazy queue, potentially avoiding additonal I/Os. Descriptions of the queues in the lazy queue are provided after Figure 11-2.

All four queues (blocking, UBC request, flush, and lazy) move buffers to the device queue. As buffers are moved onto the device queue, logically contiguous I/Os are consolidated into larger I/O requests. This reduces the actual number of I/Os that must be completed. Buffers on the device queue cannot be modified until their I/O has completed.

The algorithms that move the buffers onto the device queue and favor taking buffers from the queues in the following order; blocking queue, UBC request queue, and then flush queue. All three are favored over the lazy queue. The size of the device queue is limited by device and driver resources. The algorithms that load the device queue use feedback from the drivers to know when the device queue is full. At that point the device is saturated and continued movement of buffers to the device queue would only degrade throughput to the device. The potential size of the device queue and how full it is, ultimately determines how long it may take to complete a synchronous I/O operation.

Figure 11-2 shows the movement of synchronous and asynchronous I/O requests through the AdvFS I/O queues.

Figure 11-2: AdvFS I/O Queues

Detailed descriptions of the AdvFS lazy queues are as follows:

Wait queue — Asynchronous I/O requests that are waiting for an AdvFS transaction log write to complete first enter the wait queue. Each file domain has a transaction log that tracks fileset activity for all filesets in the file domain, and ensures AdvFS metadata consistency if a crash occurs.
AdvFS uses write-ahead logging, which requires that when metadata is modified, the transaction log write must complete before the actual metadata is written. This ensures that AdvFS can always use the transaction log to create a consistent view of the file-system metadata. After the transaction log is written, I/O requests can move from the wait queue to the smooth sync queue.

Smooth sync queue — Asynchronous I/O requests remain in the smooth sync queue for at least 30 seconds, by default. Allowing requests to remain in the smooth sync queue for a specified amount of time prevents I/O spikes, increases cache hit rates, and improves the consolidation of requests. After requests have aged in the smooth sync queue, they move to the ready queue.

Ready queue — Asynchronous I/O requests are sorted in the ready queue. After the queue reaches a specified size, the requests are moved the consol queue.

Consol queue — Asynchronous I/O requests are interleaved in the consol queue and moved to the device queue.

Related Attributes

The following list describes the vfs subsystem attributes that relate to AdvFS queues:

smoothsync_age — Specifies the amount of time, in seconds, that a modified page ages before becoming eligible for the smoothsync mechanism to flush it to disk.

Value: 0 to 60
Default value: 30 seconds
Setting the value to 0 sends data to the ready queue every 30 seconds, regardless of how long the data is cached.
Increasing the value increases the chance of lost data if the system crashes, but can decrease net I/O load (improve performance) by allowing the dirty pages to remain cached longer.

The smoothsync_age attribute is enabled when the system boots to multiuser mode and disabled when the system changes from multiuser mode to single-user mode. To permanently change the value of the smoothsync_age attribute, edit the following lines in the /etc/inittab file:
```
smsync:23:wait:/sbin/sysconfig -r vfs smoothsync_age=30 > /dev/null 2>&1
smsyncS:Ss:wait:/sbin/sysconfig -r vfs smoothsync_age=0 > /dev/null 2>&1
 
```
You can use the smsync2 mount option to specify an alternate smoothsync policy that can further decrease the net I/O load. The default policy is to flush modified pages after they have been dirty for the smoothsync_age time period, regardless of continued modifications to the page. When you mount a filesystem using the smsync2 mount option, modified pages in nonmemory-mapped mode are not written to disk until they have been dirty and idle for the smoothsync_age time period.
Note that AdvFS files in memory-mapped mode may not be flushed according to smoothsync_age.

AdvfsSyncMmapPages — Specifies whether or not to disable smoothsync for applications that manage their own mmap page flushing.

Value: 0 or 1
Default value: 1 (enabled)

See mmap(2) and msync(2) for more information.

AdvfsReadyQLim — Specifies the size of the ready queue.

Value: 0 to 32 K (blocks)
Default value: 16 K (blocks)

You can modify the value of the AdvfsSyncMmapPages, smoothsync_age, and the AdvfsReadyQLim attributes without rebooting the system. See Chapter 3 for information about modifying kernel subsystem attributes.

When to Tune

If you reuse data, consider increasing:

The amount of time I/O requests remain in the smoothsync queue to increase the possibility of a cache hit. However, doing so increases the chance that data might be lost if the system crashes.
Use the advfsstat -S command to show cache statistics in the AdvFS smoothsync queue.

The size of the ready queue to increase the possibility that I/O requests will be consolidated into a single, larger I/O and improve the possibility of a cache hit. However, doing so is not likely to have much influence if smoothsync is enabled and can increase the overhead in sorting the incoming requests onto the ready queue.

11.3 Tuning UFS

This section describes UFS configuration and tuning guidelines and commands that you can use to display UFS information.

11.3.1 UFS Configuration Guidelines

Table 11-6 lists UFS configuration guidelines and performance benefits and tradeoffs.

Table 11-6: UFS Configuration Guidelines

Benefit	Guideline	Tradeoff
Improve performance for small files	Make the file system fragment size equal to the block size (Section 11.3.1.1)	Wastes disk space for small files
Improve performance for large files	Use the default file system fragment size of 1 KB (Section 11.3.1.1)	Increases the overhead for large files
Free disk space and improve performance for large files	Reduce the density of inodes on a file system (Section 11.3.1.2)	Reduces the number of files that can be created
Improve performance for disks that do not have a read-ahead cache	Set rotational delay (Section 11.3.1.3)	None
Decrease the number of disk I/O operations	Increase the number of blocks combined for a cluster (Section 11.3.1.4)	None
Improve performance	Use a memory file system (MFS) (Section 11.3.1.5)	Does not ensure data integrity because of cache volatility
Control disk space usage	Use disk quotas (Section 11.3.1.6)	Might result in a slight increase in reboot time
Allow more mounted file systems	Increase the maximum number of UFS and MFS mounts (Section 11.3.1.7)	Requires additional memory resources

The following sections describe these guidelines in more detail.

11.3.1.1 Modifying the File System Fragment and Block Sizes

The UFS file system block size is 8 KB. The default fragment size is 1 KB. You can use the newfs command to modify the fragment size to 1024 KB, 2048 KB, 4096 KB, or 8192 KB when you create it.

Although the default fragment size uses disk space efficiently, it increases the overhead for files less than 96 KB. If the average file in a file system is less than 96 KB, you might improve disk access time and decrease system overhead by making the file-system fragment size equal to the default block size (8 KB).

See newfs(8) for more information.

11.3.1.2 Reducing the Density of inodes

An inode describes an individual file in the file system. The maximum number of files in a file system depends on the number of inodes and the size of the file system. The system creates an inode for each 4 KB (4096 bytes) of data space in a file system.

If a file system will contain many large files and you are sure that you will not create a file for each 4 KB of space, you can reduce the density of inodes on the file system. This will free disk space for file data, but also reduces the number of files that can be created.

To do this, use the newfs -i command to specify the amount of data space allocated for each inode when you create the file system. See newfs(8) for more information.

11.3.1.3 Set Rotational Delay

The UFS rotdelay parameter specifies the time, in milliseconds, to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file. By default, the rotdelay parameter is set to 0 to allocate blocks continuously. It is useful to set rotdelay on disks that do not have a read-ahead cache. For disks with cache, set the rotdelay to 0.

Use either the tunefs command or the newfs command to modify the rotdelay value.

See newfs(8) and tunefs(8) for more information.

11.3.1.4 Increasing the Number of Blocks Combined for a Cluster

The value of the UFS maxcontig parameter specifies the number of blocks that can be combined into a single cluster (or file-block group). The default value of maxcontig is 8. The file system attempts I/O operations in a size that is determined by the value of maxcontig multiplied by the block size (8 KB).

Device drivers that can chain several buffers together in a single transfer should use a maxcontig value that is equal to the maximum chain length. This may reduce the number of disk I/O operations.

Use the tunefs command or the newfs command to change the value of maxcontig.

See newfs(8) and tunefs(8) for more information.

11.3.1.5 Using MFS

The memory file system (MFS) is a UFS file system that resides only in memory. No permanent data or file structures are written to disk. An MFS can improve read/write performance, but it is a volatile cache. The contents of an MFS are lost after a reboot, unmount operation, or power failure.

Because no data is written to disk, an MFS is a very fast file system and can be used to store temporary files or read-only files that are loaded into the file system after it is created. For example, if you are performing a software build that would have to be restarted if it failed, use an MFS to cache the temporary files that are created during the build and reduce the build time.

See mfs(8) for more information.

11.3.1.6 Using UFS Disk Quotas

You can specify UFS file-system limits for user accounts and for groups by setting up UFS disk quotas, also known as UFS file system quotas. You can apply quotas to file-systems to establish a limit on the number of blocks and inodes (or files) that a user account or a group of users can allocate. You can set a separate quota for each user or group of users on each file system.

You may want to set quotas on file systems that contain home directories, because the sizes of these file systems can increase more significantly than other file systems. Do not set quotas on the /tmp file system.

Note that, unlike AdvFS quotas, UFS quotas may cause a slight increase in reboot time. See the AdvFS Administration manual for information about AdvFS quotas. See the System Administration manual for information about UFS quotas.

11.3.1.7 Increasing the Number of UFS and MFS Mounts

Mount structures are dynamically allocated when a mount request is made and subsequently deallocated when an unmount request is made.

Related Attributes

The max_ufs_mounts attribute specifies the maximum number of UFS and MFS mounts on the system.

Value: 0 to 2,147,483,647

Default value: 1000 (file system mounts)

You can modify the max_ufs_mounts attribute without rebooting the system. See Chapter 3 for information about modifying kernel subsystem attributes.

When to Tune

Increase the maximum number of UFS and MFS mounts if your system will have more than the default limit of 1000 mounts.

Increasing the maximum number of UFS and MFS mounts enables you to mount more file systems. However, increasing the maximum number mounts requires memory resources for the additional mounts.

11.3.2 Monitoring UFS Statistics

Table 11-7 describes the commands you can use to display UFS information.

Table 11-7: Tools to Display UFS Information

Tools	Decription	Reference
`dumpfs`	Displays UFS information.	Section 11.3.2.1
`(dbx) print ufs_clusterstats`	Displays UFS clustering statistics.	Section 11.3.2.2
`(dbx) print bio_stats`	Displays metadata buffer cache statistics.	Section 11.3.2.3

11.3.2.1 Displaying UFS Information

To display UFS information for a specified file system, including super block and cylinder group information, enter:

# dumpfs filesystem | /devices/disk/device_name

Information similar to the following is displayed:

 magic   11954   format  dynamic time   Tue Sep 14 15:46:52 2002 
nbfree  21490   ndir    9       nifree  99541  nffree  60 
ncg     65      ncyl    1027    size    409600  blocks  396062
bsize   8192    shift   13      mask    0xffffe000 
fsize   1024    shift   10      mask    0xfffffc00 
frag    8       shift   3       fsbtodb 1 
cpg     16      bpg     798     fpg     6384    ipg     1536 
minfree 10%     optim   time    maxcontig 8     maxbpg  2048 
rotdelay 0ms    headswitch 0us  trackseek 0us   rps     60

The information contained in the first lines are relevant for tuning. Of specific interest are the following fields:

bsize — The block size of the file system, in bytes (8 KB).

fsize — The fragment size of the file system, in bytes. For the optimum I/O performance, you can modify the fragment size.

minfree — The percentage of space that cannot be used by normal users (the minimum free space threshold).

maxcontig — The maximum number of contiguous blocks that will be laid out before forcing a rotational delay; that is, the number of blocks that are combined into a single read request.

maxbpg — The maximum number of blocks any single file can allocate out of a cylinder group before it is forced to begin allocating blocks from another cylinder group. A large value for maxbpg can improve performance for large files.

rotdelay — The expected time, in milliseconds, to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file. If rotdelay is 0, then blocks are allocated contiguously.

11.3.2.2 Monitoring UFS Clustering

To display how the system is performing cluster read and write transfers, use the dbx print command to examine the ufs_clusterstats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem  
(dbx) print ufs_clusterstats

Information similar to the following is displayed:

struct {
    full_cluster_transfers = 3130
    part_cluster_transfers = 9786
    non_cluster_transfers = 16833
    sum_cluster_transfers = {
        [0] 0
        [1] 24644
        [2] 1128
        [3] 463
        [4] 202
        [5] 55
        [6] 117
        [7] 36
        [8] 123
        [9] 0
         .
         .
         .
       [33]
 
    }
}
(dbx)

The previous example shows 24644 single-block transfers, 1128 double-block transfers, 463 triple-block transfers, and so on.

You can use the dbx print command to examine cluster reads and writes by specifying the ufs_clusterstats_read and ufs_clusterstats_write data structures respectively.

11.3.2.3 Displaying the Metadata Buffer Cache

To display statistics on the metadata buffer cache, including superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries, use the dbx print command to examine the bio_stats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem  
(dbx) print bio_stats

Information similar to the following is displayed:

struct {
    getblk_hits = 4590388
    getblk_misses = 17569
    getblk_research = 0
    getblk_dupbuf = 0
    getnewbuf_calls = 17590
    getnewbuf_buflocked = 0
    vflushbuf_lockskips = 0
    mntflushbuf_misses = 0
    mntinvalbuf_misses = 0
    vinvalbuf_misses = 0
    allocbuf_buflocked = 0
    ufssync_misses = 0
}

The number of block misses (getblk_misses) divided by the sum of block misses and block hits (getblk_hits) should not be more than 3 percent. If the number of block misses is high, you might want to increase the value of the bufcache attribute. See Section 11.1.4 for information on increasing the value of the bufcache attribute.

11.3.3 Tuning UFS for Performance

Table 11-8 lists UFS tuning guidelines and performance benefits and tradeoffs.

Table 11-8: UFS Tuning Guidelines

Benefit	Guideline	Tradeoff
Improve performance	Adjust UFS smoothsync and I/O throttling for asynchronous UFS I/O requests (Section 11.3.3.1)	None
Free CPU cycles and reduce the number of I/O operations	Delay UFS cluster writing (Section 11.3.3.2)	If I/O throttling is not used, might degrade real-time workload performance when buffers are flushed
Reduce the number of disk I/O operations	Increase the number of combined blocks for a cluster (Section 11.3.3.3)	Might require more memory to buffer data
Improve read and write performance	Defragment the file system (Section 11.3.3.4)	Requires down time

The following sections describe these guidelines in more detail.

11.3.3.1 Adjusting UFS Smooth Sync and I/O Throttling

UFS uses smoothsync and I/O throttling to improve UFS performance and to minimize system stalls resulting from a heavy system I/O load.

Smoothsync allows each dirty page to age for a specified time period before going to disk. This allows more opportunity for frequently modified pages to be found in the cache, which decreases the I/O load. Also, spikes in which large numbers of dirty pages are locked on the device queue are minimized because pages are enqueued to a device after having aged sufficiently, as opposed to getting flushed by the update daemon.

I/O throttling further addresses the concern of locking dirty pages on the device queue. It enforces a limit on the number of delayed I/O requests allowed to be on the device queue at any point in time. This allows the system to be more responsive to any synchronous requests added to the device queue, such as a read or the loading of a new program into memory. This can also decrease the amount and duration of process stalls for specific dirty buffers, as pages remain available until placed on the device queue.

Related Attributes

The vfs subsystem attributes that affect smoothsync and throttling are:

smoothsync_age — Specifies the amount of time, in seconds, that a modified page ages before becoming eligible for the smoothsync mechanism to flush it to disk.

Value: 0 to 60
Default value: 30 seconds
If set to 0, smoothsync is disabled and dirty page flushing is controlled by the update daemon at 30-second intervals.

When to Tune
Increasing the value increases the chance of lost data if the system crashes, but can decrease net I/O load (improve performance) by allowing the dirty pages to remain cached longer.
The smoothsync_age attribute is enabled when the system boots to multiuser mode and disabled when the system changes from multiuser mode to single-user mode. To change the value of the smoothsync_age attribute, edit the following lines in the /etc/inittab file:
```
smsync:23:wait:/sbin/sysconfig -r vfs smoothsync_age=30 > /dev/null 2>&1
smsyncS:Ss:wait:/sbin/sysconfig -r vfs smoothsync_age=0 > /dev/null 2>&1
 
```
You can use the smsync2 mount option to specify an alternate smoothsync policy that can further decrease the net I/O load. The default policy is to flush modified pages after they have been dirty for the smoothsync_age time period, regardless of continued modifications to the page. When you mount a UFS using the smsync2 mount option, modified pages are not written to disk until they have been dirty and idle for the smoothsync_age time period. Note that memory-mapped pages always use this default policy, regardless of the smsync2 setting.

io_throttle_shift — Specifies a value that limits the maximum number of concurrent delayed UFS I/O requests on an I/O device queue.

Default value: 1 (2 seconds). However, the io_throttle_shift attribute only applies to file systems that you mount using the throttle mount option.

The greater the number of requests on an I/O device queue, the longer it takes to process those requests and to make those pages and device available. The number of concurrent delayed I/O requests on an I/O device queue can be throttled (controlled) by setting the io_throttle_shift attribute. The calculated throttle value is based on the value of the io_throttle_shift attribute and the device's calculated I/O completion rate. The time required to process the I/O device queue is proportional to the throttle value. The correspondences between the value of the io_throttle_shift attribute and the time to process the device queue are:

Value of the io_throttle_shift Attribute	Time (in seconds) to Process Device Queue
-4	0.0625
-3	0.125
-2	0.25
-1	0.5
0	1
1	2
2	4
3	8
4	16

Consider reducing the value of the io_throttle_shift attribute if your environment is particularly sensitive to delays in accessing the I/O device.

io_maxmzthruput — Specifies whether or not to maximize I/O throughput or to maximize the availability of dirty pages. Maximizing I/O throughput works more aggressively to keep the device busy, but within the constraints of the io_throttle_shift attribute. Maximizing the availability of dirty pages favors decreasing the stall time experienced when waiting for dirty pages.
Value: 0 (disabled) or 1 (enabled)
Default value: 1 (enabled). However, the io_throttle_maxmzthruput attribute only applies to file system that you mount using the throttle mount option.
When to Tune
Consider disabling the io_maxmzthruput attribute if your environment is particularly sensitive to delays in accessing sets of frequently used dirty pages or an environment in which I/O is confined to a small number of I/O-intensive applications, such that access to a specific set of pages becomes more important for overall performance than does keeping the I/O device busy.

You can modify the smoothsync_age, io_throttle_static, and io_throttle_maxmzthruput attributes without rebooting the system.

11.3.3.2 Delaying UFS Cluster Writing

By default, clusters of UFS pages are written asynchronously. You can configure clusters of UFS pages to be written delayed as other modified data and metadata pages are written.

Related Attribute

delay_wbuffers — Specifies whether or not clusters of UFS pages are written asynchronously or delayed.

Value: 0 or 1

Default value: 0 (asynchronously)

If the percentage of UBC dirty pages reaches the value of the delay_wbuffers_percent attribute, the clusters will be written asynchronously, regardless of the value of the delay_wbuffers attribute.

Delay writing clusters of UFS pages if your applications frequently write to previously written pages. This can result in a decrease in the total number of I/O requests. However, if you are not using I/O throttling, it might adversely affect real-time workload performance because the system will experience a heavy I/O load at sync time.

To delay writing clusters of UFS pages, use the dbx patch command to set the value of the delay_wbuffers kernel variable to 1 (enabled).

See Section 3.2 for information about using dbx.

11.3.3.3 Increasing the Number of Blocks in a Cluster

UFS combines contiguous blocks into clusters to decrease I/O operations. You can specify the number of blocks in a cluster.

Related Attribute

cluster_maxcontig — Specifies the number of blocks that are combined into a single I/O operation.

Default value: 32 blocks

If the specific file-system's rotational delay value is 0 (default), then UFS attempts to create clusters with up to n blocks, where n is either the value of the cluster_maxcontig attribute or the value from device geometry, whichever is smaller.

If the specific file-system's rotational delay value is nonzero, then n is the value of the cluster_maxcontig attribute, the value from device geometry, or the value of the maxcontig file-system attribute, whichever is smaller.

When to Tune

Increase the number of blocks combined for a cluster if your applications can use a large cluster size.

Use the newfs command to set the file-system rotational delay value and the value of the maxcontig attribute. Use the dbx command to set the value of the cluster_maxcontig attribute.

11.3.3.4 Defragmenting a File System

When a file consists of noncontiguous file extents, the file is considered fragmented. A very fragmented file decreases UFS read and write performance, because it requires more I/O operations to access the file.

When to Perform

Defragmenting a UFS file system improves file-system performance. However, it is a time-consuming process.

You can determine whether the files in a file system are fragmented by determining how effectively the system is clustering. You can do this by using the dbx print command to examine the ufs_clusterstats data structure. See Section 11.3.2.2 for information.

UFS block clustering is usually efficient. If the numbers from the UFS clustering kernel structures show that clustering is not effective, the files in the file system may be very fragmented.

Recommended Procedure

To defragment a UFS file system, follow these steps:

Back up the file system onto tape or another partition.

Create a new file system either on the same partition or a different partition.

Restore the file system.

See the System Administration manual for information about backing up and restoring data and creating UFS file systems.

11.4 Tuning NFS

The network file system (NFS) shares the Unified Buffer Cache (UBC) with the virtual memory subsystem and local file systems. NFS can put an extreme load on the network. Poor NFS performance is almost always a problem with the network infrastructure. Look for high counts of retransmitted messages on the NFS clients, network I/O errors, and routers that cannot maintain the load.

Lost packets on the network can severely degrade NFS performance. Lost packets can be caused by a congested server, the corruption of packets during transmission (which can be caused by bad electrical connections, noisy environments, or noisy Ethernet interfaces), and routers that abandon forwarding attempts too quickly.

For information about how to tune network file systems (NFS), see Chapter 5.