11    Managing File System Performance

To tune for better file-system performance, you must understand how your applications and users perform disk I/O, as described in Section 1.8, and how the file system you are using shares memory with processes, as described in Chapter 12. Using this information, you might improve file-system performance by changing the value of the kernel subsystem attributes described in this chapter.

This chapter describes how to tune:

11.1    Tuning Caches

The kernel caches (temporarily stores) in memory recently accessed data. Caching data is effective because data is frequently reused and it is much faster to retrieve data from memory than from disk. When the kernel requires data, it checks if the data was cached. If the data was cached, it is returned immediately. If the data was not cached, it is retrieved from disk and cached. File-system performance is improved if data is cached and later reused.

Data found in a cache is called a cache hit, and the effectiveness of cached data is measured by a cache hit rate. Data that was not found in a cache is called a cache miss.

Cached data can be information about a file, user or application data, or metadata, which is data that describes an object (for example, a file). The following list identifies the types of data that are cached:

11.1.1    Monitoring Cache Statistics

Table 11-1 describes the commands you can use to display and monitor cache information.

Table 11-1:  Tools to Display Cache Information

Tools Description Reference

(dbx) print processor number

Displays namei cache statistics.

Section 11.1.2

vmstat

Displays virtual memory statistics.

Section 11.1.3 and Section 12.3.1

(dbx) print bio_stats

Displays metadata buffer cache statistics.

Section 11.3.2.3

11.1.2    Tuning the namei Cache

The virtual file system (VFS) presents to applications a uniform kernel interface that is abstracted from the subordinate file system layer. As a result, file access across different types of file systems is transparent to the user.

The VFS uses a structure called a vnode to store information about each open file in a mounted file system. If an application makes a read or write request on a file, VFS uses the vnode information to convert the request and direct it to the appropriate file system. For example, if an application makes a read() system call request on a file, VFS uses the vnode information to convert the system call to the appropriate type for the file system containing the file: ufs_read() for UFS, advfs_read() for AdvFS, or nfs_read() call if the file is in a file system mounted through NFS, then directs the request to the appropriate file system.

The VFS caches a recently accessed file name and its corresponding vnode in the namei cache. File-system performance is improved if a file is reused and its name and corresponding vnode are in the namei cache.

The following list describes the vfs subsystem attributes that relate to the namei cache:

Related Attributes

Note

If you use increase the values of namei cache-related attributes, consider increasing file system attributes that cache file and directory information. If you use AdvFS, see Section 11.1.5 for more information. If you use UFS, see Section 11.1.4 for more information.

When to Tune

You can check namei cache statistics to see if you should change the values of namei cache related attributes. To check namei cache statistics, enter the dbx print command and specify a processor number to examine the nchstats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print processor_ptr[0].nchstats
 
 

Information similar to the following is displayed:

struct {
        ncs_goodhits = 18984
        ncs_neghits = 358
        ncs_badhits = 113
        ncs_falsehits = 23
        ncs_miss = 699
        ncs_long = 21
        ncs_badtimehits = 33
        ncs_collisions = 2
        ncs_unequaldups = 0
        ncs_newentry = 697
        ncs_newnegentry =  419
        ncs_gnn_hit = 1653
        ncs_gnn_miss = 12
        ncs_gnn_badhits = 12
        ncs_gnn_collision = 4
        ncs_pad = {
            [0] 0
        }
} 
 
 

Table 11-2 describes when you might change the values of namei cache related attributes based on the dbx print output:

Table 11-2:  When to Change the Values of the Namei Cache Related Attributes

If Increase

The value of ncs_goodhits + ncs_neghits / ncs_goodhits + ncs_neghits + ncs_miss + ncs_falsehits is less than 80 percent

The value of either the maxusers attribute or the name_cache_hash_size attribute
The value of the ncs_badtimehits is more than 0.1 percent of the ncs_goodhits The value of the namei_cache_valid_time attribute and the vnode_age attribute

You cannot modify the values of the name_cache_hash_size attribute, the namei_cache_valid_time attribute, or the vnode_deallocation_enable attribute without rebooting the system. You can modify the value of the vnode_age attribute without rebooting the system. See Chapter 3 for information about modifying subsystem attributes.

11.1.3    Tuning the UBC

The Unified Buffer Cache (UBC) shares with processes the memory that is not wired to cache UFS user and application data and AdvFS user and application data and metadata. File-system performance is improved if the data and metadata is reused and in the UBC.

Related Attributes

The following list describes the vm subsystem attributes that relate to the UBC:

Note

If the values of the ubc_maxpercent and ubc_minpercent attributes are close, you may degrade file system performance.

When to Tune

An insufficient amount of memory allocated to the UBC can impair file system performance. Because the UBC and processes share memory, changing the values of UBC-related attributes might cause the system to page. You can use the vmstat command to display virtual memory statistics that will help you to determine if you need to change values of UBC-related attributes. Table 11-3 describes when you might change the values UBC-related attributes based on the vmstat output:

Table 11-3:  When to Change the Values of the UBC-Related Attributes

If vmstat Output Displays Excessive: Action:
   
Paging but few or no page outs

Increase the value of the ubc_borrowpercent attribute.

Paging and swapping Decrease the ubc_maxpercent attribute.
Paging Force the system to reuse pages in the UBC instead of from the free list by making the value of the ubc_maxpercent attribute greater than the value of the vm_ubseqstartpercent attribute, which it is by default, and that the value of the vm_ubcseqpercent attribute is greater than a referenced file.
Page outs Increase the value of the ubc_minpercent attribute.

See Section 12.3.1 for information on the vmstat command. See Section 12.1.2.2 for information about UBC memory allocation.

You can modify the value of any of the UBC parameters described in this section without rebooting the system. See Chapter 3 for information about modifying subsystem attributes.

Note

The performance of an application that generates a lot of random I/O is not improved by a large UBC, because the next access location for random I/O cannot be predetermined.

11.1.4    Tuning the Metadata Buffer Cache

At boot time, the kernel wires a percentage of memory for the metadata buffer cache. UFS file metadata, such as superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries are cached in the metadata buffer cache. File-system performance is improved if the metadata is reused and in the metadata buffer cache.

Related Attributes

The following list describes the vfs subsystem attributes that relate to the metadata buffer cache:

When to Tune

Consider increasing the size of the bufcache attribute if you have a high cache miss rate (low hit rate).

To determine if you have a high cache miss rate, use the dbx print command to display the bio_stats data structure. If the miss rate (block misses divided by the sum of the block misses and block hits) is more than 3 percent, consider increasing the value of the bufcache attribute. See Section 11.3.2.3 for more information on displaying the bio_stats data structure.

Note that increasing the value of the bufcache attribute will reduce the amount of memory available to processes and the UBC.

11.1.5    Tuning AdvFS Access Structures

At boot time, the system reserves a portion of the physical memory that is not wired by the kernel for AdvFS access structures. AdvFS caches information about open files and information about files that were opened but are now closed in AdvFS access structures. File-system performance is improved if the file information is reused and in an access structure.

AdvFS access structures are dynamically allocated and deallocated according to the kernel configuration and system demands.

Related Attribute

When to Tune

If users or applications reuse AdvFS files (for example, a proxy server), consider increasing the value of the AdvfsAccessMaxPercent attribute to allocate more memory for AdvFS access structures. Note that increasing the value of the AdvfsAccessMaxPercent attribute reduces the amount of memory available to processes and might cause excessive paging and swapping. You can use the vmstat command to display virtual memory statistics that will help you to determine excessive paging and swapping. See Section 12.3.1 for information on the vmstat command

Consider decreasing the amount of memory reserved for AdvFS access structures if:

11.2    Tuning AdvFS

This section describes how to tune Advanced File System (AdvFS) queues, AdvFS configuration guidelines, and commands that you can use to display AdvFS information.

See the AdvFS Administration manual for information about AdvFS features and setting up and managing AdvFS.

11.2.1    AdvFS Configuration Guidelines

The amount of I/O contention on the volumes in a file domain is the most critical factor for fileset performance. This can occur on large, very busy file domains. To help you determine how to set up filesets, first identify:

Then, use the previous information and the following guidelines to configure filesets and file domains:

Table 11-4 lists additional AdvFS configuration guidelines and performance benefits and tradeoffs. See the AdvFS Administration manual for more information about AdvFS.

Table 11-4:  AdvFS Configuration Guidelines

Benefit Guideline Tradeoff
Data loss protection Use LSM or RAID to store data using RAID1 (mirror data) or RAID5 (Section 11.2.1.1) Requires LSM or RAID
Data loss protection Force synchronous writes or enable atomic write data logging on a file (Section 11.2.1.2) Might degrade file system performance
Improve performance for applications that read or write data only once Enable direct I/O (Section 11.2.1.3) Degrades performance of applications that repeatedly access the same data
Improve performance Use AdvFS to distribute files in a file domain (Section 11.2.1.4) None
     
Improve performance Stripe data (Section 11.2.1.5) None if using AdvFS or requires LSM or RAID
Improve performance Defragment file domains (Section 11.2.1.6) None
Improve performance Decrease the I/O transfer size (Section 11.2.1.7) None
Improves performance Move the transaction log to a fast or uncongested disk (Section 11.2.1.8) Might require an additional disk

The following sections describe these guidelines in more detail.

11.2.1.1    Storing Data Using RAID1 or RAID5

You can use LSM or hardware RAID to implement a RAID1 or RAID5 data storage configuration.

In a RAID1 configuration, LSM or hardware RAID stores and maintain mirrors (copies) of file domain or transaction log data on different disks. If a disk fails, LSM or hardware RAID uses a mirror to make the data available.

In a RAID5 configuration, LSM or hardware RAID stores parity information and data. If a disk fails, LSM or hardware RAID use the parity information and data on the remaining disks to reconstruct the missing data.

See the Logical Storage Manager manual for more information about LSM. See your storage hardware documentation for more information about hardware RAID.

11.2.1.2    Forcing a Synchronous Write Request or Enabling Persistent Atomic Write Data Logging

AdvFS writes data to disk in 8-KB units. By default, AdvFS asynchronous write requests are cached in the UBC, and the write system call returns a success value. The data is written to disk at a later time (asynchronously). AdvFS does not guarantee that all or part of the data will actually be written to disk if a crash occurs during or immediately after the write. For example, if the system crashes during a write that consists of two 8-KB units of data, only a portion (less than 16 KB) of the total write might have succeeded. This can result in partial data writes and inconsistent data.

You can configure AdvFS to force the write request for a specified file to be synchronous to ensure that data is successfully written to disk before the write system call returns a success value.

Enabling persistent atomic write data logging for a specified file writes the data to the transaction log file before it is written to disk. If a system crash occurs during or immediately after the write system call, the data in the log file is used to reconstruct the write system call upon recovery.

You cannot enable both forced synchronous writes and persistent atomic write data logging on a file. However, you can enable atomic write data logging on a file and also open the file with an O_SYNC option. This ensures that the write is synchronous, but also prevents partial writes if a crash occurs before the write system call returns.

To force synchronous write requests, enter:

# chfile -l on filename

A file that has persistent atomic write data logging enabled cannot be memory mapped by using the mmap system call, and it cannot have direct I/O enabled (see Section 11.2.1.3). To enable persistent atomic write data logging, enter:

# chfile -L on filename

A file that has persistent atomic write data logging will only be atomic if the writes are 8192 bytes or less. If the writes are greater than 8192 bytes, they are written in segments that are at most 8192 bytes in length with each segement an atomic-write.

To enable atomic-write data logging on AdvFS files that are NFS mounted, ensure that:

See chfile(8) and the AdvFS Administration manual for more information.

11.2.1.3    Enabling Direct I/O

You can enable direct I/O to significantly improve disk I/O throughput for applications that do not frequently reuse previously accessed data. The following lists considerations if you enable direct I/O:

You cannot enable direct I/O for a file if it is already opened for data logging or if it is memory mapped. Use the fcntl system call with the F_GETCACHEPOLICY argument to determine if an open file has direct I/O enabled.

To enable direct I/O for a specific file, use the open system call and set the O_DIRECTIO file access flag. A file remains opened for direct I/O until all users close the file.

See fcntl(2), open(2), the AdvFS Administration manual, and the Programmer's Guide for more information.

11.2.1.4    Using AdvFS to Distribute Files

If the files in a multivolume domain are not evenly distributed, performance might be degraded. You can distribute space evenly across volumes in a multivolume file domain to balance the percentage of used space among volumes in a domain. Files are moved from one volume to another until the percentage of used space on each volume in the domain is as equal as possible.

To determine if you need to balance files, enter:

# showfdmn file_domain_name

Information similar to the following is displayed:

               Id     Date Created       LogPgs Version   Domain Name
3437d34d.000ca710  Sun Oct 5 10:50:05 2001  512       3   usr_domain
 Vol  512-Blks   Free % Used  Cmode Rblks  Wblks  Vol Name 
  1L   1488716 549232    63%     on   128    128  /dev/disk/dsk0g
  2     262144 262000     0%     on   128    128  /dev/disk/dsk4a
     --------- -------  ------
       1750860 811232    54%

The % Used field shows the percentage of volume space that is currently allocated to files or metadata (the fileset data structure). In the previous example, the usr_domain file domain is not balanced. Volume 1 has 63 percent used space while volume 2 has 0 percent used space (it was just added).

To distribute the percentage of used space evenly across volumes in a multivolume file domain, enter:

# balance file_domain_name

The balance command is transparent to users and applications, and does not affect data availability or split files. Therefore, file domains with very large files may not balance as evenly as file domains with smaller files and you might need to manually move large files into the same volume in a multivolume file domain.

To determine if you should move a file, enter:

# showfile -x file_name

Information similar to the following is displayed:

    Id Vol PgSz Pages XtntType  Segs  SegSz  I/O  Perf  File
8.8002   1   16    11   simple    **     ** async  18%  src
 
             extentMap: 1
        pageOff    pageCnt     vol    volBlock    blockCnt
              0          1       1      187296          16
              1          1       1      187328          16
              2          1       1      187264          16
              3          1       1      187184          16
              4          1       1      187216          16
              5          1       1      187312          16
              6          1       1      187280          16
              7          1       1      187248          16
              8          1       1      187344          16
              9          1       1      187200          16
             10          1       1      187232          16
        extentCnt: 11

The file in the previous example is a good candidate to move to another volume because it has 11 extents and an 18 percent performance efficiency as shown in the Perf field. A high percentage indicates optimal efficiency.

To move a file to a different volume in the file domain, enter:

# migrate [-p pageoffset] [-n pagecount] [-s volumeindex_from] \
[-d volumeindex_to] file_name

You can specify the volume from which a file is to be moved, or allow the system to pick the best space in the file domain. You can move either an entire file or specific pages to a different volume.

Note that using the balance utility after moving files might move files to a different volume.

See showfdmn(8), migrate(8), and balance(8) for more information.

11.2.1.5    Striping Data

You can use AdvFS, LSM, or hardware RAID to stripe (distribute) data. Striped data is data that is separated into units of equal size, then written to two or more disks, creating a stripe of data. The data can be simultaneously written if there are two or more units and the disks are on different SCSI buses.

Figure 11-1 shows how a write request of 384 KB of data is separated into six 64-KB data units and written to three disks as two complete stripes.

Figure 11-1:  Striping Data

Use only one method to stripe data. In some specific cases, using multiple striping methods can improve performance, but only if:

See stripe(8) for more information about using AdvFS to stripe data. See the Logical Storage Manager manual for more information about using LSM to stripe data. See your storage hardware documentation for more information about using hardware RAID to stripe data.

11.2.1.6    Defragmenting a File Domain

An extent is a contiguous area of disk space that AdvFS allocates to a file. Extents consist of one or more 8-KB pages. When storage is added to a file, it is grouped in extents. If all data in a file is stored in contiguous blocks, the file has one file extent. However, as files grow, contiguous blocks on the disk may not be available to accommodate the new data, so the file must be spread over discontiguous blocks and multiple file extents.

File I/O is most efficient when there are few extents. If a file consists of many small extents, AdvFS requires more I/O processing to read or write the file. Disk fragmentation can result in many extents and may degrade read and write performance because many disk addresses must be examined to access a file.

To display fragmentation information for a file domain, enter:

# defragment -vn file_domain_name

Information similar to the following is displayed:

 defragment: Gathering data for 'staff_dmn'
 Current domain data:
   Extents:                 263675
   Files w/ extents:        152693
   Avg exts per file w/exts:  1.73
   Aggregate I/O perf:         70%
   Free space fragments:     85574
                <100K   <1M   <10M   >10M
   Free space:   34%   45%    19%     2%
   Fragments:  76197  8930    440      7
 
 

Ideally, you want few extents for each file.

Although the defragment command does not affect data availability and is transparent to users and applications, it can be a time-consuming process and requires disk space. Run the defragment command during low file system activity as part of regular file system maintenance, or if you experience problems because of excessive fragmentation.

There is little performance benefit from defragmenting a file domain that contains files less than 8 KB, is used in a mail server, or is read-only.

You can also use the showfile command to check a file's fragmentation. See Section 11.2.2.4 and defragment(8) for more information.

11.2.1.7    Decreasing the I/O Transfer Size

AdvFS attempts to transfer data to and from the disk in sizes that are the most efficient for the device driver. This value is provided by the device driver and is called the preferred transfer size. AdvFS uses the preferred transfer size to:

Generally, the I/O transfer size provided by the device driver is the most efficient. However, in some cases you may want to reduce the AdvFS I/O transfer size. For example, if your AdvFS fileset is using LSM volumes, the preferred transfer size might be very high. This could cause the cache to be unduly diluted by the buffers for the files being read. If this is suspected, reducing the read transfer size may alleviate the problem.

For systems with impaired mmap page faulting or with limited memory, limit the read transfer size to limit the amount of data that is prefetched; however, this will limit I/O consolidation for all reads from this disk.

To display the I/O transfer sizes for a disk, enter:

# chvol -l block_special_device_name domain

To modify the read I/O transfer size, enter:

# chvol -r blocks block_special_device_name domain

To modify the write I/O transfer size, enter:

# chvol -w blocks block_special_device_name domain

See chvol(8) for more information.

Each device driver has a minimum and maximum value for the I/O transfer size. If you use an unsupported value, the device driver automatically limits the value to either the largest or smallest I/O transfer size it supports. See your device driver documentation for more information on supported I/O transfer sizes.

11.2.1.8    Moving the Transaction Log

Place the AdvFS transaction log on a fast or uncongested disk and bus; otherwise, performance might be degraded.

To display volume information, enter:

# showfdmn file_domain_name

Information similar to the following is displayed:

               Id              Date Created  LogPgs  Domain Name
35ab99b6.000e65d2  Tue Jul 14 13:47:34 2002     512  staff_dmn
 
  Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name
   3L     262144      154512     41%     on    256    256  /dev/rz13a
   4      786432      452656     42%     on    256    256  /dev/rz13b
      ----------  ----------  ------
         1048576      607168     42%

In the showfdmn command display, the letter L displays next to the volume that contains the transaction log.

If the transaction log is located on a slow or busy disk, you can:

See showfdmn(8), switchlog(8), vdump(8), and vrestore(8) for more information.

11.2.2    Monitoring AdvFS Statistics

Table 11-5 describes the commands you can use to display AdvFS information.

Table 11-5:  Tools to Display AdvFS Information

Tool Description Reference

advfsstat

Displays AdvFS performance statistics.

Section 11.2.2.1

advscan

Displays disks in a file domain.

Section 11.2.2.2

showfdmn

Displays information about AdvFS file domans and volumes.

Section 11.2.2.3

showfsets

Displays AdvFS fileset information for a file domain.

Section 11.2.2.5

showfile

Displays information about files in an AdvFS fileset.

Section 11.2.2.4

The following sections describe these commands in more detail.

11.2.2.1    Displaying AdvFS Performance Statistics

To display detailed information about a file domain, including use of the UBC and namei cache, fileset vnode operations, locks, bitfile metadata table (BMT) statistics, and volume I/O performance, use the advfsstat command.

The following example displays volume I/O queue statistics:

# advfsstat -v 3 [-i number_of_seconds] file_domain

Information, in units of one disk block (512 bytes), similar to the following is displayed:

  rd   wr   rg  arg   wg  awg  blk ubcr flsh  wlz  sms  rlz  con  dev
   0    0    0    0    0    0   1M    0  10K 303K  51K  33K  33K  44K
 
 

You can use the -i option to display information at specific time intervals, in seconds.

The previous example displays:

To display the number of file creates, reads, and writes and other operations for a specified domain or fileset, enter:

# advfsstat [-i number_of_seconds] -f 2 file_domain file_set

Information similar to the following is displayed:

  lkup  crt geta read writ fsnc dsnc   rm   mv rdir  mkd  rmd link
     0    0    0    0    0    0    0    0    0    0    0    0    0
     4    0   10    0    0    0    0    2    0    2    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
    24    8   51    0    9    0    0    3    0    0    4    0    0
  1201  324 2985    0  601    0    0  300    0    0    0    0    0
  1275  296 3225    0  655    0    0  281    0    0    0    0    0
  1217  305 3014    0  596    0    0  317    0    0    0    0    0
  1249  304 3166    0  643    0    0  292    0    0    0    0    0
  1175  289 2985    0  601    0    0  299    0    0    0    0    0
   779  148 1743    0  260    0    0  182    0   47    0    4    0
     0    0    0    0    0    0    0    0    0    0    0    0    0
     0    0    0    0    0    0    0    0    0    0    0    0    0

See advfsstat(8) for more information.

11.2.2.2    Displaying Disks in an AdvFS File Domain

Use the advscan command:

To display AdvFS volumes on devices or in an LSM disk group, enter:

# advscan device | LSM_disk_group

Information similar to the following is displayed:

Scanning disks  dsk0 dsk5 
Found domains: 
usr_domain
          Domain Id       2e09be37.0002eb40
          Created         Thu Jun 26 09:54:15 2002
          Domain volumes          2
          /etc/fdmns links        2
          Actual partitions found:
                                  dsk0c                     
                                  dsk5c

To re-create missing domains on a device, enter:

# advscan -r device

Information similar to the following is displayed:

Scanning disks  dsk6 
Found domains: *unknown*      
          Domain Id       2f2421ba.0008c1c0                 
          Created         Mon Jan 20 13:38:02 2002                   
          Domain volumes          1   
          /etc/fdmns links        0                   
          Actual partitions found:                                         
                                  dsk6a*    
*unknown*       
         Domain Id       2f535f8c.000b6860                 
         Created         Tue Feb 25 09:38:20 2002                   
         Domain volumes          1    
         /etc/fdmns links        0                   
         Actual partitions found:
                                 dsk6b*    
 
Creating /etc/fdmns/domain_dsk6a/
        linking dsk6a   
Creating /etc/fdmns/domain_dsk6b/         
        linking dsk6b

See advscan(8) for more information.

11.2.2.3    Displaying AdvFS File Domains

To display information about a file domain, including the date created and the size and location of the transaction log, and information about each volume in the domain, including the size, the number of free blocks, the maximum number of blocks read and written at one time, and the device special file, enter:

# showfdmn file_domain

Information similar to the following is displayed:

               Id              Date Created  LogPgs  Version  Domain Name
34f0ce64.0004f2e0  Wed Mar 17 15:19:48 2002     512        4  root_domain
 
  Vol   512-Blks        Free  % Used  Cmode  Rblks  Wblks  Vol Name 
   1L     262144       94896     64%     on    256    256  /dev/disk/dsk0a
 
 

For multivolume domains, the showfdmn command also displays the total volume size, the total number of free blocks, and the total percentage of volume space currently allocated.

See showfdmn(8) for more information about the output of the command.

11.2.2.4    Displaying AdvFS File Information

To display detailed information about files (and directories) in an AdvFS fileset, enter:

# showfile filename...

or

# showfile * 

The * displays the AdvFS characteristics for all of the files in the current working directory.

Information similar to the following is displayed:

         Id  Vol  PgSz  Pages  XtntType  Segs  SegSz  I/O   Perf  File
  23c1.8001    1    16      1    simple    **     **  ftx   100%  OV
  58ba.8004    1    16      1    simple    **     **  ftx   100%  TT_DB
         **   **    **     **   symlink    **     **   **     **  adm
  239f.8001    1    16      1    simple    **     **  ftx   100%  advfs
         **   **    **     **   symlink    **     **   **     **  archive
     9.8001    1    16      2    simple    **     **  ftx   100%  bin (index)
         **   **    **     **   symlink    **     **   **     **  bsd
         **   **    **     **   symlink    **     **   **     **  dict
   288.8001    1    16      1    simple    **     **  ftx   100%  doc
   28a.8001    1    16      1    simple    **     **  ftx   100%  dt
         **   **    **     **   symlink    **     **   **     **  man
  5ad4.8001    1    16      1    simple    **     **  ftx   100%  net
         **   **    **     **   symlink    **     **   **     **  news
   3e1.8001    1    16      1    simple    **     **  ftx   100%  opt
         **   **    **     **   symlink    **     **   **     **  preserve
         **   **    **     **     advfs    **     **   **     **  quota.group
         **   **    **     **     advfs    **     **   **     **  quota.user
     b.8001    1    16      2    simple    **     **  ftx   100%  sbin (index)
         **   **    **     **   symlink    **     **   **     **  sde
   61d.8001    1    16      1    simple    **     **  ftx   100%  tcb
         **   **    **     **   symlink    **     **   **     **  tmp
         **   **    **     **   symlink    **     **   **     **  ucb
  6df8.8001    1    16      1    simple    **     **  ftx   100%  users

See showfile(8) for more information about the command output.

11.2.2.5    Displaying the AdvFS Filesets in a File Domain

To display information about the filesets in a file domain, including the fileset names, the total number of files, the number of used blocks, the quota status, and the clone status, enter:

# showfsets file_domain

Information similar to the following is displayed:

usr
        Id           : 3d0f7cf8.000daec4.1.8001
        Files        :    30469,  SLim=        0,  HLim=        0
        Blocks (512) :  1586588,  SLim=        0,  HLim=        0
        Quota Status : user=off group=off
        Object Safety: off
        Fragging     : on
        DMAPI        : off
 
 

The previous example displays that a file domain called dmn1 has one fileset and one clone fileset.

See showfsets(8) for more information.

11.2.3    Tuning AdvFS Queues

For each AdvFS volume, I/O requests are sent to one of the following queues:

All four queues (blocking, UBC request, flush, and lazy) move buffers to the device queue. As buffers are moved onto the device queue, logically contiguous I/Os are consolidated into larger I/O requests. This reduces the actual number of I/Os that must be completed. Buffers on the device queue cannot be modified until their I/O has completed.

The algorithms that move the buffers onto the device queue and favor taking buffers from the queues in the following order; blocking queue, UBC request queue, and then flush queue. All three are favored over the lazy queue. The size of the device queue is limited by device and driver resources. The algorithms that load the device queue use feedback from the drivers to know when the device queue is full. At that point the device is saturated and continued movement of buffers to the device queue would only degrade throughput to the device. The potential size of the device queue and how full it is, ultimately determines how long it may take to complete a synchronous I/O operation.

Figure 11-2 shows the movement of synchronous and asynchronous I/O requests through the AdvFS I/O queues.

Figure 11-2:  AdvFS I/O Queues

Detailed descriptions of the AdvFS lazy queues are as follows:

Related Attributes

The following list describes the vfs subsystem attributes that relate to AdvFS queues:

You can modify the value of the AdvfsSyncMmapPages, smoothsync_age, and the AdvfsReadyQLim attributes without rebooting the system. See Chapter 3 for information about modifying kernel subsystem attributes.

When to Tune

If you reuse data, consider increasing:

11.3    Tuning UFS

This section describes UFS configuration and tuning guidelines and commands that you can use to display UFS information.

11.3.1    UFS Configuration Guidelines

Table 11-6 lists UFS configuration guidelines and performance benefits and tradeoffs.

Table 11-6:  UFS Configuration Guidelines

Benefit Guideline Tradeoff
Improve performance for small files

Make the file system fragment size equal to the block size (Section 11.3.1.1)

Wastes disk space for small files
Improve performance for large files

Use the default file system fragment size of 1 KB (Section 11.3.1.1)

Increases the overhead for large files
Free disk space and improve performance for large files

Reduce the density of inodes on a file system (Section 11.3.1.2)

Reduces the number of files that can be created
Improve performance for disks that do not have a read-ahead cache

Set rotational delay (Section 11.3.1.3)

None
Decrease the number of disk I/O operations

Increase the number of blocks combined for a cluster (Section 11.3.1.4)

None
Improve performance

Use a memory file system (MFS) (Section 11.3.1.5)

Does not ensure data integrity because of cache volatility
Control disk space usage

Use disk quotas (Section 11.3.1.6)

Might result in a slight increase in reboot time
Allow more mounted file systems

Increase the maximum number of UFS and MFS mounts (Section 11.3.1.7)

Requires additional memory resources

The following sections describe these guidelines in more detail.

11.3.1.1    Modifying the File System Fragment and Block Sizes

The UFS file system block size is 8 KB. The default fragment size is 1 KB. You can use the newfs command to modify the fragment size to 1024 KB, 2048 KB, 4096 KB, or 8192 KB when you create it.

Although the default fragment size uses disk space efficiently, it increases the overhead for files less than 96 KB. If the average file in a file system is less than 96 KB, you might improve disk access time and decrease system overhead by making the file-system fragment size equal to the default block size (8 KB).

See newfs(8) for more information.

11.3.1.2    Reducing the Density of inodes

An inode describes an individual file in the file system. The maximum number of files in a file system depends on the number of inodes and the size of the file system. The system creates an inode for each 4 KB (4096 bytes) of data space in a file system.

If a file system will contain many large files and you are sure that you will not create a file for each 4 KB of space, you can reduce the density of inodes on the file system. This will free disk space for file data, but also reduces the number of files that can be created.

To do this, use the newfs -i command to specify the amount of data space allocated for each inode when you create the file system. See newfs(8) for more information.

11.3.1.3    Set Rotational Delay

The UFS rotdelay parameter specifies the time, in milliseconds, to service a transfer completion interrupt and initiate a new transfer on the same disk. It is used to decide how much rotational spacing to place between successive blocks in a file. By default, the rotdelay parameter is set to 0 to allocate blocks continuously. It is useful to set rotdelay on disks that do not have a read-ahead cache. For disks with cache, set the rotdelay to 0.

Use either the tunefs command or the newfs command to modify the rotdelay value.

See newfs(8) and tunefs(8) for more information.

11.3.1.4    Increasing the Number of Blocks Combined for a Cluster

The value of the UFS maxcontig parameter specifies the number of blocks that can be combined into a single cluster (or file-block group). The default value of maxcontig is 8. The file system attempts I/O operations in a size that is determined by the value of maxcontig multiplied by the block size (8 KB).

Device drivers that can chain several buffers together in a single transfer should use a maxcontig value that is equal to the maximum chain length. This may reduce the number of disk I/O operations.

Use the tunefs command or the newfs command to change the value of maxcontig.

See newfs(8) and tunefs(8) for more information.

11.3.1.5    Using MFS

The memory file system (MFS) is a UFS file system that resides only in memory. No permanent data or file structures are written to disk. An MFS can improve read/write performance, but it is a volatile cache. The contents of an MFS are lost after a reboot, unmount operation, or power failure.

Because no data is written to disk, an MFS is a very fast file system and can be used to store temporary files or read-only files that are loaded into the file system after it is created. For example, if you are performing a software build that would have to be restarted if it failed, use an MFS to cache the temporary files that are created during the build and reduce the build time.

See mfs(8) for more information.

11.3.1.6    Using UFS Disk Quotas

You can specify UFS file-system limits for user accounts and for groups by setting up UFS disk quotas, also known as UFS file system quotas. You can apply quotas to file-systems to establish a limit on the number of blocks and inodes (or files) that a user account or a group of users can allocate. You can set a separate quota for each user or group of users on each file system.

You may want to set quotas on file systems that contain home directories, because the sizes of these file systems can increase more significantly than other file systems. Do not set quotas on the /tmp file system.

Note that, unlike AdvFS quotas, UFS quotas may cause a slight increase in reboot time. See the AdvFS Administration manual for information about AdvFS quotas. See the System Administration manual for information about UFS quotas.

11.3.1.7    Increasing the Number of UFS and MFS Mounts

Mount structures are dynamically allocated when a mount request is made and subsequently deallocated when an unmount request is made.

Related Attributes

The max_ufs_mounts attribute specifies the maximum number of UFS and MFS mounts on the system.

Value: 0 to 2,147,483,647

Default value: 1000 (file system mounts)

You can modify the max_ufs_mounts attribute without rebooting the system. See Chapter 3 for information about modifying kernel subsystem attributes.

When to Tune

Increase the maximum number of UFS and MFS mounts if your system will have more than the default limit of 1000 mounts.

Increasing the maximum number of UFS and MFS mounts enables you to mount more file systems. However, increasing the maximum number mounts requires memory resources for the additional mounts.

11.3.2    Monitoring UFS Statistics

Table 11-7 describes the commands you can use to display UFS information.

Table 11-7:  Tools to Display UFS Information

Tools Decription Reference

dumpfs

Displays UFS information.

Section 11.3.2.1

(dbx) print ufs_clusterstats

Displays UFS clustering statistics.

Section 11.3.2.2

(dbx) print bio_stats

Displays metadata buffer cache statistics.

Section 11.3.2.3

11.3.2.1    Displaying UFS Information

To display UFS information for a specified file system, including super block and cylinder group information, enter:

# dumpfs filesystem | /devices/disk/device_name

Information similar to the following is displayed:

 magic   11954   format  dynamic time   Tue Sep 14 15:46:52 2002 
nbfree  21490   ndir    9       nifree  99541  nffree  60 
ncg     65      ncyl    1027    size    409600  blocks  396062
bsize   8192    shift   13      mask    0xffffe000 
fsize   1024    shift   10      mask    0xfffffc00 
frag    8       shift   3       fsbtodb 1 
cpg     16      bpg     798     fpg     6384    ipg     1536 
minfree 10%     optim   time    maxcontig 8     maxbpg  2048 
rotdelay 0ms    headswitch 0us  trackseek 0us   rps     60

The information contained in the first lines are relevant for tuning. Of specific interest are the following fields:

11.3.2.2    Monitoring UFS Clustering

To display how the system is performing cluster read and write transfers, use the dbx print command to examine the ufs_clusterstats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem  
(dbx) print ufs_clusterstats

Information similar to the following is displayed:

struct {
    full_cluster_transfers = 3130
    part_cluster_transfers = 9786
    non_cluster_transfers = 16833
    sum_cluster_transfers = {
        [0] 0
        [1] 24644
        [2] 1128
        [3] 463
        [4] 202
        [5] 55
        [6] 117
        [7] 36
        [8] 123
        [9] 0
         .
         .
         .
       [33]
 
    }
}
(dbx)

The previous example shows 24644 single-block transfers, 1128 double-block transfers, 463 triple-block transfers, and so on.

You can use the dbx print command to examine cluster reads and writes by specifying the ufs_clusterstats_read and ufs_clusterstats_write data structures respectively.

11.3.2.3    Displaying the Metadata Buffer Cache

To display statistics on the metadata buffer cache, including superblocks, inodes, indirect blocks, directory blocks, and cylinder group summaries, use the dbx print command to examine the bio_stats data structure. For example:

# /usr/ucb/dbx -k /vmunix /dev/mem  
(dbx) print bio_stats

Information similar to the following is displayed:

struct {
    getblk_hits = 4590388
    getblk_misses = 17569
    getblk_research = 0
    getblk_dupbuf = 0
    getnewbuf_calls = 17590
    getnewbuf_buflocked = 0
    vflushbuf_lockskips = 0
    mntflushbuf_misses = 0
    mntinvalbuf_misses = 0
    vinvalbuf_misses = 0
    allocbuf_buflocked = 0
    ufssync_misses = 0
}

The number of block misses (getblk_misses) divided by the sum of block misses and block hits (getblk_hits) should not be more than 3 percent. If the number of block misses is high, you might want to increase the value of the bufcache attribute. See Section 11.1.4 for information on increasing the value of the bufcache attribute.

11.3.3    Tuning UFS for Performance

Table 11-8 lists UFS tuning guidelines and performance benefits and tradeoffs.

Table 11-8:  UFS Tuning Guidelines

Benefit Guideline Tradeoff
Improve performance

Adjust UFS smoothsync and I/O throttling for asynchronous UFS I/O requests (Section 11.3.3.1)

None
Free CPU cycles and reduce the number of I/O operations

Delay UFS cluster writing (Section 11.3.3.2)

If I/O throttling is not used, might degrade real-time workload performance when buffers are flushed
Reduce the number of disk I/O operations

Increase the number of combined blocks for a cluster (Section 11.3.3.3)

Might require more memory to buffer data
Improve read and write performance

Defragment the file system (Section 11.3.3.4)

Requires down time

The following sections describe these guidelines in more detail.

11.3.3.1    Adjusting UFS Smooth Sync and I/O Throttling

UFS uses smoothsync and I/O throttling to improve UFS performance and to minimize system stalls resulting from a heavy system I/O load.

Smoothsync allows each dirty page to age for a specified time period before going to disk. This allows more opportunity for frequently modified pages to be found in the cache, which decreases the I/O load. Also, spikes in which large numbers of dirty pages are locked on the device queue are minimized because pages are enqueued to a device after having aged sufficiently, as opposed to getting flushed by the update daemon.

I/O throttling further addresses the concern of locking dirty pages on the device queue. It enforces a limit on the number of delayed I/O requests allowed to be on the device queue at any point in time. This allows the system to be more responsive to any synchronous requests added to the device queue, such as a read or the loading of a new program into memory. This can also decrease the amount and duration of process stalls for specific dirty buffers, as pages remain available until placed on the device queue.

Related Attributes

The vfs subsystem attributes that affect smoothsync and throttling are:

You can modify the smoothsync_age, io_throttle_static, and io_throttle_maxmzthruput attributes without rebooting the system.

11.3.3.2    Delaying UFS Cluster Writing

By default, clusters of UFS pages are written asynchronously. You can configure clusters of UFS pages to be written delayed as other modified data and metadata pages are written.

Related Attribute

delay_wbuffers — Specifies whether or not clusters of UFS pages are written asynchronously or delayed.

Value: 0 or 1
Default value: 0 (asynchronously)
If the percentage of UBC dirty pages reaches the value of the delay_wbuffers_percent attribute, the clusters will be written asynchronously, regardless of the value of the delay_wbuffers attribute.

Delay writing clusters of UFS pages if your applications frequently write to previously written pages. This can result in a decrease in the total number of I/O requests. However, if you are not using I/O throttling, it might adversely affect real-time workload performance because the system will experience a heavy I/O load at sync time.

To delay writing clusters of UFS pages, use the dbx patch command to set the value of the delay_wbuffers kernel variable to 1 (enabled).

See Section 3.2 for information about using dbx.

11.3.3.3    Increasing the Number of Blocks in a Cluster

UFS combines contiguous blocks into clusters to decrease I/O operations. You can specify the number of blocks in a cluster.

Related Attribute

cluster_maxcontig — Specifies the number of blocks that are combined into a single I/O operation.

Default value: 32 blocks

If the specific file-system's rotational delay value is 0 (default), then UFS attempts to create clusters with up to n blocks, where n is either the value of the cluster_maxcontig attribute or the value from device geometry, whichever is smaller.

If the specific file-system's rotational delay value is nonzero, then n is the value of the cluster_maxcontig attribute, the value from device geometry, or the value of the maxcontig file-system attribute, whichever is smaller.

When to Tune

Increase the number of blocks combined for a cluster if your applications can use a large cluster size.

Use the newfs command to set the file-system rotational delay value and the value of the maxcontig attribute. Use the dbx command to set the value of the cluster_maxcontig attribute.

11.3.3.4    Defragmenting a File System

When a file consists of noncontiguous file extents, the file is considered fragmented. A very fragmented file decreases UFS read and write performance, because it requires more I/O operations to access the file.

When to Perform

Defragmenting a UFS file system improves file-system performance. However, it is a time-consuming process.

You can determine whether the files in a file system are fragmented by determining how effectively the system is clustering. You can do this by using the dbx print command to examine the ufs_clusterstats data structure. See Section 11.3.2.2 for information.

UFS block clustering is usually efficient. If the numbers from the UFS clustering kernel structures show that clustering is not effective, the files in the file system may be very fragmented.

Recommended Procedure

To defragment a UFS file system, follow these steps:

  1. Back up the file system onto tape or another partition.

  2. Create a new file system either on the same partition or a different partition.

  3. Restore the file system.

See the System Administration manual for information about backing up and restoring data and creating UFS file systems.

11.4    Tuning NFS

The network file system (NFS) shares the Unified Buffer Cache (UBC) with the virtual memory subsystem and local file systems. NFS can put an extreme load on the network. Poor NFS performance is almost always a problem with the network infrastructure. Look for high counts of retransmitted messages on the NFS clients, network I/O errors, and routers that cannot maintain the load.

Lost packets on the network can severely degrade NFS performance. Lost packets can be caused by a congested server, the corruption of packets during transmission (which can be caused by bad electrical connections, noisy environments, or noisy Ethernet interfaces), and routers that abandon forwarding attempts too quickly.

For information about how to tune network file systems (NFS), see Chapter 5.