5    Tuning Network File Systems

The network file system (NFS) allows users to access files transparently across networks. The NFS supports a spectrum of network topologies, from small and simple networks to large and complex networks. The NFS shares the Unified Buffer Cache (UBC) with the virtual memory subsystem and local file systems.

NFS can put an extreme load on the network. Poor NFS performance is almost always a problem with the network infrastructure. Look for high counts of retransmitted messages on the NFS clients, network I/O errors, and routers that cannot maintain the load. Lost packets on the network can severely degrade NFS performance. Lost packets can be caused by a congested server, the corruption of packets during transmission (which can be caused by bad electrical connections, noisy environments, or noisy Ethernet interfaces), and routers that abandon forwarding attempts too quickly.

When evaluating NFS performance, remember that NFS does not perform well if any file-locking mechanisms are in use on an NFS file. The locks prevent the file from being cached on the client.

Improving performance on a system that is used only for serving NFS differs from tuning a system that is used for general timesharing, because an NFS server runs only a few small user-level programs, which consume few system resources. There is minimal paging and swapping activity, so memory resources should be focused on caching file system data.

File-system tuning is important for NFS because processing NFS requests consumes the majority of CPU and wall-clock time. Ideally, the UBC hit rate should be high. Increasing the UBC hit rate can require additional memory or a reduction in the size of other file-system caches (see Section 11.1.3). In general, file-system tuning will improve the performance of I/O-intensive user applications. In addition, a vnode must exist to keep file data. If you are using AdvFS, an access structure is also required to keep file data. See Chapter 11 for more information about file systems.

This chapter describes how to improve NFS performance. It offers various configuration guidelines and describes several monitoring tools, including the following topics:

5.1    Monitoring NFS Statistics

This section provides references to utilities that you can use to gather NFS performance information. It is important that you gather statistics under a variety of conditions. Comparing sets of data will help you diagnose performance problems.

Table 5-1 describes the tools that you can use to detect poor NFS performance.

Table 5-1:  Tools to Detect Poor NFS Performance

Tools Description Reference
nfsstat Displays network file system statistics. Section 2.4.3
tcpdump Monitors and displays packet headers on a network interface. Section 2.4.4
netstat Displays network statistics. Section 2.4.5
ps axlm Displays idle I/O threads on a system. Server side: Section 2.4.6 Client side: Section 2.4.7
nfswatch Monitors all incoming network traffic to a NFS file server and divides it into several categories. Section 2.4.8
dbx print nfs_sv_active_hist Displays a histogram of the active NFS server threads. Section 3.1
dbx print nchstats Determines the namei cache hit rate. Section 2.6
dbx print bio_stats Determines the metadata buffer cache hit rate. Section 3.1

5.2    Detecting Poor NFS Performance

Table 5-2 describes some poor NFS performance problems and possible solutions.

Table 5-2:  Potential NFS Problems and Solutions

Problem Solutions
NFS server threads busy. Reconfigure the server to run more threads. See Section 2.4.6.
Memory resources are not focused on file-system caching.

Increase the amount of memory allocated to the UBC. See Section 11.1.3.

If you are using AdvFS, increase the memory reserved for AdvFS access structures. See Section 11.1.5.

System resource allocation is not adequate.

Set the value of the maxusers attribute to the number of server NFS operations that are expected to occur each second. See Section 8.1.

UFS metadata buffer cache hit rate is low.

Increase the size of the metadata buffer cache. See Section 11.1.4.

Increase the size of the namei cache. See Section 11.1.2.

CPU idle time is low. Use UFS, instead of AdvFS. See Section 11.3.

5.3    Performance Benefits and Tradeoffs

Table 5-3 lists NFS configuration guidelines and performance benefits and trade-offs.

Table 5-3:  NFS Tuning Guidelines

Benefit Guideline Tradeoff
Enable efficient I/O blocking operations

Configure the appropriate number of threads on an NFS server (Section 5.4.1)

None
Enable efficient I/O blocking operations

Configure the appropriate number of threads on the client system (Section 2.4.7)

None
Improve performance on slow or congested networks

Decrease network timeouts on the client system (Section 5.4.3)

Reduces the theoretical performance
Improve network performance for read-only file systems and enable clients to quickly detect changes

Modify cache timeout limits on the client system (Section 5.4.3)

Increases network traffic to server

The following sections describe these guidelines in more detail.

5.4    NFS Configuration

This section describes specific areas of the network file system (NFS) configuration. For more information about Network Configuration see the Network Administration: Connections guide.

5.4.1    Configuring Server Threads

The nfsd daemon runs on NFS servers to service NFS requests from client systems. The daemon spawns a number of server threads that process NFS requests from client systems. At least one server thread must be running for a machine to operate as a server. The number of threads determines the number of parallel operations and must be a multiple of 8.

To improve performance on frequently used NFS servers, configure either 16 or 32 threads, which provides the most efficient blocking for I/O operations. Configure 16 threads or more, up to a total of 128 combined UDP and TCP threads.

To monitor the number of UDP and TCP threads, use the following commands:

# ps axlm | grep -v grep | grep -c nfs_udp

# ps axlm | grep -v grep | grep -c nfs_tcp

The previous commands will display the number of sleeping or idle threads. If this number is repeatedly 0, additional nfsd threads should be configured. See Section 2.4.7 or nfsd(8) for more information.

5.4.2    Configuring Client Threads

Client systems use the nfsiod daemon to service asynchronous I/O operations, such as buffer cache read-ahead and delayed write operations. The nfsiod daemon spawns several I/O threads to service asynchronous I/O requests to its server. The I/O threads improve performance of both NFS reads and writes.

The optimal number of I/O threads to run depends on many variables, such as how quickly the client is writing data, how many files will be accessed simultaneously, and the behavior of the NFS server. The number of threads must be a multiple of 8 minus 1 (for example, 7,15 ,31, and so forth is optimal).

NFS servers attempt to gather writes into complete UFS clusters before initiating I/O, and the number of threads (plus 1) is the number of writes that a client can have outstanding at one time. Having exactly 7 or 15 threads produces the most efficient blocking for I/O operations. If write gathering is enabled, and the client does not have any threads, you may experience a performance degradation. To disable write gathering, use the dbx patch command to set the nfs_write_gather kernel variable to 0. See Section 3.2 for information about the dbx command.

To display idle I/O threads on the client, use the following command:

# ps axlm | grep -v grep | grep -c nfsiod

If few threads are sleeping, you might improve NFS performance by increasing the number of threads. See Section 2.4.6 or nfsiod(8) for more information.

5.4.3    Modifying Cache Timeout Limits

For read-only file systems and slow network links, performance might improve by changing the cache timeout limits on NFS client systems. These timeouts affect how quickly you see updates to a file or directory that was modified by another host. If you are not sharing files with users on other hosts, including the server system, increasing these values will slightly improve performance and will reduce the amount of network traffic that you generate.

See mount(8) and the descriptions of the acregmin, acregmax, acdirmin, acdirmax, actimeo options for more information.

5.5    NFS Retransmissions

Excessive retransmissions can cause poor performance because the client must wait for the server to respond before it retransmits a request. Excessive retransmissions can be caused by the following problems:

Use the nfsstat -c command to measure the NFS retransmission rate on client machines. You can then determine the rate of retransmissions. See nfsstat(8) for more information.

The average NFS response time to a client request under a low to medium load is approximately 15 milliseconds. Most clients retransmit a request after approximately 1 second. If a 10 percent reduction in performance is acceptable, then a 1.5-millisecond increase in response time is an acceptable limit. This reduction gives an acceptable NFS retransmission rate of 0.15 percent. The calculation is as follows:

  .0015 sec/request
-----------------------  =  0.0015 retransmission/request
1.0 sec/retransmission

Because the worst-case NFS request (read or write 8 KB over the Ethernet) requires seven packets (one request and six fragmented replies), the error rate of the network must be less than 0.02 percent. The calculation is as follows:

  0.15 percent
---------  = 0.02 percent
    7

Use the netstat -i command to measure the network error rate. If this rate is unacceptably high, determine if an individual machine is generating an excessive number of errors. If the problem appears to be pervasive, analyze the cabling technology that is being used. For example, if you have difficulties with noisy nonstandard coaxial cable, you could switch to a twisted-pair Ethernet. See netstat(1) for more information.

5.5.1    Decreasing Network Timeouts

NFS does not perform well if it is used over slow network links, congested networks, or wide area networks (WANs). In particular, network timeouts on client systems can severely degrade NFS performance. This condition can be identified by using the nfsstat command and determining the ratio of timeouts to calls. If timeouts are more than 1 percent of the total calls, NFS performance may be severely degraded. See Section 2.4.3 for sample nfsstat output of timeout and call statistics.

You can also use the netstat -s command to verify the existence of a timeout problem. A nonzero value in the fragments dropped after timeout field in the ip section of the netstat output may indicate that the problem exists. See Section 2.4.5 for sample netstat command output.

If fragment drops are a problem on a client system, use the mount command with the -rsize=1024 and -wsize=1024 options to set the size of the NFS read and write buffers to 1 KB.

5.6    Tuning NFS Servers

Tru64 UNIX uses a buffer cache in memory to avoid disk operations whenever possible. This memory is effective in reducing the client waiting time for relatively slow disk I/O. It also makes disk I/O more efficient by allowing the staging and scheduling of disk operations.

You can improve performance by allowing the disk device driver to schedule several requests at a time to take advantage of the position of the disk arm. The total amount of disk I/O is reduced, because repeat requests may be found in the cache. If NFS read activity is high, then adding more memory to your server can improve server performance because the size of the buffer cache is a percentage of the size of memory.

Performance problems at the server make the system buffer cache inefficient when serving remote write requests. NFS uses a simple stateless protocol, which requires that each client request be complete and self-contained and that the server completely process each request before sending an acknowledgment back to the client. If the server crashes or if an acknowledgment is lost, the client retransmits its request to the server. Because of this, the following events occur:

In NFS Version 2, write operations are synchronous. When the server receives a write request, it must write the data and information needed to find it later before replying. Tru64 UNIX uses a technique called write gathering to reduce this I/O load, but the performance impact is still very high.

In NFS Version 3, write requests are usually asynchronous, which minimize the performance impact of write operations. When the server first receives a write request, it merely acknowledges receipt of the data. Later, the client will send a commit request, requesting the server to write any data that is still in the cache and reply when all data is on stable storage. The protocol includes a write verifier that allows the client to detect if the server crashed and rebooted between the write and commit operations. If so, the client retransmits the uncommitted write requests to ensure that the server has the proper data.

You cannot use the system buffer cache to improve performance with NFS requests that modify data. If a server writes modified data only to volatile memory, a server crash would jeopardize the data integrity. The client may assume that its data is safely stored, but if a crash occurs and the data was stored only in volatile memory, the data may be lost. Because a single server stores data for many clients, many clients can be affected. However, if modifications are always synchronously written to disk, data will not be lost and you can recover from server crashes.

Because NFS operations are synchronously committed to disk, a server can survive system failures since data integrity is ensured. However, performance is degraded because these operations take place at disk speeds and not at the memory speeds available to cachable operations. In addition, because these operations are processed serially, there is no opportunity to optimize the scheduling of the disk arm. Modifications to the cache are written synchronously to disk, so there is no opportunity to decrease write-disk traffic.

NFS servers run only a few small user-level programs, which consume few system resources. File-system tuning is important because processing NFS requests consumes the majority of CPU and wall-clock time. See Chapter 11 for information on file-system tuning.

In addition, if you are running NFS over Transmission Control Protocol (TCP), tuning TCP may improve performance if there are many active clients. See Section 10.2 for information on network subsystem tuning. If you are running NFS over User Datagram Protocol (UDP), network subsystem tuning is not normally needed.

Follow the guidelines in Table 5-4 to help you tune a system that is only serving NFS.

Table 5-4:  NFS Server Tuning Guidelines

Guideline Reference
Set the value of the maxusers attribute to the number of server NFS operations that are expected to occur each second. Section 8.1
Increase the size of the namei cache. Section 11.1.2
Increase the memory reserved for AdvFS access structures, if you are using AdvFS. Section 11.1.5
Increase the size of the metadata buffer cache, if you are using UFS. Section 11.1.4

5.6.1    Modifying NFS Server Side Attributes

You may be able to improve network file system server performance by tuning the following network file system (nfs) subsystem attributes:

Note

Parameters for the nfs kernel subsystem are accessible only by using dbx; there are no comparable system attributes accessible through the /sbin/sysconfig command or the dxkerneltuner GUI. See Section 3.2 for more information about dbx.

See sys_attrs_inet(5) for more information and see Chapter 3 for information about modifying kernel subsystem attributes.

5.6.1.1    Write Gathering

Write gathering can improve the server capacity as it postpones disk writes of data requests by the client. Updating metadata is done less frequently compared to every time a write request is made. Write gathering provides a small amount of latency into the request processing cycle, waiting for more write requests to the same disk blocks to arrive at the server. However, the overall benefit of freeing up CPU cycles on the server outweighs the necessary overhead in most situations.

Write gathering also improves bandwidths as fewer, larger disk writes are completed; for example, there are fewer seeks and missed rotations. Some NFS V3 clients support asynchronous writes, but the benefit from server write gathering is not as apparent. However, clients that do not support asynchronous writes, such as NFS V2 and some NFS V3 clients, need to do synchronous writes during recovery, which leaves write gathering turned on. This can improve system performance.

By default, write gathering is turned on. To receive the best results using this feature, tuning nfsiods on the clients can help improve the scalability of large servers scale.

There are nfs variables that are not applicable if nfs_write_gather is off. You can turn nfs_ufs_lbolt on or off, only if nfs_write_gather is turned on. The following conditions are the same for NFS V2 and NFS V3.

You can modify the variables under the following conditions:

  1. If nfs_write_gather is on (default) -> If nfs_ufs_lbolt is on -> You can specify the time the server will delay the write (see Section 5.6.1.2).

  2. If nfs_write_gather is on (default) -> If nfs_ufs_lbolt is off -> Modifying nfs_*_ticks does not have any effect.

  3. If nfs_write_gather is off -> If nfs_ufs_lbolt is off -> Modifying nfs_*_ticks does not have any effect.

When serving dumb single threaded clients such as PCs or clients that do not support biods or clients that only emit writes infrequently, write gathering can slow down the clients as they wait for delayed replies from the server. This occurs because of the added overhead of latency added on the server side, whch delays writing. To improve a client's performance, disable nfs_write_gather and set it to 0 using dbx -k /vmunix. See Section 3.2 or Section 5.6.1.1.1 for more information about dbx.

5.6.1.1.1    Improving NFS Server Response Time to Client Write Requests

Changing the setting of the nfs_ufs_lbolt parameter to 0 might significantly improve NFS server response time to client write requests under either of the following conditions:

Setting nfs_ufs_lbolt has effect only when the NFS V2 protocol is being used. For NFS V2, the NFS server relies on a technique called write gathering to improve data throughput of synchronous write requests. One aspect of write gathering is to delay the return of a write reply to the client to include replies for any subsequent write requests, which might be received for the same file during a set interval. When data for all requests processed during the delay interval are safely in storage, the server issues all the associated replies at the same time.

The period of time in which the server waits for more client requests is shorter than the time it takes to do a seek operation to disk but longer than it takes to flush data to the device's cache. Therefore, if the device cache is nonvolatile (the data is safely in storage before the transfer to media is complete), the time used by the server to wait for more requests is no longer efficient. Furthermore, the delay period degrades the performance of client systems that issue only one request as a time and then wait for a reply.

The following example shows how to use the dbx assign command to change the nfs_ufs_lbolt parameter in the running kernel, and the dbx patch command to ensure that the new setting is also made to the /vmunix file on disk:

# dbx -k /vmunix
dbx version 5.1
Type 'help' for help.
 
(dbx) print nfs_ufs_lbolt = 1
(dbx) assign nfs_ufs_lbolt = 0
(dbx) patch nfs_ufs_lbolt = 0

The nfs_ufs_lbolt parameter is not specific to using NFS V2 with UFS. Setting this parameter to 0 might also improve NFS V2 performance with AdvFS or the Cluster File System (CFS) as well. However, in a cluster environment, there can be a trade-off. The NFS server and the CFS server for NFS are not necessarily the same member system. If they are not and nfs_ufs_lbolt is set to 0, multiple replies to NFS write requests over TCP mounts are no longer batched in one RPC between the two servers in the cluster. In this case, the increase in the number of RPCs might degrade cluster performance.

Setting nfs3_ufs_lbolt to 0 will eliminate the same time interval as nfs_ufs_lbolt does but for requests using NFS V3 rather than NFS V2. NFS V3 relies far less on write gathering to handle client requests, and setting nfs3_ufs_lbolt to 0 is not likely to improve NFS V3 performance to any significant degree.

See Section 3.2 and Section 5.6.1.1.1 for more information about dbx.

5.6.1.2    Specifying the Amount of Time in Seconds the Server will Delay the Write

The nfs subsystem variables nfs_slow_ticks, nfs_fast_ticks, and nfs_unkn_ticks are specific to write gathering and are used to specify the amount of time, in seconds, the server will delay the write. Write gathering uses these variables to delay the write to give a larger window for more write requests coming for the same file.

The nfs variables are not valid if write gathering is off or if nfs_ufs_lbolt is turned off. See Section 5.6.1.1 for more information.

To identify what type of network card your system is using, enter one of the following commands:

To specify which of the three nfs variables to use, identify the network card type being used on your system as the media for NFS client/server communication and then match it with the specific variable in Table 5-5.

Table 5-5:  Identifying Your Network Card Type

Network Card Type Variable Default Value (msec)
FDDI nfs_fast_ticks 8
Ethernet nfs_slow_ticks 5
Other nfs_unkn_ticks 8

For newer and faster network cards, such as Gigabit Ethernet, decreasing the size of nfs_slow_ticks (IFT_ETHER) may result in increased performance.

5.6.1.3    Increasing the NFS Send and Receive Buffer Size

The nfs_tcpsendspace and nfs_tcprecvspace variables specify the NFS default send and receive buffer size for TCP sockets. If you are using a high-speed network adapter such as a Gigabit Ethernet, increasing these variables can improve system performance.

Use the following command to modify these variables:


# dbx -k/vmunix

The default values for nfs_tcpsendspace and nfs_tcprecvspace are 98304 bytes. For NFS V3, the default values are recommended for most network adapters and an I/O transfer size of 64 K. For NFS Version 2.0, the default values are recommended for an I/O transfer size of 8K.

Use tcpdump to determine if NFS on the remote system supports a TCP window size larger than 65536 bytes. The default size of the the TCP window is 65536 bytes. By default your supports RFC 1323, which allows you to set up larger window sizes through window scaling. However, if NFS on the remote system does not support RFC 1323, it will refuse the SYN packet sent at connection time.

Setting up a larger nfs_tcpsendspace window size on the server will speed up sending packets, increasing performance. On the client, if the client system also has Gigabit Ethernet, then the benefit would be the same.

See the Network Programmer's Guide for more information about window scaling.

5.7    Tuning NFS Clients

Adding disks or memory to a client can improve performance in two ways: by improving access time and by reducing the overall load on the server and network. A client can avoid network file system (NFS) performance problems for files that are not shared (such as root, swap, and temporary files) by using local disks for these files. For diskless clients, increased memory can make a big improvement in performance by allowing the client to swap and page less often. By adding local resources, the demands on the server and the network can be reduced.

While it is easy to improve client performance by adding memory or disks, these improvements may not be cost-effective because of the additional administrative tasks that are needed to maintain the operating system. For example, if you store valuable data on local disks, you must ensure that the disks are backed up. If the data is shared, you may also have to ensure that other systems have access. If you add resources to the server, the additional administrative costs are less than if you add the resources to the client.

The following sections describe how to improve nfs performance by modifying nfs subsystem attributes.

5.7.1    Modifying NFS Client Side Attributes

You may be able to improve NFS server performance by tuning the following nfs subsystem attributes:

Note

Parameters for the nfs kernel subsystem are accessible only by using dbx; there are no comparable system attributes accessible throught the /sbin/sysconfig command or the dxkerneltuner GUI.

See sys_attrs_inet(5) for more information and see Chapter 3 for information about modifying kernel subsystem attributes.

5.7.1.1    Improving Read Performance

When the NFS Version 3.0 client completes a long sequential read or a partial block write and there are idled clients present, the NFS V3 client will attempt to read ahead. The nfs3_readahead variable specifies how many pages the client can read ahead, but not exceeding the maximum pages. The nfs3_maxreadahead variable species the maximum number of pages that the client can read ahead. The nfs3_readahead default value is 2 and the nfs3_maxreadahead default value is 8.

The read-ahead feature helps improve read performance. On systems with newer and faster network interfaces, tuning both variables as well as the number of running nfsiods helps saturate the network interface. This maximizes the system hardware resource. Tuning this variable can double the read performance on newer gigabit network cards.

5.7.1.2    Controlling How Long Before the Client will Start Transmitting

The nfs3_jukebox_delay is the client's variable that controls how long, in seconds, before the client will start retransmitting again. For transactions on a busy server, nfs3_jukebox_delay can be increased to avoid unnecessary retransmission of client requests.The default value for nfs3_jukebox_delay is 10 seconds.

The nfs3_jukebox_delay variable is not related to any of the storage HSM mechanisms. The name is used to reflect the error message sent from the server, NFS3ERR_JUKEBOX. The term JUKEBOX reflects an NFS historic event and implies that the file is temporarily inaccessible. This allows the client to be aware of the server status and be able to make decisions to aggressively delay accessing the file rather than repeatedly retransmitting the request.

5.7.1.3    Directory Name Lookup Cache (DNLC)

The nfs_dnlc variable specifies the directory name lookup cache. By default, the client maintains a cache of results from recent file system directory lookup operations. As fewer server lookup requests are completed, client performance is improved.

To turn off the directory name lookup cache, specify the -noac option with the mount command. If -noac is not specified at mount time, you can turn off the nfs_dnlc variable to disable dnlc.

If the server is rapidly changing the files in the directory on the server, turning nfs_dnlc off can be useful. This will avoid some stale file handles by forcing opens to issue lookup calls.

5.7.1.4    Negative Name Cache Lookups (NNC)

The nfs_nnc variable specifies the negative name cache. In client lookup operations, when cache lookup fails, the client also maintains negative name cache to accommodate the failed vfs layer caching and to further eliminate unnecessary duplicate server lookups over the wire.

By default, nfs_nnc is turned on. For applications where minimal or no NFS directory lookup is done, turning nfs_nnc off can improve application performance.

If the server is rapidly changing the files in the directory on the server, turning nfs_dnlc off can be useful. This can avoid some stale file handles by forcing opens to issue lookup calls.

5.7.1.5    Specifying File Consistency Across NFS Clients

The nfs_cto variable specifies the closed-to-open (CTO) process, when a file is closed and all modified data associated with the file is flushed to the server or when the file is open, the client sends a request to the server to validate the client's local caches. This behavior ensures a file's consistency across multiple NFS clients.

When the -nocto option is specified at mount time, the client does not perform the flush on close. This allows the possibility of differences among copies of the same file as stored on multiple clients. For example:

# mount -nocto fubar:/abc/local

By default, nfs_cto is turned on. The benefit of keeping the variable turned on is that it solves the inconsistency of a file being accessed by multiple clients. If the first client makes a write to the file and closes it, when the second client opens it, the data on the second client is guaranteed to be up-to-date.

The benefit of turning nfs_cto off is when access to a specific file system will be made from only one client. Turning nfs_cto off can improve performance.

The client checks the close-to-open consistency at mount time. First, the client checks the nfs_cto variable and checks the setting of the mount -nocto option. Then the client proceeds to do close-to-open consistency by checking if nfs_cto is on or if the mount -nocto option is not set.

If nfs_cto is turned off at mount time using the -nocto option, set nfs_cto using dbx -k and the client will do CTO again on the mounted file system.

5.7.1.6    Changing the NFS Client Behavior When Fetching File Attributes

Setting the nfs_quicker_attr variable to any value other than 0, changes the NFS client behavior when fetching file attributes, for example ls -l or stat(2). By default, NFS waits for the file I/O to finish and executes the fsync() function to get the most up-to-date attributes. However, this can lead to delays when writing a file over a faster-than-disk interface. In a controlled environoment, setting this variable will fetch the cached attributes.

Modifying the nfs_quicker_attr variable can be useful in a testing or debugging environment when you want to observe the progress of writing a large file by repeatedly fetching file attributes, for example, using ls -l.