9 Managing Disk Storage Performance

There are various ways that you can manage your disk storage. Depending on your performance and availability needs, you can use static disk partitions, the Logical Storage Manager (LSM), hardware RAID, or a combination of these solutions.

The disk storage configuration can have a significant impact on system performance, because disk I/O is used for file system operations and also by the virtual memory subsystem for paging and swapping.

You may be able to improve disk I/O performance by following the configuration and tuning guidelines described in this chapter, which describes the following:

Improving overall disk I/O performance by distributing the I/O load (Section 9.1)

Monitoring the distribution of disk I/O (Section 9.2)

Managing LSM performance (Section 9.3)

Managing hardware RAID subsystem performance (Section 9.4)

Managing Common Access Method (CAM) performance (Section 9.5)

Not all guidelines are appropriate for all disk storage configurations. Before applying any guideline, be sure that you understand your workload resource model, as described in Section 1.8, and the guideline's benefits and tradeoffs.

9.1 Guidelines for Distributing the Disk I/O Load

Distributing the disk I/O load across devices helps to prevent a single disk, controller, or bus from becoming a bottleneck. It also enables simultaneous I/O operations.

For example, if you have 16 GB of disk storage, you may get better performance from sixteen 1-GB disks rather than four 4-GB disks, because using more spindles (disks) may allow more simultaneous operations. For random I/O operations, 16 disks may be simultaneously seeking instead of four disks. For large sequential data transfers, 16 data streams can be simultaneously working instead of four data streams.

Use the following guidelines to distribute the disk I/O load:

Stripe data or disks.
RAID0 (data or disk striping) enables you to efficiently distribute data across the disks. See Section 11.2.1.5 for detailed information about the benefits of striping. Note that availability decreases as you increase the number of disks in a striped array.
To stripe data, use LSM (see Section 9.3). To stripe disks, use a hardware RAID subsystem (see Section 9.4).
As an alternative to data or disk striping, you can use the Advanced File System (AdvFS) to stripe individual files across disks in a file domain. However, do not stripe a file and also the disk on which it resides. See Section 11.2 for more information.

Use RAID5.
RAID5 distributes disk data and parity data across disks in an array to provide high data availability and to improve read performance. However, RAID5 decreases write performance in a nonfailure state, and decreases read and write performance in a failure state. RAID5 can be used for configurations that are mainly read-intensive. As a cost-efficient alternative to mirroring, you can use RAID5 to improve the availability of rarely accessed data.
To create a RAID5 configuration, use LSM (see Section 9.3) or a hardware RAID subsystem (Section 9.4).

Distribute frequently used file systems across disks and, if possible, different buses and controllers.
Place frequently used file systems on different disks and, if possible, different buses and controllers. Directories containing executable files or temporary files, such as /var, /usr, and /tmp, are often frequently accessed. If possible, place /usr and /tmp on different disks.
You can use the AdvFS balance command to balance the percentage of used space among the disks in an AdvFS file domain. See Section 11.2.1.4 for information.

Distribute swap I/O across devices.
To make paging and swapping more efficient and help prevent any single adapter, bus, or disk from becoming a bottleneck, distribute swap space across multiple disks. Do not put multiple swap partitions on the same disk.
You can also use LSM to mirror your swap space. See Section 9.3 for more information.
See Section 12.2 for more information about configuring swap devices for high performance.

Section 9.2 describes how to monitor the distribution of disk I/O.

9.2 Monitoring the Distribution of Disk I/O

Table 9-1 describes some commands that you can use to determine if your disk I/O is being distributed.

Table 9-1: Disk I/O Distribution Monitoring Tools

Tool	Description	Reference
`showfdmn`	Displays information about AdvFS file domains and determines if files are evenly distributed across AdvFS volumes.	Section 11.2.2.3
`advfsstat`	Displays information about AdvFS file domain and fileset usage, and provides performance statistics information for AdvFS file domains and filesets that you can use to determine if the file system I/O is evenly distributed.	Section 11.2.2.1
`swapon`	Displays the swap space configuration and usage. It displays the total amount of allocated swap space, the amount of swap space that is being used, and the amount of free swap space.	Section 12.3.3
`volstat`	Displays performance statistics for LSM objects and provides information about LSM volume and disk usage that you can use to characterize and understand your I/O workload, including the read/write ratio, the average transfer size, and whether disk I/O is evenly distributed.	Section 9.3 or the Logical Storage Manager documentation.
`iostat`	Displays disk I/O statistics and provides information about which disks are being used the most.	Section 9.2.1

9.2.1 Displaying Disk Usage by Using the iostat Command

For the best performance, disk I/O should be evenly distributed across disks. Use the iostat command to determine which disks are being used the most. The command displays disk I/O statistics for disks, in addition to terminal and CPU statistics.

An example of the iostat command is as follows; output is provided in one-second intervals:

# /usr/ucb/iostat 1
    tty     floppy0    dsk0     dsk1   cdrom0     cpu   
 tin tout   bps tps  bps tps  bps tps  bps tps  us ni sy id     
   1   73     0   0   23   2   37   3    0   0   5  0 17 79
   0   58     0   0   47   5  204  25    0   0   8  0 14 77    
   0   58     0   0    8   1   62   1    0   0  27  0 27 46

The iostat command output displays the following information:

The first line of the iostat command output is the average since boot time, and each subsequent report is for the last interval.

For each disk (dskn), the number of KB transferred per second (bps) and the number of transfers per second (tps).

For the system (cpu), the percentage of time the CPU has spent in user state running processes either at their default priority or preferred priority (us), in user mode running processes at a less favored priority (ni), in system mode (sy), and in idle mode (id). This information enables you to determine how disk I/O is affecting the CPU. User mode includes the time the CPU spent executing library routines. System mode includes the time the CPU spent executing system calls.

The iostat command can help you to do the following:

Determine which disk is being used the most and which is being used the least. This information will help you determine how to distribute your file systems and swap space. Use the swapon -s command to determine which disks are used for swap space.

Determine if the system is disk bound. If the iostat command output shows a lot of disk activity and a high system idle time, the system may be disk bound. You may need to balance the disk I/O load, defragment disks, or upgrade your hardware.

Determine if an application is written efficiently. If a disk is doing a large number of transfers (the tps field) but reading and writing only small amounts of data (the bps field), examine how your applications are doing disk I/O. The application may be performing a large number of I/O operations to handle only a small amount of data. You may want to rewrite the application if this behavior is not necessary.

9.3 Managing Storage with LSM

The Logical Storage Manager (LSM) provides flexible storage management, improved disk I/O performance, and high data availability, with little additional overhead. Although any type of system can benefit from LSM, it is especially suited for configurations with large numbers of disks or configurations that regularly add storage.

LSM allows you to set up unique pools of storage that consist of multiple disks. From these disk groups, you can create virtual disks (LSM volumes), which are used in the same way as disk partitions. You can create UFS or AdvFS file systems on a volume, use a volume as a raw device, or create volumes on top of RAID storage sets.

Because there is no direct correlation between an LSM volume and a physical disk, file system or raw I/O can span disks. You can easily add disks to and remove disks from a disk group, balance the I/O load, and perform other storage management tasks.

In addition, LSM provides high performance and high availability by using RAID technology. LSM is often referred to as software RAID. LSM configurations can be more cost-effective and less complex than a hardware RAID subsystem. Note that LSM RAID features require a license.

9.3.1 LSM Features

LSM provides the following basic disk management features that do not require a license:

Disk concatenation enables you to create a large volume from multiple disks.

Load balancing transparently distributes data across disks.

Configuration database load-balancing automatically maintains an optimal number of LSM configuration databases in appropriate locations without manual intervention.

The volstat command provides detailed LSM performance information.

The following LSM features require a license:

RAID0 (striping) distributes data across disks in an array. Striping is useful if you quickly transfer large amounts of data, and also enables you to balance the I/O load from multi-user applications across multiple disks. LSM striping provides significant I/O performance benefits with little impact on the CPU.

RAID1 (mirroring) maintains copies of data on different disks and reduces the chance that a single disk failure will cause the data to be unavailable.

RAID5 (parity RAID) provides data availability through the use of parity and distributes data and parity across disks in an array.

Mirrored root file system and swap space improves availability.

Hot-spare support provides an automatic reaction to I/O failures on mirrored or RAID5 objects by relocating the affected objects to spare disks or other free space.

Dirty-region logging (DRL) improves the recovery time of mirrored volumes after a system failure.

A graphical user interface (GUI) enables easy disk management and provides detailed performance information.

To obtain the best LSM performance, follow the configuration and tuning guidelines described in the Logical Storage Manager manual.

9.4 Managing Hardware RAID Subsystem Performance

Hardware RAID subsystems provide RAID functionality for high performance and high availability, relieve the CPU of disk I/O overhead, and enable you to connect many disks to a single I/O bus or in some cases, multiple buses. There are various types of hardware RAID subsystems with different performance and availability features, but they all include a RAID controller, disks in enclosures, cabling, and disk management software.

RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features, such as write-back caches and complete component redundancy.

Hardware RAID subsystems use disk management software, such as the RAID Configuration Utility (RCU) and the StorageWorks Command Console (SWCC) utility, to manage the RAID devices. Menu-driven interfaces allow you to select RAID levels.

Use hardware RAID to combine multiple disks into a single storage set that the system sees as a single unit. A storage set can consist of a simple set of disks, a striped set, a mirrored set, or a RAID set. You can create LSM volumes, AdvFS file domains, or UFS file systems on a storage set, or you can use the storage set as a raw device.

The following sections discuss the following RAID hardware topics:

Hardware RAID features (Section 9.4.1)

Hardware RAID products (Section 9.4.2)

Guidelines for hardware RAID configurations (Section 9.4.3)

See the hardware RAID product documentation for detailed configuration information.

9.4.1 Hardware RAID Features

Hardware RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features. All hardware RAID subsystems provide you with the following features:

A RAID controller that relieves the CPU of the disk I/O overhead

Increased disk storage capacity
Hardware RAID subsystems allow you to connect a large number of disks to a single I/O bus or, in some cases, multiple buses. In a typical storage configuration, you attach a disk storage shelf to a system by using a SCSI bus connected to a host bus adapter installed in a I/O bus slot. However, you can connect a limited number of disks to a SCSI bus, and systems have a limited number of I/O bus slots.
In contrast, hardware RAID subsystems contain multiple internal SCSI buses that can be connected to a system by using a single I/O bus slot.

Read cache
A read cache improves I/O read performance by holding data that it anticipates the host will request. If a system requests data that is already in the read cache (a cache hit), the data is immediately supplied without having to read the data from disk. Subsequent data modifications are written both to disk and to the read cache (write-through caching).

Write-back cache
Hardware RAID subsystems support write-back caches (as a standard or an optional feature), which can improve I/O write performance while maintaining data integrity. A write-back cache decreases the latency of many small writes, and can improve Internet server performance because writes appear to be written immediately. Applications that perform few writes will not benefit from a write-back cache.
With write-back caching, data intended to be written to disk is temporarily stored in the cache, consolidated, and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.
A write-back cache must have an uninterruptible power source (UPS) to protect against data loss and corruption.

RAID support
All hardware RAID subsystems support RAID0 (disk striping), RAID1 (disk mirroring), and RAID5. High-performance RAID array subsystems also support RAID3 and dynamic parity RAID. See Section 1.3.1 for information about RAID levels.

Non-RAID disk array capability or "just a bunch of disks" (JBOD)

Component hot swapping and hot sparing
Hot-swap support allows you to replace a failed component while the system continues to operate. Hot-spare support allows you to automatically use previously installed components if a failure occurs.

Graphical user interface (GUI) for easy management and monitoring

9.4.2 Hardware RAID Products

There are different types of hardware RAID subsystems, which provide various degrees of performance and availability at various costs. HP supports the following hardware RAID subsystems:

Backplane RAID array storage subsystems
These entry-level subsystems, such as those utilizing the RAID Array 230/Plus storage controller, provide a low-cost hardware RAID solution and are designed for small and midsize departments and workgroups.
A backplane RAID array storage controller is installed in a PCI bus slot and acts as both a host bus adapter and a RAID controller.
Backplane RAID array subsystems provide RAID functionality (0, 1, 0+1, and 5), an optional write-back cache, and hot-swap functionality.

High-performance RAID array subsystems
These subsystems, such as the RAID Array 450 subsystem, provide extensive performance and availability features and are designed for client/server, data center, and medium to large departmental environments.
A high-performance RAID array controller, such as an HSZ80 controller, is connected to a system through an ultrawide differential SCSI bus and a high-performance host bus adapter installed in an I/O bus slot.
High-performance RAID array subsystems provide RAID functionality (0, 1, 0+1, 3, 5, and dynamic parity RAID), dual-redundant controller support, scalability, storage set partitioning, a standard UPS write-back cache, and components that can be hotswapped.

Enterprise Storage Arrays (ESA)/Modular storage array (MSA)
These preconfigured high-performance hardware RAID subsystems, such as the RAID Array 12000, provide the highest performance, availability, and disk capacity of any RAID subsystem. They are used for high transaction-oriented applications and high bandwidth decision-support applications.
ESAs support all major RAID levels, including dynamic parity RAID; fully redundant components that can be hot-swapped; a standard UPS write-back cache; and centralized storage management.

See the HP Logical Storage Manager Version 5.1B QuickSpecs for detailed information about hardware RAID subsystem features.

9.4.3 Hardware RAID Configuration Guidelines

Table 9-2 describes the hardware RAID subsystem configuration guidelines and lists performance benefits as well as tradeoffs.

Table 9-2: Hardware RAID Subsystem Configuration Guidelines

Guideline	Performance Benefit	Tradeoff
Evenly distribute disks in a storage set across different buses (Section 9.4.3.1)	Improves performance and helps to prevent bottlenecks	None
Use disks with the same data capacity in each storage set (Section 9.4.3.2)	Simplifies storage management	None
Use an appropriate stripe size (Section 9.4.3.3)	Improves performance	None
Mirror striped sets (Section 9.4.3.4)	Provides availability and distributes disk I/O performance	Increases configuration complexity and may decrease write performance
Use a write-back cache (Section 9.4.3.5)	Improves write performance, especially for RAID5 storage sets	Cost of hardware
Use dual-redundant RAID controllers (Section 9.4.3.6)	Improves performance, increases availability, and prevents I/O bus bottlenecks	Cost of hardware
Install spare disks (Section 9.4.3.7)	Improves availability	Cost of disks
Replace failed disks promptly (Section 9.4.3.7)	Improves performance	None

The following sections describe some of these guidelines. See your RAID subsystem documentation for detailed configuration information.

9.4.3.1 Distributing Storage Set Disks Across Buses

You can improve performance and help to prevent bottlenecks by distributing storage set disks evenly across different buses.

In addition, make sure that the first member of each mirrored set is on a different bus.

9.4.3.2 Using Disks with the Same Data Capacity

Use disks with the same capacity in a storage set. This simplifies storage management and reduces wasted disk space.

9.4.3.3 Choosing the Correct Hardware RAID Stripe Size

You must understand how your applications perform disk I/O before you can choose the stripe (chunk) size that will provide the best performance benefit. See Section 1.8 for information about identifying a resource model for your system.

Here are some guidelines for stripe sizes:

If the stripe size is large compared to the average I/O size, each disk in a stripe set can respond to a separate data transfer. I/O operations can then be handled in parallel, which increases sequential write performance and throughput. This can improve performance for environments that perform large numbers of I/O operations, including transaction processing, office automation, and file services environments, and for environments that perform multiple random read and write operations.

If the stripe size is smaller than the average I/O operation, multiple disks can simultaneously handle a single I/O operation, which can increase bandwidth and improve sequential file processing. This is beneficial for image processing and data collection environments. However, do not make the stripe size so small that it will degrade performance for large sequential data transfers.

For example, if you use an 8-KB stripe size, small data transfers will be distributed evenly across the member disks, but a 64-KB data transfer will be divided into at least 8 data transfers.

In addition, the following guidelines can help you choose the correct stripe size:

Raw disk I/O operations
If your applications are doing I/O to a raw device and not a file system, use a stripe size that distributes a single data transfer evenly across the member disks. For example, if the typical I/O size is 1 MB and you have a four-disk array, you could use a 256-KB stripe size. This would distribute the data evenly among the four member disks, with each doing a single 256-KB data transfer in parallel.

Small file system I/O operations
For small file system I/O operations, use a stripe size that is a multiple of the typical I/O size (for example, four to five times the I/O size). This will help to ensure that the I/O is not split across disks.

I/O to a specific range of blocks
Choose a stripe size that will prevent any particular range of blocks from becoming a bottleneck. For example, if an application often uses a particular 8-KB block, you may want to use a stripe size that is slightly larger or smaller than 8 KB or is a multiple of 8 KB to force the data onto a different disk.

9.4.3.4 Mirroring Striped Sets

Striped disks improve I/O performance by distributing the disk I/O load. However, striping decreases availability because a single disk failure will cause the entire stripe set to be unavailable. To make a stripe set highly available, you can mirror the stripe set.

9.4.3.5 Using a Write-Back Cache

RAID subsystems support, either as a standard or an optional feature, a nonvolatile (battery-backed) write-back cache that can improve disk I/O performance while maintaining data integrity. A write-back cache improves performance for systems that perform large numbers of writes and for RAID5 storage sets. Applications that perform few writes will not benefit from a write-back cache.

With write-back caching, data intended to be written to disk is temporarily stored in the cache and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.

A write-back cache improves performance, especially for Internet servers, because writes appear to be written immediately. If a failure occurs, upon recovery, the RAID controller detects any unwritten data that still exists in the write-back cache and writes the data to disk before enabling normal controller operations.

A write-back cache must be backed up with an uninterruptible power source (UPS) to protect against data loss and corruption.

If you are using an HSZ40, HSZ50, HSZ70, or HSZ80 RAID controller with a write-back cache, the following guidelines may improve performance:

Set CACHE_POLICY to B.

Set CACHE_FLUSH_TIMER to a minimum of 45 (seconds).

Enable the write-back cache (WRITEBACK_CACHE) for each unit, and set the value of MAXIMUM_CACHED_TRANSFER_SIZE to a minimum of 256.

See the RAID subsystem documentation for more information about using the write-back cache.

9.4.3.6 Using Dual-Redundant Controllers

If supported by your RAID subsystem, you can use a dual-redundant controller configuration and balance the number of disks across the two controllers. This can improve performance, increase availability, and prevent I/O bus bottlenecks.

9.4.3.7 Using Spare Disks to Replace Failed Disks

Install predesignated spare disks on separate controller ports and storage shelves. This will help you to maintain data availability and recover quickly if a disk failure occurs.

9.5 Managing CAM Performance

The Common Access Method (CAM) is the operating system interface to the hardware. CAM maintains pools of buffers that are used to perform I/O. Each buffer takes approximately 1 KB of physical memory. Monitor these pools and tune them if necessary.

You may be able to modify the following io subsystem attributes to improve CAM performance:

cam_ccb_pool_size — The initial size of the buffer pool free list at boot time. The default is 200.

cam_ccb_low_water — The number of buffers in the pool free list at which more buffers are allocated from the kernel. CAM reserves this number of buffers to ensure that the kernel always has enough memory to shut down runaway processes. The default is 100.

cam_ccb_increment — The number of buffers either added or removed from the buffer pool free list. Buffers are allocated on an as-needed basis to handle immediate demands, but are released in a more measured manner to guard against spikes. The default is 50.

If the I/O pattern associated with your system tends to have intermittent bursts of I/O operations (I/O spikes), increasing the values of the cam_ccb_pool_size and cam_ccb_increment attributes may improve performance.

You may be able to diagnose CAM performance problems by using dbx to examine the ccmn_bp_head data structure, which provides statistics on the buffer structure pool that is used for raw disk I/O. The information provided is the current size of the buffer structure pool (num_bp) and the wait count for buffers (bp_wait_cnt).

For example:

# /usr/ucb/dbx -k /vmunix /dev/mem 
(dbx) print ccmn_bp_head
struct {
    num_bp = 50
    bp_list = 0xffffffff81f1be00
    bp_wait_cnt = 0
}
(dbx)

If the value for the bp_wait_cnt field is not 0, CAM has run out of buffer pool space. If this situation persists, you may be able to eliminate the problem by changing one or more of the CAM subsystem attributes described in this section.