There are various ways that you can manage your disk storage. Depending on your performance and availability needs, you can use static disk partitions, the Logical Storage Manager (LSM), hardware RAID, or a combination of these solutions.
The disk storage configuration can have a significant impact on system performance, because disk I/O is used for file system operations and also by the virtual memory subsystem for paging and swapping.
You may be able to improve disk I/O performance by following the configuration and tuning guidelines described in this chapter, which describes the following:
Improving overall disk I/O performance by distributing the I/O load (Section 9.1)
Monitoring the distribution of disk I/O (Section 9.2)
Managing LSM performance (Section 9.3)
Managing hardware RAID subsystem performance (Section 9.4)
Managing Common Access Method (CAM) performance (Section 9.5)
Not all guidelines are appropriate for all disk storage configurations.
Before applying any guideline, be sure that you understand your workload resource
model, as described in
Section 1.8, and the guideline's
benefits and tradeoffs.
9.1 Guidelines for Distributing the Disk I/O Load
Distributing the disk I/O load across devices helps to prevent a single disk, controller, or bus from becoming a bottleneck. It also enables simultaneous I/O operations.
For example, if you have 16 GB of disk storage, you may get better performance from sixteen 1-GB disks rather than four 4-GB disks, because using more spindles (disks) may allow more simultaneous operations. For random I/O operations, 16 disks may be simultaneously seeking instead of four disks. For large sequential data transfers, 16 data streams can be simultaneously working instead of four data streams.
Use the following guidelines to distribute the disk I/O load:
Stripe data or disks.
RAID0 (data or disk striping) enables you to efficiently distribute data across the disks. See Section 11.2.1.5 for detailed information about the benefits of striping. Note that availability decreases as you increase the number of disks in a striped array.
To stripe data, use LSM (see Section 9.3). To stripe disks, use a hardware RAID subsystem (see Section 9.4).
As an alternative to data or disk striping, you can use the Advanced File System (AdvFS) to stripe individual files across disks in a file domain. However, do not stripe a file and also the disk on which it resides. See Section 11.2 for more information.
Use RAID5.
RAID5 distributes disk data and parity data across disks in an array to provide high data availability and to improve read performance. However, RAID5 decreases write performance in a nonfailure state, and decreases read and write performance in a failure state. RAID5 can be used for configurations that are mainly read-intensive. As a cost-efficient alternative to mirroring, you can use RAID5 to improve the availability of rarely accessed data.
To create a RAID5 configuration, use LSM (see Section 9.3) or a hardware RAID subsystem (Section 9.4).
Distribute frequently used file systems across disks and, if possible, different buses and controllers.
Place frequently used file systems on different
disks and, if possible, different buses and controllers.
Directories containing
executable files or temporary files, such as
/var,
/usr, and
/tmp, are often frequently accessed.
If possible, place
/usr
and
/tmp
on
different disks.
You can use the AdvFS
balance
command to balance
the percentage of used space among the disks in an AdvFS file domain.
See
Section 11.2.1.4
for information.
Distribute swap I/O across devices.
To make paging and swapping more efficient and help prevent any single adapter, bus, or disk from becoming a bottleneck, distribute swap space across multiple disks. Do not put multiple swap partitions on the same disk.
You can also use LSM to mirror your swap space. See Section 9.3 for more information.
See Section 12.2 for more information about configuring swap devices for high performance.
Section 9.2
describes how to monitor the distribution
of disk I/O.
9.2 Monitoring the Distribution of Disk I/O
Table 9-1
describes
some commands that you can use to determine if your disk I/O is being distributed.
Table 9-1: Disk I/O Distribution Monitoring Tools
| Tool | Description | Reference |
|
Displays information about AdvFS file domains and determines if files are evenly distributed across AdvFS volumes. |
|
|
Displays information about AdvFS file domain and fileset usage, and provides performance statistics information for AdvFS file domains and filesets that you can use to determine if the file system I/O is evenly distributed. |
|
|
Displays the swap space configuration and usage. It displays the total amount of allocated swap space, the amount of swap space that is being used, and the amount of free swap space. |
|
|
Displays performance statistics for LSM objects and provides information about LSM volume and disk usage that you can use to characterize and understand your I/O workload, including the read/write ratio, the average transfer size, and whether disk I/O is evenly distributed. |
Section 9.3 or the Logical Storage Manager documentation. |
|
Displays disk I/O statistics and provides information about which disks are being used the most. |
9.2.1 Displaying Disk Usage by Using the iostat Command
For the best performance, disk
I/O should be evenly distributed across disks.
Use the
iostat
command to determine which disks are being used the most.
The command displays
disk I/O statistics for disks, in addition to terminal and CPU statistics.
An example of the
iostat
command is as follows; output
is provided in one-second intervals:
# /usr/ucb/iostat 1
tty floppy0 dsk0 dsk1 cdrom0 cpu
tin tout bps tps bps tps bps tps bps tps us ni sy id
1 73 0 0 23 2 37 3 0 0 5 0 17 79
0 58 0 0 47 5 204 25 0 0 8 0 14 77
0 58 0 0 8 1 62 1 0 0 27 0 27 46
The
iostat
command output displays the following
information:
The first line of the
iostat
command output
is the average since boot time, and each subsequent report is for the last
interval.
For each disk (dskn),
the number of KB transferred per second (bps) and the number
of transfers per second (tps).
For the system (cpu), the percentage of
time the CPU has spent in user state running processes either at their default
priority or preferred priority (us), in user mode running
processes at a less favored priority (ni), in system mode
(sy), and in idle mode (id).
This information
enables you to determine how disk I/O is affecting the CPU.
User mode includes
the time the CPU spent executing library routines.
System mode includes the
time the CPU spent executing system calls.
The
iostat
command can help you to do the following:
Determine which disk is being used the most and which
is being used the least.
This information will help you determine how to distribute
your file systems and swap space.
Use the
swapon -s
command
to determine which disks are used for swap space.
Determine if the system is disk bound.
If the
iostat
command output shows a lot of disk activity and a high system idle
time, the system may be disk bound.
You may need to balance the disk I/O load,
defragment disks, or upgrade your hardware.
Determine if an application is written efficiently.
If a disk
is doing a large number of transfers (the
tps
field) but
reading and writing only small amounts of data (the
bps
field), examine how your applications are doing disk I/O.
The application
may be performing a large number of I/O operations to handle only a small
amount of data.
You may want to rewrite the application if this behavior is
not necessary.
The Logical Storage Manager (LSM) provides flexible storage management, improved disk I/O performance, and high data availability, with little additional overhead. Although any type of system can benefit from LSM, it is especially suited for configurations with large numbers of disks or configurations that regularly add storage.
LSM allows you to set up unique pools of storage that consist of multiple disks. From these disk groups, you can create virtual disks (LSM volumes), which are used in the same way as disk partitions. You can create UFS or AdvFS file systems on a volume, use a volume as a raw device, or create volumes on top of RAID storage sets.
Because there is no direct correlation between an LSM volume and a physical disk, file system or raw I/O can span disks. You can easily add disks to and remove disks from a disk group, balance the I/O load, and perform other storage management tasks.
In addition, LSM provides high performance and high availability by
using RAID technology.
LSM is often referred to as
software RAID.
LSM configurations can be more cost-effective and less complex
than a hardware RAID subsystem.
Note that LSM RAID features require a license.
9.3.1 LSM Features
LSM provides the following basic disk management features that do not require a license:
Disk concatenation enables you to create a large volume from multiple disks.
Load balancing transparently distributes data across disks.
Configuration database load-balancing automatically maintains an optimal number of LSM configuration databases in appropriate locations without manual intervention.
The
volstat
command provides detailed LSM
performance information.
The following LSM features require a license:
RAID0 (striping) distributes data across disks in an array. Striping is useful if you quickly transfer large amounts of data, and also enables you to balance the I/O load from multi-user applications across multiple disks. LSM striping provides significant I/O performance benefits with little impact on the CPU.
RAID1 (mirroring) maintains copies of data on different disks and reduces the chance that a single disk failure will cause the data to be unavailable.
RAID5 (parity RAID) provides data availability through the use of parity and distributes data and parity across disks in an array.
Mirrored root file system and swap space improves availability.
Hot-spare support provides an automatic reaction to I/O failures on mirrored or RAID5 objects by relocating the affected objects to spare disks or other free space.
Dirty-region logging (DRL) improves the recovery time of mirrored volumes after a system failure.
A graphical user interface (GUI) enables easy disk management and provides detailed performance information.
To obtain the best LSM performance, follow the configuration and tuning
guidelines described in the
Logical Storage Manager
manual.
9.4 Managing Hardware RAID Subsystem Performance
Hardware RAID subsystems provide RAID functionality for high performance and high availability, relieve the CPU of disk I/O overhead, and enable you to connect many disks to a single I/O bus or in some cases, multiple buses. There are various types of hardware RAID subsystems with different performance and availability features, but they all include a RAID controller, disks in enclosures, cabling, and disk management software.
RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features, such as write-back caches and complete component redundancy.
Hardware RAID subsystems use disk management software, such as the RAID Configuration Utility (RCU) and the StorageWorks Command Console (SWCC) utility, to manage the RAID devices. Menu-driven interfaces allow you to select RAID levels.
Use hardware RAID to combine multiple disks into a single storage set that the system sees as a single unit. A storage set can consist of a simple set of disks, a striped set, a mirrored set, or a RAID set. You can create LSM volumes, AdvFS file domains, or UFS file systems on a storage set, or you can use the storage set as a raw device.
The following sections discuss the following RAID hardware topics:
Hardware RAID features (Section 9.4.1)
Hardware RAID products (Section 9.4.2)
Guidelines for hardware RAID configurations (Section 9.4.3)
See the hardware RAID product documentation for detailed configuration
information.
9.4.1 Hardware RAID Features
Hardware RAID storage solutions range from low-cost backplane RAID array controllers to cluster-capable RAID array controllers that provide extensive performance and availability features. All hardware RAID subsystems provide you with the following features:
A RAID controller that relieves the CPU of the disk I/O overhead
Increased disk storage capacity
Hardware RAID subsystems allow you to connect a large number of disks to a single I/O bus or, in some cases, multiple buses. In a typical storage configuration, you attach a disk storage shelf to a system by using a SCSI bus connected to a host bus adapter installed in a I/O bus slot. However, you can connect a limited number of disks to a SCSI bus, and systems have a limited number of I/O bus slots.
In contrast, hardware RAID subsystems contain multiple internal SCSI buses that can be connected to a system by using a single I/O bus slot.
Read cache
A read cache improves I/O read performance by holding data that it anticipates the host will request. If a system requests data that is already in the read cache (a cache hit), the data is immediately supplied without having to read the data from disk. Subsequent data modifications are written both to disk and to the read cache (write-through caching).
Write-back cache
Hardware RAID subsystems support write-back caches (as a standard or an optional feature), which can improve I/O write performance while maintaining data integrity. A write-back cache decreases the latency of many small writes, and can improve Internet server performance because writes appear to be written immediately. Applications that perform few writes will not benefit from a write-back cache.
With write-back caching, data intended to be written to disk is temporarily stored in the cache, consolidated, and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.
A write-back cache must have an uninterruptible power source (UPS) to protect against data loss and corruption.
RAID support
All hardware RAID subsystems support RAID0 (disk striping), RAID1 (disk mirroring), and RAID5. High-performance RAID array subsystems also support RAID3 and dynamic parity RAID. See Section 1.3.1 for information about RAID levels.
Non-RAID disk array capability or "just a bunch of disks" (JBOD)
Component hot swapping and hot sparing
Hot-swap support allows you to replace a failed component while the system continues to operate. Hot-spare support allows you to automatically use previously installed components if a failure occurs.
Graphical user interface (GUI) for easy management and monitoring
There are different types of hardware RAID subsystems, which provide various degrees of performance and availability at various costs. HP supports the following hardware RAID subsystems:
Backplane RAID array storage subsystems
These entry-level subsystems, such as those utilizing the RAID Array 230/Plus storage controller, provide a low-cost hardware RAID solution and are designed for small and midsize departments and workgroups.
A backplane RAID array storage controller is installed in a PCI bus slot and acts as both a host bus adapter and a RAID controller.
Backplane RAID array subsystems provide RAID functionality (0, 1, 0+1, and 5), an optional write-back cache, and hot-swap functionality.
High-performance RAID array subsystems
These subsystems, such as the RAID Array 450 subsystem, provide extensive performance and availability features and are designed for client/server, data center, and medium to large departmental environments.
A high-performance RAID array controller, such as an HSZ80 controller, is connected to a system through an ultrawide differential SCSI bus and a high-performance host bus adapter installed in an I/O bus slot.
High-performance RAID array subsystems provide RAID functionality (0, 1, 0+1, 3, 5, and dynamic parity RAID), dual-redundant controller support, scalability, storage set partitioning, a standard UPS write-back cache, and components that can be hotswapped.
Enterprise Storage Arrays (ESA)/Modular storage array (MSA)
These preconfigured high-performance hardware RAID subsystems, such as the RAID Array 12000, provide the highest performance, availability, and disk capacity of any RAID subsystem. They are used for high transaction-oriented applications and high bandwidth decision-support applications.
ESAs support all major RAID levels, including dynamic parity RAID; fully redundant components that can be hot-swapped; a standard UPS write-back cache; and centralized storage management.
See the HP
Logical Storage Manager Version 5.1B QuickSpecs
for detailed
information about hardware RAID subsystem features.
9.4.3 Hardware RAID Configuration Guidelines
Table 9-2
describes
the hardware RAID subsystem configuration guidelines and lists performance
benefits as well as tradeoffs.
Table 9-2: Hardware RAID Subsystem Configuration Guidelines
| Guideline | Performance Benefit | Tradeoff |
| Evenly distribute disks in a storage set across different buses (Section 9.4.3.1) | Improves performance and helps to prevent bottlenecks | None |
| Use disks with the same data capacity in each storage set (Section 9.4.3.2) | Simplifies storage management | None |
| Use an appropriate stripe size (Section 9.4.3.3) | Improves performance | None |
| Mirror striped sets (Section 9.4.3.4) | Provides availability and distributes disk I/O performance | Increases configuration complexity and may decrease write performance |
| Use a write-back cache (Section 9.4.3.5) | Improves write performance, especially for RAID5 storage sets | Cost of hardware |
| Use dual-redundant RAID controllers (Section 9.4.3.6) | Improves performance, increases availability, and prevents I/O bus bottlenecks | Cost of hardware |
| Install spare disks (Section 9.4.3.7) | Improves availability | Cost of disks |
| Replace failed disks promptly (Section 9.4.3.7) | Improves performance | None |
The following sections describe some of these guidelines.
See your RAID
subsystem documentation for detailed configuration information.
9.4.3.1 Distributing Storage Set Disks Across Buses
You can improve performance and help to prevent bottlenecks by distributing storage set disks evenly across different buses.
In addition, make sure that the first member of each mirrored set is
on a different bus.
9.4.3.2 Using Disks with the Same Data Capacity
Use disks with the same capacity in a storage set.
This simplifies storage management and reduces wasted disk space.
9.4.3.3 Choosing the Correct Hardware RAID Stripe Size
You must understand how your applications perform disk I/O before you can choose the stripe (chunk) size that will provide the best performance benefit. See Section 1.8 for information about identifying a resource model for your system.
Here are some guidelines for stripe sizes:
If the stripe size is large compared to the average I/O size, each disk in a stripe set can respond to a separate data transfer. I/O operations can then be handled in parallel, which increases sequential write performance and throughput. This can improve performance for environments that perform large numbers of I/O operations, including transaction processing, office automation, and file services environments, and for environments that perform multiple random read and write operations.
If the stripe size is smaller than the average I/O operation, multiple disks can simultaneously handle a single I/O operation, which can increase bandwidth and improve sequential file processing. This is beneficial for image processing and data collection environments. However, do not make the stripe size so small that it will degrade performance for large sequential data transfers.
For example, if you use an 8-KB stripe size, small data transfers will be distributed evenly across the member disks, but a 64-KB data transfer will be divided into at least 8 data transfers.
In addition, the following guidelines can help you choose the correct stripe size:
Raw disk I/O operations
If your applications are doing I/O to a raw device and not a file system, use a stripe size that distributes a single data transfer evenly across the member disks. For example, if the typical I/O size is 1 MB and you have a four-disk array, you could use a 256-KB stripe size. This would distribute the data evenly among the four member disks, with each doing a single 256-KB data transfer in parallel.
Small file system I/O operations
For small file system I/O operations, use a stripe size that is a multiple of the typical I/O size (for example, four to five times the I/O size). This will help to ensure that the I/O is not split across disks.
I/O to a specific range of blocks
Choose a stripe size that will prevent any particular range of blocks from becoming a bottleneck. For example, if an application often uses a particular 8-KB block, you may want to use a stripe size that is slightly larger or smaller than 8 KB or is a multiple of 8 KB to force the data onto a different disk.
9.4.3.4 Mirroring Striped Sets
Striped disks improve I/O performance by distributing
the disk I/O load.
However, striping decreases availability because a single
disk failure will cause the entire stripe set to be unavailable.
To make a
stripe set highly available, you can mirror the stripe set.
9.4.3.5 Using a Write-Back Cache
RAID subsystems support, either as a standard or an optional feature, a nonvolatile (battery-backed) write-back cache that can improve disk I/O performance while maintaining data integrity. A write-back cache improves performance for systems that perform large numbers of writes and for RAID5 storage sets. Applications that perform few writes will not benefit from a write-back cache.
With write-back caching, data intended to be written to disk is temporarily stored in the cache and then periodically written (flushed) to disk for maximum efficiency. I/O latency is reduced by consolidating contiguous data blocks from multiple host writes into a single unit.
A write-back cache improves performance, especially for Internet servers, because writes appear to be written immediately. If a failure occurs, upon recovery, the RAID controller detects any unwritten data that still exists in the write-back cache and writes the data to disk before enabling normal controller operations.
A write-back cache must be backed up with an uninterruptible power source (UPS) to protect against data loss and corruption.
If you are using an HSZ40, HSZ50, HSZ70, or HSZ80 RAID controller with a write-back cache, the following guidelines may improve performance:
Set
CACHE_POLICY
to B.
Set
CACHE_FLUSH_TIMER
to a minimum of 45
(seconds).
Enable the write-back cache (WRITEBACK_CACHE)
for each unit, and set the value of
MAXIMUM_CACHED_TRANSFER_SIZE
to a minimum of 256.
See the RAID subsystem documentation for more information about using
the write-back cache.
9.4.3.6 Using Dual-Redundant Controllers
If supported by your RAID subsystem, you
can use a dual-redundant controller configuration and balance the number of
disks across the two controllers.
This can improve performance, increase availability,
and prevent I/O bus bottlenecks.
9.4.3.7 Using Spare Disks to Replace Failed Disks
Install predesignated spare disks on separate controller ports
and storage shelves.
This will help you to maintain data availability and
recover quickly if a disk failure occurs.
9.5 Managing CAM Performance
The Common Access Method (CAM) is the operating system interface to the hardware. CAM maintains pools of buffers that are used to perform I/O. Each buffer takes approximately 1 KB of physical memory. Monitor these pools and tune them if necessary.
You may be able to modify the following
io
subsystem
attributes to improve CAM performance:
cam_ccb_pool_size
The initial size of the buffer pool free list at boot time.
The default is
200.
cam_ccb_low_water
The number of buffers in the pool free list at which more buffers are allocated
from the kernel.
CAM reserves this number of buffers to ensure that the kernel
always has enough memory to shut down runaway processes.
The default is 100.
cam_ccb_increment
The number of buffers either added or removed from the buffer pool free list.
Buffers are allocated on an as-needed basis to handle immediate demands, but
are released in a more measured manner to guard against spikes.
The default
is 50.
If the I/O pattern associated with your system tends to have intermittent
bursts of I/O operations (I/O spikes), increasing the values of the
cam_ccb_pool_size
and
cam_ccb_increment
attributes
may improve performance.
You may
be able to diagnose CAM performance problems by using
dbx
to examine the
ccmn_bp_head
data structure, which provides
statistics on the buffer structure pool that is used for raw disk I/O.
The
information provided is the current size of the buffer structure pool (num_bp) and the wait count for buffers (bp_wait_cnt).
For example:
# /usr/ucb/dbx -k /vmunix /dev/mem
(dbx) print ccmn_bp_head
struct {
num_bp = 50
bp_list = 0xffffffff81f1be00
bp_wait_cnt = 0
}
(dbx)
If the value for the
bp_wait_cnt
field is not 0,
CAM has run out of buffer pool space.
If this situation persists, you may
be able to eliminate the problem by changing one or more of the CAM subsystem
attributes described in this section.