1 Overview

The Logical Storage Manager (LSM) software is an optional integrated, host-based disk storage management application that lets you manage storage devices without disrupting users or applications accessing data on those storage devices. Although any system can benefit from LSM, it is especially suited to configurations with large numbers of disks or configurations that regularly add storage.

LSM uses Redundant Arrays of Independent Disks (RAID) technology to enable you to configure storage devices into a virtual pool of storage from which you create LSM volumes. You can configure new and existing UFS and AdvFS file systems, databases, and applications to use LSM volumes. You can also create LSM volumes on top of RAID storage sets.

The benefits of using an LSM volume instead of a disk partition include:

Data loss protection, through mirroring (RAID 1) or striping with parity (RAID5)

Maximized disk usage, by seamlessly combining storage devices to appear as a single storage device to users and applications

Performance improvements, through striping (RAID 0) over different disks and different buses

Data availability in a TruCluster Server environment
TruCluster Server software makes multiple Tru64 UNIX systems appear as a single system on the network. The systems running the TruCluster Server software become members of the cluster and share resources and data storage. This sharing allows applications, such as LSM, to continue uninterrupted if the cluster member on which it was running fails.

This chapter introduces LSM features, concepts, terminology, and available interfaces. For more information on LSM terms and a list of all the LSM commands, see volintro(8).

1.1 Overview of the LSM Object Hierarchy

LSM uses the following hierarchy of objects to organize storage:

LSM disk — An object that represents a storage device that is initialized exclusively for use by LSM

Disk Group — An object that represents a collection of LSM disks for use by an LSM volume

Subdisk — An object that represents a contiguous set of blocks on an LSM disk that LSM uses to write volume data

Plex — An object that represents a subdisk or collection of subdisks to which LSM writes a copy of the volume data or log information

Volume — An object that represents a hierarchy of LSM objects, including LSM disks, subdisks, and plexes in a disk group. Applications and file systems make read and write requests to the LSM volume.

The following sections describe LSM objects in more detail.

1.1.1 LSM Disk

An LSM disk is any storage device supported by Tru64 UNIX, including disks, disk partitions, and hardware RAID sets, that you configure exclusively for use by LSM. LSM views the storage in the same way as the Tru64 UNIX operating system software views it. For example, if the operating system software treats a RAID set as a single storage device, so does LSM. In addition, LSM recognizes and supports hardware disk clones.

For more information on supported storage devices, see the Tru64 UNIX QuickSpecs web site at the following URL:

http://www.tru64unix.compaq.com/docs/pub_page/spds.html

Figure 1-1 shows a typical hardware configuration that LSM supports.

Figure 1-1: Typical LSM Hardware Configuration

A storage device becomes one of the following LSM disk types when you initialize it for use by LSM:

A sliced disk, which is created when you commit an entire disk to LSM use.

In a sliced disk, LSM organizes the storage into two regions on separate partitions — a large public region used for storing data and a private region for storing LSM internal metadata, such as LSM configuration information. The default size of the private region is 4096 blocks.

A simple disk, which is created when you specify a disk partition for LSM use, including the c partition.

In a simple disk, LSM organizes the storage into two regions on the same partition — a large public region used for storing data and a private region for storing LSM internal metadata, such as LSM configuration information. The default size of the private region is 4096 blocks.
Whenever possible, initialize the entire disk as a sliced disk instead of configuring individual disk partitions as simple disks. This ensures that the disk's storage is used efficiently and avoids using space for multiple private regions on the same disk.

A nopriv disk, which is created when you encapsulate a disk or disk partition containing data you want to place under LSM control.

In a nopriv disk, LSM creates only a public region for the existing data and no private region.

1.1.1.1 Disk Access Name

When you initialize a disk for LSM use, LSM assigns it a disk access name based on the device you specify. For example, if you initialize an entire disk (for example, dsk4), the disk access name is dsk4. If you initialize a disk partition (for example, dsk4b), the disk access name is dsk4b.

If you initialize multiple partitions of the same disk as separate LSM disks, each has its own disk access name; for example, dsk2b and dsk2f.

1.1.1.2 Disk Media Name

When you add an LSM disk to a disk group, it gets a disk media name, which can be either the same as the disk access name or a name you assign. Disk media names can include any combination of up through 31 alphanumeric characters but cannot include spaces or a slash ( / ).

For example, a disk with a disk access name of dsk1 can also have a disk media name of dsk1 or a name you assign, such as finance_data_disk.

LSM keeps track of the association of the disk media name and the disk access name. The disk media name provides insulation from operating system naming conventions. This association allows LSM to find the device if you move it to a new location (for example, to a different controller).

If you remove a disk from a disk group, it loses its disk media name. If you add the disk to a different disk group you can give it a different disk media name, or let it use the disk access name by default.

Within a disk group all the disk media names must be unique, but two different disk groups can have disks with the same disk media name.

1.1.2 Disk Group

A disk group is an object that represents a grouping of LSM disks. LSM disks in a disk group share a common configuration database that identifies all the LSM objects (LSM disks, subdisks, plexes, and volumes) in the disk group. LSM automatically creates and maintains copies of the configuration database in the private region of several LSM sliced or simple disks in each disk group.

The default size of the private region is 4096 blocks, and each LSM object requires one record. Two records fit in one sector (512 bytes). Therefore, the default private region size guarantees space for a configuration database that tracks 8192 objects (LSM disks, subdisks, plexes, and volumes).

LSM distributes these copies across all controllers for redundancy. If all disks in a disk group are located on the same controller, LSM distributes the copies across several disks. LSM automatically records changes to the LSM configuration and, if necessary, changes the number and location of copies of the configuration database for a disk group.

You cannot have a disk group that contains only LSM nopriv disks, because an LSM nopriv disk does not have a private region to store copies of the configuration database.

By default, the LSM software creates a disk group named rootdg. The configuration database for rootdg contains information for itself and all other disk groups that you create.

An LSM volume can use disks only within the same disk group. You can create all of your volumes in the rootdg disk group, or you can create other disk groups. For example, if you dedicate disks to store financial data, you can create and assign those disks to a disk group named finance.

1.1.3 Subdisk

A subdisk is an object that represents a contiguous set of blocks in an LSM disk's public region that LSM uses to store data.

By default, LSM assigns subdisk names using the LSM disk media name followed by a dash (-) and an ascending two-digit number beginning with 01; for example, dsk1-01.

Alternatively, you can assign a subdisk name of up to 31 alphanumeric characters that cannot include spaces or the slash ( / ). For example, you can assign a subdisk name of finance_disk-01 on a disk with a disk media name of dsk3.

A subdisk can be:

The entire public region. Figure 1-2 shows that the entire public region of an LSM disk was configured as a subdisk named dsk1-01.
Figure 1-2: Single Subdisk Using a Public Region

A portion of the public region. Figure 1-3 shows a public region of an LSM disk that was configured as two subdisks named dsk2-01 and dsk2-02.
Figure 1-3: Multiple Subdisks Using a Public Region

1.1.4 Data Plex

A data plex is an object that represents a subdisk or collection of subdisks in the same disk group to which LSM writes volume data.

By default, LSM assigns plex names using the volume name followed by a dash (-) and an ascending two-digit number beginning with 01. For example, volume1-01 is the name of the first (or only) plex in a volume named volume1.

Alternatively, you can assign a plex name of up to 31 alphanumeric characters that cannot include spaces or the slash ( / ). For example, you can assign a plex name of finance_plex01.

You can use one of three types of data plex depending on how you want LSM to store volume data on disk:

Concatenated data plex
In a concatenated data plex, LSM writes volume data in a linear manner. When the space in one subdisk has been written to, the remaining data goes to the next sequential subdisk in the plex. Section 1.1.4.1 explains this plex type in more detail. A volume can contain two or more concatenated data plexes, in which case the volume is described as concatenated and mirrored.

Striped data plex
In a striped data plex, LSM separates data into equal-sized data units (defined by the stripe width) and writes the data units to each disk in the plex. This attempts to balance the load across all disks. Section 1.1.4.2 explains this plex type in more detail. A volume can contain two or more striped data plexes, in which case the volume is described as striped and mirrored.

RAID 5 data plex
In a RAID5 data plex, LSM calculates a parity value for the data being written, then separates the data and parity into equal-sized data units (defined by the stripe width), and intersperses the data and parity across all disks. Section 1.1.4.3 explains this plex type in more detail. A volume can contain only one RAID5 data plex, due to internal design constraints.

1.1.4.1 Concatenated Data Plex

In a concatenated data plex, LSM creates a contiguous address space on the subdisks and sequentially writes volume data in a linear manner. If LSM reaches the end of a subdisk while writing data, it continues to write data to the next subdisk, which might be on a different physical disk (Figure 1-4). LSM lets you use space on several disks that otherwise might be unusable. One disk's public region can contain subdisks used in several different volumes.

Figure 1-4: Concatenated Data Plex

A single subdisk failure in a volume with one concatenated data plex will result in LSM volume failure. To prevent this type of failure, you can create multiple plexes (mirrors) on different disks. LSM continuously maintains the data in the mirrors. If a plex becomes unavailable because of a disk failure, the volume continues operating using another plex.

Using disks on different SCSI buses for mirror plexes speeds read requests, because data can be simultaneously read from multiple plexes.

1.1.4.2 Striped Data Plex

In a striped data plex, LSM divides a write request into equal-size data units, defined by the stripe width (64K bytes by default) and writes each data unit to a different disk, creating a stripe of data across the columns (usually, the number of disks in the plex). You can define a different stripe width (data unit size) to achieve the best division of data across the columns.

LSM can simultaneously write two or more data units if the disks are on different SCSI buses.

Figure 1-5 shows a three-column striped plex. In this type of plex, an I/O write request is divided into equal-size units (A, B, C, D, and so on) and each data unit is written sequentially to a different subdisk (in a different disk column).

Figure 1-5: Volume with a Three-Column Striped Data Plex

If a write request does not complete a stripe (the number of data units is not evenly divisible by the number of columns), then the first data unit of the next write request starts in the next column.

If a write request is not evenly divisible by the data unit size, so that the last data unit in a write request does not map to the end of a column, the next write request completes the column then continues to subsequent columns.

As in a concatenated data plex, a single disk failure in a volume with one striped data plex will result in volume failure. To prevent this type of failure, you can create multiple data plexes (mirrors) on different disks. LSM continuously maintains the data in the mirrored data plexes. If a plex becomes unavailable because of a disk failure, the volume continues operating using another plex.

Using disks on different SCSI buses for mirror plexes speeds read requests, because data can be simultaneously read from multiple plexes.

1.1.4.3 RAID 5 Data Plex

In a RAID 5 data plex, LSM calculates a parity value for each stripe of data, then separates the data and parity into equal-size units defined by the stripe width (16K bytes by default) and writes the data and parity units on three or more columns of subdisks, creating a stripe of data and parity across the columns. The parity is contained in one data unit to ensure that each column of disks contains the entire parity value for any given data stripe.

LSM writes the parity in a different column for each consecutive stripe of data. The parity unit for the first stripe is written to the last column. Each successive parity unit is located in the next column to the left of the previous parity unit location. If there are more stripes than columns, the parity unit placement begins again in the last column.

If a disk in one column fails, LSM continues operating using the data and parity information in the remaining columns to reconstruct the missing data. You can define a different stripe width (data unit size) to achieve the best division of data and parity across the columns.

LSM can simultaneously write the data and parity units if the columns are on different SCSI buses.

Figure 1-6 shows how data and parity information are written in a RAID5 data plex.

Figure 1-6: Volume with a RAID 5 Data Plex

1.1.5 Log Plex

A log plex contains information about activity in a volume. After a system failure, LSM recovers only those areas of the volume identified in the log plex as being dirty (written to) at the time of the failure.

By default, LSM creates a log plex for mirrored volumes (volumes with two or more striped or concatenated data plexes) and for volumes that use a RAID5 data plex. Mirrored volumes use a Dirty Region Log (DRL) plex and an optional Fast Plex Attach (FPA) plex. RAID5 volumes use a RAID5 log plex.

Dirty Region Log (DRL) plex
In a DRL plex, LSM keeps track of the regions of a volume that change due to I/O requests. When the system restarts after a crash, LSM resynchronizes only the regions marked as dirty in the log. This greatly reduces the time needed to resynchronize the volume, especially for volumes of hundreds of megabytes or more.
Regions are marked as dirty before the data is written. When the write completes, the region is not immediately marked as clean but instead allowed to stay dirty for a specific length of time. This reduces the overhead of marking the log if another write occurs to the same region. If a dirty region has had no activity for an extended period of time, it is marked as clean.
If you do not use a DRL plex, LSM copies and resynchronizes all the data to each plex to restore the plex consistency when the system restarts after a failure. Although this process occurs in the background and the volume is still available, it can be a lengthy procedure and can result in unnecessarily recovering data, thereby degrading system performance.

Fast Plex Attach (FPA) log plex
A Fast Plex Attach log plex is used to support backups of mirrored volumes. An FPA log tracks the regions of a volume that change while one of its data plexes is detached. The detached plex is used to create a secondary volume for performing backups. When the plex returns to the original volume, only the regions marked in the FPA log plex are written to the returning plex, reducing the time required to resynchronize that plex to the volume.

RAID 5 log plex
In a RAID5 log plex, LSM stores a copy of the data and parity for several full stripes of I/O. When a write to a RAID5 volume occurs, the parity is calculated and the data and parity are first written to the RAID5 log, then to the volume. When the system is restarted after a crash, all the writes in the RAID5 log are written (or possibly rewritten) to the volume. The RAID5 log plex uses a special log subdisk.

In addition, for compatibility with Version 4.0, LSM supports a combination data and log plex. This type of plex is not used in Version 5.0 and higher.

1.1.6 LSM Volume

A volume is an object that represents a hierarchy of plexes, subdisks, and LSM disks in a disk group. Applications and file systems make read and write requests to the LSM volume. The LSM volume depends on the underlying LSM objects to satisfy the request.

An LSM volume can use storage from only one disk group.

LSM does not assign default names to volumes; you must assign a name of up to 31 alphanumeric characters that does not include spaces or the slash ( / ). Within a disk group the volume names must be unique, but two different disk groups can have volumes with the same name.

LSM volumes can be either redundant or nonredundant. A redundant volume provides high data availability, either through mirroring (two or more concatenated or striped data plexes) or through parity (RAID5 data plex). The following sections describe these properties in more detail.

1.1.6.1 Nonredundant Volumes

A nonredundant volume has one data plex and therefore does not provide any data redundancy. The plex layout can be either striped or concatenated.

A nonredundant volume with one concatenated plex is called a simple volume, which can comprise space on one or more disks. This is the simplest volume type. A simple volume usually has the slowest performance of all the volume types.

1.1.6.2 Mirrored Volumes

A mirrored volume has two or more data plexes, which are either concatenated or striped, and a log plex (by default). Depending on the plex layout, this type of volume is also called a concatenated and mirrored volume or a striped and mirrored volume. Usually, all the data plexes in a volume have the same layout (all striped or all concatenated), but this is not a restriction.

Each data plex is an instance of the volume data. A mirrored volume provides data redundancy and improved read performance, as data can be read from any mirror. Mirrored volumes can have up to 32 plexes in any combination of data and DRL plexes, but by definition mirrored volumes have at least two data plexes. Mirrored volumes are redundant volumes, because each mirror (plex) contains a complete copy of the volume data.

Figure 1-7 shows a volume with concatenated and mirrored data plexes and a Dirty Region Log (DRL) plex (Section 1.1.5).

Figure 1-7: Volume with Concatenated and Mirrored Data Plexes

Figure 1-8 shows a volume with striped and mirrored data plexes and a DRL plex.

Figure 1-8: Volume with Striped and Mirrored Data Plexes

Different LSM volumes can use disk space on the same disk, but in different subdisks.

Figure 1-9: Two LSM Volumes Using Subdisks on the Same Disk

In Figure 1-9, volume V1 uses space on disk dsk5 in subdisk dsk5-01. Volume V2 uses space on disk dsk5 as well, but in subdisk dsk5-02.

Volume V1 is striped and mirrored (uses two striped plexes), and volume V2 is a simple volume (uses one concatenated plex). If disk dsk5 fails, volume V1 continues running using plex V1-1. However, volume V2 will fail completely because it is not redundant.

1.1.6.3 RAID 5 Volumes

A RAID 5 volume has one RAID5 data plex and one RAID5 log plex. You can add multiple RAID5 log plexes to the volume, but one is sufficient. RAID5 volumes are redundant volumes, because the volume preserves data redundancy through the parity information.

Note

You cannot mirror a RAID 5 data plex.
The TruCluster Server software does not support RAID5 volumes.

1.1.6.4 Volume Usage Types

An LSM volume has a usage type that defines a particular class of rules for operating on the volume. The rules are typically based on the expected content of the volume. The LSM usage types include:

fsgen — For volumes that contain file systems. This is the default usage type.

gen — For volumes used for swap space or other applications that do not use the system buffer cache (such as a database).

raid5 — For all RAID5 volumes, regardless of what the volume contains.

In addition, LSM uses the following special usage types:

root — For the rootvol volume, created by encapsulating the root partition on a standalone system.

swap — For the primary swap volume, created by encapsulating the primary swap partition on a standalone system. For the swap volumes of cluster members, created by encapsulating the members' swap devices.

cluroot — For the cluster_rootvol volume, created by migrating the clusterwide root file system domain to an LSM volume in a TruCluster Server cluster.

1.1.6.5 Volume Device Interfaces

Like most storage devices, an LSM volume has a block device interface and a character device interface.

A volume's block device interface is located in the /dev/vol/disk_group directory.

A volume's character device interface is located in the /dev/rvol/disk_group directory.

Databases, file systems, applications, and secondary swap use an LSM volume in the same manner as a disk partition because these interfaces support the standard UNIX open, close, read, write, and ioctl calls (Figure 1-10).

Figure 1-10: LSM Volumes Used Like Disk Partitions

1.2 Overview of LSM Interfaces

You can create, display, and manage LSM objects using one of the following interfaces:

A command-line interpreter (CLI), where you enter LSM commands at the system prompt. This manual focuses chiefly on LSM CLI commands.
The CLI provides the full functionality of LSM. The other interfaces might not support some LSM operations.

A Java-based graphical user interface (GUI) called LSM Storage Administrator (lsmsa) that displays a hierarchical view of LSM objects and their relationships.

A menu-based, interactive interface called voldiskadm that supports a limited number of LSM operations on disks and disk groups.
To perform a procedure through the voldiskadm interface, you choose an operation from the main menu and the interface prompts you for information. The voldiskadm interface provides default values when possible. You can press Return to use the default value or enter a new value or enter ? at any time to view online help.
For more information, see voldiskadm(8).

A bit-mapped GUI called Visual Administrator (dxlsm) that uses the Basic X Environment.
The Visual Administrator lets you view and manage disks and volumes and perform limited file system administration. The Visual Administrator displays windows in which LSM objects are represented as icons.

Note

The Visual Adminstrator (dxlsm) has been replaced by the Storage Administrator (lsmsa).

For more information, see dxlsm(8X).

In many cases, you can use the LSM interfaces interchangeably. That is, LSM objects created by one interface are usually manageable through and compatible with LSM objects created by other LSM interfaces; however, the Fast Plex Attach feature is available only through the CLI.

1.2.1 LSM Command-Line Interpreter

The LSM command-line interpreter provides you with most control and specificity in creating and managing LSM objects. The other interfaces (lsmsa, voldiskadm, and dxlsm) do not support all the operations available through the command line.

Most LSM commands fall into two categories: high-level and low-level.

The high-level commands are generally more powerful than the low-level commands and are the recommended method of performing the majority of LSM operations. The high-level commands are shortcuts; they might pass the specified operands or values to several low-level commands, initiating many operations with just one command. The high-level commands sequence the intermediate steps in the correct order and also perform a certain amount of error checking by evaluating the operands you specify and the intended end result of the operation, alerting you to problems. The high-level commands use algorithms and default values that provide the best LSM configuration for the majority of cases.
This manual focuses chiefly on the high-level LSM commands, except where more specificity than they provide is required.

The low-level commands require detailed knowledge and understanding of your particular environment and what you are trying to achieve with your LSM configuration. The low-level commands operate on specific LSM object types. In many cases you must perform several operations, sometimes in a precise order, to accomplish what the high-level commands can do for you in fewer steps and with less risk of error.

Table 1-1 lists the LSM commands described in this manual, their functions, and, where applicable, indicates whether the command is considered a high-level or low-level command.

Table 1-1: LSM Commands Described in This Manual

Command	Function	Command Level (If Applicable)
Setup and Daemon Commands
`volsetup`	Initializes the LSM software by creating the `rootdg` disk group.	—
`volsave`	Backs up the LSM configuration database.	—
`volrestore`	Restores the LSM configuration database.	—
`voldctl`, `vold`, `voliod`	Controls LSM volume configuration and kernel daemon operations.	—
`volwatch`	Monitors LSM for failure events and performs hot-sparing if enabled. Used typically only during initial LSM setup to enable the hot-sparing feature.	—
Object Creation and Management Commands
`volassist`	Creates, mirrors, backs up, and moves volumes automatically.	High-level — The most-often used LSM command for creating and managing LSM volumes.
`voldiskadd`	Creates LSM disks and disk groups.	High-level — Performs many of the same functions as `voldisksetup` and `voldg`, in one interactive session
`voldisksetup`	Adds one or more disks for use with LSM.	High-level
`voldisk`	Administers LSM disks.	Low-level
`voldg`	Administers disk groups.	High-level
`volume`	Administers volumes.	Low-level
`volplex`	Administers plexes.	Low-level
`volsd`	Administers subdisks.	Low-level
`volmake`	Creates LSM objects manually.	Low-level
`voledit`	Creates, modifies, and removes LSM records.	Low-level
`volrecover`	Synchronizes plexes and parity data after a crash or disk failure.	High-level
`volmend`	Mends simple problems in configuration records.	Low-level
`volevac`	Evacuates all volume data from a disk.	High-level
Data Migration and Encapsulation Commands
`volencap`	Sets up scripts to encapsulate disks or disk partitions to LSM volumes.	—
`volreconfig`	Performs the encapsulation scripts set up by `volencap`, and if necessary restarts the system to complete the encapsulation.	—
`volrootmir`	Mirrors the root and swap volumes. (Not supported in a cluster.)	—
`volunroot`	Removes the root and swap volumes. (Not supported in a cluster.)	—
`volmigrate`, `volunmigrate`	Migrates AdvFS domains to or from LSM volumes.	—
`vollogcnvt`	Converts volumes with Block Change Logging (pre-Version 5.0) to Dirty Region Logging (Version 5.0 and higher).	—
Informational Commands
`volprint`	Displays LSM configuration information.	—
`voldisk`	Displays information about LSM disks.	—
`volinfo`	Displays volume status information.	—
`volstat`	Displays LSM statistics.	—
`volnotify`	Displays LSM configuration events.	—
Interface Start-Up Commands
`lsmsa`	Starts the LSM Storage Administrator GUI.	—

In addition to commands, LSM includes the volmake(4) and vol_pattern(4) description files.

For more information on a command, see the reference page corresponding to its name. For example, for more information on the volassist command, enter:

# man volassist

For a list of LSM commands and files, see volintro(8).

1.2.2 Storage Administrator Interface

The Storage Administrator provides dialog boxes in which you enter information to create or manage LSM objects. Completing a dialog box can be the equivalent of entering several commands. The Storage Administrator lets you manage local or remote systems on which LSM is running. You need an LSM license to use the Storage Administrator.

For more information, see Appendix A.