14    Administering Crash Dumps

This chapter describes how you configure and generate system crash dumps and how you save and store crash dumps and their associated data using either the Graphical User Interface or manually. Crash dumps are a snapshot of the running kernel, taken automatically when the system shuts down unexpectedly. Crash dumps are referenced most often when you contact your technical support representatives to analyze and correct problems that result in a system crash. However, if you are an experienced system administrator or developer you may be familiar with techniques of crash dump analysis and you may want to take and analyze your own dump files.

The following topics are discussed in this chapter:

14.1    Overview of Crash Dumps

When a system shuts down unexpectedly, it writes all or part of the data in physical memory either a) to swap space on disk (the virtual memory space) or b) to memory. Such shutdown events are referred to as system crashes or panics. The stored data and status information is called a crash dump. Crash dumps differ from the core dumps produced by an application, after which the system usually keeps running. After a crash dump, the system is shut down to the console prompt (>>>) and may or may not need to be rebooted, depending on the auto_action Boot Halt Restart option.

During the reboot process, the system moves the crash dump into a file and copies the kernel executable image to another file. Together, these files are the crash dump files and are often required for analysis when a system crashes or during the development of custom kernels (debugging). You may need to supply a crash dump file to your technical support organization to analyze system problems.

To administer dumps, you must understand how crash dump files are created. Also, you must reserve space on disks for the crash dump and crash dump files. The amount of space you reserve depends on your system configuration and the type of crash dump you want the system to perform.

14.1.1    Related Documentation and Utilities

Crash dumps make use of the virtual memory swap space provided on disk. Administering the swap space is described in Chapter 3. System event management is described in Chapter 12, which describes the binlogd and syslogd event management channels.

Additional information on crash dumps and related topics is available in manuals and reference pages.

14.1.1.1    Manuals

The following lists manuals that provide useful information for crash dumps and related topics.

14.1.1.2    Reference Pages

The reference pages listed here provide further information regarding associated utilities.

savecore(8)

The program that copies dump data from swap partitions or from memory to a file.

expand_dump(8)

Decompresses a kernel crash dump file.

dumpsys(8)

Copies a snapshot of memory to a dump file without halting the system. This is known as a continuable dump and is useful for estimating crash dump size during dump configuration planning.

sysconfig(8) and sysconfigdb(8)

Maintains the kernel subsystem configuration and is used to set kernel crash dump attributes that control crash behavior. You can use the Kernel Tuner graphical user interface (/usr/bin/X11/dxkerneltuner) to modify kernel attributes. See dxkerneltuner(8) for information. Online help is also available for this interface. The Kernel Tuner can be launched from CDE and is located in the Application Manager: System Admin folder.

swapon(8)

Specifies additional files for paging and swapping. Use this command if you need to add additional temporary or permanent swap space to produce full dumps.

dbx(1)

The source level debugger.

14.1.1.3    SysMan Menu Applications

Applications for configuring and creating crash dumps are available from the SysMan Menu:

Configure System Dump

Use this application to configure the generic system configuration variables associated with the savecore command.

Create Dump Snapshot

Use this application to configure the dumpsys command, which dumps a snapshot of memory manually.

See Section 14.2 for more information.

14.1.2    Files Used During Crash Dumps

By default, the savecore command copies a crash dump file into the /var/adm/crash directory, although you can redirect crash dumps to any file system that you designate and also to a remote host. In common with many other system directories, the /var/adm/crash directory is a context-dependent symbolic link (CDSL), which facilitates joining systems into clusters. The CDSL for this directory is /var/cluster/members/member0/adm/crash. Within this directory, the following files are created or used:

/var/adm/crash/bounds

A text file specifying the incremental number of the next dump (the n in vmzcore.n)

/var/adm/crash/minfree

A file that specifies the minimum number of kilobytes to be left after crash dump files are written

/var/adm/crash/vmzcore.n

The crash dump file, named vmcore.n if the file is not compressed (no z)

/var/adm/crash/vmunix.n

A copy of the kernel that was running at the time of the crash, typically of /vmunix

/etc/syslog.conf, /etc/binlog.conf, and /etc/evmdaemon.conf

The logging configuration files

14.2    Crash Dump Applications

There are two applications that simplify the processes of configuring crash dumps and creating crash dump files manually. These applications are available from the Support and Services branch of the SysMan Menu.

The first application is Configure System Dump. Its purpose is to configure the parameters of the system dump so that you have the appropriate information for your needs should a crash dump occur in the future.

The second application is Create Dump Snapshot. It allows you to set various options and to take a snapshot of memory, which is stored in a file for examination when you cannot halt the system to generate a crash dump.

14.2.1    Using the Configure System Dump Application

The Configure System Dump application lets you tailor the crash dump data according to your needs. This application allows you to set various options that influence the crash dump file should a crash dump occur in the future.

You can access this application from the SysMan Menu by selecting Support and Services then selecting Configure Dump.

Figure 14-1 shows the main window of this application.

Figure 14-1:  Configure System Dump application

After you invoke this application from the SysMan Menu, you can provide the following information:

  1. The first selection, Enable Dumps, requires that you choose one of the following:

    None

    Disables the mechanism to generate a crash dump.

    One

    Enables the mechanism so that one set of crash dump files (the crash dump file and a copy of the kernel) is written, should a crash dump occur.

    Two

    Also enables the mechanism so that a set of crash dump files is written, should a crash dump occur. This option also provides for a subsequent set of crash dump files if an additional system fault occurs while the crash dump files are written.

  2. In the second selection, you can choose a Full or Partial dump.

    A full dump saves the crash dump header information and all physical memory.

    A partial dump saves the crash dump header and copy of the part of the physical memory believed to contain significant information at the time of the system crash selected portion of physical memory.

  3. You may choose to compress the crash dump file with the Enable Compression check box. You should always enable compression unless some reason dictates otherwise.

  4. The next selection, Dump Location, specifies how the crash dump data is stored:

    Disk/Memory on Failure

    Saves the crash dump file to disk. If this fails, a partial compressed memory dump is attempted.

    Memory Only

    Saves the crash dump file to the memory space.

    Disk Only

    Saves the crash dump file to disk; no attempt of a partial compressed memory file is attempted on failure.

  5. In the final selections, you can specify whether or not the crash dump file should be dumped to exempted memory. If so, select the Use Exempted Memory check box to enable the following two fields:

    Exempted Memory Address

    Specify the starting memory address where the dump should be saved.

    Exempted Memory Size

    Specify the size of the memory region.

    Note

    These fields accept decimal and hexadecimal entries. Be sure to precede all hexadecimal entries wth 0x.

The Configure System Dump application offers online help, which provides more information.

14.2.2    Using the Create Dump Snapshot Application

The Create Dump Snapshot application, illustrated in Figure 14-2 , allows you to save a snapshot of system memory to a dump file.

You can access this application from the SysMan Menu by selecting Support and Services then selecting Create Dump Snapshot.

Figure 14-2 shows the main window of this application.

Figure 14-2:  Create Dump Snapshot application

After you invoke this application from the SysMan Menu, you can provide the following information:

  1. Designate a full or partial dump.

  2. Specify whether or not you want the data compressed. If so, use the Compression Ratio % slide bar to specify the compression ratio; a lower value increases the compression, if possible.

  3. Indicate whether the utility should suppress contiguous zeroes with the Disable Zero Suppression check box. This suppression is not recommended.

  4. Select the Ignore insufficient space warning check box unless you want the application to warn you if there was not enough space to save the crash dump data.

  5. Enter the full pathname for the directory, where you would like the crash dump file to be written, in the Dump Directory field. The number of megabytes available in that directory is displayed in the Megabytes Available in field. Select Update MB to update that display field.

The Create Dump Snapshot application offers online help which provides more information.

14.3    Crash Dump Creation

After a system crash, you normally reboot your system by issuing the boot command at the console prompt. During a system reboot, the savecore command moves crash dump information from the swap partitions or memory into a file and copies the kernel that was running at the time of the crash into another file. You can analyze these files to help you determine the cause of a crash. The savecore command also logs the crash in system log files.

You can invoke the savecore command from the command line. See savecore(8) for information.

14.3.1    Setting Dump Kernel Attributes in the Generic Subsystem

You can control the way that a crash dump is taken by setting kernel attributes defined in the generic subsystem, as follows:

dump_savecnt

Limits the number of successful crash dumps that are generated for a single crash and reboot sequence or disables dumping. See Section 14.3.2.

dump_to_memory

Specifies whether primary system core dumps are written to memory or to disk. See Section 14.3.2.

dump_sp_threshold

Controls the partitions to which the crash dump is written. The default value causes the primary swap partition to be used exclusively for crash dumps that are small enough to fit the partition. See Section 14.3.4.

dump_user_pte_pages

Specifies whether or not you want to include user page tables in partial crash dumps. This attribute is off by default. See Section 14.4.2.

expected_dump_compression

Specifies the level of compression that you typically expect the system to achieve. The setting is 500 by default, but can be an integer from 0 to 1000. See Section 14.4.4.

partial_dump

Specifies whether a partial crash dump or a full crash dump is preserved. This attribute is on by default. See Section 14.4.3.

compressed_dump

Specifies whether a dump is compressed to save space. This attribute is on by default. Even if set to off, the value of other dump attributes may cause it to be automatically set to on. See Section 14.4.5 and also Section 14.4.6.

dump_kernel_text

Enables or disables the inclusion of kernel text pages in the dump creating a larger dump file. This attribute only applies when partial dumps are enabled. See Section 14.4.3.

live_dump_dir_name

Specifies the full path to the directory where continuable dumps are written. See Section 14.5.1.

live_dump_zero_suppress

Enables or disables zero compression of continuable dumps. Dump files take slightly longer to create but occupy less space. See Section 14.5.1.

If available, dumping to exempt memory is controlled by the following attributes:

dump_exmem_addr

Identifies the starting address (virtual or physical) for a region of exempt memory used for writing primary dumps.

dump_exmem_size

Specifies the size (in bytes) of the exempt memory region to which dumps are written.

dump_exmem_include

Specifies whether or not exempt memory pages are included in the dump.

See Section 14.4.6 for a description of this feature.

The following command displays typical dump attribute settings:

# sysconfig -q generic | grep dump
compressed_dump = 1
dump_exmem_addr = 0
dump_exmem_size = 0
dump_exmem_include = 0
dump_kernel_text = 0
dump_savecnt = 1
dump_sp_threshold = 4096
dump_to_memory = 0
dump_user_pte_pages = 0
expected_dump_compression = 500
live_dump_zero_suppress = 1
live_dump_dir_name = /var/adm/crash
partial_dump = 1
 
 

See sys_attrs_generic(5) for a description of the dump attributes and settings. See sysconfig(8) and sysconfigdb(8) for information on setting attribute values.

14.3.2    Crash Dump File Creation

When the savecore command begins running during the reboot process, it determines whether a crash dump occurred and whether the file system contains enough space to save it. (The system saves no crash dump if you shut it down and reboot it; that is, the system saves a crash dump only when it crashes.)

The value of the dump_savecnt attribute controls the number of dumps. Possible values are:

0 (zero)

Never generate a crash dump.

1

Generate a primary crash dump (the default).

2

Generate a secondary crash dump.

The value of the dump_to_memory attribute controls the location of dumps and interacts with the value of the dump_savecnt attribute as follows:

-1

Writing dumps to memory is disabled. This value also disables writing a secondary dump when the value of the dump_savecnt attribute is 2.

0 (zero)

Dumps are written to disk except in the event of disk failure, in which case they are written to memory. This is the default behavior.

1

Dumps are written only to memory when sufficient memory is available. A special case is if secondary dumps are enabled (dump_savecnt=2). See sys_attrs_generic(5) for more information.

Under certain circumstances, dumps in memory may be overwritten. To prevent an overwrite from happening, you can write dumps to a protected region of memory called exempt memory. See Section 14.4.6 for more information.

If a crash dump exists and the file system contains enough space to save the crash dump files, the savecore command moves the crash dump and a copy of the kernel into files in the default crash directory, /var/adm/crash. (You can modify the location of the crash directory.)

You can choose to:

The savecore command stores the kernel image in the vmunix.n file, and by default it stores the (compressed) contents of physical memory in the vmzcore.n file.

The n variable specifies the number of the crash, which is recorded in the bounds file in the crash directory. After the first crash, the savecore command creates the bounds file and stores the number 1 in it. The command increments that value for each succeeding crash.

The savecore command runs early in the reboot process so that little or no system swapping occurs before the command runs. This practice helps ensure that crash dumps are not corrupted by swapping.

14.3.3    Crash Dump Logging

After the savecore command writes the crash dump files, it performs the following steps to log the crash in system log files:

  1. Writes a reboot message to the /var/adm/syslog/auth.log file.

    If the system crashed because of a panic condition, the panic string is included in the log entry.

    You can cause the savecore command to write the reboot message to another file by modifying the auth facility entry in the syslog.conf file. If you remove the auth entry from the syslog.conf file, the savecore command does not save the reboot message.

  2. Attempts to save the kernel message buffer from the crash dump.

    The kernel message buffer contains messages created by the kernel that crashed. These messages may help you determine the cause of the crash.

    The savecore command saves the kernel message buffer in the /var/adm/crash/msgbuf.savecore file, by default. You can change the location to which savecore writes the kernel message buffer by modifying the msgbuf.err entry in the /etc/syslog.conf file. If you remove the msgbuf.err entry from the /etc/syslog.conf file, savecore does not save the kernel message buffer.

    Later in the reboot process, the syslogd daemon starts up, reads the contents of the msgbuf.err file, and moves those contents into the /var/adm/syslog/kern.log file, as specified in the /etc/syslog.conf file. The syslogd daemon then deletes the msgbuf.err file. See syslogd(8) for more information about how system logging is performed.

  3. Attempts to save the binary event buffer from the crash dump.

    The binary event buffer contains messages that can help you identify the problem that caused the crash, particularly if the crash resulted from a hardware error.

    The savecore command saves the binary event buffer in the /usr/adm/crash/binlogdumpfile file by default. You can change the location to which savecore writes the binary event buffer by modifying the dumpfile entry in the /etc/binlog.conf file. If you remove the dumpfile entry from the /etc/binlog.conf file, savecore does not save the binary event buffer.

    Later in the reboot process, the binlogd daemon starts up, reads the contents of the /usr/adm/crash/binlogdumpfile file, and moves those contents into the /usr/adm/binary.errlog file, as specified in the /etc/binlog.conf file. The binlogd daemon then deletes the binlogdumpfile file. See binlogd(8) for more information about how binary error logging is performed.

  4. The system may crash before all kernel events are handled and posted. In such cases, the savecore program recovers such events and stores them for later processing. This recovery happens only if any such events are available and if the savecore program is able to extract and save the events successfully. By default, the events are stored in the /var/adm/crash/evm.buf file. See savecore(8) and EVM(5) for more information.

14.3.4    Swap Space

When the system creates a crash dump to disk, it writes the dump to the swap partitions. The system uses the swap partitions because the information stored in those partitions has meaning only for a running system. After the system crashes, the information is useless and can be overwritten safely.

Before the system writes a crash dump, it determines how the dump fits into the swap partitions, which are defined in the /etc/sysconfigtab file. For example, the following fragment of the /etc/sysconfigtab file entry shows three swap partitions available:

vm:
   swapdevice=/dev/disk/dsk0b, /dev/disk/dsk3h, /dev/disk/dsk13g
   vm-swap-eager=1

The following list describes how the system determines where to write the crash dump:

Each crash dump contains a header, which the system always writes to the end of the primary swap partition. The header contains information about the size of the dump and where the dump is stored. This information allows savecore to find and save the dump at system reboot time.

In most cases, compressed dumps fit on the primary swap partition. The next section describes dump_sp_threshold, which is relevant in understanding how a crash dump is created. The use of the remaining kernel attributes controls the content of the dump. These attributes are described in Section 14.4.

Controlling the Use of Swap Partitions

You can configure the system so that it fills the secondary swap partitions with dump information before writing any information (except the dump header) to the primary swap partition. The attribute that you use to configure where crash dumps are written first is the dump_sp_threshold attribute.

The value in the dump_sp_threshold attribute indicates the amount of space you normally want available for swapping as the system reboots. By default, this attribute is set to 16,384 blocks, meaning that the system attempts to leave 8 MB of disk space open in the primary swap partition after the dump is written.

Figure 14-3 shows the default setting of the dump_sp_threshold attribute for a 40 MB swap partition. (40 MB is not typical of a swap partition size on most systems, the example uses small numbers for the sake of simplicity.)

Figure 14-3:  Default dump_sp_threshold Attribute Setting

The system can write 32 MB of dump information to the primary swap partition shown in Figure 14-3. Therefore, a 30 MB dump fits on the primary swap partition and is written to that partition. However, a 40 MB dump is too large; the system writes the crash dump header to the end of the primary swap partition and writes the rest of the crash dump to secondary swap partitions, if available.

Setting the dump_sp_threshold attribute to a high value causes the system to fill the secondary swap partitions before it writes dump information to the primary swap partition. For example, if you set the dump_sp_threshold attribute to a value that is equal to the size of the primary swap partition, the system fills the secondary swap partitions first. (Setting the dump_sp_threshold attribute is described in Section 14.4.1.) Figure 14-4 shows how a crash dump is written to secondary swap partitions on multiple devices.

Figure 14-4:  Crash Dump Written to Multiple Devices

If a noncompressed crash dump fills partition e in Figure 14-4, the system writes the remaining crash dump information to the end of the primary swap partition. The system fills as much of the primary swap partition as is necessary to store the entire dump. The dump is written to the end of the primary swap partition to attempt to protect it from system swapping. However, the dump can fill the entire primary swap partition and may be corrupted by swapping that occurs as the system reboots.

Estimating Crash Dump Size Using dumpsys

To estimate the size of crash dumps, you can use the dumpsys command, which produces a run time or continuable dump. See Section 14.5.1 for information on using the dumpsys command. You may need to temporarily create file system space to hold the experimental dumps. You can produce both full and partial dumps using this method. Crash dumps are compressed by default unless you specify the dumpsys -u command option. You use the expand_dump command to produce a noncompressed dump from the compressed output of the dumpsys command.

Because the crash dumps written to swap are about the same size as their resulting saved crash dump files, you can easily determine how large a crash dump was by examining the size of the resulting crash dump file. For example, to determine the size of the first crash dump file created by your system, enter the following command:

# ls -s /var/adm/crash/vmzcore.0
20480 vmzcore.0

This command displays the number of 512-byte blocks occupied by the crash dump file. In this case, the file occupies 20,480 blocks, so you know that a crash dump written to the swap partitions also occupies about 20,480 blocks.

In some cases, a system contains so much active memory that it cannot store a crash dump on a single disk. For example, suppose your system contains 2 GB of memory but only has several 4 GB disks, most of which are dedicated to storing data. Crash dumps for this system may be too large to fit on a single swap partition on a single device. To cause crash dumps to spread across multiple disks, create a second (and perhaps tertiary) swap partitions on several disks. The system automatically writes dumps that are too large to fit in the specified portion of the primary swap partition to other available swap partitions.

14.3.5    Planning Crash Dump Space

Because crash dumps are written to the swap partitions on your system, you allow space for crash dumps by adjusting the size of your swap partitions, thereby creating temporary or permanent swap space. See swapon(8) for information about modifying the size of swap partitions.

Note

Be sure to list all permanent swap partitions in the /etc/sysconfigtab file. The savecore command, which copies the crash dump from swap partitions to a file, uses the information in the /etc/sysconfigtab file to find the swap partitions. If you omit a swap partition from the /etc/sysconfigtab file, the savecore command may be unable to find the omitted partition.

Space requirements can vary from system to system. During the installation procedure the following algorithm to calculate required space in the /var file system is used:

3 * memsize / 24MB + 3 * 15MB
 

Where memsize is the amount of physical memory in megabytes and 15MB is the approximate size of a custom kernel. This algorithm allows for the preservation of three dumps. The following sections give you guidelines for estimating the amount of space required for partial and full crash dumps on your system. In addition, setting the dump_sp_threshold attribute is described.

14.3.6    Planning and Allocating File System Space for Crash Dump Files

Using the information on typical crash dump sizes for your system, you can plan and allocate the file system space that you need for the /var/adm/crash directory.

For example, suppose you save partial crash dumps. Your system has 96 MB of memory and you have reserved 85 MB of disk space for crash dumps and swapping. In this case, you should reserve 20 MB of space in the file system for storing crash dump files. You need to reserve considerably more space if you want to save files from more than one crash dump. If you want to save files from multiple crash dumps, consider archiving older crash dump files. See Section 14.6 for information about archiving crash dump files.

By default, the savecore command writes crash dump files to the /var/adm/crash directory. To reserve space for crash dump files in the default directory, you must mount the /var/adm/crash directory on a file system that has a sufficient amount of disk space. (For information about mounting file systems, see Chapter 6 and mount(8).) If you expect your crash dump files to be large, you may need to use a Logical Storage Manager (LSM) volume to store crash dump files. For information about creating LSM volumes, see the Logical Storage Manager manual.

If your system cannot save crash dump files because of insufficient disk space, the system returns to single-user mode. This return to single-user mode prevents system swapping from corrupting the crash dump. When in single-user mode, you can make space available in the crash directory or change the crash directory. One possibility in this situation is to issue the savecore command at the single-user mode prompt. On the command line, specify the name of a directory that contains a sufficient amount of file space to save the crash dump files. For example, the following savecore command writes crash dump files to the /usr/adm/crash2 directory:

# savecore /usr/adm/crash2

After the savecore command has saved the crash dump files, you can bring up your system in multiuser mode, or bring up the network to dump remotely to another host using the ftp command.

Specifying a directory on the savecore command line changes the crash directory only for the duration of that command. If the system crashes later and the system startup script invokes the savecore script, the savecore command copies the crash dump to files in the default /var/adm/crash directory.

You can control the default location of the crash directory by setting the SAVECORE_DIR variable with the rcmgr command. For example, to save crash dump files in the /usr/adm/crash2 directory by default (at each system startup), issue the following command:

# /usr/sbin/rcmgr set SAVECORE_DIR /usr/adm/crash2

If you want the system to return to multiuser mode, regardless of whether it saved a crash dump, issue the following command:

# /usr/sbin/rcmgr set SAVECORE_FLAGS M

14.4    Choosing the Content and Method of Crash Dumps

Crash dumps are compressed and partial by default, but can be full, noncompressed, or both. Normally, partial crash dumps provide the information that you need to determine the cause of a crash. However, you may want the system to generate full crash dumps if you have a recurring crash problem and partial crash dumps are not helpful in finding the cause of the crash.

A partial crash dump contains the following:

A full crash dump contains the following:

You can modify how crash dumps are taken:

These options are explained in the following sections.

14.4.1    Adjusting the Primary Swap Partition's Crash Dump Threshold

To configure your system so that it writes even small crash dumps to secondary swap partitions before the primary swap partition, use a large value for the dump_sp_threshold attribute. The value you assign to this attribute indicates the amount of space that you normally want available for system swapping after a system crash, as described in Section 14.3.

To adjust the dump_sp_threshold attribute, issue the sysconfig command. For example, suppose your primary swap partition is 40 MB. To raise the value so that the system writes crash dumps to secondary partitions, issue the following command:

# sysconfig -r generic dump_sp_threshold=81920

In the preceding example, the dump_sp_threshold attribute, which is in the generic subsystem, is set to 81,920 512-byte blocks (40 MB). In this example, the system attempts to leave the entire primary swap partition open for system swapping. The system automatically writes the crash dump to secondary swap partitions and the crash dump header to the end of the primary swap partition.

The sysconfig command changes the value of system attributes for the currently running kernel. To store the new value of the dump_sp_threshold attribute in the sysconfigtab database, modify that database by using the sysconfigdb command. For information about the sysconfigtab database and the sysconfigdb command, see sysconfigdb(8).

Note

After the savecore program has copied the crash dump to a file, all swap devices are immediately available for mounting and swapping. The sharing of swap space only occurs for a short time during boot, and usually on systems with a small amount of physical memory.

14.4.2    Including User Page Tables in Partial Crash Dumps

By default, the system omits user page tables from partial crash dumps. These tables do not normally help you determine the cause of a crash and omitting them reduces the size of crash dumps and crash dump files. However, your technical support person may instruct you to include user page tables for crash dump analysis.

To include user page tables in partial crash dumps, set the value of the dump_user_pte_pages attribute to 1. The dump_user_pte_pages attribute is in the generic subsystem. The following example shows the command you issue to set this attribute:

# sysconfig -r generic dump_user_pte_pages = 1

The sysconfig command changes the value of system attributes for the currently running kernel. To store the new value of the dump_user_pte_pages attribute in the sysconfigtab database, modify that database by using the sysconfigdb command or use the Kernel Tuner GUI (dxkerneltuner).

To return to the system default of not writing user page tables to partial crash dumps, set the value of the dump_user_pte_pages attribute to 0 (zero).

14.4.3    Selecting Partial or Full Crash Dumps

By default, the system generates partial crash dumps. If you want the system to generate full crash dumps, you can modify the default behavior by setting the kernel's partial_dump variable to 0 (zero) as follows:

# sysconfig -r generic partial_dump=0 partial_dump: reconfigured 
# sysconfig -q generic partial_dump generic: partial_dump = 0

You can use the Kernel Tuner GUI or the sysconfigdb command to modify kernel entries and preserve the modifications across reboots. To return to partial crash dumps, reset the partial_dump variable to 1.

When partial dumps are enabled, you can enable the dump_kernel_text attribute to include kernel text pages.

14.4.4    Expected Dump Compression

The expected_dump_compression variable is used to signal how much compression you typically expect to achieve in a dump . By default, the value of expected_dump_compression is set to 500, the median for a minimum allowed value of 0 (zero) and a maximum value of 1000. The following steps describe how you calculate the appropriate expected_dump_compression variable for your system:

  1. Create a compressed dump, using the dumpsys command, as described in Section 14.5.1. Using the ls -s command, record the size of this dump as value a.

  2. Use the expand_dump command to produce a noncompressed version of the dump. Using the ls -s -s command, record the size of this dump as value b.

  3. Divide a by b to produce the approximate compression ratio.

  4. Repeat the previous steps several times and choose the largest value of the compression ratio. Multiply the compression ratio by 1000 to produce an expected dump value.

  5. Add 10 percent of the expected dump value to create a value for the expected_dump_compression variable.

Set the kernel's expected_dump_compression variable to the required value using the sysconfig command as follows:

# sysconfig -r generic expected_dump_compression=750
expected_dump_compression: reconfigured
# sysconfig -q generic partial_dump
generic:
expected_dump_compression=750

You can also use the Kernel Tuner GUI or the sysconfigdb command to modify kernel entries and preserve the modifications across reboots.

14.4.5    Selecting and Using Noncompressed Crash Dumps

By default, crash dumps are compressed to save disk space, allowing you to dump a larger crash dump file to a smaller partition. This can offer significant advantages on systems with a large amount of physical memory, particularly if you want to tune the system to discourage swapping for realtime operations. On reboot after a crash, the savecore command runs automatically and detects that the dump is compressed, using information in the crash dump header in the swap partition. It then copies the crash dump file from the swap partition to the /var/adm/crash directory. The compressed crash dump files are identified by the letter z in the file name, to distinguish them from noncompressed crash dump files. For example: vmzcore.1.

You can use this type of compressed crash dump file with some debugging tools such as dbx, which is not true of the type of compression produced by tools such as compress or gzip. If you need to use a tool that does not support compressed crash dump files, you can convert it to a conventional noncompressed format with the expand_dump utility. The following example shows how you use the expand_dump utility:

# expand_dump vmzcore.2 vmcore.2

You may want to disable compressed dumps if you always use tools or scripts that do not work with the compressed format, and it is not convenient to use the expand_dump command. To disable compressed dumps, use the following sysconfig command:

# sysconfig -r generic compressed_dump=0

The preceding command temporarily changes the mode of dumping to noncompressed and the mode reverts to compressed dumps on the next reboot. To make the change persistent, use the sysconfigdb command to update the value of the compressed_dump attribute in the /etc/sysconfigtab file or use the Kernel Tuner GUI to modify the value in the generic subsystem.

Note

Memory dumps are compressed. If the compressed_dump system attribute is not set, the system automatically enables compression before attempting to write a memory dump.

See savecore(8), expand_dump(8), and sysconfig(8) for more information on crash dump compression and how to produce a noncompressed crash dump file.

14.4.6    Dumping to Exempt Memory

Exempt memory is a region of physical memory that is set aside for a specific purpose. You can create an exempt region of memory by specifying it in the /etc/sysconfigtab file. This causes the exempt region to be created when the system boots. For example:

cma_dd:
 CMA_Option = Size-0x3000000, Alignment - 0, /
 Addrlimit - 0x4000000, Type - 0x96, Flag-0

The preceding /etc/sysconfigtab file entry reserves a region of exempt memory that is 48MB in size. Its Type is specified as M_EXEMPT by the value 0x96, the value of Addrlimit sets the starting position of the exempt region, which at 0x4000000 is 64MB into physical memory. Each time the machine boots, it attempts to reserve this same area of physical memory, making it unavailable for any other use.

Another way of creating exempt regions of memory is by using the contig_malloc() function call with the type M_EXEMPT in a pseudodevice driver. See the malloc.h file for information on the M_EXEMPT type. See contig_malloc(9r) for information on using the function call.

You can use the vmstat command with the -M option to examine exempt memory regions.

To dump to exempt memory, the dump_to_memory attribute must be enabled as described in Section 14.3.2. You also configure the following attributes as required:

dump_exmem_size

Specifies the size (in bytes) of the exempt memory region to which dumps are written. By default, the value is 0 (zero), which disables writing a dump to an exempt memory region.

dump_exmem_addr

Identifies the starting address (virtual or physical) for a region of exempt memory used for writing primary dumps.

dump_exmem_include

Specifies whether or not exempt memory pages are included in the dump. By default, the value is 0 (zero) and exempt memory pages are excluded.

The setting of the dump_exmem_addr attribute has no effect unless you also configure the dump_exmem_size attribute. Ensure that you keep a record of any run-time settings for the attributes so that you are able to find the crash dump after recovery from a system failure.

The following example shows how you reconfigure these attributes:

# sysconfig -q generic dump_to_memory
generic:
dump_to_memory = 0
# sysconfig -r generic dump_to_memory=1
dump_to_memory: reconfigured
# sysconfig -q generic dump_to_memory
generic:
dump_to_memory = 1
 
 

Memory dumps are compressed by default. The compressed_dump system attribute automatically is enabled if it is not set to on. The savecore command uses the vmzcore character special device file to recover the compressed dumps. See savecore(8) and vmzcore(7) for more information.

14.4.7    Dumping to a Remote Host

Use the savecore command with the -r option to write crash dump files from a client host to a remote host using an ftp connection. You can specify either of the following definitions for a remote destination:

For example, the following command specifies a login to the remote host in verbose mode, which enables you to debug the ftp connection.

# savecore -v -r soserv:jeffdump:Cr$hDeBuG

When it connects to the target host, the savecore utility directs the remote ftpd server daemon to create a directory named after the client host name. The crash dump files (bounds, msgbuf.savecore, evm.buf, vmunix.N, and vmcore.N or vmzcore.N) are written to the directory. You must ensure that you have adequate space for the crash dump on the remote device.

See savecore(8) and ftpd(8) for more information and for restrictions when using this feature.

14.5    Generating a Crash Dump Manually

The following sections describe how you can create a crash dump file manually under two conditions:

Continuable dump

Use the dumpsys command to copy a snapshot of the running memory to a dump file without halting the system. (That is, the system continues to run.)

Forced dump

Use the crash console command to cause a crash dump file to be created on a system that is not responding (that is, hung).

It is assumed that you have planned adequate space for the crash dump file and set any kernel parameters as described in the preceding sections.

14.5.1    Continuable Dumps on a Running System

When you cannot halt the system and take a normal crash dump, use the dumpsys command to dump a snapshot of memory. Because the system is running while the dumpsys command takes a snapshot, memory may change as its content is copied. Analysis of the resulting dump can demonstrate incomplete linked lists and partially zeroed pages, which are not problems, but reflect the transitory state of memory. For this reason, some system problems cannot be detected by using the dumpsys command and you may need to halt the system and force a crash dump as described in Section 14.5.2. By default, the dumpsys command writes the crash dump in the /var/adm/crash directory.

The /var/adm/crash/minfree text file specifies the minimum number of kilobytes that must be left on the file system after the dumpsys command copies the dump. By default, this file does not exist, indicating that no minimum is set. To specify a minimum, create the file and store the number of kilobytes you want reserved in it. You can override the setting in the minfree file by using the -i option. The -s option displays the approximate number of disk blocks that full and partial dumps require. The exact size can not be determined ahead of time for the following reasons:

The following examples show a dump from a system with 512 KB of physical memory. The examples show a noncompressed crash dump. Dumps are usually compressed by default:

# dumpsys -s
Approximate full dump size = 1048544 disk blocks,
 if compressed, expect about 524272 disk blocks.
Approximate partial dump size = 94592 disk blocks,
 if compressed, expect about 47296 disk blocks.
# dumpsys -i /userfiles
Saving 536797184 bytes of image in /userfiles/vmzcore.0
# ls /userfiles
bounds vmzcore.0 vmunix.0
 
 

Two attributes in the generic kernel subsystem enable you to control continuable dumps:

live_dump_dir_name

Specifies a path to the directory where the continuable dump files are written. The default value is the /var/adm/crash directory.

live_dump_zero_suppress

Enables or disables zero compression of continuable dumps. Using this option produces files that take longer to create but occupy less space.

See dumpsys(8) and sys_attrs_generic(5) for more information. See the Kernel Debugging manual for information on analyzing the continuable crash dump.

14.5.2    Forcing Crash Dumps on a Hung System

You can force the system to create a crash dump when the system hangs. On most hardware platforms, you force a crash dump by following these steps:

  1. If your system has a switch for enabling and disabling the Halt button, set that switch to the Enable position.

  2. Press the Halt button.

  3. At the console prompt, enter the crash command.

Some systems have no Halt button. In this case, follow these steps to force a crash dump on a hung system:

  1. Type Ctrl/p at the console prompt.

  2. At the console prompt, enter the crash command.

If your system hangs and you force a crash dump, the panic string recorded in the crash dump is the following:

hardware restart

This panic string is always the one recorded when system operation is interrupted by pressing the Halt button or by typing Ctrl/p.

14.6    Storing and Archiving Crash Dump Files

If you are working entirely with compressed (vmzcore.n) crash dump files, they should be compressed for efficient archiving. The following sections discuss certain special cases.

Section 14.6.1 describes how to compress files for storage or transmission if:

Section 14.6.2 describes how to uncompress partial crash dump files that are compressed from vmcore.n files.

14.6.1    Compressing a Crash Dump File

To compress a vmcore.n crash dump file, use a utility such as gzip, compress, or dxarchiver. For example, the following command creates a compressed file named vmcore.3.gz:

# gzip vmcore.3

A vmzcore.n crash dump file uses a special compression method that makes it readable by debuggers and crash analysis tools without requiring decompression. A vmzcore.n file is compressed substantially compared to the equivalent vmcore.n file, but not as much as if the vmcore.n file is compressed using a standard UNIX compression utility, such as gzip. Standard compression applied to a vmzcore.n file makes the resulting file about 40 percent smaller than the equivalent vmzcore.n file.

If you need to apply the maximum compression possible to a vmzcore.n file, follow these steps:

  1. Uncompress the vmzcore.n file by using the expand_dump command; see expand_dump(8) for more information. The following example creates an uncompressed file named vmcore.3 from the vmzcore.3 file:

    # expand_dump vmzcore.3
    

  2. Compress the resulting vmcore.n file using a standard UNIX utility. The following example uses the gzip command to create a compressed file named vmcore.3.gz:

    # gzip vmcore.3
    

You can uncompress a vmzcore.n file only with the expand_dump command. (Do not use gunzip, uncompress, or any other utility). After you uncompress a vmzcore.n file into a vmcore.n file by using the expand_dump command, you cannot compress it back into a vmzcore.n file.

14.6.2    Uncompressing a Partial Crash Dump File

This section applies only if you are uncompressing a partial crash dump file that was previously compressed from a vmcore.n file.

If you compress a vmcore.n dump file from a partial crash dump, you must use care when you uncompress it. Using the gunzip or uncompress command with no options results in a vmcore.n file that requires space equal to the size of memory. In other words, the uncompressed file requires the same amount of disk space as a vmcore.n file from a full crash dump.

This situation occurs because the original vmcore.n file contains UNIX File System (UFS) file holes. (UFS files can contain regions, called holes, which have no associated data blocks.) When a process, such as the gunzip or uncompress command, reads from a hole in a file, the file system returns zero-valued data. Thus, memory omitted from the partial dump is added back into the uncompressed vmcore.n file as disk blocks containing all zeros.

To ensure that the uncompressed core file remains at its partial dump size, you must pipe the output from the gunzip or uncompress command with the -c option to the dd command with the conv=sparse option. For example, to uncompress a file named vmcore.0.Z, issue the following command:

# uncompress -c vmcore.0.Z | dd of=vmcore.0 conv=sparse
 
262144+0 records in
 
262144+0 records out