Kernel debugging is a task normally performed by systems engineers writing kernel programs. A kernel program is one that is built as part of the kernel and that references kernel data structures. System administrators might also debug the kernel in the following situations:
A process is hung or stops running unexpectedly
The need arises to examine, and possibly modify, kernel parameters
The system itself hangs, panics, or crashes
This manual describes how to debug kernel programs and the kernel. It also includes information about analyzing crash dump files.
In addition to the information provided here, tracing a kernel problem can require a basic understanding of one or more of the following technical areas:
The hardware architecture
See the Alpha Architecture Handbook for an overview of the Alpha hardware architecture and a description of the 64-bit Alpha RISC instruction set.
The internal design of the operating system at a source code and data structure level
See the Alpha Architecture Reference Manual for information on how the Tru64 UNIX operating system interfaces with the hardware.
This chapter provides an overview of the following topics:
Linking a kernel image prior to debugging for systems that are running a kernel built at boot time. (Section 1.1)
Debugging kernel programs (Section 1.2)
Debugging the running kernel (Section 1.3)
Analyzing a crash dump file(Section 1.4)
1.1 Linking a Kernel Image for Debugging
By default, the kernel is a statically linked image that resides in
the file
/vmunix
.
However, your system might be configured
so that it is linked at bootstrap time.
Rather than being a bootable image,
the boot file is a text file that describes the hardware and software that
will be present on the running system.
Using this information, the bootstrap
linker links the modules that are needed to support this hardware and software.
The linker builds the kernel directly into memory.
You cannot directly debug a bootstrap-linked kernel because you must supply the name of an image to the kernel debugging tools. Without the image, the tools have no access to symbol names, variable names, and so on. Therefore, the first step in any kernel debugging effort is to determine whether your kernel was linked at bootstrap time. If the kernel was linked at bootstrap time, you must then build a kernel image file to use for debugging purposes.
The best way to determine whether your system is bootstrap linked or
statically linked is to use the
file
command to test the
type of file from which your system was booted.
If your system is a bootstrap-linked
system, it was booted from an ASCII text file; otherwise, it was booted from
an executable image file.
For example, enter the following command to determine
the type of file from which your system was booted:
#/usr/bin/file `/usr/sbin/sizer -b` /etc/sysconfigtab: ascii text
The
sizer -b
command returns the name of the file
from which the system was booted.
This file name is input to the
file
command, which determines that the system was booted from an
ASCII text file.
The output shown in the preceding example indicates that
the system is a bootstrap-linked system.
If the system had been booted from
an executable image file named
vmunix
, the output from
the
file
command would have appeared as follows:
vmunix:COFF format alpha executable or object module not stripped
If your system is running a bootstrap-linked kernel, build a kernel image that is identical to the bootstrap-linked kernel your system is running, by entering the following command:
# /usr/bin/ld -o vmunix.image `/usr/sbin/sizer -m`
The output from the
sizer -m
command
is a list of the exact modules and linker flags used to build the currently
running bootstrap-linked kernel.
This output causes the
ld
command to create a kernel image that is identical to the bootstrap-linked
kernel running on your system.
The kernel image is written to the file named
by the
-o
flag, in this case the
vmunix.image
file.
Once you create this image, you can debug the kernel as described in
this manual, using the
dbx
,
kdbx
, and
kdebug
debuggers.
When you invoke the
dbx
or
kdbx
debugger, remember to specify the name of the kernel image
file you created with the
ld
command, such as the
vmunix.image
file shown here.
When you are finished debugging the kernel, you can remove the kernel
image file you created for debugging purposes.
1.2 Debugging Kernel Programs
Kernel
programs can be difficult to debug because you normally cannot control kernel
execution.
To make debugging kernel programs more convenient, the system
provides the
kdebug
debugger.
The
kdebug
debugger is code that resides inside the kernel and allows you to use the
dbx
debugger to control execution of a running kernel in the same
manner as you control execution of a user space program.
To debug a kernel
program in this manner, follow these steps:
Build your kernel program into the kernel on a test system.
Set up the
kdebug
debugger, as described
in
Section 2.3.
Enter the
dbx -remote
command on a remote
build system, supplying the pathname of the kernel running on the test system.
Set breakpoints and enter
dbx
commands
as you normally would.
Section 2.1
describes some of the
commands that are useful during kernel debugging.
For general information
about using
dbx
, see the
Programmer's Guide.
The system also provides the
kdbx
debugger, which
is designed especially for debugging kernel code.
This debugger contains
a number of special commands, called extensions, that allow you to display
kernel data structures in a readable format.
Section 2.2
describes using
kdbx
and its extensions.
(You cannot use
the
kdbx
debugger with the
kdebug
debugger.)
Another feature of
kdbx
is that you can customize
it by writing your own extensions.
The system contains a set of
kdbx
library routines that you can use to create extensions that
display kernel data structures in ways that are meaningful to you.
Chapter 3
describes writing
kdbx
extensions.
1.3 Debugging the Running Kernel
When you have problems with a process or set of processes, you can attempt to identify the problem by debugging the running kernel. You might also invoke the debugger on the running kernel to examine the values assigned to system parameters. (You can modify the value of the parameters using the debugger, but this practice can cause problems with the kernel and should be avoided.)
You use the
dbx
or
kdbx
debugger
to examine the state of processes running on your system and to examine the
value of system parameters.
The
kdbx
debugger provides
special commands, called extensions, that you can use to display kernel data
structures.
(Section 2.2.3
describes the extensions.)
To examine the state of processes, you invoke the debugger (as described in Section 2.1 or Section 2.2) using the following command:
# dbx -k /vmunix /dev/mem
This command invokes
dbx
with the kernel debugging
-k
option
(flag), which maps kernel addresses to make kernel debugging easier.
The
/vmunix
and
/dev/mem
parameters cause the debugger
to operate on the running kernel.
Once in the
dbx
environment, you use
dbx
commands to display process IDs (PIDs) and trace execution of processes.
You can perform the same tasks using the
kdbx
debugger.
The following example shows the
dbx
command you use to
display process IDs:
(dbx) kps PID COMM 00000 kernel idle 00001 init 00014 kloadsrv 00016 update
.
.
.
If you want to trace the execution of the
kloadsrv
daemon, use the
dbx
command to set the
$pid
symbol to the PID of the
kloadsrv
daemon.
Then, enter
the
t
command:
(dbx) set $pid = 14 (dbx) t > 0 thread_block() ["/usr/sde/build/src/kernel/kern/sched_prim.c":1623, 0xfffffc0000\ 43d77c] 1 mpsleep(0xffffffff92586f00, 0x11a, 0xfffffc0000279cf4, 0x0, 0x0) ["/usr/sde/build\ /src/kernel/bsd/kern_synch.c":411, 0xfffffc000040adc0] 2 sosleep(0xffffffff92586f00, 0x1, 0xfffffc000000011a, 0x0, 0xffffffff81274210) ["/usr/sde\ /build/src/kernel/bsd/uipc_socket2.c":654, 0xfffffc0000254ff8] 3 sosbwait(0xffffffff92586f60, 0xffffffff92586f00, 0x0, 0xffffffff92586f00, 0x10180) ["/usr\ /sde/build/src/kernel/bsd/uipc_socket2.c":630, 0xfffffc0000254f64] 4 soreceive(0x0, 0xffffffff9a64f658, 0xffffffff9a64f680, 0x8000004300000000, 0x0) ["/usr/sde\ /build/src/kernel/bsd/uipc_socket.c":1297, 0xfffffc0000253338] 5 recvit(0xfffffc0000456fe8, 0xffffffff9a64f718, 0x14000c6d8, 0xffffffff9a64f8b8,\ 0xfffffc000043d724) ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":1002,\ 0xfffffc00002574f0] 6 recvfrom(0xffffffff81274210, 0xffffffff9a64f8c8, 0xffffffff9a64f8b8, 0xffffffff9a64f8c8,\ 0xfffffc0000457570) ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":860,\ 0xfffffc000025712c] 7 orecvfrom(0xffffffff9a64f8b8, 0xffffffff9a64f8c8, 0xfffffc0000457570, 0x1, 0xfffffc0000456fe8)\ ["/usr/sde/build/src/kernel/bsd/uipc_syscalls.c":825, 0xfffffc000025708c] 8 syscall(0x120024078, 0xffffffffffffffff, 0xffffffffffffffff, 0x21, 0x7d) ["/usr/sde\ /build/src/kernel/arch/alpha/syscall_trap.c":515, 0xfffffc0000456fe4 9 _Xsyscall(0x8, 0x12001acb8, 0x14000eed0, 0x4, 0x1400109d0) ["/usr/sde/build\ /src/kernel/arch/alpha/locore.s":1046, 0xfffffc00004486e4] (dbx) exit
Often, looking at the trace of a process that is hanging or has unexpectedly stopped running reveals the problem. Once you find the problem, you can modify system parameters, restart daemons, or take other corrective actions.
For more information about the commands you can use to debug the running
kernel, see
Section 2.1
and
Section 2.2.
1.4 Analyzing a Crash Dump File
If your system crashes, you can often find the cause of the crash by
using
dbx
or
kdbx
to debug or analyze
a crash dump file.
The operating system can crash because one of the following occurs:
Hardware exception
Software panic
Hung system
When a system hangs, it is often necessary to force the system to create dumps that you can analyze to determine why the system hung. The System Administration manual describes the procedure for forcing a crash dump of a hung system.
Resource exhaustion
The system crashes or hangs because it cannot continue executing. Normally, even in the case of a hardware exception, the operating system detects the problem. (For example a machine-checking routine might discover a hardware problem and begin the process of crashing the system.) In general, the operating system performs the following steps when it detects a problem from which it cannot recover:
It
calls the system
panic
function.
The
panic
function saves the contents of registers
and sends the panic string (a message describing the reason for the system
panic) to the error logger and the console terminal.
If the system is a Symmetric Multiprocessing (SMP) system, the
panic
function notifies the other CPUs in the system that a
panic
has occurred.
The other CPUs then also execute the
panic
function and record the following panic string:
cpu_ip_intr: panic request
Once each CPU has recorded the system panic, execution continues only on the master CPU. All other CPUs in the SMP system stop execution.
It
calls the system
boot
function.
The
boot
function records the stack.
The
dump
function copies core memory into swap partitions
and the system stops running or the reboot process begins.
Console environment
variables control whether the system reboots automatically.
(The
System Administration
manual describes these environment variables.)
At system reboot time, the copy of core memory saved in the swap partitions is copied into a file, called a crash dump file. You can analyze the crash dump file to determine what caused the crash. By default, the crash dump is a partial (rather than full) dump and is in compressed form. For complete information about managing crash dumps and crash dump files, including how to change default settings, see the System Administration manual. For examples of analyzing crash dump files, see Chapter 4.