The final executable image for a program bears little resemblance to the source code files from which it was created. One of the principal functions of the symbol table is to track the relationship between the two so that the debugger is able to describe the resulting program in a way that the programmer can recognize.
Source file and line number information provide the data necessary to
convert between locations in source code and the generated machine instructions.
7.1 New or Changed Line Number Features
Tru64 UNIX V5.1 includes the following new or changed features:
A new ESLI command to describe gaps in address ranges (see Section 7.3.1.2)
Version 3.13 of the symbol table includes the following new or changed features:
Extended Source Location Information (see Section 7.3.1.2)
7.2 Structures, Fields, and Values for Line Numbers
Unless otherwise specified, all structures described in this section
are declared in the header file
sym.h
, and all constants
are defined in the header file
symconst.h
.
7.2.1 Line Number Entry (
LINER
)
Line numbers are represented using two formats: packed and expanded.
The packed format is a byte stream that can be interpreted as described in
Section 7.3.1.1
to build an expanded table that maps instructions to source
line numbers.
The
LINER
type is used to refer to
a single entry in the expanded table.
It is declared as:
typedef int LINER, *pLINER;
A second, newer form of line number information is located in the optimization
symbols section.
See
Section 6.2.3
and
Section 7.3.1.2.
7.3 Line Number Usage
7.3.1 Line Number Information
For a debugger to be effective, a connection must be made between high-level-language statements in source files and the executable machine instructions in object files. Line number entries map executable instructions to source lines. This mapping allows a debugger to present to a programmer the line of source code that corresponds to the code being executed. The line number information is produced by the compiler and should be rewritten if an application such as an instrumentation tool or an optimizer modifies code.
Line number information is emitted in two forms, one found in the line number table and one in the optimization symbol table (see Section 6.3.3).
The line number information found in the optimization symbol table is referred to as ESLI (extended source location information). This is a new form of line number that augments the information in the line number table. ESLI will only be present for procedures that cannot be described accurately by entries in the line number table.
Version Note In symbol table formats less than V3.13 line number information is found exclusively in the line number table.
Line number information is generated for each source file that contributes executable code to a program. Within each source file, line numbers are organized by procedure, in the order of appearance in the file. The line number symbol table section is produced only when a program is compiled with limited or greater symbolic information (see Section 7.3.1).
Figure 7-1 illustrates the organization of the line number table.
The order
outlined in
Figure 7-1
is not guaranteed to match the ordering
of file descriptors or procedure descriptors in those tables.
The starting
offset for a procedure's line table entries can be computed by adding the
procedure descriptor's
cbLineOffset
to the containing
file descriptor's
cbLineOffset
.
The count of line
number entries for a specific procedure can only be determined by finding
the starting offset of the next procedure's entries in the line number table.
This calculation is illustrated by the
proc_pline_count()
function in the packed line number programming example in
Section 18.1.
Alternate entry points have a starting line number, but they have no specific ending line number. Procedure descriptors for a procedure and each of its associated alternate entry points share a common end offset in the line number table. See Section 11.3.1.9 for more information on alternate entry points.
The line number table has two forms. The "packed" form is used in the object file. The "expanded" form is a more useful representation to programmers and can be derived algorithmically (or by API) from the packed form.
The packed line numbers are stored as bytes. Each packed entry within the single byte value consists of two parts: count and delta. The count is the number of instructions generated from a source line. The delta is the number of source lines between the current source line and the previous one that generated executable instructions.
Figure 7-2 shows how these two values are represented.
Figure 7-2: Line Number Byte Format
The four-bit count is interpreted as an unsigned value between 1 and 16 (0 means 1, 1 means 2, and so forth). A zero value would be wasted when no instructions are generated for a source line and, as a result, no line number entry will exist for that line.
The four-bit delta is interpreted as a signed value in the range -7 to +7. Code generators may produce instructions that are not in the same order as the corresponding source lines. Therefore, the offset to the "next" source line may be a forwards or backward jump.
Either of these quantities may fall outside the representable range. For a delta outside the range, an extended format exists (as shown in Figure 7-3). This extended format can represent delta values in the range -32768 to 32767. Delta values outside of this range are not representable. This is a permanent restriction of the packed line number format.
Figure 7-3: Line Number 3-Byte Extended Format
For a count outside the range, one or more additional entries follow, with the delta set to zero.
If both fields are out of range, the delta is handled first. An extended-format delta representation is followed by an entry with the delta bits set to zero and the remainder of the count contained in the count value.
The packed line number format can be expanded to produce the instruction-to-source-line mapping that is needed for debugging. A sample program is provided in Section 18.1 to illustrate interpretation of packed line numbers.
The following source listing of a file named
lines.c
provides an example that shows how the compiler assigns line numbers:
1 #include <stdio.h> 2 main() 3 { 4 char c; 5 6 printf("this program just prints input\n"); 7 for (;;) { 8 if ((c =fgetc(stdin)) != EOF) break; 9 /* this is a greater than 7-line comment 10 * 1 11 * 2 12 * 3 13 * 4 14 * 5 15 * 6 16 * 7 17 */ 18 printf("%c", c); 19 } /* end for */ 20 } /* end main */
The compiler generates line numbers only for the lines 2, 6, 8, 18, and 20; the other lines are either blank or contain only comments.
Table 7-1
shows the packed entries' interpretation for
each source line.
Table 7-1: Line Number Example
Source Line | LINER
contents |
Interpretation |
2 | 03 |
Delta 0, count 4 |
6 | 44 |
Delta 4, count 5 |
8 | 29 |
Delta 2, count 10 |
18 1 | 88 00 0a |
Delta 10, count 9 |
19 | 10 |
Delta 1, count 1 |
20 | 14 |
Delta 1, count 5 |
Table Note:
Extended format (delta is greater than 7 lines).
The compiler generates the following instructions for the example program:
[lines.c: 2] 0x0: ldah gp, 1(t12) [lines.c: 2] 0x4: lda gp, -32592(gp) [lines.c: 2] 0x8: lda sp, -16(sp) [lines.c: 2] 0xc: stq ra, 0(sp) [lines.c: 6] 0x10: ldq a0, -32720(gp) [lines.c: 6] 0x14: ldq t12, -32728(gp) [lines.c: 6] 0x18: jsr ra, (t12), printf [lines.c: 6] 0x1c: ldah gp, 1(ra) [lines.c: 6] 0x20: lda gp, -32620(gp) [lines.c: 8] 0x24: ldq a0, -32736(gp) [lines.c: 8] 0x28: ldq t12, -32744(gp) [lines.c: 8] 0x2c: jsr ra, (t12), fgetc [lines.c: 8] 0x30: ldah gp, 1(ra) [lines.c: 8] 0x34: lda gp, -32640(gp) [lines.c: 8] 0x38: and v0, 0xff, t0 [lines.c: 8] 0x3c: stq v0, 8(sp) [lines.c: 8] 0x40: xor t0, 0xff, t0 [lines.c: 8] 0x44: bne t0, 0x6c [lines.c: 18] 0x48: ldq t2, 8(sp) [lines.c: 18] 0x4c: sll t2, 0x38, t2 [lines.c: 18] 0x50: sra t2, 0x38, a1 [lines.c: 18] 0x54: ldq a0, -32752(gp) [lines.c: 18] 0x58: ldq t12, -32728(gp) [lines.c: 18] 0x5c: jsr ra, (t12), printf [lines.c: 18] 0x60: ldah gp, 1(ra) [lines.c: 18] 0x64: lda gp, -32688(gp) [lines.c: 19] 0x68: br zero, 0x24 [lines.c: 20] 0x6c: bis zero, zero, v0 [lines.c: 20] 0x70: ldq ra, 0(sp) [lines.c: 20] 0x74: lda sp, 16(sp) [lines.c: 20] 0x78: ret zero, (ra), 1 [lines.c: 20] 0x7c: call_pal halt
After
expanding packed line numbers, the following instruction-to-source mapping
(formatted
instruction number.source line number
) is produced
by
odump
for the
-l
option:
0. 2 1. 2 2. 2 3. 2 4. 6 5. 6 6. 6 7. 6 8. 6 9. 8 10. 8 11. 8 12. 8 13. 8 14. 8 15. 8 16. 8 17. 8 18. 18 19. 18 20. 18 21. 18 22. 18 23. 18 24. 18 25. 18 26. 19 27. 20 28. 20 29. 20 30. 20 31. 20
Header files included in an object have no associated line numbers recorded
in the symbol table.
Line number information for included files containing
source code is not supported by the packed line number format.
The following
section describes a more comprehensive line number representation that includes
line number information for header files.
7.3.1.2 Extended Source Location Information (ESLI)
Version Note ESLI is supported for symbol table format V3.13 and greater.
The line number table does not correctly describe optimized code or programs with untraditional source files, resulting in images that are difficult to debug. Extended Source Location Information (ESLI) is intended to provide more information to enable debugging of optimized programs, including PC and line number changes, file transitions, and line and column ranges. ESLI is essentially a superset of the older line number table.
ESLI is stored in the optimization symbols section. This information is accessible on a per-procedure basis from the procedure descriptors. See Section 6.3.3 for more detail on accessing information in the optimization symbols section.
ESLI is a byte stream that can be interpreted in two modes: data mode or command mode. Currently, two formats are defined for data mode. These are designated as "Data Mode 1" and "Data Mode 2". Additional data modes may be defined as needed.
Figure 7-4: ESLI Data Mode Bytes
Data Mode 1 is the initial mode for a procedure's ESLI.
Data Mode 1
is identical to the packed line number format with the exception of the interpretation
of the delta PC escape value
0x80
(which indicates a switch
to command mode).
In Data Mode 2, each entry consists of two bytes.
The first byte is
identical to the encoding and interpretation of Data Mode 1.
The second byte
is an absolute column number (from 0 to 255), where column number 0 indicates
that column information is missing or not meaningful for this entry.
The
escape from Data Mode 2 to command mode consists of a delta PC escape value
set to
0x80
and column number set to 0.
In command mode, each byte is either a command or a command parameter.
For a command byte, the low-order six bits are a command code, and the two
high bits are used as flags, as shown in
Figure 7-5.
The "mark"
flag, if set, announces that a new state has been established.
Several commands
may be required to fully describe a new state.
The "resume" flag, if set,
indicates the end of command mode.
The next byte following a command with
"resume" set will be a data mode byte.
The effective data mode can be changed
by
SET_DATA_MODE
commands in command
mode, otherwise the data mode that was in effect prior to the escape to command
mode will be resumed.
See
Table 7-2
for a complete list of
commands.
Command parameters are stored in LEB (Little Endian Byte) 128 format. See Section 1.4.6 for a description of this data representation. PC deltas are always expressed as machine instruction offsets and must be scaled by the size of a machine instruction before adding to the current PC. No other deltas need to be scaled.
Table 7-2
shows how to interpret the bytes in command
mode.
These definitions can be found in the system header file
linenum.h
.
Name | Value | Parameters by Type |
ADD_PC |
1 | SLEB |
ADD_LINE |
2 | SLEB |
SET_COL |
3 | LEB |
SET_FILE |
4 | LEB |
SET_DATA_MODE |
5 | LEB |
ADD_LINE_PC |
6 | SLEB, SLEB |
ADD_LINE_PC_COL |
7 | SLEB, SLEB, LEB |
SET_LINE |
8 | LEB |
SET_LINE_COL |
9 | LEB, LEB |
SEQUENCE_BREAK |
10 | SLEB |
ADD_PC
Parameter is a signed value to add to the current PC value.
ADD_LINE
Parameter is a signed value to add to the current line number.
SET_COL
Parameter is an unsigned value that represents a new column number. The column number is used to associate the PC with a particular location within a source line. Column number parameters use a zero-based representation that must be adjusted by adding 1.
SET_FILE
Parameter is an unsigned value used to switch file context.
This command
is typically followed by a
set_line
command.
SET_DATA_MODE
Parameter is an unsigned value used to set the data mode that will be
in effect when data mode is resumed.
The only parameter values that are currently
accepted are
1
and
2
.
Additional data
modes may be defined in future releases.
ADD_LINE_PC
Both parameters are signed values. The first is added to the line number and the second is added to the PC.
ADD_LINE_PC_COL
The first two parameters are signed values and the third is an unsigned value. The first two are added to the line number and PC respectively. The third is used to set the column number.
SET_LINE
Parameter is an unsigned value that sets the current line number.
SET_LINE_COL
Both parameters are unsigned values. The first represents the line number and the second represents the column number.
SEQUENCE_BREAK
Indicates the end of a contiguous sequence of address descriptions. The value of the parameter is added to the current address, and the resulting address becomes the starting address of the next sequence of address descriptions. The current file and line number continue to apply as the current values for the new sequence as well. (These can, however, be changed using the appropriate commands.)
Version Note The
SEQUENCE_BREAK
command is supported in Tru64 UNIX V5.1 and greater for symbol table format V3.13 and greater.
A tool reading the ESLI must maintain the current PC value, file number, line number, and column. Taken together, these four values represent the current "state". Consumers must also keep track of the mode in effect to interpret the data properly. A sample program is provided in Section 18.2 to illustrate consumption of ESLI.
Data encoded in ESLI can be represented in tabular format. The PC value and file, line, and column numbers can be stored as a state table. The following example shows how to build this state table.
In this example ESLI will record line numbers for a routine that includes text from a header file.
Source listing for
line1.c
:
1 /* ESLI example using included source lines */ 2 3 main() { 4 char *msg; 5 6 msg = (char *)0; 7 8 #include "line2.h" 9 10 printf("%s", msg); 11 }
Source listing for
line2.h
1 msg = (char *)malloc(20); 2 /* 3 * 4 * 5 * 6 * 7 * 8 * 9 * 10 */ 11 strcpy(msg, "Hello\n");
The compiler generates the following instructions for the example program:
main: [line1.c: 3] 0x1200011d0: ldah gp, 8192(t12) [line1.c: 3] 0x1200011d4: lda gp, 28336(gp) [line1.c: 3] 0x1200011d8: lda sp, -16(sp) [line1.c: 3] 0x1200011dc: stq ra, 0(sp) [line1.c: 3] 0x1200011e0: stq s0, 8(sp) [line1.c: 6] 0x1200011e4: bis zero, zero, s0 [line2.h: 1] 0x1200011e8: bis zero, 0x14, a0 [line2.h: 1] 0x1200011ec: ldq t12, -32560(gp) [line2.h: 1] 0x1200011f0: jsr ra, (t12) [line2.h: 1] 0x1200011f4: ldah gp, 8192(ra) [line2.h: 1] 0x1200011f8: lda gp, 28300(gp) [line2.h: 1] 0x1200011fc: bis zero, v0, s0 [line2.h: 11] 0x120001200: bis zero, s0, a0 [line2.h: 11] 0x120001204: lda a1, -32768(gp) [line2.h: 11] 0x120001208: ldq t12, -32600(gp) [line2.h: 11] 0x12000120c: jsr ra, (t12) [line2.h: 11] 0x120001210: ldah gp, 8192(ra) [line2.h: 11] 0x120001214: lda gp, 28272(gp) [line1.c: 10] 0x120001218: ldq_u zero, 0(sp) [line1.c: 10] 0x12000121c: lda a0, -32760(gp) [line1.c: 10] 0x120001220: bis zero, s0, a1 [line1.c: 10] 0x120001224: ldq t12, -32552(gp) [line1.c: 10] 0x120001228: jsr ra, (t12) [line1.c: 10] 0x12000122c: ldah gp, 8192(gp) [line1.c: 10] 0x120001230: lda gp, 28244(gp) [line1.c: 11] 0x120001234: bis zero, zero, v0 [line1.c: 11] 0x120001238: ldq ra, 0(sp) [line1.c: 11] 0x12000123c: ldq s0, 8(sp) [line1.c: 11] 0x120001240: lda sp, 16(sp) [line1.c: 11] 0x120001244: ret zero, (ra)
The ESLI and its interpretation for the generated code is shown in the
following table.
Table 7-3: ESLI Example
Command | State | |||||||
(M)ark (R)esume | (F)ile (L)ine (C)olumn | |||||||
ESLI bytes (hex) | Mode | Code | M | R | PC (hex) | F | L | C |
Initial State (from
PDR ) |
Data1 | 1200011d0 |
0 |
3 |
0 |
|||
04 |
Data1 | 1200011e4 |
0 |
3 |
0 |
|||
30 |
Data1 | 1200011e8 |
0 |
6 |
0 |
|||
80 |
Data1 | Escape | ||||||
04 01 |
Cmd | set_file(1) |
1 |
|||||
48 01 |
Cmd | set_line(1) |
R | 1 |
||||
05 |
Data1 | 120001200 |
1 |
1 |
0 |
|||
80 |
Data1 | Escape | ||||||
86 0a 06 |
Cmd | add_line_pc(10,6) |
M | 120001218 |
1 |
11 |
0 |
|
04 00 |
Cmd | set_file(0) |
0 |
|||||
48 0a |
Cmd | set_line(10) |
R | 10 |
||||
06 |
Data1 | 120001234 |
0 |
10 |
0 |
|||
16 |
Data1 | 120001250 |
0 |
11 |
0 |
The handling of alternate
entry points differs from the handling of main entry points.
Procedure descriptors
for alternate entry points are identified by a
PDR
.lnHigh
value of
-1
.
If the PC for an instruction
maps to an alternate entry point, the following steps should be taken:
Find procedure descriptor for the corresponding main entry.
This is accomplished by searching back in the procedure descriptors until
a
PDR
is found that is not an alternate entry (PDR
.lnHigh
is not
-1
).
Access the ESLI for the procedure.
Read the ESLI until the PC value matches the
PDR
.adr
field of the alternate entry's
procedure descriptor.