7    Line Number Information

The final executable image for a program bears little resemblance to the source code files from which it was created. One of the principal functions of the symbol table is to track the relationship between the two so that the debugger is able to describe the resulting program in a way that the programmer can recognize.

Source file and line number information provide the data necessary to convert between locations in source code and the generated machine instructions.

7.1    New or Changed Line Number Features

Tru64 UNIX V5.1 includes the following new or changed features:

Version 3.13 of the symbol table includes the following new or changed features:

7.2    Structures, Fields, and Values for Line Numbers

Unless otherwise specified, all structures described in this section are declared in the header file sym.h, and all constants are defined in the header file symconst.h.

7.2.1    Line Number Entry (LINER)

Line numbers are represented using two formats: packed and expanded. The packed format is a byte stream that can be interpreted as described in Section 7.3.1.1 to build an expanded table that maps instructions to source line numbers. The LINER type is used to refer to a single entry in the expanded table. It is declared as:

typedef int LINER, *pLINER;

A second, newer form of line number information is located in the optimization symbols section. See Section 6.2.3 and Section 7.3.1.2.

7.3    Line Number Usage

7.3.1    Line Number Information

For a debugger to be effective, a connection must be made between high-level-language statements in source files and the executable machine instructions in object files. Line number entries map executable instructions to source lines. This mapping allows a debugger to present to a programmer the line of source code that corresponds to the code being executed. The line number information is produced by the compiler and should be rewritten if an application such as an instrumentation tool or an optimizer modifies code.

Line number information is emitted in two forms, one found in the line number table and one in the optimization symbol table (see Section 6.3.3).

The line number information found in the optimization symbol table is referred to as ESLI (extended source location information). This is a new form of line number that augments the information in the line number table. ESLI will only be present for procedures that cannot be described accurately by entries in the line number table.


Version Note

In symbol table formats less than V3.13 line number information is found exclusively in the line number table.


7.3.1.1    The Line Number Table

Line number information is generated for each source file that contributes executable code to a program. Within each source file, line numbers are organized by procedure, in the order of appearance in the file. The line number symbol table section is produced only when a program is compiled with limited or greater symbolic information (see Section 7.3.1).

Figure 7-1 illustrates the organization of the line number table.

Figure 7-1:  Line Number Table

The order outlined in Figure 7-1 is not guaranteed to match the ordering of file descriptors or procedure descriptors in those tables. The starting offset for a procedure's line table entries can be computed by adding the procedure descriptor's cbLineOffset to the containing file descriptor's cbLineOffset. The count of line number entries for a specific procedure can only be determined by finding the starting offset of the next procedure's entries in the line number table. This calculation is illustrated by the proc_pline_count() function in the packed line number programming example in Section 18.1.

Alternate entry points have a starting line number, but they have no specific ending line number. Procedure descriptors for a procedure and each of its associated alternate entry points share a common end offset in the line number table. See Section 11.3.1.9 for more information on alternate entry points.

The line number table has two forms. The "packed" form is used in the object file. The "expanded" form is a more useful representation to programmers and can be derived algorithmically (or by API) from the packed form.

The packed line numbers are stored as bytes. Each packed entry within the single byte value consists of two parts: count and delta. The count is the number of instructions generated from a source line. The delta is the number of source lines between the current source line and the previous one that generated executable instructions.

Figure 7-2 shows how these two values are represented.

Figure 7-2:  Line Number Byte Format

The four-bit count is interpreted as an unsigned value between 1 and 16 (0 means 1, 1 means 2, and so forth). A zero value would be wasted when no instructions are generated for a source line and, as a result, no line number entry will exist for that line.

The four-bit delta is interpreted as a signed value in the range -7 to +7. Code generators may produce instructions that are not in the same order as the corresponding source lines. Therefore, the offset to the "next" source line may be a forwards or backward jump.

Either of these quantities may fall outside the representable range. For a delta outside the range, an extended format exists (as shown in Figure 7-3). This extended format can represent delta values in the range -32768 to 32767. Delta values outside of this range are not representable. This is a permanent restriction of the packed line number format.

Figure 7-3:  Line Number 3-Byte Extended Format

For a count outside the range, one or more additional entries follow, with the delta set to zero.

If both fields are out of range, the delta is handled first. An extended-format delta representation is followed by an entry with the delta bits set to zero and the remainder of the count contained in the count value.

The packed line number format can be expanded to produce the instruction-to-source-line mapping that is needed for debugging. A sample program is provided in Section 18.1 to illustrate interpretation of packed line numbers.

The following source listing of a file named lines.c provides an example that shows how the compiler assigns line numbers:

1   #include <stdio.h>
2   main()
3   {
4       char c;
5
6       printf("this program just prints input\n");
7       for (;;) {
8          if ((c =fgetc(stdin)) != EOF) break;
9       /*   this is a greater than 7-line comment
10           * 1
11           * 2
12           * 3
13           * 4
14           * 5
15           * 6
16           * 7
17           */
18           printf("%c", c);
19      } /* end for */
20  } /* end main */

The compiler generates line numbers only for the lines 2, 6, 8, 18, and 20; the other lines are either blank or contain only comments.

Table 7-1 shows the packed entries' interpretation for each source line.

Table 7-1:  Line Number Example

Source Line LINER contents Interpretation
2 03 Delta 0, count 4
6 44 Delta 4, count 5
8 29 Delta 2, count 10
18 1 88 00 0a Delta 10, count 9
19 10 Delta 1, count 1
20 14 Delta 1, count 5

Table Note:

  1. Extended format (delta is greater than 7 lines).

The compiler generates the following instructions for the example program:

  [lines.c:   2] 0x0:     ldah     gp, 1(t12)
  [lines.c:   2] 0x4:     lda      gp, -32592(gp)
  [lines.c:   2] 0x8:     lda      sp, -16(sp)
  [lines.c:   2] 0xc:     stq      ra, 0(sp)
  [lines.c:   6] 0x10:    ldq      a0, -32720(gp)
  [lines.c:   6] 0x14:    ldq      t12, -32728(gp)
  [lines.c:   6] 0x18:    jsr      ra, (t12), printf
  [lines.c:   6] 0x1c:    ldah     gp, 1(ra)
  [lines.c:   6] 0x20:    lda      gp, -32620(gp)
  [lines.c:   8] 0x24:    ldq      a0, -32736(gp)
  [lines.c:   8] 0x28:    ldq      t12, -32744(gp)
  [lines.c:   8] 0x2c:    jsr      ra, (t12), fgetc
  [lines.c:   8] 0x30:    ldah     gp, 1(ra)
  [lines.c:   8] 0x34:    lda      gp, -32640(gp)
  [lines.c:   8] 0x38:    and      v0, 0xff, t0
  [lines.c:   8] 0x3c:    stq      v0, 8(sp)
  [lines.c:   8] 0x40:    xor      t0, 0xff, t0
  [lines.c:   8] 0x44:    bne      t0, 0x6c
  [lines.c:  18] 0x48:    ldq      t2, 8(sp)
  [lines.c:  18] 0x4c:    sll      t2, 0x38, t2
  [lines.c:  18] 0x50:    sra      t2, 0x38, a1
  [lines.c:  18] 0x54:    ldq      a0, -32752(gp)
  [lines.c:  18] 0x58:    ldq      t12, -32728(gp)
  [lines.c:  18] 0x5c:    jsr      ra, (t12), printf
  [lines.c:  18] 0x60:    ldah     gp, 1(ra)
  [lines.c:  18] 0x64:    lda      gp, -32688(gp)
  [lines.c:  19] 0x68:    br       zero, 0x24
  [lines.c:  20] 0x6c:    bis      zero, zero, v0
  [lines.c:  20] 0x70:    ldq      ra, 0(sp)
  [lines.c:  20] 0x74:    lda      sp, 16(sp)
  [lines.c:  20] 0x78:    ret      zero, (ra), 1
  [lines.c:  20] 0x7c:    call_pal halt

After expanding packed line numbers, the following instruction-to-source mapping (formatted instruction number.source line number) is produced by odump for the -l option:

           0.    2         1.    2         2.    2
           3.    2         4.    6         5.    6
           6.    6         7.    6         8.    6
           9.    8        10.    8        11.    8
          12.    8        13.    8        14.    8
          15.    8        16.    8        17.    8
          18.   18        19.   18        20.   18
          21.   18        22.   18        23.   18
          24.   18        25.   18        26.   19
          27.   20        28.   20        29.   20
          30.   20        31.   20

Header files included in an object have no associated line numbers recorded in the symbol table. Line number information for included files containing source code is not supported by the packed line number format. The following section describes a more comprehensive line number representation that includes line number information for header files.

7.3.1.2    Extended Source Location Information (ESLI)


Version Note

ESLI is supported for symbol table format V3.13 and greater.


The line number table does not correctly describe optimized code or programs with untraditional source files, resulting in images that are difficult to debug. Extended Source Location Information (ESLI) is intended to provide more information to enable debugging of optimized programs, including PC and line number changes, file transitions, and line and column ranges. ESLI is essentially a superset of the older line number table.

ESLI is stored in the optimization symbols section. This information is accessible on a per-procedure basis from the procedure descriptors. See Section 6.3.3 for more detail on accessing information in the optimization symbols section.

ESLI is a byte stream that can be interpreted in two modes: data mode or command mode. Currently, two formats are defined for data mode. These are designated as "Data Mode 1" and "Data Mode 2". Additional data modes may be defined as needed.

Figure 7-4:  ESLI Data Mode Bytes

Data Mode 1 is the initial mode for a procedure's ESLI. Data Mode 1 is identical to the packed line number format with the exception of the interpretation of the delta PC escape value 0x80 (which indicates a switch to command mode).

In Data Mode 2, each entry consists of two bytes. The first byte is identical to the encoding and interpretation of Data Mode 1. The second byte is an absolute column number (from 0 to 255), where column number 0 indicates that column information is missing or not meaningful for this entry. The escape from Data Mode 2 to command mode consists of a delta PC escape value set to 0x80 and column number set to 0.

In command mode, each byte is either a command or a command parameter. For a command byte, the low-order six bits are a command code, and the two high bits are used as flags, as shown in Figure 7-5. The "mark" flag, if set, announces that a new state has been established. Several commands may be required to fully describe a new state. The "resume" flag, if set, indicates the end of command mode. The next byte following a command with "resume" set will be a data mode byte. The effective data mode can be changed by SET_DATA_MODE commands in command mode, otherwise the data mode that was in effect prior to the escape to command mode will be resumed. See Table 7-2 for a complete list of commands.

Figure 7-5:  ESLI Command Byte

Command parameters are stored in LEB (Little Endian Byte) 128 format. See Section 1.4.6 for a description of this data representation. PC deltas are always expressed as machine instruction offsets and must be scaled by the size of a machine instruction before adding to the current PC. No other deltas need to be scaled.

Table 7-2 shows how to interpret the bytes in command mode. These definitions can be found in the system header file linenum.h.

Table 7-2:  ESLI Commands

Name Value Parameters by Type
ADD_PC 1 SLEB
ADD_LINE 2 SLEB
SET_COL 3 LEB
SET_FILE 4 LEB
SET_DATA_MODE 5 LEB
ADD_LINE_PC 6 SLEB, SLEB
ADD_LINE_PC_COL 7 SLEB, SLEB, LEB
SET_LINE 8 LEB
SET_LINE_COL 9 LEB, LEB
SEQUENCE_BREAK 10 SLEB

ADD_PC

Parameter is a signed value to add to the current PC value.

ADD_LINE

Parameter is a signed value to add to the current line number.

SET_COL

Parameter is an unsigned value that represents a new column number. The column number is used to associate the PC with a particular location within a source line. Column number parameters use a zero-based representation that must be adjusted by adding 1.

SET_FILE

Parameter is an unsigned value used to switch file context. This command is typically followed by a set_line command.

SET_DATA_MODE

Parameter is an unsigned value used to set the data mode that will be in effect when data mode is resumed. The only parameter values that are currently accepted are 1 and 2. Additional data modes may be defined in future releases.

ADD_LINE_PC

Both parameters are signed values. The first is added to the line number and the second is added to the PC.

ADD_LINE_PC_COL

The first two parameters are signed values and the third is an unsigned value. The first two are added to the line number and PC respectively. The third is used to set the column number.

SET_LINE

Parameter is an unsigned value that sets the current line number.

SET_LINE_COL

Both parameters are unsigned values. The first represents the line number and the second represents the column number.

SEQUENCE_BREAK

Indicates the end of a contiguous sequence of address descriptions. The value of the parameter is added to the current address, and the resulting address becomes the starting address of the next sequence of address descriptions. The current file and line number continue to apply as the current values for the new sequence as well. (These can, however, be changed using the appropriate commands.)


Version Note

The SEQUENCE_BREAK command is supported in Tru64 UNIX V5.1 and greater for symbol table format V3.13 and greater.


A tool reading the ESLI must maintain the current PC value, file number, line number, and column. Taken together, these four values represent the current "state". Consumers must also keep track of the mode in effect to interpret the data properly. A sample program is provided in Section 18.2 to illustrate consumption of ESLI.

Data encoded in ESLI can be represented in tabular format. The PC value and file, line, and column numbers can be stored as a state table. The following example shows how to build this state table.

In this example ESLI will record line numbers for a routine that includes text from a header file.

Source listing for line1.c:

1   /* ESLI example using included source lines */
2   
3   main() {
4      char *msg;
5   
6      msg = (char *)0;
7   
8   #include "line2.h"
9   
10     printf("%s", msg);
11  }

Source listing for line2.h

1   msg = (char *)malloc(20);
2   /*
3    *
4    *
5    *
6    *
7    *
8    *
9    *
10   */
11  strcpy(msg, "Hello\n");

The compiler generates the following instructions for the example program:

      main:
[line1.c:   3] 0x1200011d0:     ldah    gp, 8192(t12)
[line1.c:   3] 0x1200011d4:     lda     gp, 28336(gp)
[line1.c:   3] 0x1200011d8:     lda     sp, -16(sp)
[line1.c:   3] 0x1200011dc:     stq     ra, 0(sp)
[line1.c:   3] 0x1200011e0:     stq     s0, 8(sp)
[line1.c:   6] 0x1200011e4:     bis     zero, zero, s0
[line2.h:   1] 0x1200011e8:     bis     zero, 0x14, a0
[line2.h:   1] 0x1200011ec:     ldq     t12, -32560(gp)
[line2.h:   1] 0x1200011f0:     jsr     ra, (t12)
[line2.h:   1] 0x1200011f4:     ldah    gp, 8192(ra)
[line2.h:   1] 0x1200011f8:     lda     gp, 28300(gp)
[line2.h:   1] 0x1200011fc:     bis     zero, v0, s0
[line2.h:  11] 0x120001200:     bis     zero, s0, a0
[line2.h:  11] 0x120001204:     lda     a1, -32768(gp)
[line2.h:  11] 0x120001208:     ldq     t12, -32600(gp)
[line2.h:  11] 0x12000120c:     jsr     ra, (t12)
[line2.h:  11] 0x120001210:     ldah    gp, 8192(ra)
[line2.h:  11] 0x120001214:     lda     gp, 28272(gp)
[line1.c:  10] 0x120001218:     ldq_u   zero, 0(sp)
[line1.c:  10] 0x12000121c:     lda     a0, -32760(gp)
[line1.c:  10] 0x120001220:     bis     zero, s0, a1
[line1.c:  10] 0x120001224:     ldq     t12, -32552(gp)
[line1.c:  10] 0x120001228:     jsr     ra, (t12)
[line1.c:  10] 0x12000122c:     ldah    gp, 8192(gp)
[line1.c:  10] 0x120001230:     lda     gp, 28244(gp)
[line1.c:  11] 0x120001234:     bis     zero, zero, v0
[line1.c:  11] 0x120001238:     ldq     ra, 0(sp)
[line1.c:  11] 0x12000123c:     ldq     s0, 8(sp)
[line1.c:  11] 0x120001240:     lda     sp, 16(sp)
[line1.c:  11] 0x120001244:     ret     zero, (ra)

The ESLI and its interpretation for the generated code is shown in the following table.

Table 7-3:  ESLI Example

  Command State
  (M)ark (R)esume (F)ile (L)ine (C)olumn
ESLI bytes (hex) Mode Code M R PC (hex) F L C
Initial State (from PDR) Data1       1200011d0 0 3 0
04 Data1       1200011e4 0 3 0
30 Data1       1200011e8 0 6 0
80 Data1 Escape            
04 01 Cmd set_file(1)       1    
48 01 Cmd set_line(1)   R     1  
05 Data1       120001200 1 1 0
80 Data1 Escape            
86 0a 06 Cmd add_line_pc(10,6) M   120001218 1 11 0
04 00 Cmd set_file(0)       0    
48 0a Cmd set_line(10)   R     10  
06 Data1       120001234 0 10 0
16 Data1       120001250 0 11 0

The handling of alternate entry points differs from the handling of main entry points. Procedure descriptors for alternate entry points are identified by a PDR.lnHigh value of -1. If the PC for an instruction maps to an alternate entry point, the following steps should be taken: