From:	SMTP%"jnelson@gauche.zko.dec.com" 23-SEP-1993 16:35:09.82
To:	EVERHART
CC:	
Subj:	Re: Delineating C Functions in VMS Object Files

From: jnelson@gauche.zko.dec.com (Jeff E. Nelson)
X-Newsgroups: comp.os.vms
Subject: Re: Delineating C Functions in VMS Object Files
Date: 22 Sep 1993 13:53:36 GMT
Organization: Digital Equipment Corporation
Lines: 115
Distribution: world
Message-ID: <27pld0$eve@usenet.pa.dec.com>
Reply-To: jnelson@gauche.zko.dec.com (Jeff E. Nelson)
NNTP-Posting-Host: gauche.zko.dec.com
Summary: it's not easy, nor is it complete
X-Newsreader: mxrn 6.18-5
To: Info-VAX@KL.SRI.COM
X-Gateway-Source-Info: USENET


In article <278jhm$4ha@skates.gsfc.nasa.gov>, alex@tpocc.gsfc.nasa.gov (Alex Measday) writes:
|>I've got a program, OFLOW, that generates CFLOW-style, textual structure
|>charts by extracting symbol information from VMS object files (C object
|>files, in this case).  OFLOW reads an object file and
|>
|>    (1) Extracts the function name (i.e., the "caller") from the
|>        the module header record (MHD$C_MHD).
|>
|>    (2) Extracts global symbol references (TIR$C_STA_GBL commands)
|>        from the text information and relocation records (OBJ$C_TIR).
|>        These global symbols are the "callees"; i.e., the functions
|>        called by the function defined in (1).
|>
|>OFLOW was derived from a similar program I wrote years ago that was
|>mostly used on FORTRAN object files.  I vaguely recollect that the scheme
|>described above worked well with FORTRAN object files: each subroutine
|>defined in the source file appeared as a separate module in the object
|>file.
|>
|>OFLOW works well if the C source files have one function defined per
|>file, but not very well if there are multiple functions defined in a
|>source file or if the external file name doesn't match the internal
|>function name.  The external file name appears to be used as the module
|>name in the module header record and a C object file only contains one
|>module, regardless of how many functions are defined in the source file.
|>
|>QUESTION: Is there any way I can tell where the code for a function
|>starts and ends in a C object file?  Assume no optimization; I realize
|>(I discovered!) the optimizer can eliminate whole routines via inlining.
|>
|>I have pored over ANALYZE/OBJECT reports, but nothing obvious has
|>jumped out at me.   Is there something in the debugger information
|>I could use and, if so, how do I decode the debugger information?

DEBUG records are a possibility, but this approach is not easy. In an .OBJ
file, the debug information is recorded in OBJ$C_TBT and OBJ$C_DBG records,
and information in each of these is further coded into individual TIR$C_xxx
and "Store Immediate" commands. You'd have to understand the LINKER object
language first, and then understand the format of the DEBUG symbol table
records. For example, here is part of the output of an ANALYZE/OBJ which
contains the DEBUG record to define a routine named "thread_action":

12.  TRACEBACK INFORMATION (OBJ$C_TBT), 274 bytes

        1)  Store Immediate, 10 bytes:
                  7  6  5  4  3  2  1  0          01234567
                ------------------------          --------
                 14 00 00 01 94 00 BF 06|  0000  |.¿......|
                                   00 BE|  0008  |š.      |

        2)  TIR$C_STA_PL (6, %X'06')                            stack depth: 1
                psect: 0
                value: 404 (%X'00000194')

        3)  TIR$C_STO_PIDR (27, %X'1B')                         stack depth: 0

        4)  Store Immediate, 17 bytes:
                  7  6  5  4  3  2  1  0          01234567
                ------------------------          --------
                 5F 64 61 65 72 68 74 0D|  0000  |.thread_|
                 B0 16 6E 6F 69 74 63 61|  0008  |action.°|
                                      00|  0010  |.       |

	...more OBJ$C_TBT stuff...

A second approach is to look for the actual OBJ records which defines
routines. For example, here is the information which defines the C main()
routine:

45.  GLOBAL SYMBOL DIRECTORY (OBJ$C_GSD), 370 bytes

	...intervening GSD records omitted...

        13)  Entry Point Symbol and Mask Definition (GSD$C_EPM)
                data type: DSC$K_DTYPE_Z (0)
                symbol flags:
                        (0)  GSY$V_WEAK       0
                        (1)  GSY$V_DEF        1
                        (2)  GSY$V_UNI        0
                        (3)  GSY$V_REL        1
                        (4)  GSY$V_COMM       0
                psect: 0
                value: 0 (%X'00000000')
                entry mask: <R2,R3>
                symbol: "MAIN"

	...more GSD records...

You can tell that this is a routine because of the symbol type (GSD$C_EPM).
You can tell that this is a symbol definition as opposed to a reference
because the GSY$V_DEF bit is set.

There is one unfortunate drawback with the second approach: you won't find any
information about non-global (i.e., local) routines. If you think about it,
this makes sense, since the linker is used to resolve external references; all
of the internal references are already resolved by the compiler. Depending
upon your needs, this second approach may in fact be what you're looking for.

For more information, you should take a good look at the Linker Reference
Manual. One entire chapter is devoted to the Linker's Object Language. You
need to be very familiar with it should you chose either approach.

Oh yes, one more thing: the linker's object language is different on the
OpenVMS AXP platform. Not radically different, but different enough. The DEBUG
symbol table records are also different, too. The above examples are taken
from the OpenVMS VAX platform.

Hope this helps.

-Jeff E. Nelson
-Digital Equipment Corporation
-Internet: jnelson@gauche.zko.dec.com
-Affiliation given for identification purposes only
-Not an official statement of Digital Equipment Corporation