From: deleyd@gte.net Sent: Saturday, February 12, 2000 5:07 PM To: Info-VAX@Mvb.Saic.Com Subject: Re: Please help more: Repair an RMS-file I'm not sure if this news post worked correctly so I'll email it too. -D.D. RMS INTERNAL DATA STRUCTURES by David W. Deley ©1993 Preface This documentation examines the internal structures of RMS files on disk. Intended Audience The user should already have knowledge of the three file organizations: Sequential, Relative, and Indexed, and how they are used. See the VMS manual 'Guide to File Applications' for basic background information about these three file organizations. The user should also have a basic understanding of the ASCII character set and how each character is equivalent to a hexadecimal number. The user should have a basic understanding of what a file is, what a directory is, and what a disk is. Associated Documents The following documents contain further related information: * Guide to VMS File Applications * VMS File System Internals, by Kirby McCoy, DIGITAL Press. ----------------------------------------------------------------------- RMS INTERNAL DATA STRUCTURES PART I: FILES ON DISKS (ODS-2) For our analysis we shall use the following familiar nursery rhyme and see how it looks inside a file of each allowed type above: Roses are red, [14 chars] Violets are blue, [17 chars] Sugar is sweet [14 chars] And so are you! [15 chars] This piece of familiar prose was chosen not so much for its literary value but rather because it is short and has some lines with an even number of characters and some lines with an odd number of characters. Note for future reference the length of each line. To begin let's use our favorite text editor such as EVE, EDT, or EDX and create a text file containing the above nursery rhyme. Call the file ROSES.DAT . You may try this as we go along to get a hands on feel for what goes on. The file should contain exactly 4 lines, with no leading or trailing spaces. After creating the file and exiting the editor do a $ DIR/FULL to see what we got. Note for future reference the fields marked by the three chevrons '>>>': $ DIR/FULL ROSES.DAT Directory DISK3:[DELEYD.RMSDOC] >>> ROSES.DAT;1 >>> File ID: (18227,76,0) >>> Size: 1/3 Owner: [SCN,DELEYD] Created: 6-MAR-1993 21:58:21.41 Revised: 3-OCT-1993 22:59:40.06 (2) Expires: Backup: >>> File organization: Sequential >>> File attributes: Allocation: 3, Extend: 0, Global buffer count: 0 No version limit >>> Record format: Variable length, maximum 17 bytes >>> Record attributes: Carriage return carriage control RMS attributes: None Journaling enabled: None File protection: System:RWED, Owner:RWED, Group:RWED, World:RWED Access Cntrl List: None Total of 1 file, 1/3 blocks. Items to note are: 1. The file name is ROSES.DAT;1 2. The file ID is (18227,76,0) 3. The file size is 1 block used, 3 blocks allocated. 4. The file organization is 'Sequential' 5. The record format is 'Variable Length' 6. The record attributes are 'Carriage return carriage control' All of this information ABOUT the file is stored in the file's header. The file header is a block of data ABOUT the file in addition to the actual data records which comprise the file itself. We can examine the contents of the file header with the following command (try this out on your ROSES.DAT file): $ DUMP/HEADER/BLOCKS=(COUNT:0)/NOFORMAT ROSES.DAT Dump of file DISK3:[DELEYD.RMSDOC]ROSES.DAT;1 File ID (18227,76,0) End of file block 1 / Allocated 3 File Header 00000000 004C4733 02010000 FFFF6428 (d......3GL..... 000000 00010000 00030000 00110202 00000000 ................ 000010 00000000 00000000 00000000 00000046 F............... 000020 0015000B 00020000 00000000 00000000 ................ 000030 00000004 00000000 0000006B 46230000 ..#Fk........... 000040 20202020 20313B54 41442E53 45534F52 ROSES.DAT;1 000050 FDC00096 91F2EACE 4C200002 20202020 .. LÎêò...Àý 000060 00000000 00000000 00000097 37C9709C .pÉ7............ 000070 20202020 20202020 20200000 00000000 ...... 000080 20202020 20202020 20202020 20202020 000090 20202020 20202020 20202020 20202020 0000A0 20202020 20202020 20202020 20202020 0000B0 00000000 14174B02 20202020 20202020 .K...... 0000C0 00000000 00000000 00000000 00000000 ................ 0000D0 00000000 00000000 00000000 00000000 ................ 0000E0 00000000 00000000 00000000 00000000 ................ 0000F0 00000000 00000000 00000000 00000000 ................ 000100 00000000 00000000 00000000 00000000 ................ 000110 00000000 00000000 00000000 00000000 ................ 000120 00000000 00000000 00000000 00000000 ................ 000130 00000000 00000000 00000000 00000000 ................ 000140 00000000 00000000 00000000 00000000 ................ 000150 00000000 00000000 00000000 00000000 ................ 000160 00000000 00000000 00000000 00000000 ................ 000170 00000000 00000000 00000000 00000000 ................ 000180 00000000 00000000 00000000 00000000 ................ 000190 00000000 00000000 00000000 00000000 ................ 0001A0 00000000 00000000 00000000 00000000 ................ 0001B0 00000000 00000000 00000000 00000000 ................ 0001C0 00000000 00000000 00000000 00000000 ................ 0001D0 00000000 00000000 00000000 00000000 ................ 0001E0 CA660000 00000000 00000000 00000000 ..............fÊ 0001F0 The layout of the header block is described in appendix A. It's also more thoroughly described in the book "VMS File System Internals" by Kirby McCoy, order number EY-F575E-DP From Digital Press. Fortunately we don't have to delve into deciphering the contents of this block. We can get a formatted output of the block's contents by using the /FORMAT qualifier as follows (try this out on your ROSES.DAT file): $ DUMP/HEADER/BLOCKS=(COUNT:0)/FORMAT ROSES.DAT Dump of file DISK3:[DELEYD.RMSDOC]ROSES.DAT;1 File ID (18227,76,0) End of file block 1 / Allocated 3 File Header Header area Identification area offset: 40 Map area offset: 100 Access control area offset: 255 Reserved area offset: 255 Extension segment number: 0 Structure level and version: 2, 1 >>> File identification: (18227,76,0) Extension file identification: (0,0,0) VAX-11 RMS attributes >>> Record type: Variable >>> File organization: Sequential >>> Record attributes: Implied carriage control Record size: 17 >>> Highest block: 3 >>> End of file block: 1 >>> End of file byte: 70 Bucket size: 0 Fixed control area size: 0 Maximum record size: 0 Default extension size: 0 Global buffer count: 0 Directory version limit: 0 File characteristics: Map area words in use: 2 Access mode: 0 File owner UIC: [SCN,DELEYD] File protection: S:RWED, O:RWED, G:RWED, W:RWED Back link file identification: (17955,107,0) Journal control flags: Active recovery units: None Highest block written: 3 Identification area >>> File name: ROSES.DAT;1 Revision number: 2 Creation date: 6-MAR-1993 21:58:21.41 Revision date: 3-OCT-1993 22:59:40.06 Expiration date: Backup date: Map area Retrieval pointers Count: 3 LBN: 726039 Checksum: 51814 Items to note are: 1. The file ID is (18227,76,0) 2. The record type is 'Variable' (same as "record format: Variable Length") 3. The file organization is 'Sequential' 4. The record attributes are 'Implied carriage control' (same as "Record attributes: Carriage return carriage control") 5. The highest block is 3. the end of file block is 1. (same as "Size: 1/3") 6. The file name is ROSES.DAT;1 Note that the $ DIR/FULL command gets all its information ABOUT a file from the file's header. File headers themselves are not stored as part of the file itself. All file headers are stored in file [000000]INDEXF.SYS The file ID number gives us the information needed to locate our desired file header in INDEXF.SYS . From our example above, the file ID is (18227,76,0) A file ID consists of 3 parts: Part 1 is the file number. The file number is the portion of the file ID that is required for calculating the offset of the file's header in INDEXF.SYS . In our example the file number is 18227. Part 2 is the file sequence number. When the file the file header block points to is deleted then the header block is free to be reused. When the header block is reused to point to a new file the sequence number is incremented by one. Thus after using the file number part of the file ID to locate the header block in INDEXF.SYS, a check of the file sequence number insures that our file hasn't been deleted and the file header is not now pointing to some new file. In our example the file sequence number is 76, indicating that this file header block has been used to point to 76 different files since INDEXF.SYS was created. Part 3 is the relative volume number of the file. If the file is resident on a bound volume set, this number indicates the relative volume where the file physically resides. In our example the relative volume number is 0, indicating our disk is not part of a multiple volume set. (Identify the file number, sequence number, and relative volume number for your ROSES.DAT file.) We need two more pieces of information about the disk itself on which our file resides before we can calculate the offset into INDEXF.SYS of our file's header. These two pieces of information are: 1: The cluster size 2: The maximum number of files allowed on the disk This information is easily obtained with the following DCL command: $ SHOW DEVICE/FULL ddcu: Continuing our example: $ SHOW DEVICE/FULL DISK3 Disk $1$DUC13: (SB3), device type RA70, is online, mounted, file-oriented device, shareable, served to cluster via MSCP Server, error logging is enabled. Error count 0 Operations completed 1824293 Owner process "" Owner UIC [SYS,SYSBOOT] Owner process ID 00000000 Dev Prot S:RWED,O:RWED,G:RWED,W:RWED Reference count 19 Default buffer size 512 Total blocks 1133160 Sectors per track 60 Total cylinders 1349 Tracks per cylinder 14 Allocation class 1 Volume label "DISK3" Relative volume number 0 >>>Cluster size 3 Transaction count 44 Free blocks 109143 >>>Maximum files allowed 141645 Extend quantity 5 Mount count 3 Mount status System Cache name "_$1$DIA0:XQPCACHE" Extent cache size 64 Maximum blocks in extent cache 10914 File ID cache size 64 Blocks currently in extent cache 10692 Quota cache size 0 Maximum buffers in FCP cache 1752 Volume status: subject to mount verification, write-through caching enabled. Volume is also mounted on SB8, SB12. From this we see that 'cluster size' = 3, and 'Maximum files allowed' = 141645. (Try this out. Find the values for 'cluster size' and 'Maximum files allowed' for your disk which has your ROSES.DAT file.) The formula for calculating the offset of our file's header block in INDEXF.SYS is: Maximum_files_allowed offset = (4 * cluster_size) + RUP--------------------- + file_number 4096 (Here RUP means round up to the nearest integer.) The first two factors calculate the number of blocks in INDEXF.SYS we need to skip over before we come to the file header blocks. First we skip over the first 4 clusters of the file which are used for the boot block and home blocks. Then we skip over the index file bitmap, the size of which is dependent upon the maximum number of files allowed for the disk. The bitmap is large enough to hold one bit for each file header indicating whether it is currently being used or not. There are 8 bits in a byte, and 512 bytes in a block, so each bitmap block accounts for 8*512=4096 files. Hence the size of the bitmap in blocks is the maximum number of files permitted on the disk divided by 4096, and rounded up to the nearest integer. For our example we have: 141645 block_offset = (4 * 3) + ------ + 18227 4096 = 18275 (Calculate the offset using your values for cluster_size, Maximum_files_allowed, and file_number.) Having performed the above calculation we can now directly access the header block for our file in INDEXF.SYS using the command: $ DUMP/BLOCKS=(START:block_offset,COUNT:1)/FILE_HEADER ddcu:[000000]INDEXF.SYS Try this out using your calculated value for block_offset. Verify that the header information is the same as the information obtained earlier when we used the command $ DUMP/HEADER/BLOCKS=(COUNT:0)/FORMAT ROSES.DAT Now that we've located the file header we can now locate the file itself on the disk. At the end of the file header is the Map Area where the Retrieval Pointers are located. The Retrieval Pointers tell us where the file is located on the disk. For our example, the file starts at Logical Block Number (LBN) 726039 and goes for a count of 3 blocks. We can examine directly those blocks on the disk to verify that they do contain the contents of our file. (The following command requires LOG_IO or PHY_IO privilege. Try this out if you can using for START the LBN value listed at the end of your formatted file header dump): $ DUMP/BLOCKS=(START:726039,COUNT:1) DISK3: Dump of device DISK3: on 24-MAR-1993 20:27:24.60 Logical block number 726039 (000B1417), 512 (0200) bytes 2C646572 20657261 20736573 6F52000E ..Roses are red, 000000 6C622065 72612073 74656C6F 69560011 ..Violets are bl 000010 73207369 20726167 7553000E 002C6575 ue,...Sugar is s 000020 65726120 6F732064 6E41000F 74656577 weet..And so are 000030 00000000 00000000 FFFF0021 756F7920 you!........... 000040 00000000 00000000 00000000 00000000 ................ 000050 Note: Some systems may have a disk defragmenter program such as Diskeeper running in the background. These programs defragment the disk by silently rearranging the files on the disk when nobody's looking. If you notice your file's starting LBN has unexpectedly changed it may be because a disk defragmenter program has moved it to a new location. We have now found the actual contents of our file. We can get the same dump of our ROSES.DAT file by using the simple command: $ DUMP ROSES.DAT Dump of file DISK3:[DELEYD.RMSDOC]ROSES.DAT;1 File ID (18227,76,0) End of file block 1 / Allocated 3 Virtual block number 1 (00000001), 512 (0200) bytes 2C646572 20657261 20736573 6F52000E ..Roses are red, 000000 6C622065 72612073 74656C6F 69560011 ..Violets are bl 000010 73207369 20726167 7553000E 002C6575 ue,...Sugar is s 000020 65726120 6F732064 6E41000F 74656577 weet..And so are 000030 00000000 00000000 00000021 756F7920 you!........... 000040 00000000 00000000 00000000 00000000 ................ 000050 So far we have seen how to locate a file using the file ID. However, the usual way of accessing a file is by specifying the name of the file and what directory it's in. In our example, our file name is ROSES.DAT and it's in directory [DELEYD.RMSDOC]. This is where directory files (*.DIR) come in. The very top level directory on an ODS-2 disk is [000000]000000.DIR;1 . This file always has file ID (4,4,0). So all we need to do is locate the header for it in the INDEXF.SYS file. At this point you may wonder how do we access the INDEXF.SYS file? The answer is the location of the INDEXF.SYS file is determined when the disk is mounted and saved in the Volume Control Block (VCB). Once the disk is mounted the INDEXF.SYS file is always accessible. So, given the file name [DELEYD.RMSDOC]ROSES.DAT;1 , we can access the contents of that file as follows: 1. Access INDEXF.SYS and obtain the file header for file ID (4,4,0) which is [000000]000000.DIR;1 . From the file header obtain the location of this file on the disk. Look in the file for a record giving the file ID of file [000000]DELEYD.DIR;1 2. Using the file ID just obtained for file [000000]DELEYD.DIR;1, access INDEXF.SYS and get the file header for this file, then from the file header obtain the location of this file on the disk. Look in this file for a record giving the file ID of file [DELEYD]RMSDOC.DIR;1 3. Using the file ID just obtained for file [DELEYD]RMSDOC.DIR;1, access INDEXF.SYS and get the file header for this file, then from the file header obtain the location of this file on the disk. Look in this file for a record giving the file ID of file [DELEYD.RMSDOC]ROSES.DAT;1 4. Using the file ID just obtained for file [DELEYD.RMSDOC]ROSES.DAT;1, access INDEXF.SYS and get the file header for this file, then from the file header obtain the location of this file on the disk. We have now found the location of our desired file on the disk. Note: This process of locating a file is a major part of what happens when a program requests to OPEN a file. Once the file is opened, (i.e. the location of the file is found), then the program can easily access the contents of the file. ======================================================================= PART II: INTERNAL STRUCTURE OF RMS FILES There are three possible file organizations: 1. Sequential 2. Relative 3. Indexed Within each file organization there are seven possible record formats: 1. Variable 2. Fixed 3. Stream 4. Stream_CR 5. Stream_LF 6. Undefined 7. VFC (Variable-Length with Fixed-Length Control Field) Within each record there are four possible record attributes: 1. Carriage_control Carriage_return 2. Fortran 3. Print 4. None The following abbreviations are used throughout the rest of this paper: Abbreviations: File Formats: SEQ - Sequential REL - Relative IND - Indexed Record Formats: VAR - Variable length FIX - Fixed length STR - Stream SCR - Stream_CR SLF - Stream_LF UND - Undefined VFC - Variable with Fixed length control Record Attributes: CAR - Carriage_control Carriage_return FOR - Fortran PRT - Print NON - None Not all combinations of file organization, record format, and record attributes are allowed. The tables below lists all the allowable combinations. 'X' marks an allowable combination: Table of allowable combinations: SEQUENTIAL VAR FIX STR SCR SLF UND VFC CAR X X X X X X X FOR X X X X X X X PRT . . . . . . X NON X X X X X X X RELATIVE VAR FIX STR SCR SLF UND VFC CAR X X . . . . X FOR X X . . . . X PRT . . . . . . X NON X X . . . . X INDEXED VAR FIX STR SCR SLF UND VFC CAR X X . . . . . FOR X X . . . . . PRT . . . . . . . NON X X . . . . . There are a total of 38 different possible combinations. The guide to VMS File Applications explains in detail the different file organizations, record formats, and attributes. A brief summary is given here: FILE ORGANIZATIONS: SEQUENTIAL: In a sequential file the records follow one another in sequential order. Records are accessed in sequential order, starting with the first record and continuing through the file until the last record is reached. Attempting to access a next record after the last records results in an End Of File status. RELATIVE: In a relative file the records are placed in fixed length cells which follow one another in sequential order. The records may be accessed sequentially as in a sequential file, or the records may be accessed directly via record number. INDEXED: In an indexed file the placement of the records themselves is transparent to the user. RMS stores and retrieves the records for the user. A fixed portion of each record is declared to be the primary key. For example, the first 10 bytes of each record could be declared the primary key. Records may then be retrieved by specifying the primary key portion of the desired record. Records may also be retrieved in sequential order, with the records being sorted according to the primary key. RECORD FORMAT: VARIABLE: The data portion of each record is preceded with a 2-byte binary count field that specifies the length of the record in bytes, excluding the byte field itself. Each new record begins on an even byte (WORD aligned). A null character is appended to the end of the previous record if necessary so the next record will begin on a padded boundary. FIXED: All records are of a fixed length. STREAM: The data portion of each record is terminated by the characters (a carriage-return 0x0D byte followed by a line-feed 0x0A byte). This sequence of characters may not appear within the data portion of a record as it indicates the end of a record. STREAM_CR: The data portion of each record is terminated by a character (carriage-return 0x0D byte). This character may not appear within the data portion of a record as it indicates the end of a record. STREAM_LF: The data portion of each record is terminated by a character (line-feed 0x0A byte). This character may not appear within the data portion a record as it indicates the end of a record. UNDEFINED: There are no records per se, the file is just a stream of bytes starting with the first byte and ending with the last byte. VFC: Variable-Length with Fixed-Length Control Field. The data portion of each record is preceded by four bytes. The first 10 --------------------------------------------------------------------- | FAT$W_RSIZE | FAT$B_RATTRIB | FAT$B_RTYPE | 14 <-- FH2$W_RECATTR (offset to record attributes block) --------------------------------------------------------------------- | FAT$L_HIBLK | 18 --------------------------------------------------------------------- | FAT$L_EFBLK | 1C --------------------------------------------------------------------- | FAT$B_VFCSIZE | FAT$B_BKTSIZE | FAT$W_FFBYTE | 20 --------------------------------------------------------------------- | FAT$W_DEFEXT | FAT$W_MAXREC | 24 --------------------------------------------------------------------- | | FAT$W_GBC | 28 --------------------------------------------------------------------- | | 2C --------------------------------------------------------------------- | FAT$W_VERSIONS | | 30 --------------------------------------------------------------------- | | 34 --------------------------------------------------------------------- | FH2$B_ACC_MODE | FH2$B_MAP_INUSE| FH2$W_RECPROT | 38 --------------------------------------------------------------------- 111111 | (FH2$W_UICGROUP) FH2$L_FILEOWNER (FH2$W_UICMEMBER) | 3C 5432109876543210 --------------------------------------------------------------------- ------------------ | FH2$W_BACKLINK | FH2$W_FILEPROT | 40 FH2$W_FILPROT -> |DEWRDEWRDEWRDEWR| --------------------------------------------------------------------- ------------------ | FH2$B_BK_FIDNMX| FH2$B_BK_FIDRVN| FH2$W_BK_FIDSEQ | 44 --------------------------------------------------------------------- | | FH2$_RU_ACTIVE | FH2$B_JOURNAL | 48 --------------------------------------------------------------------- | FH2$L_HIGHWATER | 4C --------------------------------------------------------------------- // \\ \\ // --------------------------------------------------------------------- | FH2$W_CHECKSUM | | 1FC --------------------------------------------------------------------- two bytes (WORD) give the length of the record in bytes. The third byte is the PREFIX byte which contains carriage control information for a printer or output device to be executed prior to printing the record. The fourth byte is the SUFFIX byte which contains carriage control information for a printer or output device to be executed after prin