hp.com home products and services support and drivers solutions how to buy
cd-rom home
End of Jump to page title
HP OpenVMS systems
documentation

Jump to content


HP OpenVMS RTL Library (LIB$) Manual

HP OpenVMS RTL Library (LIB$) Manual


Previous Contents Index


ROUTINE TEST( TPARSE_ARGUMENT_BLOCK : REF BLOCK[ , BYTE ] ) = 
BEGIN 
 
TPARSE_ARGUMENT_BLOCK[ TPA$V_ABBREV ] = 1 
 
END; 

  • On VAX systems, LIB$TPARSE uses a nonstandard linkage that establishes the address of the argument block as the routine's actual argument pointer. Therefore an action routine can reference fields in the argument block by their symbolic offsets relative to the AP (argument pointer) register.
    For example:


    ROUTINE TEST = 
    BEGIN 
     
    BUILTIN 
        AP; 
     
    BIND 
        TPARSE_ARGUMENT_BLOCK = AP : REF BLOCK[ , BYTE ]; 
     
    TPARSE_ARGUMENT_BLOCK[ TPA$V_ABBREV ] = 1 
     
    END; 
    

    3.1.2 Action Routine Return Values
    The action routine returns a value to LIB$T[ABLE_]PARSE in R0 that controls execution of the current state transition. If the action routine returns success (low bit set in R0) then LIB$T[ABLE_]PARSE proceeds with the execution of the state transition. If the action routine returns failure (low bit clear in R0), LIB$T[ABLE_]PARSE rejects the transition that was being processed and acts as if the symbol type of that transition had not matched. It proceeds to evaluate other transitions in that state for eligibility.

    Note

    Prior to calling an action routine, LIB$T[ABLE_]PARSE sets the low bit of R0 to make it easier for the action routine to return success.

    If an action routine returns a nonzero failure status to LIB$T[ABLE_]PARSE and no subsequent transitions in that state match, LIB$T[ABLE_]PARSE will return the status of the action routine, rather than the status LIB$_SYNTAXERR. In longword-valued functions in high-level languages, this value is returned in R0.

    3.1.3 Using an Action Routine to Reject a Transition
    An action routine can intentionally return a failure status to force LIB$T[ABLE_]PARSE to reject a transition. This allows you to implement symbol types specific to particular applications. To recognize a specialized symbol type, code a state transition using a LIB$T[ABLE_]PARSE symbol type that describes a superset of the desired set of possible tokens. The associated action routine then performs the additional discrimination necessary and returns success or failure to LIB$T[ABLE_]PARSE, which then accordingly executes or fails to execute the transition.

    A pure finite-state machine, for instance, has difficulty recognizing strings that are shorter than some maximum length or accepting numeric values confined to some particular range.

    3.2 Blanks in the Input String
    The default mode of operation in LIB$T[ABLE_]PARSE is to treat blanks as separators. That is, they can appear between any two tokens in the string being parsed without being called for by transitions in the state table. Because blanks are significant in some situations, LIB$T[ABLE_]PARSE processes blanks if you have set the bit TPA$V_BLANKS in the options longword of the argument block. The following input string shows the difference in operation:


    ABC  DEF 
    

    LIB$T[ABLE_]PARSE recognizes the string by the following sequences of state transitions, depending on the state of the blanks control flag. The following examples illustrate processing with and without TPA$V_BLANKS set:

    Your action routines can set or clear TPA$V_BLANKS as LIB$T[ABLE_]PARSE enters or leaves sections of the state table in which blanks are significant. LIB$T[ABLE_]PARSE always checks the blanks control flag as it enters a state. If the flag is clear, it removes any space or tab characters present at the front of the input string before it proceeds to evaluate transitions. Note that when the TPA$V_BLANKS flag is clear, the TPA$_BLANK symbol type will never match. If TPA$V_BLANKS is set, you must explicitly process blanks.

    3.3 Special Characters in the Input String
    Not all members of the ASCII character set can be entered directly in the state table definitions. Examples include the single quotation mark and all control characters.

    In MACRO state tables, such characters can be specified as the symbol type with any assembler expression that is equivalent to the ASCII code of the desired character, not including the single quotes. For example, you could code a transition to match a backspace character as follows:


    BACKSPACE = 8 
       .
       .
       .
    $TRAN BACKSPACE, ... 
    

    MACRO places extra restrictions on the use of a comma in arguments to macros; often they must be surrounded by one or more angle brackets. Using a symbolic name for the comma will avoid such difficulties.

    To build a transition matching such a single character in a BLISS state table, you can use the %CHAR lexical function as follows:


    LITERAL BACKSPACE = 8; 
       .
       .
       .
    $STATE (label, 
           (%CHAR (BACKSPACE), ... ) 
            ); 
    

    3.4 Abbreviating Keywords
    The default mode of LIB$T[ABLE_]PARSE is exact match. All keywords in the input string must exactly match their spelling, length and case in the state table. However, many languages (command languages in particular) allow you to abbreviate keywords. For this reason, LIB$T[ABLE_]PARSE has three abbreviation facilities to permit the recognition of abbreviated keywords when the state table lists only the full spellings. All three are controlled by flags and options defined in the argument block OPTIONS field. Table lib-11 describes these flags.

    Table lib-11 Keyword Abbreviation Flags
    Flag Description
    TPA$B_MCOUNT
    TPA64$B_MCOUNT
    By setting a value in the MCOUNT argument block field, the calling program or action routine specifies a minimum number of characters from the abbreviated keyword that must be present for a match to occur. For example, setting the byte to the value 4 would allow the keyword DEASSIGN to appear in an input string as DEAS, DEASS, DEASSI, DEASSIG, or DEASSIGN.

    LIB$T[ABLE_]PARSE checks all the characters of the keyword string. Incorrect spellings beyond the minimum abbreviation are not permitted.

    TPA$V_ABBRFM
    TPA64$V_ABBRFM
    If you set the ABBRFM flag in the argument block OPTIONS field, LIB$T[ABLE_]PARSE recognizes any leftmost substring of a keyword as a match for that keyword. LIB$T[ABLE_]PARSE does not check for ambiguity; it matches the first keyword listed in the state table of which the input token is a subset.

    For proper recognition of ambiguous keywords, the keywords in each state must be arranged in alphabetical order by the ASCII collating sequence as follows:

    Dollar sign ($)
    Numerics
    Uppercase alphabetics
    Underscore (_)
    Lowercase alphabetics
    TPA$V_ABBREV
    TPA64$V_ABBREV
    If you set the ABBREV flag in the argument block OPTIONS field, LIB$T[ABLE_]PARSE recognizes any abbreviation of a keyword as long as it is unambiguous among the keywords in that state.

    If LIB$T[ABLE_]PARSE finds that the front of the input string contains an ambiguous keyword string, it sets the AMBIG flag in the OPTIONS field and refuses to recognize any keyword transitions in that state. (It still accepts other symbol types.) The AMBIG flag can be checked by an action routine that is called when coming out of that state, or by the calling program if LIB$T[ABLE_]PARSE returns with a syntax error status. LIB$T[ABLE_]PARSE clears the flag when it enters the next state.

    If both the ABBRFM and ABBREV flags are set, ABBRFM takes precedence.

    Note

    Using a keyword abbreviation option can permit short abbreviations or ambiguity, which restricts the extensibility of a language. Adding a new keyword can make a formerly valid abbreviation ambiguous.

    3.5 Using Subexpressions
    LIB$T[ABLE_]PARSE subexpressions are analogous to subroutines within the state table. You can use subexpressions as you would use subroutines in any program:

    A subexpression call is indicated with the MACRO expression !label or the BLISS expression (label) as the transition type argument. Transfer of control to a subexpression causes LIB$T[ABLE_]PARSE to call itself recursively, using the same argument block and keyword table as the original call, and using the specified state label as a starting state.

    The following statement is an example of a $TRAN macro that calls a subexpression:


    $TRAN !Q_STRING,,,,Q_DESCRIPTOR 
    
    In this example, Q_STRING is the label of another state, a subexpression, in the same state table.

    When LIB$T[ABLE_]PARSE evaluates a transition that transfers control to a subexpression, it evaluates the subexpression's transitions and processes the remaining input string.

    3.5.1 Using Action Routines and Storing Data in a Subexpression
    Be careful when designing subexpressions whose transitions provide action routines or use the mask and msk-adr arguments. As LIB$T[ABLE_]PARSE processes the state transitions of a subexpression, it calls the specified action routines and stores the mask and msk-adr. If the subexpression fails, LIB$T[ABLE_]PARSE backs up the input string and resumes processing in the calling state. However, any effect that an action routine has had on the caller's database cannot be undone.

    If subexpressions are used only as state table subroutines, there is usually no harm done, because when a subexpression fails in this mode, the parse generally fails. This is not true of pushdown or nondeterministic parsing. In applications where you expect subexpressions to fail, design action routines to store results in temporary storage. You can then make these results permanent at the main level, where the flow of control is deterministic.

    3.5.2 An Example: Parsing a Quoted String
    The following example is an excerpt of a state table that parses a string quoted by an arbitrary character. The table interprets the first character that appears as a quote character. Many text editors and some programming languages contain this sort of construction.

    LIB$T[ABLE_]PARSE processes a transition that invokes a subexpression as it would any other transition:


    ;+ 
    ; Main level state table. The first transition accepts and 
    ; stores the quoting character. 
    ;- 
         $STATE    STRING 
         $TRAN     TPA$_ANY,,,,Q_CHAR 
    ;+ 
    ; Call the subexpression to accept the quoted string and store 
    ; the string descriptor. Note that the descriptor spans all 
    ; the characters accepted by the subexpression. 
    ;- 
         $STATE 
         $TRAN     !Q_STRING,,,,Q_DESCRIPTOR 
         $TRAN     TPA$_LAMBDA,TPA$_FAIL 
    ;+ 
    ; Accept the trailing quote character, left behind by the 
    ; subexpression 
    ;- 
         $STATE 
         $TRAN     TPA$_ANY,NEXT 
    ;+ 
    ; Subexpression to scan the quoted string. The second transition 
    ; matches until it is rejected by the action routine. The subexpression 
    ; should never encounter the end of string before the final quoting 
    ; character. 
    ;- 
         $STATE     Q_STRING 
         $TRAN     TPA$_EOS,TPA$_FAIL 
         $TRAN     TPA$_ANY,Q_STRING,TEST_Q 
         $TRAN     TPA$_LAMBDA,TPA$_EXIT 
    ;+ 
    ; The following MACRO subroutine compares the current character 
    ; with the quoting character and returns failure if it matches. 
    ;- 
     
    TEST_Q: .WORD     0                     ; null entry mask 
            CMPB      TPA$B_CHAR(AP),Q_CHAR ; check the character 
            BNEQ      10$                   ; note R0 is already 1 
            CLRL      R0                    ; match - reject transition 
    10$:    RET 
    

    3.5.3 An Example: Parsing a Complex Grammar
    The following example is an excerpt from a state table that shows how to use subexpressions to parse a complex grammar. The state table accepts a number followed by a keyword qualifier. Depending on the keyword, the table interprets the number as decimal, octal, or hexadecimal. The state table accepts strings such as the following:

    10/OCTAL
    32768/DECIMAL
    77AF/HEX

    This sort of grammar is difficult to parse with a deterministic finite-state machine. Using a subexpression look-ahead of two states permits a simpler expression of the state tables.


    ;+ 
    ; Main state table entry. Accept a number of some type and store 
    ; its value at the location NUMBER. 
    ;- 
         $STATE 
         $TRAN     !OCT_NUM,NEXT,,,NUMBER 
         $TRAN     !DEC_NUM,NEXT,,,NUMBER 
         $TRAN     !HEX_NUM,NEXT,,,NUMBER 
    ;+ 
    ; Subexpressions to accept an octal number followed by the OCTAL 
    ; qualifier. 
    ;- 
         $STATE     OCT_NUM 
         $TRAN      TPA$_OCTAL 
         $STATE 
         $TRAN      '/' 
         $STATE 
         $TRAN      'OCTAL',TPA$_EXIT 
    ;+ 
    ; Subexpression to accept a decimal number followed by the DECIMAL 
    ; qualifier. 
    ;- 
         $STATE     DEC_NUM 
         $TRAN      TPA$_DECIMAL 
         $STATE 
         $TRAN      '/' 
         $STATE 
         $TRAN      'DECIMAL',TPA$_EXIT 
    ;+ 
    ; Subexpression to accept a hex number followed by the HEX 
    ; qualifier. 
    ;- 
         $STATE     HEX_NUM 
         $TRAN      TPA$_HEX 
         $STATE 
         $TRAN      '/' 
         $STATE 
         $TRAN      'HEX',TPA$_EXIT 
    

    Note that the transitions that follow a match with a numeric token do not disturb the NUMBER field in the argument block. This allows the main level subexpression call to retrieve it when the subexpression returns.

    3.6 LIB$T[ABLE_]PARSE and Modularity
    To use LIB$T[ABLE_]PARSE in a modular and shareable fashion:

    4 Data Representation
    This section describes the binary representation and allocation of a LIB$T[ABLE_]PARSE state table and a keyword table. While this information is not required to use LIB$T[ABLE_]PARSE, it may be useful in debugging your program.

    4.1 State Table Representation
    Each state consists of its transitions concatenated in memory. LIB$T[ABLE_]PARSE equates the state label to the address of the first byte of the first transition. A marker in the last transition identifies the end of the state. The LIB$T[ABLE_]PARSE table macros build the state table in the PSECT _LIB$STATE$.

    Each transition in a state consists of 2 to 23 bytes containing the arguments of the transition. The state table generation macros do not allocate storage for arguments not specified in the transition macro. This allows simple transitions to be represented efficiently. For example, the following transition, which simply accepts the character "?" and falls through to the next state, is represented in two bytes:

    $TRAN '?'

    In this section, pointers described as self-relative are signed displacements from the address following the end of the pointer (this is identical to branch displacements in the OpenVMS VAX instruction set).

    Table lib-12 describes the elements of a state transition in the order in which they appear, if present, in the transition. If a transition does not include a specific option, no bytes are assigned to the option within the transition.

    Table lib-12 Binary Representation of a LIB$T [ABLE_]PARSE State Transition
    Transition Element No. of Bytes Description
    Symbol type 1 The first byte of a transition always contains the binary coding of the symbol type accepted by this transition. Flag bit 0 in the flags byte controls the interpretation of the type byte. If the flag is clear, the type byte represents a single character (the ' x' construct). If the flag bit is set, the type byte is one of the other type codes (keyword, number, and so on). The following table lists the symbol types accepted by LIB$T[ABLE_]PARSE:
    Symbol Type Binary Encoding
    ' x' ASCII code of the character (8 bits)
    ' keyword' The keyword index (0 to 219)
    TPA$_DECIMAL_64 (Alpha and I64 only) 228
    TPA$_OCTAL_64 (Alpha and I64 only) 229
    TPA$_HEX_64 (Alpha and I64 only) 230
    TPA$_NODE_ACS 231
    TPA$_NODE_PRIMARY 232
    TPA$_NODE 233
    TPA$_FILESPEC 234
    TPA$_UIC 235
    TPA$_IDENT 236
    TPA$_ANY 237
    TPA$_ALPHA 238
    TPA$_DIGIT 239
    TPA$_STRING 240
    TPA$_SYMBOL 241
    TPA$_BLANK 242
    TPA$_DECIMAL 243
    TPA$_OCTAL 244
    TPA$_HEX 245
    TPA$_LAMBDA 246
    TPA$_EOS 247
    TPA$_SUBEXPR 248 (subexpression call)
      (Other codes are reserved for expansion)
        Use of the TPA$_FILESPEC, TPA$_NODE, TPA$_NODE_PRIMARY, or TPA$_NODE_ACS symbol type results in calls to the $FILESCAN system service. Use of the symbol type TPA$_IDENT results in calls to the $ASCTOID system service. If your application of LIB$T[ABLE_]PARSE runs in an environment other than OpenVMS user mode, you must carefully evaluate whether use of these services is consistent with your environment.
    First flags byte 1 This byte contains the following bits, which specify the options of the transition. It is always present.
    Bit Description
    0 Set if the type byte is not a single character.
    1 Set if the second flags byte is present.
    2 Set if this is the last transition in the state.
    3 Set if a subexpression pointer is present.
    4 Set if an explicit target state is present.
    5 Set if the mask longword is present.
    6 Set if the msk-adr longword is present.
    7 Set if an action routine address is present.
    Second flags byte 1 This byte is present if any of its flag bits is set. It contains an additional flag describing the transition. It is used as follows:
    Bit Description
    0 Set if the action routine argument is present.
    Subexpression pointer 2 This word is present in transitions that are subexpression calls. It is a 16-bit signed self-relative pointer to the starting state of the subexpression.
    Argument longword 4 This longword field contains the 32-bit action routine argument, when specified.
    Action routine address 4 This longword contains a self-relative pointer to the action routine, when specified.
    Bit mask 4 This longword contains the mask argument, when specified.
    Mask address 4 This longword, when specified, contains a self-relative pointer through which the mask, or data that depends on the symbol type, is to be stored. Because the pointer is self-relative, when it points to an absolute location, the state table is not PIC (position-independent code).
    Transition target 2 This word, when specified, contains the address of the target state of the transition. The address is stored as a 16-bit signed self-relative pointer. The final state TPA$_EXIT is coded as a word whose value is --1; the failure state TPA$_FAIL is coded as a word whose value is --2.

    4.2 Keyword Table Representation
    The keyword table is a vector of 16-bit signed pointers that address locations in the keyword string area, relative to the start of the keyword vector. It is the structure to which the $INIT_STATE macro equates its second argument.


    Previous Next Contents Index