hp.com home products and services support and drivers solutions how to buy
cd-rom home
End of Jump to page title
HP OpenVMS systems
documentation

Jump to content


HP OpenVMS MACRO Compiler Porting and User's Guide

HP OpenVMS MACRO Compiler
Porting and User's Guide


Previous Contents Index

2.10 Using Floating-Point Instructions

All floating-point instructions and directives, with the exception of POLYx, EMODx and all H_floating instructions, are supported.

These instructions are emulated by means of subroutine calls. This support is provided to allow hands-off compatibility for most existing VAX MACRO modules and is not designed for fast floating-point performance.

Besides the overhead of the emulation routine call, on OpenVMS Alpha systems, all floating-point operands must be passed through memory because the Alpha architecture does not have instructions to move values directly from the integer registers to the floating-point registers. In addition, on the first floating-point instruction, the FEN (floating-point enable) bit is set for the process which will cause the entire floating-point register set to be saved and restored on every context switch for the life of the image.

2.10.1 Differences Between the OpenVMS VAX and OpenVMS Alpha/I64 Implementations

The differences between the implementations on OpenVMS VAX and OpenVMS Alpha/I64 systems are noted in the following list:

2.10.2 Impact on Routines in Other Languages

This support does not make the floating-point register set visible to the compiler. It simply allows floating point-operations to be done on the integer registers. This means that routines in other languages that want to interface with a VAX MACRO routine, either calling it or being called by it, must not expect any floating-point values as inputs or outputs. Compilers for other languages will pass these values in the floating-point registers. Floating-point arguments can be passed into or out of a VAX MACRO routine only by pointer.

Calls to run-time library (RTL) routines of other languages fall into this category. For example, a call to MTH$RANDOM returns a floating value in floating-point register F0. The compiler cannot directly read F0. You need to either create a jacket routine (in another language), which makes the call to MTH$RANDOM and then moves the result to R0, or write a separate routine that only does the move.

2.11 Preserving VAX Atomicity and Granularity

The VAX architecture includes instructions that perform a read-modify-write memory operation so that it appears to be a single, noninterruptible operation in a uniprocessing system. Atomicity is the term used to describe the ability to modify memory in one operation. Because the complexity of such instructions severely limits performance, read-modify-write operations on an Alpha or I64 system can be performed only by nonatomic, interruptible instruction sequences.

Furthermore, VAX instructions can address single-aligned or unaligned byte, word, and longword locations in memory without affecting the surrounding memory locations. (A data item is considered aligned if its address is an even multiple of the item's size in bytes.) Granularity is the term used to describe the ability to independently write to portions of aligned longwords.

Because byte, word, and unaligned longword access also severely limits performance, an OpenVMS Alpha system can only access aligned longword or quadword locations. Therefore, a sequence of instructions to write a single byte, word, or unaligned longword causes some of the surrounding bytes to be read and rewritten.

While Itanium has instructions for accessing bytes and words, there is a performance penalty if they are unaligned.

These architectural differences can cause data to become corrupted under certain conditions.

In an OpenVMS Alpha system, atomicity and granularity preservation are not provided by locking out other threads from modifying memory, but by providing a way to determine if a piece of memory may have been modified during the read-modify-write operation. In this case, the read-modify-write operation is retried.

In an OpenVMS I64 system, atomicity is achieved by retrying the operation as for OpenVMS Alpha.

To ensure data integrity, the compiler provides certain qualifiers and directives to be used for the conditions described in the following sections.

2.11.1 Preserving Atomicity

On OpenVMS VAX, OpenVMS Alpha, and OpenVMS I64 multiprocessing systems, an application in which multiple, concurrent threads can modify shared data in a writable global section must have some way of synchronizing their access to that data. On a OpenVMS VAX single processor system, a memory modification instruction is sufficient to provide synchronized access to shared data. However, it is not sufficient on OpenVMS Alpha or OpenVMS I64 systems.

The compiler provides the /PRESERVE=ATOMICITY option to guarantee the integrity of read-modify-write operations for VAX instructions that have a memory modify operand. Alternatively, you can insert the .PRESERVE ATOMICITY and .NOPRESERVE ATOMICITY directives in sections of VAX MACRO source code as required to enable and disable atomicity.

For instance, assume the following instruction, which requires a read, modify, and write sequence on the data pointed to by R1:


INCL (R1) 

In a OpenVMS VAX system, the microcode performs these three operations. Therefore, an interrupt cannot occur until the sequence is fully completed.

In an OpenVMS Alpha system, the following three instructions are required to perform the one VAX instruction:


LDL     R27, (R1) 
ADDL    R27, 1, R27 
STL     R27, (R1) 

Similarly, in an OpenVMS I64 system, the following four instructions are required:


ld4     r22 = [r9] 
sxt4    r22 = r22 
adds    r22 = 1, r22 
st4     [r9] = r22 

The problem with this Alpha/Itanium code sequence is that an interrupt can occur between any of the instructions. If the interrupt causes an AST routine to execute or causes another process to be scheduled between the LDL and the STL, and the AST or other process updates the data pointed to by R1, the STL will store the result (R1) based on stale data.

When an atomic operation is required, and /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified, the compiler generates the following Alpha instruction sequence for INCL (R1):


Retry:  LDL_L   R28,(R1) 
        ADDL    R28,#1,R28 
        STL_C   R28,(R1) 
 
        BEQ     R28, fail 
         . 
         . 
         . 
fail:   BR      Retry 

and the following Itanium instruction sequence:


$L3:    ld4            r23 = [r9] 
        mov.m          apccv = r23 
        mov            r22 = r23 
        sxt4           r23 = r23 
        adds           r23 = 1, r23 
        cmpxchg4.acq   r23, [r9] = r23 
        cmp.eq         pr0, pr6 = r22, r23 
  (pr6) br.cond.dpnt.few $L3   

On the OpenVMS Alpha system, if (R1) is modified by any other code thread on the current or any other processor during this sequence, the Store Longword Conditional instruction (STL_C) will not update (R1), but will indicate an error by writing 0 into R28. In this case, the code branches back and retries the operation until it completes without interference.

The BEQ Fail and BR Retry are done instead of a BEQ Retry because the branch prediction logic of the Alpha architecture assumes that backward conditional branches will be taken. Since this operation will rarely need to be retried, it is more efficient to make a forward conditional branch which is assumed not to be taken.

Because of the way atomicity is preserved on OpenVMS Alpha systems, this guarantee of atomicity applies to both uniprocessor and multiprocessor systems. This guarantee applies only to the actual modify instruction and does not extend interlocking to subsequent or previous memory accesses (see Section 2.11.6).

The OpenVMS I64 version of the code uses the compare-exchange instruction (cmpxchg) to implement the locked access, but the effect is the same: If other code has modified the location being modified here, then this code will loop back and retry the operation.

You should take special care in porting an application to an OpenVMS Alpha or OpenVMS I64 system if it involves multiple processes that modify shared data in a writable global section, even if the application executes only on a single processor. Additionally, you should examine any application in which a mainline process routine modifies data in process space that can also be modified by an asynchronous system trap (AST) routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications1 for a more complete discussion of the programming issues involved in read-modify-write operations in an Alpha system.

Warning

When preserving atomicity, the compiler generates aligned memory instructions that cannot be handled by the Alpha PALcode unaligned fault handler. They will cause a fatal reserved operand fault on unaligned addresses. Therefore, all memory references for which .PRESERVE ATOMICITY is specified must be to aligned addresses (see Section 2.11.5).

2.11.2 Preserving Granularity

To preserve the granularity of a VAX MACRO memory write instruction on a byte, word, or unaligned longword on Alpha means to guarantee that the instruction executes successfully on the specified data and preserves the integrity of the surrounding data.

The VAX architecture includes instructions that perform independent access to byte, word, and unaligned longword locations in memory so two processes can write simultaneously to different bytes of the same aligned longword without interfering with each other.

The Alpha architecture, as originally implemented, defined instructions that could address only aligned longword and quadword operands. However, byte and word operands for load and store were later added.

On Alpha, code that writes a data field to memory that is less than a longword in length or is not aligned can do so only by using an interruptible instruction sequence that involves a quadword load, an insertion of the modified data into the quadword, and a quadword store. In this case, two processes that intend to write to different bytes in the same quadword will actually load, perform operations on, and store the whole quadword. Depending on the timing of the load and store operations, one of the byte writes could be lost.

The Itanium architecture has byte, word, longword, and quadword addressibility, so granularity of access is easily obtained without having to request it with an option or declaration.

The compiler provides the /PRESERVE=GRANULARITY option to guarantee the integrity of byte, word, and unaligned longword writes. The /PRESERVE=GRANULARITY option causes the compiler to generate Alpha instructions that provide granularity preservation for any VAX instructions that write to bytes, words, or unaligned longwords. Alternatively, you can insert the .PRESERVE GRANULARITY and .NOPRESERVE GRANULARITY directives in sections of VAX MACRO source code as required to enable and disable granularity preservation.

For example, the instruction MOVB R1, (R2) generates the following Alpha code sequence:


LDQ_U     R23, (R2) 
INSBL     R1, R2, R22 
MSKBL     R23, R2, R23 
BIS       R23, R22, R23 
STQ_U     R23, (R2) 

If any other code thread modifies part of the data pointed to by (R2) between the LDQ_U and the STQ_U instructions, that data will be overwritten and lost.

The following Itanium code sequence is generated:


st1     [r28] = r9 

If you have specified that granularity be preserved for the same instruction, by either the command qualifier or the directive, the Alpha code sequence becomes the following:


          BIC       R2,#^B0111,R24 
RETRY:    LDQ_L     R28,(R24) 
          MSKBL     R28,R2,R28 
          INSBL     R1,R2,R25 
          BIS       R25,R28,R25 
          STQ_C     R25,(R24) 
          BEQ       R25, FAIL 
           . 
           . 
           . 
FAIL:     BR        RETRY 

In this case, if the data pointed to by (R2) is modified by another code thread, the operation will be retried.

The Itanium code sequence would be unchanged, because the code is already only writing to the affected memory locations.

For a MOVW R1,(R2) instruction, the Alpha code generated to preserve granularity depends on whether the register R2 is currently assumed to be aligned by the compiler's register alignment tracking. If R2 is assumed to be aligned, the compiler generates essentially the same code as in the preceding MOVB example, except that it uses INSWL and MSKWL instructions instead of INSBL and MSKBL, and it uses #^B0110 in the BIC of the R2 address. If R2 is assumed to be unaligned, the compiler generates two separate LDQ_L/STQ_C pairs to ensure that the word is correctly written even if it crosses a quadword boundary.

Similarly, for Itanium, the compiler will simply generate st2 [r28] = r9 if the address is word-aligned.

Warning

The code generated for an aligned word write, with granularity preservation enabled, will cause a fatal reserved operand fault at run time if the address is not aligned. If the address being written to could ever be unaligned, inform the compiler that it should generate code that can write to an unaligned word by using the compiler directive .SET_REGISTERS UNALIGNED=Rn immediately before the write instruction.

To preserve the granularity of a MOVL R1,(R2) instruction, the compiler always writes whole longwords with a STL instruction, even if the address to which it is writing is assumed to be unaligned. If the address is unaligned, the STL instruction will cause an unaligned memory reference fault. The PALcode unaligned fault handler will then do the loads, masks, and stores necessary to write the unaligned longword. However, since PALcode is noninterruptible, this ensures that the surrounding memory locations are not corrupted.

When porting an application to an OpenVMS Alpha system, you should determine whether the application performs byte, word, or unaligned longword writes to memory that is shared either with processes executing on the local processor, or with processes executing on another processor in the system, or with an AST routine or condition handler. See Migrating to an OpenVMS AXP System: Recompiling and Relinking Applications for a more complete discussion of the programming issues involved in granularity operations in an OpenVMS Alpha system.

Note

INSV instructions do not generate code that correctly preserves granularity when granularity is turned on.

2.11.3 Precedence of Atomicity Over Granularity

If you enable the preservation of both granularity and atomicity, and the compiler encounters VAX code that requires that both be preserved, atomicity takes precedence over granularity.

For example, the instruction INCW 1(R0), when compiled with .PRESERVE=GRANULARITY, retries the write of the new word value, if it is interrupted. However, when compiled with .PRESERVE=ATOMICITY, it will also refetch the initial value and increment it, if interrupted. If both options are specified, it will do the latter.

In addition, while the compiler can successfully generate code for unaligned words and longwords that preserves granularity, it cannot generate code for unaligned words or longwords that preserves atomicity. If both options are specified, all memory references must be to aligned addresses.

2.11.4 When Atomicity Cannot Be Guaranteed

Because compiler atomicity guarantees only affect memory modification operands in VAX instructions, you should take special care in examining VAX MACRO sources for coding problems that /PRESERVE=ATOMICITY cannot resolve on OpenVMS Alpha or OpenVMS I64 systems.

For example, consider the following VAX instruction:


ADDL2 (R1),4(R1) 

For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:


        LDL     R28,(R1) 
Retry:  LDL_L   R24,4(R1) 
        ADDL    R28,R24,R24 
        STL_C   R24,4(R1) 
        BEQ     fail 
        . 
        . 
        . 
fail:   BR      Retry 

Note that, in this code sequence, when the STL_C fails, only the modify operand is reread before the add. The data (R1) is not reread. This behavior differs slightly from VAX behavior. In an OpenVMS VAX system, the entire instruction would execute without interruption; in an OpenVMS Alpha or OpenVMS I64 system, only the modify operand is updated atomically.

As a result, code that requires the read of the data (R1) to be atomic must use another method, such as a lock, to obtain that level of synchronization.

For this instruction, the compiler generates an Itanium code sequence such as the following:


        ld4     r19 = [r9] 
        sxt4    r19 = r19 
        adds    r16 = 4, r9 
$L4:    ld4     r17 = [r16] 
        mov.m   apccv = r17 
        mov     r15 = r17 
        sxt4    r17 = r17 
        add     r17 = r19, r17 
        cmpxchg4.acq r17, [r16] = r17 
        cmp.eq  pr0, pr7 = r15, r17 
(pr7)   br.cond.dpnt.few $L4 

Consider another VAX instruction:


MOVL    (R1),4(R1) 
For this instruction, the compiler generates an Alpha code sequence such as the following whether or not atomicity preservation was turned on:


LDL     R28,(R1) 
STL     R28,4(R1) 

The VAX instruction in this example is atomic on a single VAX CPU, but the Alpha instruction sequence is not atomic on a single Alpha CPU. Because the 4(R1) operand is a write operand and not a modify operand, the operation is not made atomic by the use of the LDL_L and STL_C.

On OpenVMS I64 systems, the code sequence would be something like the following:


ld4     r14 = [r9] 
sxt4    r14 = r14 
adds    r24 = 4, r9 
st4     [r24] = r14 

Finally, consider a more complex VAX INCL instruction:


INCL    @(R1) 
For this instruction, the compiler generates an Alpha code sequence such as the following, when /PRESERVE=ATOMICITY (or .PRESERVE ATOMICITY) is specified:


        LDL     R28,(R1) 
Retry:  LDL_L   R24,(R28) 
        ADDL    R24,#1,R24 
        STL_C   R24,(R28) 
        BEQ     fail 
        . 
        . 
        . 
fail:   BR      Retry 

Here, only the update of the modify data is atomic. The fetch required to obtain the address of the modify data is not part of the atomic sequence.

On OpenVMS I64 systems, the code sequence would be similar to the following:


        ld4     r16 = [r9] 
        sxt4    r16 = r16 
$L5:    ld4     r14 = [r16] 
        mov.m   apccv = r14 
        mov     r24 = r14 
        sxt4    r14 = r14 
        adds    r14 = 1, r14 
        cmpxchg4.acq r14, [r16] = r14 
        cmp.eq  pr0, pr8 = r24, r14 
(pr8)   br.cond.dpnt.few $L5 

Note

1 This manual has been archived but is available on the OpenVMS Documentation CD-ROM.


Previous Next Contents Index