HP OpenVMS MACRO Compiler Porting and User's Guide

HP OpenVMS MACRO Compiler
Porting and User's Guide

Contents

Index

Note

If the code is being locked because the IPL will be raised above 2, where page faults cannot occur, make sure that the delimited code does not call run-time library routines or other procedures. The VAX MACRO compiler generates calls to routines to emulate certain VAX instructions. An image that uses these macros must link against the system base image so that references to these routines are resolved by code in a nonpageable executive image.

Image Initalization-Time Lockdown

For image initialization-time lockdown, three macros are used:

$LOCKED_PAGE_START
$LOCKED_PAGE_END
$LOCK_PAGE_INIT

The macros $LOCKED_PAGE_START and $LOCKED_PAGE_END mark the beginning and end of a code segment which may be locked. The code delineated by these macros must contain complete routines---execution cannot fall through either macro, nor can the locked code be branched into or out of. Any attempt to branch into or out of the locked code section, or to fall through the macros will be flagged by the compiler with the following error message:

%AMAC-E-MULTLKSEC, Routines which share code must use the same linkage psect.

$LOCKED_PAGE_END has an optional parameter, LINK_SECT, which is used to specify the linkage psect to return to after the routine is executed. It is only used if the linkage psect in effect when the $LOCKED_PAGE_START macro was executed was not the default linkage psect, $LINKAGE.

The macro $LOCK_PAGE_INIT must be executed in the initialization routines of an image which is using $LOCKED_PAGE_START and $LOCKED_PAGE_END to delineate areas to be locked. It creates the necessary psects and issues the $LKWSET calls to lock the code and linkage sections into the working set. R0 and R1 are destroyed by this macro.

$LOCK_PAGE_INIT has an optional parameter, ERROR, which is an error address to which to branch if one of the $LKWSET calls fail. If this address is reached, R0 reflects the status of the failed call, and R1 contains 0 if the call to lock the code failed, or 1 if that call succeeded but the call to lock the linkage section failed.

Note that since psects are used to identify code to be locked, the $LOCK_PAGE_INIT macro need not be in the same module as the code delineated by the $LOCKED_PAGE_START and $LOCKED_PAGE_END macros. The invocation of $LOCK_PAGE_INIT locks all delineated code in the entire image.

Table 3-1 shows the code changes necessary for using these macros. The delineating labels are replaced by the $LOCKED_PAGE_START and $LOCKED_PAGE_END macros. The descriptor is eliminated, and the $LKWSET call in the initialization code is replaced by $LOCK_PAGE_INIT.

Table 3-1 Image Initialization-Time Lockdown
Code Section On OpenVMS VAX Systems On OpenVMS Alpha Systems

Data declaration
LOCK_DESCRIPTOR:
.ADDRESS LOCK_START
.ADDRESS LOCK_END

Nothing. Eliminate the descriptor altogether.

Initialization
$LKWSET_S LOCK_DESCRIPTOR
BLBC R0,ERROR

$LOCK_PAGE_INIT ERROR

Main code
LOCK_START:
Routine_A:
.
.
.
RSB
LOCK_END:

$LOCKED_PAGE_START
Routine_A:
.
.
.
RSB
$LOCKED_PAGE_END

**Table 3-1 Image Initialization-Time Lockdown**
Code Section	On OpenVMS VAX Systems	On OpenVMS Alpha Systems
Data declaration	LOCK_DESCRIPTOR: .ADDRESS LOCK_START .ADDRESS LOCK_END	Nothing. Eliminate the descriptor altogether.
Initialization	$LKWSET_S LOCK_DESCRIPTOR BLBC R0,ERROR	$LOCK_PAGE_INIT ERROR
Main code	LOCK_START: Routine_A: . . . RSB LOCK_END:	$LOCKED_PAGE_START Routine_A: . . . RSB $LOCKED_PAGE_END

Locking Code Written in Other Languages

Code written in other programming languages can also be locked down by using the $LOCK_PAGE_INIT macro in a VAX MACRO module. Any code in any module written in any language will be locked by this macro if the psect $LOCK_PAGE_2 is used for the generated code and the psect $LOCK_LINKAGE_2 is used for the generated linkage section.

On-the-Fly Lockdown

For on-the-fly lockdown, $LOCK_PAGE and $UNLOCK_PAGE, respectively, mark the beginning and end of a section of code to be locked. The marked code becomes a separate routine in the locked psect, where all code locked anywhere in the image is placed.

$LOCK_PAGE locks the pages and linkage section of the locked routine into the working set and JSRs to it. This macro is placed inline in executable code. All code between this macro and the matching $UNLOCK_PAGE macro is included in the locked routine and is locked down.

$UNLOCK_PAGE returns from the locked routine and then unlocks the pages and linkage section from the working set. The macro is placed inline in executable code at some point after a $LOCK_PAGE macro.

$LOCK_PAGE and $UNLOCK_PAGE both have an optional parameter, ERROR, which is an error address to which to branch if the $LKWSET or $ULWSET calls fail. $UNLOCK_PAGE has a second optional parameter, LINK_SECT. LINK_SECT is a linkage psect to which to return if the linkage psect in effect when the $LOCK_PAGE macro was executed was not the default linkage psect, $LINKAGE.

All registers are preserved by both macros unless the error address parameter is present and one of the calls fail, in which case R0 reflects the status of the failed call. R1 then contains 0 if the call to lock or unlock the code failed, and 1 if that call succeeded but the call to lock or unlock the linkage section failed.

Control must enter the code through the $LOCK_PAGE macro, and must leave through the $UNLOCK_PAGE macro. The local symbol block that is in effect when the $LOCK_PAGE macro is executed is restored when the $UNLOCK_PAGE macro is executed, but since the locked code becomes a separate routine, the locked code itself is a separate local symbol block. Even if named symbols are used, branches into or out of the locked code section are not allowed, and will be flagged by the compiler with the following error:

%AMAC-E-MULTLKSEC, Routines which share code must use the same linkage psect.

Note that since the locked code is made into a separate routine, any references to local stack storage within the routine will have to be changed, as the stack context is no longer the same.

Note

Because on-the-fly lockdown requires the overhead of four system service calls plus an extra subroutine call every time it is executed, it is recommended that this be changed to initialization-time lockdown if the lockdown is done for any performance-critical code. If other routines in the image use initialization-time lockdown, then you must change the on-the-fly lockdown to initialization-time lockdown.

Table 3-2 shows the code changes required to use these macros for on-the-fly lockdown. Note that the $UNLOCK_PAGE macro precedes the RSB, so that it is executed. Any status being passed by the routine in R0 and R1 remains intact because $UNLOCK_PAGE preserves these registers.

Table 3-2 On-the-Fly Lockdown
Code Section On VAX Systems On Alpha Systems

Main code
Routine_A:
.
.
.
SETIPL 100$
.
.
.
RSB
100$: .LONG IPL$SYNCH

Routine_A:
.JSB_ENTRY
.
.
.
$LOCK_PAGE
.
.
.
$UNLOCK_PAGE
RSB

**Table 3-2 On-the-Fly Lockdown**
Code Section	On VAX Systems	On Alpha Systems
Main code	Routine_A: . . . SETIPL 100$ . . . RSB 100$: .LONG IPL$SYNCH	Routine_A: .JSB_ENTRY . . . $LOCK_PAGE . . . $UNLOCK_PAGE RSB

Table 3-3 shows the same original code and the changes necessary for initialization-time lockdown.

Table 3-3 Image Initialization-Time Lockdown with the Same Code
Code Section On VAX Systems On Alpha Systems

Initialization

Nothing.
$LOCK_PAGE_INIT

Main code
Routine_A:
.
.
.
SETIPL 100$
.
.
.
RSB
100$: .LONG IPL$SYNCH

$LOCKED_PAGE_START
Routine_A:
.JSB_ENTRY
.
.
.
RSB
$LOCKED_PAGE_END

**Table 3-3 Image Initialization-Time Lockdown with the Same Code**
Code Section	On VAX Systems	On Alpha Systems
Initialization	Nothing.	$LOCK_PAGE_INIT
Main code	Routine_A: . . . SETIPL 100$ . . . RSB 100$: .LONG IPL$SYNCH	$LOCKED_PAGE_START Routine_A: .JSB_ENTRY . . . RSB $LOCKED_PAGE_END

3.11 Synchronization

The following information about synchronization is relevant when porting code from OpenVMS VAX to OpenVMS Alpha or OpenVMS I64 systems:

Code that issues longword operations to aligned longwords in memory continues to work on Alpha and Itanium systems without additional synchronization required. This is architecturally guaranteed.
The Alpha and Itanium architecture extends this guarantee to include quadword operations to aligned quadwords in memory. However, this is not backwards-compatible to VAX systems. Only Alpha or Itanium code can depend on this feature.
Interlocked instructions (BBSSI, BBCCI, and ADAWI) still work. However, keep the following in mind when you use them:
- When compiling these instructions, the MACRO compiler provides memory barrier functionality implicitly.
- These instructions assume a byte granularity environment. If the data segment on which these instructions operate can be concurrently written by different threads, you may need to impose additional synchronization of the data segment using the MACRO compiler's PRESERVE feature.
- Another way to address the byte granularity problem and achieve greater performance at the same time is to restructure the data segments to be unpacked. That is, the bit that is changed by BBSSI or BBCCI, or the word that is modified by ADAWI, should reside in a longword where the other portions of the longword are not modified by an independent and concurrent instruction thread.
  Further separation of the data in question, such that independent and concurrent access to any location in the aligned 128-byte lock range that contains the data is not occurring, will result in additional performance gains on many Alpha implementations of the Load-locked/Store-conditional instructions.
The VAX interlocked queue instructions work unchanged on OpenVMS Alpha or OpenVMS I64 systems and result in the PALcode equivalents being called which incorporate the necessary interlocks and memory barriers.
Note that the noninterlocked queue instructions are also compiled to their PALcode equivalents and that they are still atomic on a single processor.

The VAX synchronization tools work unchanged on OpenVMS Alpha and OpenVMS I64. All of the following mechanisms use interlocked instructions directly or indirectly for synchronization (the interlocked instructions that are used provide memory barriers transparently):

Event flags---all of the system services that manipulate them
Spin locks---all of the acquisition and release operators (LOCK and UNLOCK, FORKLOCK and FORKUNLOCK, DEVICELOCK and DEVICEUNLOCK)
Mutexes---protected by spin locks

Lock manager---protected by spin locks

Note

This synchronization guarantee is only true for multiprocessing systems. The uniprocessing version of spin locks does not use interlocked instructions. As a result, memory barriers are not provided in uniprocessor spin lock, mutex, and lock manager synchronization.

Regarding ASTs, concurrent threads and atomicity, one must either redesign the code or force atomicity using features provided by the compilers. The MACRO compiler provides the PRESERVE feature.

On OpenVMS Alpha, code that modifies exception handlers may require changes if it is possible for an outstanding arithmetic trap or a machine abort or both to occur asynchronously. The TRAPB instruction provides the synchronization mechanisms that are required. If you want to force synchronization when changing handlers, you must manually add these to your program as shown in the following example:

ADDL3 R1, R2, 4(R3) ; Save total EVAX_TRAPB ; Insure any arithmetic traps handled by ; existing handler MOVAB HANDLER2, 0(FP) ; Enable new condition handler

When writing OpenVMS Alpha assembly language code, make sure that you understand the read/write ordering of the Alpha architecture. Encode MB instructions where necessary.

Chapter 4
Improving the Performance of Ported Code

This chapter describes how you can improve the performance of your ported code.

This chapter contains the following topics:

4.1 Aligning Data

An unaligned data reference will work but will be slow on OpenVMS Alpha or OpenVMS I64, because the system must take an unaligned address fault to complete the unaligned reference. If it is known that a data reference is unaligned, the compiler can generate unaligned quadword loads and masks to manually extract the data. This is slower than an aligned load but much faster than taking an alignment fault. Global data labels that are not longword or quadword aligned are flagged with information-level messages.

In addition, unaligned memory modification references cannot be made atomic with /PRESERVE=ATOMICITY or .PRESERVE ATOMICITY. If this is attempted, it will cause a fatal reserved operand fault.

4.1.1 Alignment Assumptions

By default, the compiler assumes the following:

Addresses in registers used as base pointers are longword aligned at routine entry.
External references are longword aligned.
Addresses that resulted from certain types of instructions, such as DIVL, are assumed unaligned.

Every time a register is changed, the compiler determines whether the base address in the register is still aligned. If the register and specified offset result in an aligned address, the compiler uses an aligned load or store for a memory reference. The compiler attempts to track register usage in terms of whether the base address remains aligned. When a stored memory address is loaded, for instance, MOVL 4(R1),R0, or used indirectly for instance, MOVL @4(R1),R0, the compiler assumes the resulting address is aligned.

For quadword memory references such as MOVQ instructions, the compiler assumes the base address is quadword aligned, unless it has determined by means of its register tracking code that the address may not be longword aligned. In other words, quadword register alignment is not tracked---only longword alignment.

Quadword references in OpenVMS Alpha or OpenVMS I64 built-ins, such as those in the following example, will be in new code, where alignment should be correct. Therefore all memory references in the following example will use aligned quadword load/stores:

EVAX_LDQ R1, (R2) EVAX_ADDQ (R1), #1, (R3)

If an OpenVMS Alpha or OpenVMS I64 built-in (other than EVAX_LDQU or EVAX_STQU) is used on an address that is not quadword aligned, an alignment fault will occur at run time.

4.1.2 Directives and Qualifier for Changing Alignment Assumptions

The compiler provides two directives and one qualifier for changing the compiler's alignment assumptions. Both directives enable the compiler to produce more efficient code. The .SET_REGISTERS directive allows you to specify whether a register is aligned or unaligned. This directive should be used when the result of an operation is the reverse of what the compiler expects. It also allows you to declare registers that the compiler would not otherwise detect as input or output registers.

The .SYMBOL_ALIGNMENT directive allows you to specify the alignment of any memory reference that uses a symbolic offset. This directive should be used when you know the data will be aligned for every use of the symbolic offset.

These directives are described in Appendix B. The examples in each description show how to use them.

The /UNALIGN qualifier to the MACRO/MIGRATION command tells the compiler to assume unaligned all the time for all register-based memory references rather than try to track the alignment. This does not affect stack-based or static references where the compiler knows the alignment.

This qualifier is described in Appendix A.

4.1.3 Precedence of Alignment Controls

The order of precedence of the compiler's alignment controls, from strongest (.SYMBOL_ALIGNMENT) to weakest (built-in assumptions and tracking mechanisms), follows:

.SYMBOL_ALIGNMENT directive
.SET_REGISTER directive
/UNALIGN qualifier
Built-in assumptions and tracking mechanisms

4.1.4 Recommendations for Aligning Data

The following recommendations are provided for aligning data:

If references to the data must be made atomic with /PRESERVE=ATOMICITY or .PRESERVE ATOMICITY, the data must be aligned.
Do not fix alignment problems in public interfaces; this could break existing programs.
For data in internal or privileged interfaces, do not automatically make changes to improve data alignment. You should consider the frequency with which the data structure is accessed, the amount of work involved in realigning the structure, and the risk that things might go wrong. In judging the amount of work involved, make sure you know all accesses to the data; do not merely guess. If you own all accesses in the code for which you are responsible and if you are making changes in the module (or modules) anyway, then it is safe to fix the alignment problem.
Do not routinely unpack byte and word data into longwords or quadwords. The time to do this is when you are fixing an alignment problem (word not on word boundary), subject to the aforementioned cautions and constraints, or if you know the data granularity is a problem.
If you do not own all the accesses to the data, there still may be circumstances under which fixing alignment is appropriate. If the data is frequently accessed, if performance is a real issue, and if you must unavoidably scramble the data structure anyway, it makes sense to align the structure at the same time.
It is important that you notify other programmers whose code may be affected. Do not assume in such cases that all related modules will recompile or that program documentation will help others detect errant data cell separation assumptions. Always assume that changes like this will reveal irregular programming practices and not go smoothly.

4.2 Code Flow and Branch Prediction

The Alpha and Itanium architectures are pipelined, which means that before completing the current instruction, they start to execute several instructions beyond it. By tailoring the code to keep the pipeline filled, you can make the code run significantly faster.

On each conditional branch, the Alpha and Itanium architectures attempt to predict whether or not the branch is taken so that they can correctly fill the instruction pipeline with the next instruction to be executed. The Alpha architecture predicts that forward conditional branches will not be taken and backward conditional branches will be taken. The Itanium architecture has branch-prediction hints in the branch instructions. A mispredicted branch costs extra time because the pipeline must be flushed, and, in addition, the instruction at the branch destination may not be in the instruction cache.

The compiler tries to follow the flow of the VAX MACRO code to generate Alpha and Itanium code that has the most common code path in a contiguous block, to allow the pipelined Alpha and Itanium architecture to process the code with the greatest efficiency. However, in some situations, the compiler's default rules do not generate the most efficient code. In performance sensitive code sections, you can often improve the efficiency of the generated code by giving the compiler information about which code paths will most likely be taken.

Contents

Index

HP OpenVMS MACRO Compiler Porting and User's Guide

3.11 Synchronization

Chapter 4Improving the Performance of Ported Code

4.1 Aligning Data

4.1.1 Alignment Assumptions

4.1.2 Directives and Qualifier for Changing Alignment Assumptions

4.1.3 Precedence of Alignment Controls

4.1.4 Recommendations for Aligning Data

4.2 Code Flow and Branch Prediction

HP OpenVMS MACRO Compiler
Porting and User's Guide

Chapter 4
Improving the Performance of Ported Code