From:	JARETH::MSHERLOCK    "Margie Sherlock" 15-OCT-1996 15:33:53.04
To:	GLENN
CC:	MS
Subj:	FAST_IO/FAST_PATH.txt

 
                                                                        1
        _________________________________________________________________

                           Options for Improving I/O Performance Features


              OpenVMS Alpha Version 7.0 includes two features for
              providing dramatically improved I/O performance: Fast
              I/O and Fast Path. These features are designed to promote
              OpenVMS as a leading platform for database systems.
              Performance improvement results from reducing the CPU cost
              per I/O request and improving symmetric multiprocessing
              (SMP) scaling of I/O operations. The CPU cost per I/O
              is reduced by optimizing code for high-volume I/O and by
              using better SMP CPU memory cache. SMP scaling of I/O is
              increased by reducing the number of spinlocks taken per I/O
              and by substituting finer-granularity spinlocks for global
              spinlocks.

              The improvements follow a natural division that already
              exists between the device-independent and device-dependent
              layers in the OpenVMS I/O subsystem. The device-independent
              overhead is addressed by Fast I/O, which is a set of lean
              system services that can substitute for certain $QIO
              operations. Using these services requires some coding
              changes in existing applications, but the changes are
              usually modest and well contained. The device-dependent
              overhead is addressed by Fast Path, which is an optional
              performance feature that creates a "fast path" to the
              device. It requires no application changes.

              Fast I/O and Fast Path can be used independently. However,
              together they can provide a 45% reduction in CPU cost
              per I/O on uniprocessor systems and a 52% reduction on
              multiprocessor systems.


                       Options for Improving I/O Performance Features 1-1

 
        Options for Improving I/O Performance Features
        1.1 Fast I/O


        1.1 Fast I/O

              Fast I/O is a set of three system services that were
              developed as a $QIO alternative built for speed. These
              services are not a $QIO replacement; $QIO is unchanged,
              and $QIO interoperation with these services is fully
              supported. Rather, the services substitute for a subset
              of $QIO operations, namely, only the high-volume read/write
              I/O requests.

              The Fast I/O services support 64-bit addresses for data
              transfers to and from disk and tape devices.

              While Fast I/O services are available on OpenVMS VAX,
              the performance advantage applies only to OpenVMS Alpha.
              OpenVMS VAX has a run-time library (RTL) compatibility
              package that translates the Fast I/O service requests to
              $QIO system service requests, so one set of source code can
              be used on both VAX and Alpha systems.

        1.1.1 Fast I/O Benefits

              The performance benefits of Fast I/O result from
              streamlining high-volume I/O requests. The Fast I/O system
              service interfaces are optimized to avoid the overhead of
              general-purpose services. For example, I/O request packets
              (IRPs) are now permanently allocated and used repeatedly
              for I/O rather than allocated and deallocated anew for each
              I/O.

              The greatest benefits stem from having user data buffers
              and user I/O status structures permanently locked down and
              mapped using system space. This allows Fast I/O to do the
              following:

              o  For direct I/O, avoid per-I/O buffer lockdown or
                 unlocking.

              o  For buffered I/O, avoid allocation and deallocation of a
                 separate system buffer, since the user buffer is always
                 addressable.

              o  Complete Fast I/O operations at IPL 8, thereby avoiding
                 the interrupt chaining usually required by the more
                 general-purpose $QIO system service. For each I/O, this
                 eliminates the IPL 4 IOPOST interrupt and a kernel AST.

        1-2 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                             1.1 Fast I/O


              In total, Fast I/O services eliminate four spinlock
              acquisitions per I/O (two for the MMG spinlock and two
              for the SCHED spinlock). The reduction in CPU cost per I/O
              is 20% for uniprocessor systems and 10% for multiprocessor
              systems.

        1.1.2 Using Buffer Objects

              The lockdown of user-process data structures is
              accomplished by buffer objects. A "buffer object" is
              process memory whose physical pages have been locked in
              memory and double-mapped into system space. After creating
              a buffer object, the process remains fully pageable and
              swappable and the process retains normal virtual memory
              access to its pages in the buffer object.

              If the buffer object contains process data structures to
              be passed to an OpenVMS system service, the OpenVMS system
              can use the buffer object to avoid any probing, lockdown,
              and unlocking overhead associated with these process data
              structures. Additionally, double-mapping into system space
              allows the OpenVMS system direct access to the process
              memory from system context.

              To date, only the $QIO system service and the Fast I/O
              services have been changed to accept buffer objects. For
              example, a buffer object allows a programmer to eliminate
              I/O memory management overhead. On each I/O, each page
              of a user data buffer is probed and then locked down on
              I/O initiation and unlocked on I/O completion. Instead of
              incurring this overhead for each I/O, it can be done once
              at buffer object creation time. Subsequent I/O operations
              involving the buffer object can completely avoid this
              memory management overhead.

              Two system services can be used to create and delete
              buffer objects, respectively, and can be called from any
              access mode. To create a buffer object, the $CREATE_BUFOBJ
              system service is called. This service expects as inputs an
              existing process memory range and returns a buffer handle
              for the buffer object. The buffer handle is an opaque
              identifier used to identify the buffer object on future
              I/O requests. The $DELETE_BUFOBJ system service is used to
              delete the buffer object and accepts as input the buffer
              handle. Although image rundown deletes all existing buffer

                       Options for Improving I/O Performance Features 1-3

 
        Options for Improving I/O Performance Features
        1.1 Fast I/O


              objects, it is good form for the application to clean up
              properly.

              A 64-bit equivalent version of the $CREATE_BUFOBJ system
              service ($CREATE_BUFOBJ_64) can be used to create buffer
              objects from the new 64-bit P2 or S2 regions. The $DELETE_
              BUFOBJ system service can be used to delete 32-bit or 64-
              bit buffer objects.

              Buffer objects require system management. Because buffer
              objects tie up physical memory, extensive use of buffer
              objects require system management planning. All the
              bytes of memory in the buffer object are deducted from
              a systemwide SYSGEN parameter called MAXBOBMEM (maximum
              buffer object memory). System managers must set this
              parameter correctly for the application loads that run
              on their systems.

              The MAXBOBMEM parameter defaults to 100 Alpha pages, but
              for applications with large buffer pools it will likely be
              set much larger. To prevent user-mode code from tying up
              excessive physical memory, user-mode callers of $CREATE_
              BUFOBJ must have a new system identifier, VMS$BUFFER_
              OBJECT_USER, assigned. This new identifier is automatically
              created in an OpenVMS Version 7.0 upgrade if the file
              SYS$SYSTEM:RIGHTSLIST.DAT is present. The system manager
              can assign this identifier with the DCL command SET ACL
              command to a protected subsystem or application that
              creates buffer objects from user mode. It may also be
              appropriate to grant the identifier to a particular user
              with the Authorize utility command GRANT/IDENTIFIER (for
              example, to a programmer who is working on a development
              system).

              There is currently a restriction on the type of process
              memory that can be used for buffer objects. Global section
              memory cannot be made into a buffer object.


        1-4 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                             1.1 Fast I/O


        1.1.3 Differences Between Fast I/O Services and $QIO

              The precise definition of high-volume I/O operations
              optimized by Fast I/O services is important. I/O that does
              not comply with this definition either is not possible
              with the Fast I/O services or is not optimized. The
              characteristics of the high-volume I/O optimized by Fast
              I/O services can be seen by contrasting the operation of
              Fast I/O system services to the $QIO system service as
              follows:

              o  The $QIO system service I/O status block (IOSB) is
                 replaced by an I/O status area (IOSA) that is larger and
                 quadword aligned. The transfer byte count returned in
                 IOSA is 64 bits, and the field is aligned on a quadword
                 boundary. Unlike the IOSB, which is optional, the IOSA
                 is required.

              o  User data buffers must be aligned to a 512-byte
                 boundary.

              o  All user process structures passed to the Fast I/O
                 system services must reside in buffer objects. This
                 includes the user data buffer and the IOSA.

              o  Only transfers that are multiples of 512 bytes are
                 supported.

              o  Only the following function codes are supported:
                 IO$_READVBLK, IO$_READLBLK, IO$_WRITEVBLK, and IO$_
                 WRITELBLK.

              o  Only I/O to disk and tape devices is optimized for
                 performance.

              o  No event flags are used with Fast I/O services. If
                 application code must use an event flag in relation
                 to a specific I/O, then the Event No Flag EFN (EFN$C_
                 ENF) can be used. This event flag is a no-overhead EFN
                 that can be used in situations when an EFN is required
                 by a system service interface but has no meaning to an
                 application.

                 For example, Fast I/O services do not use EFNs, so
                 the application cannot specify a valid EFN associated
                 with the I/O to the $SYNCH system service with which
                 to synchronize I/O completion. To resolve this issue,
                 the application can call the $SYNCH system service

                       Options for Improving I/O Performance Features 1-5

 
        Options for Improving I/O Performance Features
        1.1 Fast I/O


                 passing as arguments: EFN$C_ENF and the address of
                 the appropriate IOSA. Specifying EFN$C_ENF signifies
                 to $SYNCH that no EFN is involved in the synchronization
                 of the I/O. Once the IOSA has been written with a status
                 and byte count, return from the $SYNCH call occurs. The
                 IOSA is now the central point of synchronization for a
                 given Fast I/O (and is the only way to determine whether
                 the asynchronous I/O is complete).

              o  To minimize argument passing overhead to these services,
                 the $QIO parameters P3 through P6 are replaced by a
                 single argument that is passed directly by the Fast
                 I/O system services to device drivers. For disk-like
                 devices, this argument is the media address (VBN or LBN)
                 of the transfer. For drivers with complex parameters,
                 this argument is the address of a descriptor or of a
                 buffer specific to the device and function.

              o  Segmented transfers are supported by Fast I/O but are
                 not fully optimized. There are two major causes of
                 segmented transfers. The first is disk fragmenting.
                 While this can be an issue, it is assumed that sites
                 seeking maximum performance have eliminated the overhead
                 of segmenting I/O due to fragmentation.

                 A second cause of segmenting is issuing an I/O that
                 exceeds the port's maximum limit for a single transfer.
                 Transfers beyond the port maximum limit are segmented
                 into several smaller transfers. Some ports limit
                 transfers to 64K bytes. If the application limits
                 its transfers to less than 64K bytes, this type of
                 segmentation should not be a concern.

        1.1.4 Using Fast I/O Services

              The three Fast I/O system services are:

              o  $IO_SETUP-Sets up an I/O.

              o  $IO_PERFORM[W]-Performs an I/O request.

              o  $IO_CLEANUP-Cleans up an I/O request.


        1-6 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                             1.1 Fast I/O


        1.1.4.1 Using Fandles

              A key concept behind the operation of the Fast I/O services
              is the file handle or fandle. A fandle is an opaque token
              that represents a "setup" I/O. A fandle is needed for each
              I/O outstanding from a process.

              All possible setup, probing, and validation of arguments
              is performed off the mainline code path during application
              startup with calls to the $IO_SETUP system service. The I/O
              function, the AST address, the buffer object for the data
              buffer, and the IOSA buffer object are specified on input
              to $IO_SETUP service, and a fandle representing this setup
              is returned to the application.

              To perform an I/O, the $IO_PERFORM system service is
              called, specifying the fandle, the channel, the data buffer
              address, the IOSA address, the length of the transfer, and
              the media address (VBN or LBN) of the transfer.

              If the asynchronous version of this system service, $IO_
              PERFORM, is used to issue the I/O, then the application can
              wait for I/O completion using a $SYNCH specifying EFN$C_
              ENF and the appropriate IOSA. The synchronous form of the
              system service, $IO_PERFORMW, is used to issue an I/O and
              wait for it to complete. Optimum performance comes when the
              application uses AST completion; that is, the application
              does not issue an explicit wait for I/O completion.

              To clean up a fandle, the fandle can be passed to the $IO_
              CLEANUP system service.

        1.1.4.2 Modifying Existing Applications

              Modifying an application to use the Fast I/O services
              requires a few source-code changes. For example:

              1. A programmer adds code to create buffer objects for the
                 IOSAs and data buffers.

              2. The programmer changes the application to use the Fast
                 I/O services. Not all $QIOs need to be converted. Only
                 high-volume read/write I/O requests should be changed.

                 A simple example is a "database writer" program, which
                 writes modified pages back to the database. Suppose
                 the writer can handle up to 16 simultaneous writes. At
                 application startup, the programmer would add code to
                 create 16 fandles by 16 $IO_SETUP system service calls.

                       Options for Improving I/O Performance Features 1-7

 
        Options for Improving I/O Performance Features
        1.1 Fast I/O


              3. In the main processing loop within the database writer
                 program, the programmer replaces the $QIO calls with
                 $IO_PERFORM calls. Each $IO_PERFORM call uses one of
                 the 16 available fandles. While the I/O is in progress,
                 the selected fandle is unavailable for use with other
                 I/O requests. The database writer is probably using AST
                 completion and recycling fandle, data buffer, and IOSA
                 once the completion AST arrives.

                 If the database writer routine cannot return until
                 all dirty buffers are written (that is, it must wait
                 for all I/O completions), then $IO_PERFORMW can be
                 used. Alternatively $IO_PERFORM calls can be followed
                 by $SYNCH system service calls passing the EFN$C_ENF
                 argument to await I/O completions.

                 The database writer will run faster and scale better
                 because I/O requests now use less CPU time.

              4. When the application exits, an $IO_CLEANUP system
                 service call is done for each fandle returned by a prior
                 $IO_SETUP system service call. Then the buffer objects
                 are deleted. Image rundown performs fandle and buffer
                 object cleanup on behalf of the application, but it is
                 good form for the application to clean up properly.

        1.1.4.3 I/O Status Area (IOSA)

              The central point of synchronization for a given Fast I/O
              is its IOSA. The IOSA replaces the $QIO system service's
              IOSB argument. Larger than the IOSB argument, the byte
              count field in the IOSA is 64 bits and quadword aligned.
              Unlike the $QIO system service, Fast I/O services require
              the caller to supply an IOSA and require the IOSA to be
              part of a buffer object.

              The IOSA context field can be used in place of the $QIO
              system service ASTPRM argument. The $QIO ASTPRM argument
              is typically used to pass a pointer back to the application
              on the completion AST to locate the user context needed
              for resuming a stalled user-thread. However, for the $IO_
              PERFORM system service, the ASTPRM on the completion AST is
              always the IOSA. Since there is no user-settable ASTPRM, an
              application can store a pointer to the user thread context
              for this I/O in the IOSA context field and retrieve the
              pointer from the IOSA in the completion AST.

        1-8 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                             1.1 Fast I/O


        1.1.4.4 $IO_SETUP

              The $IO_SETUP system service performs the setup of an
              I/O and returns a unique identifier for this setup I/O,
              called a fandle, to be used on future I/Os. The $IO_SETUP
              arguments used to create a given fandle remain fixed
              throughout the life of the fandle. This has implications
              for the number of fandles needed in an application. For
              example, a single fandle can be used only for reads or
              only for writes. If an application module has up to 16
              simultaneous reads or writes pending, then potentially
              32 fandles are needed to avoid any $IO_SETUP calls during
              mainline processing.

              The $IO_SETUP system service supports an expedite flag,
              which is available to boost the priority of an I/O among
              the other I/O requests that have been handed off to the
              controller. Unrestrained use of this argument is useless,
              because if all I/O is expedited, nothing is expedited.
              Note that this flag requires the use of ALTPRI and PHY_IO
              privilege.

        1.1.4.5 $IO_PERFORM[W]

              The $IO_PERFORM[W] system service accepts a fandle and five
              other variable I/O parameters for the high-performance I/O
              operation. The fandle remains in use to the application
              until the $IO_PERFORMW returns or if $IO_PERFORM is used
              until a completion AST arrives.

              The CHAN argument to the fandle contains the data channel
              returned to the application by a previous file operation.
              This argument allows the application the flexibility
              of using the same fandle for different open files on
              successive I/Os. However, if the fandle is used repeatedly
              for the same file or channel, then an internal optimization
              with $IO_PERFORM is taken.

              Note that $IO_PERFORM was designed to have no more than six
              arguments to take advantage of the OpenMS Calling Standard,
              which specifies that calls with up to six arguments can be
              passed entirely in registers.


                       Options for Improving I/O Performance Features 1-9

 
        Options for Improving I/O Performance Features
        1.1 Fast I/O


        1.1.4.6 $IO_CLEANUP

              A fandle can be cleaned up by passing the fandle to the
              $IO_CLEANUP system service.

        1.1.4.7 Fast I/O FDT Routine (ACP_STD$FASTIO_BLOCK)

              Because $IO_PERFORM supports only four function codes,
              this system service does not use the generalized function
              decision table (FDT) dispatching that is contained in the
              $QIO system service. Instead, $IO_PERFORM uses a single
              vector in the driver dispatch table called DDT$PS_FAST_
              FDT for all the four supported functions. The DDT$PS_
              FAST_FDT field is a FDT routine vector that indicates
              whether the device driver called by $IO_PERFORM is set
              up to handle Fast I/O operations. A nonzero value for this
              field indicates that the device driver supports Fast I/O
              operations and that the I/O can be fully optimized.

              If the DDT$PS_FAST_FDT field is zero, then the driver is
              not set up to handle Fast I/O operations. The $IO_PERFORM
              system service tolerates such device drivers, but the I/O
              is only slightly optimized in this circumstance.

              The OpenVMS disk and tape drivers that ship as part of
              OpenVMS Version 7.0 have added the following line to their
              driver dispatch table (DDTAB) macro:

              FAST_FDT=ACP_STD$FASTIO_BLOCK,- ; Fast-IO FDT routine

              This line initializes the DDT$PS_FAST_FDT field to
              the address of the standard Fast I/O FDT routine, ACP_
              STD$FASTIO_BLOCK.

              If you have a disk or tape device driver that can handle
              Fast I/O operations, you can add this DDTAB macro line to
              your driver. If you cannot use the standard Fast I/O FDT
              routine, ACP_STD$FASTIO_BLOCK, you can develop your own
              based on the model presented in this routine.

        1.1.5 Additional Information

              For complete information about the following Fast I/O
              system services, see the <REFERENCE>(VMS_SYSROUT_R1) and
              <REFERENCE>(VMS_SYSROUT_R2).

                 $CREATE_BUFOBJ
                 $DELETE_BUFOBJ
                 $CREATE_BUFOBJ_64

        1-10 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                             1.1 Fast I/O


                 $IO_SETUP
                 $IO_PERFORM
                 $IO_CLEANUP

              To see a sample program that demonstrates the use of buffer
              objects and the Fast I/O system services, refer to the IO_
              PERFORM.C program in the SYS$EXAMPLES directory.

        1.2 Fast Path

              Fast Path is an optional, high-performance feature designed
              to improve I/O performance. By restructuring and optimizing
              class and port device driver code around high-volume
              I/O code paths, Fast Path creates a streamlined path to
              the device. Fast Path is of interest to any application
              where enhanced I/O performance is desirable. Two examples
              are database systems and real-time applications, where
              the speed of transferring data to disk is often a vital
              concern.

              Using Fast Path features does not require source-code
              changes. Minor interface changes are available for expert
              programmers who want to maximize Fast Path benefits.

              In OpenVMS Alpha Version 7.0, Fast Path only supports disk
              I/O for the CIXCD port. This port provides access to CI
              storage for XMI based systems.

              Fast Path is not currently available on the OpenVMS VAX
              operating system.

        1.2.1 Fast Path Features and Benefits

              Fast Path achieves dramatic performance gains by reducing
              CPU time for I/O requests on both uniprocessor and SMP
              systems. These savings are on the order of 25% less CPU
              cost per I/O request on a uniprocessor and 35% less on
              a multiprocessor system. The performance benefits are
              produced by:

              o  Reducing code paths through streamlining for the case of
                 high-volume I/O

              o  Substituting port-specific spinlocks for global I/O
                 subsystem spinlocks

              o  Affinitizing an I/O request for a given port to a
                 specific CPU

                      Options for Improving I/O Performance Features 1-11

 
        Options for Improving I/O Performance Features
        1.2 Fast Path


              The performance improvement can best be seen by contrasting
              the current OpenVMS I/O scheme to the new Fast Path scheme.
              While transparent to an OpenVMS user, each disk and tape
              device is tied to a specific port interconnect. All I/O
              for a device is sent out over its assigned port. Under
              the current OpenVMS I/O scheme, a multiprocessor I/O can
              be initiated on any CPU, but I/O completion must occur
              on the primary CPU. Under Fast Path, all I/O for a given
              port is affinitized to a specific CPU, eliminating the
              requirement for completing the I/O on the primary CPU. This
              means that the entire I/O can be initiated and completed
              on a single CPU. Because I/O operations are no longer split
              among different CPUs, performance increases as memory cache
              thrashing between CPUs decreases.

              Fast Path also removes a possible SMP bottleneck on the
              primary CPU. If the primary CPU must be involved in all
              I/O, then once this CPU becomes saturated, no further
              increase in I/O throughput is possible. Spreading the I/O
              load evenly among CPUs in a multiprocessor system provides
              greater maximum I/O throughput on a multiprocessor system.

              With most of the I/O code path executing under port-
              specific spinlocks and with each port assigned to a
              specific CPU, a scalable SMP model of parallel operation
              exists. Given multiple port and CPUs, I/O can be issued in
              parallel to a large degree.

        1.2.2 Using Fast Path

              This section describes how to use the FAST_PATH SYSGEN
              parameter to use Fast Path.

              FAST_PATH
              FAST_PATH is a SYSGEN parameter that enables (1) or
              disables (0) Fast Path performance features for all Fast
              Path capable ports. Fast Path is disabled by default.

              Preferred CPU
              Each Fast Path capable port is affinitized to a specific
              CPU called the preferred CPU. All I/O for all devices
              serviced by this port initiates and completes on the
              preferred CPU.


        1-12 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                            1.2 Fast Path


              Processes issuing I/O to a port on the port's preferred
              CPU have an inherent advantage in that the overhead to
              affinitize the I/O to the preferred CPU can be avoided. An
              application process can use the $PROCESS_AFFINITY system
              service to affinitize itself to the preferred CPU of the
              device to which the majority of its I/O is sent. With
              proper attention to affinity, a process's execution need
              never leave the preferred CPU. This presents a scalable
              process and I/O scheme for maximizing multiprocessor system
              operation. Like most RISC systems, Alpha system performance
              is highly dependent on the performance of CPU memory
              caches. Process affinity and preferred CPU affinity are two
              keys to minimizing the memory stalls in the application and
              in the operating system, thereby maximizing multiprocessor
              system throughput.

              IO_PREFER_CPUS
              IO_PREFER_CPUS is a CPU bit mask that controls the initial
              assignment of Fast Path capable ports to CPUs. Assigning
              a Fast Path port to a CPU means that the CPU cannot be
              stopped with the STOP/CPU command. If you want to preserve
              the ability to stop certain CPUs even when Fast Path is
              enabled, use IO_PREFER_CPUS. IO_PREFER_CPUS specifies
              the CPUs that can serve as preferred CPUs and that can
              be assigned a Fast Path port by the default assignment
              algorithm. CPUs whose bit is clear in the IO_PREFER_CPUS
              bit mask are not assigned a Fast Path port and can be
              stopped. IO_PREFER_CPUS defaults to -1, which specifies
              that all CPUs are able to be assigned Fast Path ports.

              The initial assignment spreads Fast Path ports evenly among
              available CPUs in a round-robin fashion, making sure that
              the primary CPU is the last CPU to receive a port. The
              primary CPU is slightly offloaded because it might be busy
              processing non-Fast Path I/O.

              $QIO IO$_SETPRFPATH ! IO$M_PREFERRED_CPU
              You can change the assignment of a Fast Path port to a CPU
              by issuing a $QIO IO$_SETPRFPATH (Set Preferred Path) to
              the port device, for example, PNA0. The IO$M_PREFERRED_
              CPU modifier must be set, and the $QIO argument P1 must
              be set to a 32-bit CPU bit mask with a bit set indicating
              the new preferred CPU. On return from the I/O, the port
              and its associated devices are all affinitized to a new

                      Options for Improving I/O Performance Features 1-13

 
        Options for Improving I/O Performance Features
        1.2 Fast Path


              preferred CPU. Note that explicitly setting the preferred
              CPU overrides any default assignment of Fast Path ports
              to CPUs. This interface allows you the flexibility to load
              balance I/O activity over multiple CPUs in an SMP system.
              This is important because I/O activity can change over the
              course of a day or week.

              $GETDVI DVI$_PREFERRED_CPU & SDA SHOW DEVICE
              For an application seeking optimal Fast Path benefits, you
              can code each application process to run on the preferred
              CPU where the majority of the process's I/O activity
              occurs. To identify the preferred CPU for any Fast Path-
              capable device when Fast Path is enabled, use the SDA
              command SHOW DEVICE to display the preferred CPU ID for
              a port or disk device.

              Alternatively, the $GETDVI system service or the DCL
              F$GETDVI lexical function will return the preferred CPU
              for a given device or file. The $GETDVI system service
              item code is DVI$_PREFERRED_CPU and the F$GETDVI item
              code string argument is PREFERRED_CPU. The return argument
              is a 32-bit CPU bit mask with a bit set indicating the
              preferred CPU. A return argument containing a bit mask
              of zero indicates that no preferred CPU exists, either
              because Fast Path is disabled or the device is not a Fast
              Path capable device. The return argument is designed to
              serve as a CPU bit mask input argument to the $PROCESS_
              AFFINITY system service that can be used to affinitize an
              application process to the optimal preferred CPU.

              A high-availability feature of VMSclusters is that dual-
              pathed devices automatically fail over to a secondary
              path, if the primary path becomes inoperable. Because a
              Fast Path device could fail over to another path or port,
              and thereby, to another preferred CPU, an application can
              occasionally reissue the $GETDVI in a timer thread to check
              that process affinity is optimal.


        1-14 Options for Improving I/O Performance Features

 
                           Options for Improving I/O Performance Features
                                                            1.2 Fast Path


        1.2.3 Fast Path Restrictions

              Fast Path restrictions include the following:

              o  Only high-volume I/Os are optimized.

                 Fast Path streamlines the operation of high-volume
                 I/O. I/O that does not meet the definition of high-
                 volume is not optimized. A high-volume Fast Path I/O is
                 characterized as follows:

                 1. A virtual, logical, or physical read or write I/O
                    without special I/O modifiers.

                 2. An I/O request that is less than 64K bytes in size.

                 3. An I/O issued when all I/O resources exist that
                    needed to perform the I/O.

              o  Send credits resource must be managed.

                 Applications seeking maximum performance must ensure the
                 availability of sufficient I/O resources.

                 The only I/O resource that a Fast Path user needs
                 to be concerned about is send credits. Send credits
                 are extended by DSA controllers to host systems
                 and represent the maximum number of I/Os that can
                 be outstanding at any given point in time. If an
                 application sends an unlimited number of simultaneous
                 I/Os to a controller, it is likely that some I/O will
                 back up waiting for send credits. You can tell whether
                 the send-credit limit is being exceeded by using the
                 DCL command SHOW CLUSTER/CONTINUOUS, followed by an ADD
                 CONNECTIONS, CR_WAIT command. Rapidly increasing credit-
                 wait counts for the disk-class driver connections (a
                 LOC_PROC_NAME name of VMS$DISK_CL_DRVR) is a sign that
                 an application may be incurring send-credit waits.

                 To ensure sufficient send credits, some controllers,
                 like the HSC and HSJ, allow the number of send credits
                 to vary. However, not all controllers have this
                 flexibility, and different controllers have different
                 send-credit limits. The best workaround is to know your
                 application access patterns and look for send credit
                 waits. If the number of send credits is being exhausted
                 on one node, then add another controller to spread
                 the load over multiple controllers. An alternative is
                 to rework the application to load balance controller

                      Options for Improving I/O Performance Features 1-15

 
        Options for Improving I/O Performance Features
        1.2 Fast Path


                 activity throughout the cluster, spreading a given
                 controller's disk load over multiple nodes and allowing
                 an application to exceed the send credits allotted to
                 one node. y>


        1-16 Options for Improving I/O Performance Features