$set noon $write sys$output "Ignore the error about no command above." $document/noprint 'f$environment("procedure")' report ps $exit (VMS Project Plan for\ Multipath Failover<line><line> DIGITAL CONFIDENTIAL) <REVISION_INFO>(X1.0.2) <AUTHOR>(Glenn C. Everhart) <DATE> <ABSTRACT>(Abstract) This project plan describes the plan for implementing failover between multiple paths to devices where more than one path is presented by hardware (or, later, multiple servers). <ENDABSTRACT> <HEAD2>(Review Team) <p> Pat St. Laurent Tom Coughlan Lenny Szubowicz John Croll <ENDTITLE_PAGE> <FINAL_CLEANUP>(PAGE_BREAK) <SUBHEAD1>(Change History) <LIST>(NUMBERED) <LE>X001--10-Jan-1997--Initial Draft <ENDLIST> <COMMENT> This template contains the SDML coding necessary to produce a PROJECT PLAN using VAX DOCUMENT 1.0 or later. Follow the comments in this file as a guide to entering text. To process the file, use: $ DOCUMENT filename REPORT device-keyword or, to get a table of contents, $ DOCUMENT/CONTENTS filename REPORT device-keyword "device-keyword" should be: MAIL to produce a .txt file POST to get a PostScript file LN03 to get a file formatted for the LN03 TERM will give you output you can read on your terminal screen, useful for checking to see if the plan looks like you want. You can refer to the VAX DOCUMENT user manuals for additional information on command line qualifiers and SDML tags. You should give hardcopies of the drafts of the plan to your active reviewers; other reviewers can be pointed to on-line copies. The drafts of the plan and the final plan should be copied to the directory in which other documents concerning the projects are kept. If no such project directory exists, put the plan in STAR::DOCD$:[ANALYSIS_REVIEW.PROJECTS], and refer reviewers to this pathname. <ENDCOMMENT> <ENDFRONT_MATTER> <RUNNING_TITLE>(Multipath Failover Project Plan\Digital Confidential) <HEAD1>(Overview) <P> The multipath failover project exists to permit failover from one path to a device to another, regardless of the underlying device drivers, for disk I/O, when the disks enter mount verification and the path appears to have been lost (or when the equivalent happens to a shadowset member). Rather than insert code into many drivers, this is to be handled by adding some I/O switching processing logically into the beginning of existing drivers. <COMMENT> Briefly describe the project, stating what its main changes or problematic areas are. You can raise risks or dependencies here briefly. <ENDCOMMENT> <HEAD1>(Business Justification) <P> There are three reasons the work must be done. <list>(numbered) <le>OpenVMS will be connecting to a number of controller types in the near future which provide more than one path to storage. These include the HSZ4x/5x/7x systems, and also fibre channel connectors, which will generally have two loops and thus will in general have two paths to every device. It is essential that these be usable, so that customers will gain the expected fault tolerance, yet at present this cannot be obtained. <le>Users of shared SCSI busses expect that they can use a direct or a served path to get to a disk, and that their systems can therefore tolerate faults in access. This is currently not true and is causing customer complaints. <le>When QIOserver becomes functional, failover between it, direct paths, and MSCP served paths will be required, as will means to let the rest of the system use only one invariant name for all storage and use only one path at a time. <endlist> <p> The only path failover in clusters currently is on disks using MSCP protocols or for the MSCP server. The failover code in these areas depends critically on the ability built into the MSCP protocol to do switching, an ability not shared by present or contemplated SCSI systems. <COMMENT> Describe the major areas of change. Give a brief assessment of the most important areas to investigate for performance impact. Also, list where the project's on-line plans, functional specs, design specs and schedules can be found. If there is no directory for this purpose, state this. <ENDCOMMENT> Project information for multipath failover may be found at: <list>(unnumbered) <LE>DS-MULTIPATH-SWITCH.WRITE (High level design document) <LE>DS-MULTIPATH-FAILOVER.SDML <LE>IR-MULTIPATH-FAILOVER.WRITE <le>PP-MULTIPATH-FAILOVER.SDML <LE>Pointers to any other documents (schedules?) <ENDLIST> <head1>(Staffing) <p> The project will need one engineer (Glenn Everhart). In addition, testing support from QTV and within the SCSI group will be needed for 6 person weeks each. <head1>(Schedule Milestones) <table>(Schedule, All Work Sequential) <table_setup>(2\12) <table_row>(12/31/1996\1st draft of design specification for review) <table_row>(1/25/1997\1st review complete (Holding this with Tom and Lenny and John Croll 1/21/1997; your comments need to be factored in also.)) <table_row>(2/1/1997\2nd draft of design specification for review (This will be a second 4-day limited review.)) <table_row>(2/14/1997\2nd review complete) <table_row>(2/14/1997\Distribute design specification for group review. (I'd suggest sending it to OPENVMS_BASE_IO and BASE_OS_CLUSTER, at least.)) <table_row>(3/4/1997\Coding starts on moving existing function level to fully general form (n paths)) <table_row>(4/2/1997\Coding (but not debug or test) of the driver should be done.) <table_row>(4/3/1997\Coding of system support routines begin.) <table_row>(4/10/1997\Coding of most essential support routines (search, UCB hiding, postprocessing hooks, DK "reqcom bug" fixes, dk/du "look for other path" code done. (Not all of support but the most vital part for early testing.))) <table_row>(4/11/1997\Start coding on control program for manual setup.) <table_row>(4/18/1997\Control program coded enough to do manual setup of sets of paths or issue failover requests.) <table_row>(4/21/1997\ Unit Testing together of parts built so far begins.) <table_row>(5/20/1997\Unit test of driver done. (Partial unit test of other code necessarily done also.) At this point, switching functionality exists in barest form.) <table_row>(5/21/1997\Coding of remaining exec modifications begins) <table_row>(5/28/1997\Coding of exec mods done; coding of remainder of control & display program starts.) <table_row>(6/4/1997\Control program coded, built.) <table_row>(6/5/1997\Server coding begun (keepalives, switch on HSC signal, initial scan of devices, recognizing pairs)) <table_row>(6/19/1997\Server coded & unit tested. New functionality completed (as outlined in project plan and in that order.) Some unit testing runs concurrently (asking Bill Clogher for a trial or three)) <table_row>(6/19/1996\Unit testing, whatever else is needed) <table_row>(7/21/1997\Code inspection and unit testing complete) <table_row>(7/28/1997\Finish update of design document to reflect actual code) <table_row>(7/29/1997\Send out final design document for comment and possibly schedule review if changes have been unexpectedly extensive.) <table_row>(7/29/1997\Distribute code to other groups for testing also.) <table_row>(8/1/97\Testing complete) <endtable> <p> Note: This is rather ambitious and does presume no serious glitches will occur. The "server" bullets noted include relevant code in other modules which will need to handle "the other end" of the communications which will go on. The functioning of the testbed code suggests this is not as far fetched as it may often be, but unknown issues are after all unknown. <p> The scheme to have partial function implemented early however is intended to provide assurance that if it becomes necessary to trim functions, that will still leave at least basic functionality usable. <p> Should the driver be written in parallel with the other code in the project the job is split roughly into two halves. The schedule bullets in that case look like this: <table>(Schedule, Switching Driver Only) <table_setup>(2\12) <table_row>(3/4/1997\Test plan drafted and reviewed. Coding starts on driver (and should begin on the rest also)) <table_row>(4/2/1997\Coding (but not debug or test) of the driver should be done.) <table_row>(4/3/1997\Unit testing of driver begins; REQUIRES that the control program be able to set up path pairs manually and request switching manually by this time.) <table_row>(5/6/1997\Unit test of driver done. (Partial unit test of other code necessarily done also.) At this point, switching functionality exists in barest form.) <table_row>(5/6/1997\Schedule final code inspection & review) <table_row>(6/2/1997\Code inspection and unit testing complete) <table_row>(6/15/1997\Finish update of design document to reflect actual code) <table_row>(6/15/1997\Send out final design document for comment and possibly schedule review if changes have been unexpectedly extensive.) <table_row>(6/2/1997\Distribute code to other groups for testing also.) <table_row>(7/3/97\Testing complete) <endtable> <HEAD1>(Schedule Discussion) <P> A two path implementation of switchover exists already, written largely in Macro-32, which provided basic switchover and support for manual switchover. The driver has around 4500 lines of code. The final code has however been directed to be written in C. The Macro version however will be used as a guide to reduce design errors. <p> Time estimates for components of the project (at 50 lines/day for C, 100 lines/day for Macro32): <list>(numbered) <le>Switching driver: 2100 new lines of C (plus 1400 already in hand), 42 days <le>Driver and I/O exec modifications, 1000 lines of Macro, 10 days <le>Utility to switch, report, set up connections for tests, 1000 lines of C, 20 days <le>Server, keepalive and failure notice recognition only, 500 lines of C, 10 days <le>Integration testing, 20 days <le>Documentation updates, 10 days <endlist> <comment> This is partially done (1300+ lines of code) and another perhaps 2000 lines remain to be written to duplicate existing functionality, and perhaps another 100 for some other functions needed (e.g., forcing idle). The 2 path code provides a detailed design framework for the bulk of this, and most of the rest can use example code in some other drivers I have available for guidance. <p> There must also be utility code to switch paths, report statistics, and do manual connections or disconnections written. Much of this function exists within the Macro32 code also, but needs to be redone in C. The Macro version is about 1600 lines long. The new C version will however not need all of the driver setup that is provided in the Macro32 version, since this functionality is being moved into the driver. Path switching to the next available path is also in the driver so that some shrinkage might occur. Thus estimate 1000 lines of code for this utility code. <p> Also there are needed a number of modifications to the boot path which will ensure the same names for devices, and to DUdriver (DUTUSUBS actually), DKdriver, MKdriver, and possibly the MSCP server, to ensure that disk paths all have the same name and all load and connect switching code if needed, and hide any non-first-seen UCBs. IOSUBNPAG will also need some additions to its common search code. The current recognition code for DKdriver that looks for MSCP served paths takes 256 lines of source. For the overall magnitude of these changes, therefore, we will presume about 1000 lines of code. These modifications are to Macro32 code and will therefore be in Macro. <p> The server code which enforces policy globally may turn out not even to be needed initially (the review has not yet taken place) if one can live without keepalives, special controller-specific path policy issues, or performance enhancing preeemptive switching. However, to get the server architecturally included, these things seem worth having. For an initial build I would leave controller specific policy issues out (and just use round robin path choice) until switching is better understood in practice. Doing this means a small server which responds to HSZ controller failure would be all that needs to be built. This may turn out to be 500 lines or so. Even this may turn out after review to be deferred until later however. <p> Thus lines of code still to be done are 2100 + 1000 + 1000 + 500, or 4600 lines of C and 1000 lines of Macro in all. At a speed of 50 lines/day for workable C code and 100 lines/day for Macro this is ~82 days' effort. As a rough guesstimate I am estimating 50% of this time is for coding itself and the other 50% for bug squashing. <p> The driver will contain the logic to do all the actual path switching work and path setup, so that other modules need only issue specialized $QIO calls to it to control it. Therefore it is entirely feasible that the process code for manually switching, hooking up paths, and reporting statistics to be done by someone else. If this path should be followed, some throwaway test driver code would need to be built to test the new driver early in the game if the "real" test code were not available before then. This might however just be a rework of some of the existing macro code and take a couple days to get together. The server code could also be split off. This would move perhaps 1500 lines of code (crudely 30 days of effort) to where it might be done in parallel. Moving path searching code and other support code would be more involved (since the switching driver needs to know those interfaces fairly intimately to perform its lookups and switching) but this can also be done. This is at an estimate 10 days (2 work weeks) of effort. <p> If the schedule should be run in parallel, development time for driver code thus works out at 42 days, and time for the rest at 40 days. <endcomment> <p> In mixed clusters, when QIOserver client access exists, the switching problem will exist even on VAXen, so the switching driver and client end of the cluster path detection will need to be there also. When this is to be supported, the switching driver in Macro and the Macro utility code may need to be brought back and updates completed on them. The Alpha driver has been ruled to be done in C. However, the actual Vax work (code modification and tests) will be deferred until after the Alpha code is ready, since the dual path situation can be avoided on the Vax platform until after QIOserver paths exist. The server component will be written in C for portability and easier debug. <p> An additional month for "integration" testing is needed. Such testing will need to verify that dual path configurations work in all combinations of the following: <list>(numbered) <le>Standalone uniprocessor <le>Standalone multiprocessor <le>Clustered uniprocessor <le>Clustered multiprocessor <le>Dual path HSZ connects on single systems <le>Dual path HSZ connects on shared SCSI busses <le>Dual path HSZ connects on unshared SCSI busses on different cluster machines. <le>DK to MSCP server failover/failback <le>Clients seeing dual served paths (to be done after QIOserver exists only; not feasible until then.) <endlist> <p> The tests with dual path HSZ will cover served access as well as local for some of the configurations described (see pictures in the investigation report). It is expected that another month or so of QTV testing will be needed also after unit test. <p> Finalizing documentation will be done concurrently with the QTV driven testing and will take at least 2 weeks. This is new functionality and the code must be documented well enough to facilitate addition of more special-case code for different types of controllers, and the system for I/O interception will need to be well described for the benefit of any third parties who may use it. (It is to our mutual benefit if they use a compatible intercept. We gain assurance that the I/O posting path can be made shorter. They gain assurance our intercepts don't interfere with theirs.) <p> The approximate order for implementation will be <list>(numbered) <le>The switching driver <le>I/O exec infrastructure (including naming) <le>The control/display program code, productizing the existing functions <le>The rest of the control/display program <le>The server, starting with probing of HSZ functions <le>The DKdriver mods to report HSZ specific messages <le>Tests of server run late in system operation <le>Final bits of DKdriver and DUdriver support and server handling of messages <endlist> <p> The move to a C driver means no function testing can begin until at least the driver and the more vital systems routine support code is done. This will be done first, however. The control program will be coded interleaved here, the idea being to produce something that can manually set up paths or request path switch first so the driver functions can be debugged. Once this basic functionality can be achieved, the rest of the package can be added in, testing at each stage to ensure against breakage. This should minimize unpleasant surprises later on and discover any problems early. Occasional calls on SCSI group test support are envisioned to run a go/no-go test or two to ensure things are still sane. This is a risk minimizing strategy. <HEAD1>(Testing Plans) <p> The system will and must be tested incrementally during build, as well as stress tested after completion. The design specification is fairly detailed since it draws on lessons learned building a working test model of the switching logic. Nevertheless, a complete design document covering the final system will be written and reviewed, and the code will be subject to review, though its extent is such that a formal review is likely to be infeasible. Since it is likely that the QIOserver people will have their hands in this code as it is being built, though, it is expected that it will have considerable informal scrutiny during construction which would normally not be had. <p> Since modifications to the rest of OVMS are minimal and will in general be done first so that normal Raven testing will have long exposure to them, it is expected further that the testing proposed to verify correct functioning will be all that is needed and that cross interactions with other OVMS components will never be seen. <p> A final formal design review will be held, as a last chance to identify design problems, as soon as the final design documents can be prepared, in addition to the reviews of the design specification to be held after the current review by a few people of the first draft. <HEAD1>(Performance Issues) <P> The performance of the subsystem must be tested relative to unswitched paths. This is simply done by insertion or removal of the switching intercept (which is designed to support this). The time cost of the switch must be below 0.1% of the time needed in main path I/O. Also, the setup code must not insert the switching code where no multiple paths exist, so that no overhead will be added there. <HEAD1>(Project Dependencies) <p> The initial work of this project has only one serious dependency: it MUST be able to make some exec changes (and preferably a couple of upward compatible structure changes also, without which meeting the performance goals will be difficult). <p> The "REQCOM" problem (which in DKdriver manifests also as not having all IRPs call driver start_io) must be solved. <p> It should be noted that support of switching of /FOReign - mounted disks failover is to be limited to manual failover on command. Thus no mount verify dependencies are known. <p> The SHdriver interactions are only generally known at this point, and it may be necessary to get some additional information between SHDRIVER and the failover code to make path failover of shadowset members work properly. <p> Completion of QIOserver will mean that at least testing with QIOserver-served paths will be needed, and some QIOserver code may need to be merged into the server defined herein. Also subsequently, the extended device naming project will need to use some of the routines in the server for its cluster name arbitration. This will probably be most efficient with that name arbitration loop merged into the one needed for this project to find duplicate paths. (The routines will be set up to facilitate this.) Thus these two projects can avoid some duplication of work. <p> The estimated number of defects in the code is: <list>(unnumbered) <le>5500 lines of new code, at one defect per 25 lines: 220 defects. <endlist> <p> The total is 220 defects. These figures are quite crude. <head1>(Project Risks) <p> The chief risks in this project are schedule ones. Unforeseen problems, particularly in the interoperation with hardware, could cause delays, and other problems could arise. Switching paths has run in "test jig" setups, validating the basic approach, but changes to any of mount verify, fast path code, MSCP, or issues which may arise with QIOserver could add complexity not currently foreseen. Furthermore the coding speed and size estimates are rough. (It is believed that the size estimate may be high but that the coding speed estimate may also be high; these may cancel one another out.) <p> The intent is to measure the size of code each day or week that has been completed and compare with expected total end size as reported here to track progress. The presumption will be that moving or slightly reworking existing code can proceed very rapidly; it already presumes "n" paths in a number of places (where the assumption is that "n" is small, i.e., less than 8 or so), and many of the needed cleanups are already identified in comments. Once this is done, the tracking will be able to be against "new" code, to permit early discovery of schedule problems should any develop. <HEAD1>(Project Deliverables) <P> <list>(numbered) <le>Design documents for the completed work <le>Switching driver (including support for all utility and server functions) <le>Utility program to switch paths manually and to set up paths to be switched manually. <le>Complete server process code able to handle HSZ50 messages and to do duplicate detection and matching for devices on HSZ or with WW IDs that can be obtained, and keepalive messages to discover failures. <le>Driver and IO exec definition changes and code modifications to support interceptions and ensure multipath device names are presented the same to the system. (Also to fix the longstanding "REQCOM" kernel stack problem if not handled elsewhere.) <endlist>