$ DOCUMENT/CONTENTS PP-sample.SDML REPORT PS Intro/history problem stmt (brief) methods used (group history & procedures) alternative chosen schedule estimates - 6 month bullets - what IS or IS NOT doneness - completion date for HL doc - project plan (ie, the ~6 "tall pole" things to fix) - long term guesstimated resources/sched resource estimates what is NOT covered (ie, 7.1 code) ------------------------------------------ (PROJECT_NAME\SCSI Subsystem Proactive Maintenance) (Investigation Report for\ <REFERENCE>(PROJECT_NAME) <line><line> DIGITAL CONFIDENTIAL) <REVISION_INFO>(X0.1) <AUTHOR>(Glenn C. Everhart, PhD.) <DATE> <ABSTRACT>(Abstract) This is the Investigation Report for <REFERENCE>(PROJECT_NAME). This investigation report is in: <p> EVMS::DOCD$:[EVMS.SCSI.REFLIB]pa_maint_invrpt.PS and .TXT <ENDABSTRACT> <comment> <HEAD2>(Review Team) <p> <SIGNATURES> <byline>(Ken Munsell\Group Manager) <endcomment> <ENDTITLE_PAGE> <COPYRIGHT_PAGE> <COPYRIGHT_DATE>(1995) <ENDCOPYRIGHT_PAGE> <CONTENTS_FILE> <ENDFRONT_MATTER> <RUNNING_TITLE>(Digital Equipment Corporation--Investigation Rpt\Digital Confidential) <HEAD1>(Introduction) <p> The VMS SCSI subsystem originated as a simple system to support a few devices on what was viewed as a low-end system. Gradually it has grown over time as SCSI became popular beyond the wildest imaginations of the first implementors, and been ported (bug for bug compatible) to Alpha from VAX. The costs for maintaining the SCSI system have been growing during this time, and after a particularly difficult system enhancement, management has decided to commission some preventive maintenance. The group held a retrospective and a problem definition series of meetings and arrived at a problem statement summarized below. <HEAD1>(PROBLEM STATEMENT) <p> The OpenVMS SCSI subsystem has become difficult to maintain, understand, and extend. These problem areas, which are visible to internal users and to customers, need to be simplified and improved. <p> The problems in the system are: <p> There is no comprehensive OpenVMS SCSI design. Nothing much was ever written, and the people who had some oral tradition about the top level design have left. This has meant that system upgrades were made with no top-level understanding or structure; the result is inconsistent interfaces and information passing by side effects in many locales in the code. Changes are fragile and the environment increasingly hard to understand, since the only reference is generally the code itself, which has lost some of whatever coherency it initially had. <p> This situation makes it hard to fix or enhance the SCSI code base. It means it takes longer than it should to learn the system, so that new people cannot quickly contribute. It breeds new bugs. <p> As a result, customers see features they want delayed or denied. They also see VMS able to adapt only slowly to SCSI enhancements such as wide SCSI, added LUNs, new device types, and commodity device support. Finally, third parties have trouble getting information they need to add device support of their own. <HEAD1>(Goals:) <list>(unnumbered) <le>Make the VMS SCSI subsystem easier to maintain <le>Make the VMS SCSI subsystem easier to understand <le>Make the VMS SCSI subsystem easier to extend <le>Not breaking SCSI functionality <le>Provide a path to implement <le>Not break code written by other groups <endlist> <HEAD1>(Non-Goals:) <list>(unnumbered) <le>Rewrite code solely to make it "pretty" <endlist> <p> <HEAD1>(Methods Used (and background in what has been done):) <p> The conclusion was drawn initially that a new architecture for the SCSI system should be developed, in writing this time, based on the ideas of the SCSI group. Initially this approach was not constrained, and some fairly major modification was investigated. <p> One particular approach, investigated individually before the full SCSI group became involved, was to port the DEC Unix CAM implementation to VMS. Such a port has some attractions, in that it might make it possible to support devices with one driver for 2 OSs. While there is a great deal of superficial similarity in approaches, the Unix CAM code would have had to have the entire suite of VMS scheduling primitives added, and its internal scheduling model would have needed revisitation also, since Unix schedules processes, including driver parts, very differently from VMS. It was believed that this effort would be lengthy and high risk (owing to the need to port an unknown fraction of other Unix internals to VMS), and the code so ported might still not be much closer to what Unix uses than VMS drivers now are, so the gains would probably not be realized. Thus this idea was abandoned. <p> Several sources of information were used to discover additional possible improvements to SCSI besides inspection of code from VMS and Unix. These included a sizable list of requests for new functions from various group members (including some who have left now but had a larger share of design lore than is available now), a SCSI retrospective in which the entire group reviewed what had gone well and badly in the Zeta production effort, discussions with group members about what they believed would be useful, and a notes conference. The SCSI ARCHITECTURE notes conference was set up in early July and contains a written repository of input from the entire SCSI group of ideas and concepts of how SCSI should be implemented on VMS. It has served to provide much of the content for a very high level SCSI architecture document which incorporates much of the work represented there. The SCSI retrospective also produced a written report which outlines what the group found to have been lacking and to have caused difficulties in the Zeta (VMS 6.2) development effort in SCSI (which introduced tagged command queueing and a number of other upgrades to SCSI). <p> The architecture notes file effort occupied much of the SCSI team for around 3 months as each member was asked to describe how SCSI should work, in fairly unconstrained form, on VMS. In this effort, it was specific that the current code base design was an acceptable answer provided it was arrived at after consideration of possible alternatives and provided the description arrived at was coherent. Responses varied greatly in closeness to the Zeta code base design, but substantial agreement was converged on in many design features through discussions in the notes file, at meetings, and individually. <p> The results of the investigations and discussions mentioned above constitute the background of the present investigation, and will serve as resources in its accomplishment. <p> <HEAD1>(Approach:) <HEAD2>(Rejected Approaches:) <p> The following approaches were considered by the full SCSI group, in the light of the history mentioned above, and rejected: <p> The first approach to dealing with SCSI system maintainability problems was to simply document the existing SCSI code base as is. This was rejected because in the opinion of the SCSI group, the existing code base lacks a consistent and comprehensive design to document. Some parts of the base do have such designs, but there is no overall principle which ensures they are mutually consistent. This precludes complete limitation to the existing base. If a consistent design existed for the full current code base, one could document it and improve maintainability and extensibility in this way. However, documenting the current code base would only codify the inconsistencies whose presence demonstrates that a coherent top level picture cannot be drawn using only the current code base. A coherent top level picture of the code base must go beyond what is there now if only to eliminate inconsistencies. <p> The second approach is an opposite of the first, namely to rewrite the entire SCSI subsystem to a new and unconstrained design. While the process of creating such a thing can produce a complete document set and drivers consistent with it, and would cause third party impact only once, it provides no benefit for a long time. New functionality would be delayed and the current code base would all have to be maintained as is for the full development period, with no short term reduction in the effort needed to do this. This approach also would be most vulnerable to the continual flood of new SCSI implementations from vendor devices. To the extent current code had to be patched to handle these, a from scratch approach would accumulate a larger number of to-be-addressed issues which would have to be handled before deployment than other approaches, which involve less of a delay before first code. <HEAD2>(Selected approach:) <p> The selected approach is a hybrid. It begins with a high level design document which is broad but not deep in details, which has rules of thumb for handling known SCSI issues, but little specific internals information. This document represents a complete vision of what the best way to implement SCSI on VMS is, subject to the constraint that it be possible to implement incrementally. It must, finally, contain external interface descriptions and data descriptions in a high-level form. <p> Once the high level document is defined sufficiently to proceed, a series of follow-on investigations will be chosen, based on CLDs, QARs, and other maintenance experience and based on the interface descriptions. Approximately 6 such areas will be chosen. These areas will then be subjects of subsidiary projects whose objectives will be to rework those areas so they are consistent with the high level documentation and to document the design of the areas reworked for the future. The scope of this project will be deemed satisfied once the high level document has definitions of interfaces and is released in initial form and when these projects are also done, though it is expected that future SCSI work will reflect, and be reflected in, the high level document. (Note: choice of the projects can overlap the document completion, considering that much work on a high level document has been done already.) <p> The initial selection of projects is described below. The list supplied includes more items than are likely to be possible by 7.2 code freeze, and is intended to show a set of to-be-done items the first several of which may be possible to finish by then as well as to give some idea of the items needing work over several releases. <p> The overall project result will be a SCSI documentation set covering high level design principles, selected areas of SCSI which need work most, and reworked code for the "worst problem" areas in the system (plus glue as needed so that the rest of the SCSI subsystem will continue to work). <p> This approach has the advantages that all work will have a top level design available. Some areas will be reworked for each new VMS release, and customer impact will be localized to change areas, and made minimal in any case due to the constraint that it be possible to use existing code along with the new. <p> The major theoretical drawback to this approach is that the constraint on high level design could preclude some conceptual breakthrough. Since the group has sought such and not found it, though, the risk of this is small. More practically, the SCSI code in this approach may never fully match the design, though it will approach it, and the time to overhaul every bit of the system may in principle be greater than a "clean sweep" rework due to need to keep un-updated components working. (You don't need to write glue code in a clean sweep approach.) <p> In practice, some parts of the SCSI system might never need to be updated, so the "overall time" issue may be a red herring. At any rate, feature enhancements and fixes are needed sooner than a clean sweep approach could deliver them. <HEAD2>(Specific Implementation Approach) <p> The high level document will be derived from the existing architecture document with addition of greater detail about data structures and the port-class interface and discussion of how to "glue" existing drivers in with this design. This document proposes a set of rules of thumb but is intended to permit incremental implementation of its ideas and thus is suitable as a starting point. This document is expected to be complete by the end of CY 1995. However, it is also expected to be modified by subsequent implementation projects so that as parts of the SCSI subsystem are reworked over time, they will approach a commonly documented high level design (and the high level design document will approach the code). To accomplish this, each project should attempt to implement features of the high level document bearing on areas the project covers, and the high level document should be adjusted when problems with it are found. <p> The subsequent projects each need to follow the LOP cycle so that they are reviewable individually; the choice list in this document is intended to show what the long term group plan is. Because the project is intended to produce a change in long-term development policy in the direction of having a high level document which is kept consistent with code base pieces as they are modified, it cannot be said to be "over" in same way a code rewrite can. For purposes of discussion of a project, however, this project can be said to have deliverables of a high level document and a set of projects to begin addressing the most urgent issues identified specifically. Once these are delivered, one can speak of THIS project as "done". A SCSI document set cannot be said to be complete and consistent until at least example code exists at all levels of the high level design and either the entire system conforms to a single design or at least glue code exists where needed so that components deemed "end of life for maintenance" can continue to be used with the rest of the system. The fact that this project involves a commitment and intention to produce such documentation as an ongoing part of future development does not imply a perpetual project. Rather, it will provide a source for future projects, but this project itself should be considered to terminate with delivery of its high level document (in its initial state), and with delivery of the first group of implementation projects. Further projects are expected to be proposed on an ongoing basis, but these will be considered in a timeframe past the VMS 7.2 period. <p> The project choices made at this time are weighted most heavily by what will address CLDs, QARs, and needs of other VMS components, but in the future, documentation of project designs, and ensuring conformance between those designs and the high level documents, must be a part of each project, with a design document as a deliverable from each project that produces code. Each project should make some contribution to making the SCSI system more maintainable as well as possibly add new functionality, and projects further out can be derived from what is necessary to implement an entire execution suite to the high level design, from class driver, through layers of common code, to adapter specific port code. The basic rule is that when a component is changed, it is changed consistently with the overall architecture (both are adjusted as more is learned) so that after several releases, it is expected that most of the system will have been revised, and thus will conform, without the need to do rewrites solely to bring about conformity. <p> Using their knowledge and the existing architecture document, plus review of the QAR and CLD databases and of recent fixes in code, as inputs the group has come up with a set of specific projects which attempt to provide specific implementation actions which will best improve the maintainability and extensibility of SCSI on VMS and reduce the maintenance burden on the group. <p> It is a goal in all the following projects to consult other groups affected to ensure no problems are caused. <p> This list includes the following projects, in the group's consensus priority order: <list>(numbered) <le> "Extended SCSI Address Space" <p> Modify driver structures and code to support big SCSI IDs. Since it makes little sense to change data structures several times, incorporate data structure cleanup here also, documenting SCSI data structures consistent with high level doc & defining access rules for at least the most frequently used fields. (It should be added that if we have to edit data structures for some other reason first, we need to do the cleanup then, though code to use all new areas added may not be done for a while.) <p> This project is intended to support SCSI IDs 0 to 15 on wide busses, and LUNs 0-31. For serial or fibre channel SCSI, larger IDs and LUNs as large as needed to cover the address range of these busses will be supported. <p> Duration: TBD <le> "SCSI Feature Control" <p> Add an external control interface to permit outside control of the use of SCSI features by (mostly 3rd-party) devices. Among the work is: <list>(unnumbered) <le> Control synchronous and fast SCSI use (and ensure drivers can support fast SCSI properly) <le> Control wide SCSI and ensure drivers can use this feature if present and enabled. <le> Control tagged command queueing use <le> Control timeout values per device <le> Control use of 10-byte modesense messages <le> Implement and add controls for diagnostic ring buffer code to capture SCSI information when enabled to do so. <endlist> Eventually this interface would be able to use the registry facility now in design to control these features of boot devices also. <p> Duration: 3 man-weeks to translate existing control interface into C. <p> Rest: TBD <p> This feature control will permit much easier handling of SCSI devices from commodity sources and permit many QAR or CLD issues to be handled by customers running a configuration utility, rather than needing engineering time to develop new drivers. <le> "SCSI Doc set" This project is designed to improve documentation available for the current SCSI implementation and to ensure future documentation is present and useful. <list>(unnumbered) <le> Have driver maintainers write up an intro/internals document describing how the driver works. <le> What documents are needed & what release each is for needs to be planned; this involves 2 sets of manuals: <list>(numbered) <le> A quick set (the cheat sheets about current drivers) <le> A slow set (full design docs for drivers using perhaps a template design document from the documentation people.) <endlist> <p> Duration: TBD <endlist> <le> "Enhanced Diagnostic Features & Tools" To make it easier and faster to diagnose errors and system error states, add the following features: <list>(unnumbered) <le> Make error log entries informative & consistent (including unique type/subtype) <le> Make bugchecks unique so one can find where they come from <le> Add diagnose interface to port drivers (functionally. If this can be done most efficiently by implementing class driver disconnect and using GKdriver this can be done that way.) <endlist> <p> Duration: Estimated 890-1140 lines of code in 11 modules (in Macro, Bliss, and C, mostly Macro). Guessimate ~1 month. <le> "Common routines" To simplify the SCSI code base in some ways, create common routines to: <list>(unnumbered) <le> Construct SCSI commands (so class drivers don't have each to "know" how) <le> Handle memory management functions to do address and mapping translations drivers need in a standard and central way (rather than separately in each driver). <endlist> <p> Duration: TBD <le> "Utility Application" <p> Write a utility to issue SCSI commands, collect ring buffer info, output from commands, etc. Depends on IO$_DIAGNOSE for function. <p> Duration: TBD <le> "Target Mode" <p> Implement target mode/AEN code needed for clusters in other port drivers as needed. Implement a complete target mode either using a new class driver or keeping things in port driver common code. This supports SCSI clusters now, but may be needed for communications or other functions in the future. <p> Duration: TBD <le> "Flow Issues" Make the queue manager optional so ports not needing it will not get it. This will make considering flow control necessary and should be the occasion for adding externally controllable flow control in if not done already. While handling these issues, remaining known problems with reset and flow control issues for power management should be addressed. <p> Such functions will make tuning I/O on shared SCSI busses simpler as well as speed up I/O where the software queue manager is not needed, on several SCSI adapters. <p> Duration: TBD <le> "Restructure Drivers" Restructure drivers to: <list>(unnumbered) <le> Have more common code <le> Isolate unique device support cases better <le> Adapt as appropriate to device SCSI capabilities (discussed more fully in the high level design document) <endlist> <p> Duration: TBD <le> "Gentler Error Recovery" <p> Implement command cancel via Abort Tag, Abort, Bus Device Reset, Bus Reset stopping at the first success. Also allow I/O cancel to cancel operations more promptly, especially long commands. This may involve a rework of flow control and so should address adding some externally specified quotas, runtime tunable, to better regulate flow if so. (Whether this happens depends on whether another project has reworked flow control to allow external controls first.) <p> Such handling will provide less disruption on SCSI clusters and is essential if shared access to non disk devices is to be supported (as opposed to served access). <p> Duration: TBD <le> "Larger I/Os" Remove port limitations so that I/O requests larger than 64K can be enabled if ports support them. Add control via control interface to allow this. (This is useful for firmware loading etc., but may impact timeouts and the like if default is to allow very large transfers. Thus make it an exception.) <p> Duration: TBD <endlist> <p> It should be noted that as of this writing it is expected the first few of these projects may be doable by VMS 7.2 code freeze. Beyond those, it is also expected that the list will be reviewed. Hence time estimates for the latter part of the list are not included here, since experience will probably cause other proposals to surface and give rise to change in the ordering. The longer list is presented here since it can be said to cover the requirements of the high level document in the sense that if all these projects are completed, at least one wholly compliant execution path from class driver, through common code, to port driver adapter code will exist. <p> The high level design document's prescriptions are intended to accomplish business purposes, though, as are each of the foregoing projects, and work on each particular area of the system must pass muster as the most valuable work possible at the time it is done, consistent with VMS goals and resources at that time. <HEAD1>(Schedule) <list>(unnumbered) <le>High level document available for reference but incomplete in detail: 8/31/1995 <le>High level document complete: 12/31/1995 <le>Subproject initial list complete: 11/12/1995 <le>Subproject LOP investigations begun for first projects: 11/20/1995 <le>Subproject coding begun: 2/2/1996 <le>Some Subprojects completed: 7/1996 <endlist> <HEAD1>(Resources Available) <p> Resources available for writing these documents are considerable. There exist design documents for parts of some of the drivers which, while out of date and incomplete reflect parts of the designs. Also a top level design document exists and much detail was assembled in connection with deriving a new SCSI architecture. While it has not described interfaces or data structures in detail, it does describe some general features and satisfies the constraint on incremental implementability. Group members have also listed numbers of areas needing improvement in the past. Since there is considerable overlap in these problem areas, selection of a few top candidates should be straightforward. The entire project should be possible to get to beginning its subordinate projects by the end of 1995 so that some new code can be in place for the projected VMS 7.2 release. <p> There are also considerable code resources available, in the form of already existing control interface code, already-designed driver control areas to allow external control of SCSI features, designs for some of the wide-address problem, and of course a code base which does function correctly in most circumstances and which is well commented. <p> As to human resources, it cannot be assumed the entire SCSI group is available, since QARs and CLDs need to be handled on an ongoing basis. Therefore the probable number of engineers available for these projects is probably in the 2 to 3 range over the period in question. <HEAD1>(Next Steps) <p> A more detailed project plan will follow to cover time estimates, and LOP plans for the chosen subprojects will be begun by the beginning of 1996, in order that finer details on time and resource requirements can be ascertained. These estimates may dictate changing the order of implementation, but in any case execution of subprojects should begin by early 1996, with the intent of having several completed by VMS 7.2 code freeze.