From:	SMTP%"RELAY-INFO-VAX@CRVAX.SRI.COM" 24-SEP-1993 11:28:48.88
To:	EVERHART
CC:	
Subj:	Re: Deleting users stuck in RWAST

From: jeh@cmkrnl.com
X-Newsgroups: comp.os.vms
Subject: Re: Deleting users stuck in RWAST
Message-Id: <1993Sep21.122935.2755@cmkrnl.com>
Date: 21 Sep 93 12:29:35 PDT
Distribution: world
Organization: Kernel Mode Systems, San Diego, CA
Lines: 162
To: Info-VAX@kl.sri.com
X-Gateway-Source-Info: USENET

In article <1993Sep20.231543.1@alien.gici.com>, laut@alien.gici.com writes:
> In article <1993Sep20.095900.2748@cmkrnl.com>, jeh@cmkrnl.com writes:
>> In article <1993Sep10.225830.1@alien.gici.com>, laut@alien.gici.com writes:
>>> The problem arises when, for some reason, the thread's transition from IPL#4 
>>> to IPL#2 doesn't correctly finish, and so CCB$W_IOC doesn't get decremented.
>>> The result is a hung process.
>> 
>> THe usual reason for this is that the outstanding I/O request on the 
>> "problem" channel couldn't be cancelled or completed.  
>> 
> [...]
> since VMS is _trusting_ the driver to "clean up after 
> itself," a driver that -CAN'T- cancel and/or complete I/O has no business
> being loaded into your system, because it has a design fault in it. 

Unfortunately, some such drivers are shipped with VMS!  It's not as though we
have a choice about them. 

The hardware can be at fault too.  This used to be a very famous "feature" of
certain Massbus tape drives:  (Am I dating myself, or what?)  It was possible
to lose an interrupt that would otherwise signal the end of a long
non-data-transfer operation, such as a rewind, if someone took the drive
off-line during the rewind.  For a while the VMS folks were telling the TC03
folks "fix the controller to give us the interrupt anyway" and the TC03 folks
were saying "we can't anytime soon, fix the driver to allow cancels"... 

There are also devices that will lose an interrupt after being programmed for a
DMA transfer.  The standard rule (written long ago, when Massbus and Unibus
were all that existed, and the smartest terminal mux supported on VMS was a
DZ11) is that "DMA transfers, once started, cannot be cancelled, you must wait
for the interrupt".  The only safe way that a driver can violate this rule is
to tell the controller "don't do that one" and get a confirmation from the
controller that it hears and will obey.  

In many older (dumb) controllers, and some new controllers too, the only way to
do this is via a hard reset on the controller to make it forget *everything*
that's happened since system boot.  Lots of driver writers are afraid to do
this.  If you have to wait for a response from the device after the reset, the
programming can get tricky, esp. if the driver's cancel I/O routine was written
as an afterthought instead of being designed in from the start.  (With a device
of almost any realistic level of complexity, the driver model offered in DEC's
template driver just isn't adequate.)  Many driver writers, including some who
apparently worked for DEC at one time, just punted the whole issue and assumed
that every DMA transfer would complete. 

There are lots of other examples.  My point here is that while it's fine to say
"you shouldn't have any buggy inner-mode code in your machine", this is
difficult to achieve in practice, except by not booting!  

>>> The problem arises when, for some reason, the thread's transition from IPL#4 
>>> to IPL#2 doesn't correctly finish, and so CCB$W_IOC doesn't get decremented.
>>> The result is a hung process.

For the record, I've yet to see a case where the IPL 4-> IPL 2 transition just
"doesn't correctly finish".  Invariably something else is going on -- namely,
the target process is waiting at IPL 2, preventing the delivery of ASTs. The
process rundown code is supposed to be set up so that anything that might need
a wait at IPL 2 will be over and done with before it's time to $CANCEL all
outstanding I/Os.  Obviously this doesn't always happen. 

>> (Note that "threads" -- not an official VMS term, in this context anyway --
>> don't really transition from IPL 4 to IPL 2.  The IPL 4 interrupt service
>> routine runs in system context.  It arranges for I/O completion ASTs to be
>> delivered to the process that requested the I/O.  These ASTs run in the context
>> of the target process with a completely different register set, P0 address
>> space  mapping, etc. and aren't, properly speaking, a continuation of the
>> "thread" that was running at IPL 4.) 
> 
> <groan>  The I/O-Post Processor starts at IPL#4, in system context, to 
> complete its system-specific stuff.  Then, it converts the IRP into an ACB 
> and queues it to the target process, in order to get the process context 
> mapped, so that it can finish up with things like propogating the IOSB; and 
> in cases of buffered I/O, copying the system buffer into the user's buffer.
> 
> The procedure is identical in concept to "forking" within system context, 
> whereby the THREAD is started as a device interrupt, and then forks down to 
> its Fork IPL so as not to block any other incoming interrupts.
> 
> Really now, Jamie.  If I were Ehud, I would likely be accusing you of having 
> answered my post as you did for your own personal gain or ego gratification. :)

No, I answered your post as I did because I believe (based on some years spent
full-time and the subsequent years spent part-time teaching and writing about
VMS internals and device drivers), that the analogy to forking will lead some
readers to form incorrect mental models of this part of VMS. 

I feel very strongly about getting the terminology and the models right and
carefully specifying what they do and do not apply to.  Incorrect models and
terminology invariably lead to incorrect conclusions about other aspects of
VMS's operation. 

The way I see it, the IPL 4 -> IPL 2 transition is *not* identical to forking,
not just because of the different context (system vs. process), but because the
IPL 4 code is not over and done with once an IRP has been queued to the
requesting process as an ACB.  Rather, the IPL 4 code sits in a loop looking
for more IRPs on its input queue. 

The operation of the IPL 4 code *is* very similar to what happens in the *fork
dispatching routines* (the IPL 6, 8, 9, 10, and 11 ISRs)... if you want to draw
an analogy to forking, that's the one to point to.  (I've yet to hear anybody
claim that a fork routine was a continuation of the IPL n fork dispatching
thread -- it isn't; it's a continuation of the thread that called EXE$FORK.) 

I think I see where you're getting your view of things, though.  The IPL 2 I/O
posting special kernel AST routine is a continuation of the IPL 4 thread, in
that the IPL 4 thread specifies the code to be executed at IPL 2.  This isn't
the case in the fork dispatchers. 

Still, there are enough differences btw this and forking that I think this is a
very misleading analogy to draw.  Unless of course you take the time to explain
the details of where the analogy is valid and where it isn't.  

(Speaking of fork processes -- damn, I wish they *had* called them "fork
threads"!  The use of the word "process" in this term leads to far more
confusion than we're concerned with here.  The only way to deal with this, I've
found, is to hit it head-on the first time you introduce the term, and say that
while "fork process" is the official term, these things have nothing to do with
"processes" as we usually think of them.  The "comparison charts" in the DEC
Ed. Svcs. materials, contrasting "processes" and "fork processes", do more harm
than good...) 

>> You forgot to mention that if you do this, and THEN the outstanding I/O request
>> decides to complete, VMS will most likely crash.  This is the reason for the
>> "don't delete processes until their I/Os are finished" rule.  
>> 
>> 	--- Jamie Hanrahan, Kernel Mode Systems, San Diego CA
> 
> I did.  _You_ deleted it from my post.  

I didn't see it.  Perhaps I was responding to a previous followup that had
quoted your post.  

> Also, you _forgot_ what the original 
> poster said, namely that this occurs under an ORACLE application coming in 
> on an RTAn: device, and that the only way he has of fixing the problem is to 
> reboot the machine.  Either way, his problem gets fixed.

*grin*  yep.  This reminds me of something that happened a couple of weeks ago,
in a completely different "domain".  A client called me in a panic:  "I just
dumped most of a cup of coffee into my keyboard!"  (And, I know that this
particular client uses artificial sweetener and creamer.)  I couldn't deal with
it right then, so I said, "Take the kb to the lunchroom and flood it with warm
running water, for at least ten minutes."  "Won't that hurt it?"  "Most likely
not, but it's certainly a lot better than what's in there now; if we do nothing
the kb will be a total loss, so let's do SOMEthing..." 

> However, for the record, just to insure there is no confusion out there in 
> net.land, let me re-iterate again the importance of everyone realizing that 
> what I suggested above is truely a LAST-DITCH method of fixing the problem,
> because it is pulling an unexpected (ahem) "thread" from the fabric of things.
> (%^/)
> 
> -- 
> Bill Laut			Internet:  laut@alien.gici.com
> Gull Island Consultants, Inc.	Phone:     (616) 780-3321
> Muskegon, MI  49440		>> "Usual disclaimers, apply within" <<

	--- Jamie Hanrahan, Kernel Mode Systems, San Diego CA
drivers, internals, networks, applications, and training for VMS and Windows NT
uucp 'g' protocol weenie and release coordinator, VMSnet (DECUS uucp) W.G., and 
Chair, Programming and Internals Working Group, U.S. DECUS VMS Systems SIG 
Internet:  jeh@cmkrnl.com (JH645)  Uucp: uunet!cmkrnl!jeh  CIS: 74140,2055