From: CSBVAX::CSBVAX::MRGATE::"SMTP::PRUNE.SRV.CS.CMU.EDU::CMU-TEK-TCP-REQUEST" 13-MAR-1989 17:41 To: MRGATE::"ARISIA::EVERHART" Subj: Processes hung in RWAST states Received: from CS.CMU.EDU by PRUNE.SRV.CS.CMU.EDU; 13 Mar 89 15:06:30 EST Received: from DUPHY4.DREXEL.EDU by CS.CMU.EDU; 13 Mar 89 15:03:58 EST Date: Mon, 13 Mar 89 08:38:56 EST From: lane (Charles Lane) @ DUPHY4.Drexel.Edu Message-Id: <890313083853.1196@DUPHY4.Drexel.Edu> Subject: Processes hung in RWAST states To: cmu-tek @ DUPHY4.Drexel.Edu cc: lane @ DUPHY4.Drexel.Edu Comment: Drexel University Particle Physics There's another couple of hints for dealing with processes (particularly our favorite TCP/IP code) hung in RWAST states. All of the cases that I've seen so far with NAMSRV, IPACP, etc. hung in RWAST were a result of IP device driver/ACP problems. Do a `show dev/full IP' and look at all the various IP channels that are open. Do they say `busy' for the channels assigned to hung processes? When a process is STOPped, or tries to exit, VMS makes sure that there isn't any outstanding i/o requests on its i/o channels. If there is outstanding i/o, the process is put in a RWAST state until the i/o completes. Mind you, this is not the only way to get into RWAST, just a common way! Now, if IPACP is busy looping or otherwise misbehaving, it won't process i/o requests and so your i/o will never complete. One easily detected symptom of this is IPACP in a COM state, and if you do a NETSTAT, then you find yourself hung....when you try to control-Y out of NETSTAT, whammo! into RWAST. [don't do this from a terminal you'll need later... better yet, come in over DECNET to try this] This sounds a lot like what you've got. Another thing that will get you in RWAST is if the IPACP does too much... say it does i/o completion on a $CANCEL request. This decrements the outstanding i/o count on the channel to -1 ... when the process tries to exit the system checks the i/o count, says `this guy has outstanding i/o requests...put him in RWAST'. For this scenario, most everything will be okay until the time comes to deallocate the IP channel. How to see if you've got this problem? ANALYZE/SYSTEM (as mentioned in another posting) is the way to go. You want to do a SHOW SUMMARY to see what processes are around, then SET PROCESS XYZ [if XYZ is the process that's hung] Now do a SHOW PROC/CHAN to see the i/o channels. Look for something that is suspiciously `busy'. Note the channel number of the busy channel. +----channel number noted before... Okay, now do a : \/ EXAM @CTL$GL_CCBBASE - 0060 ; 0010 you'll get 4 longwords, in the usual right-to-left form of VMS dumps. This is the Channel Control Block for that channel [VMS internals p 471] & guide to writing device drivers. The format is like: +-------------------------------------------------+ CCB:| CCB$L_UCB | +-------------------------------------------------+ | CCB$L_WIND | +-----------------------+-----------+-------------+ | CCB$W_IOC |CCB$B_AMOD | CCB$B_STS | +-----------------------+-----------+-------------+ | CCB$L_DIRP | +-------------------------------------------------+ Which in the dump looks something like 00000000 00010401 00000000 7f123456 dirp ---- wind ucb ioc The IOC is the only thing we want....if it is one (as here) it means there is one outstanding i/o request on the channel. For example, NAMSRV in normal circumstances, keeps an outstanding read request on its IP channel. If IOC is FFFF, then the IPACP has returned one more request than was asked, and you got trouble. You can get this latter effect by a mismatch of one of my `$CANCEL' modified IPDRIVERs with an old, unmodified IPACP. Sounds like you got both right off of the distribution tape, however, which wouldn't have a mismatched IPDRIVER and IPACP. Good hunting... --Chuck Lane lane@duphy4.drexel.edu