From: CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 26-NOV-1989 22:16 To: MRGATE::"ARISIA::EVERHART" Subj: Re: Fix for Ethermon Received: From WARBUCKS.AI.SRI.COM by CRVAX.SRI.COM with TCP; Sat, 25 NOV 89 10:52:19 PDT Received: from TGV.COM by Warbucks.AI.SRI.COM with DECNET ; Sat, 25 Nov 89 10:50:00 PST Date: Sat, 25 Nov 89 09:31:37 PST From: adelman@TGV.COM (Kenneth Adelman) Reply-To: Adelman@TGV.COM (Kenneth Adelman) Message-Id: <891125091048.20800055@TGV.COM> Subject: Re: Fix for Ethermon To: mahendra%ureginav.bitnet%ugw.utcs.utoronto.ca%TGV.COM@Warbucks.AI.SRI.COM Cc: OBERMAN@ICDC.LLNL.GOV, info-vax@sri.com, info-multinet@TGV.COM >> >> From: "Kevin Oberman, LLNL, (415)422-6955" >> Subject: Fix for Ethermon >> Sender: INFO-VAX Discussion >> >> The first is that after I exit ETHERMON on my 8650, the DEUNA stops >> receiving. DECnet, LAT, and TCP/IP all die. MultiNet reports repeated >> DEUNA resets. Any attempt to shutdown DECnet or even to SHOW NETWORK >> hangs the session. Even powering the DEUNA up to reset it doesn't >> help. Only a reboot has worked. I have no clues on this one. >> > Kevin: > I am experiencing similar problems too with ETHERMON, but only after > I replaced CMU's TCP/IP with MultiNet on our VAX 6320 and VAX-11/750. > Prior to that I could enter and exit ETHERMON at will and didn't have > any problems. Could MultiNet be doing something that ETHERMON doesn't > like or vice versa? I would be curious too to know what is going on > since I have removed ETHERMON from my system and noone is allowed to > use it until we know what is going on. > > Does anybody have any insights? Yes, MultiNet is opening a lot (5) ports to the ethernet driver with the auto restart (NMA$C_PCLI_RES) feature enabled. There seems to be a bug in DEC's ethernet drivers where when a lot of users (ports) use this feature the device driver fails to recover from a device reset. There seems to be a separate problem where under high loads or highly loaded networks the devices reset. I'll guess that putting the device in promiscuous mode increases the odds of resetting (I've seen it happen), but the real problem is that the VMS driver doesn't recover from the reset. I've investigated one particular case of the problem further. Under VMS V5.2 the DELQA-resetting-and-wedging problem gets worse than under VMS prior to V5.2. Apparently when a reset occurs the device driver clears the "RECEIVE ENABLE" bit in the CSR, and never reenables it, preventing the device from seeing any more packets. If you patch out the BICLW2 instruction at offset ^X0C3A in XQDRIVER, the problem goes away (but I'm not sure what the side effects of this may be, if any). If you deposit into the CSR directly after the device wedges and enable this bit, it starts running again. The problem is most definitely related to the use of the auto-restart feature. I've a short test program that duplicates the problem w/o the use of MultiNet nor the FFI or ALTSTART interfaces. It opens 30 ports using this feature and then tweaks the device CSR to cause a reset, and the device rarely recovers (the more ports, the less likely the recovery). I don't have one of the doomed environments which causes the resets to occur, so I need to trip them artificially, but have been working with Colorado Support and one of our customers on the problem for over a month now. Even with all this information, Colorado Support's last answer (to call # C89-1003-1029) was that they didn't think the problem was their responsibility (????) and that they were refering it to the Santa Clara office for resolution. I haven't heard from the Santa Clara office, but if their technical people are anything like their sales people this isn't going anywhere. We can't even get a correct invoice out of Santa Clara for a shrink-wrap 3100 model 30 we purchased months ago; and we can't get them to order the software maintenance. I don't think they know how. The footnote to this is that because of the Santa Clara office I would never consider buying something from DEC (like disks) if I could get it from another vendor. If anyone has any ideas about how I can get DEC to pay attention to this problem I'd appreciate it. Kenneth Adelman TGV