From: CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 12-NOV-1989 14:44 To: MRGATE::"ARISIA::EVERHART" Subj: Re: MONITOR ETHERNET (a warning) Message-Id: <8911121903.AA17274@crdgw1.ge.com> Received: From NSFNET-RELAY.AC.UK by CRVAX.SRI.COM with TCP; Sat, 11 NOV 89 10:02:17 PDT Received: from sun.nsfnet-relay.ac.uk by vax.NSFnet-Relay.AC.UK via Janet with NIFTP id aa04780; 11 Nov 89 13:27 GMT Date: Sat, 11 Nov 89 13:31 BST From: Nick de Smith <"PSI%JANET.000000000040::NICK%ncdlab.ulcc.ac.uk"@NSFnet-Relay.AC.UK> To: INFO-VAX <@NSFnet-Relay.AC.UK:INFO-VAX@crvax.sri.com> Subject: Re: MONITOR ETHERNET (a warning) Hi, "Rob Wright ", writes: > Thank you for the V5.x patches to Monitor. They reveal some interesting things. > A little knowledge being a dangerous thing, I wonder whether some kind soul can > tell me what some of the output means... I am puzzled by the MONITOR ETHERNET class. The way it is implemented is a decidely odd hack - at each scan, the ethernet device's port driver has its line counters read with a modifier that also zeroes the counters. MONITOR, when it next reads the counters, knows the interval and can thus compute the averages. I am not sure how it does this, * but the values seem incorrect in places *. Also, having the counters zeroed each time loses important information, such as the cumulative error statistics. I base this statement on having written my own "MONITOR ETHERNET" (supported, on the DECUS tapes in [.ATG.XE], see below) which uses the same (undocumented) counters. I could of course be wrong, but my method, which does not involve zeroing the counters at each scan, gives substantially different statistics to MONITOR ETHERNET but which, however, agree with pen, pencil and calculator computations and a third party Ethernet (PC based) monitor. If there are any users of my XE package out there (I know you are there somewhere!), please let me know if you agree or not with this. /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ I would therefore be VERY WARY of MONITOR ETHERNET, however nice it may seem, IT IS UNSUPPORTED and (I think) for a very good reason: it give wrong values. /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/ > On one node (heavily loaded 6230) the "Buffer Unavailable" statistic averages > around 16 per second, with an observed maximum of 83. Is this important? Buffer unavailable is a strange statistic. There are two main types of "Buffer error": 1) System buffer unavailable This refers to packets read. There is a buffer used by the etherenet drivers to store data before placing in in the user's buffers. If a system buffer is unavailable, the packet will be lost. It is the responsibility of the overlying protocol to detect this and to re-request the packet from the sender. DECnet will do this for you (most of the time!). If this counter has a high average value (high > 1/sec) then the ethernet interface is overloading the VAX with data. Use the NCP SET LINE line RECEIVE BUFFERS n command, where 12 < n < 32. There is not much penalty for increasing the line buffers (they come from non-paged pool). The default value of 6 that DECnet uses is totally useless on loaded networks, especially if you use PCSA or a lot of DECnet (extra especially if you have a lot of routing nodes) or LAT. The consequence if this value being large is that there will be a large number of re-requests for data by DECnet etc, which you can and should avoid. This can acually be seen by users of DECservers: when you type, the character echo seems to stutter or take a long time (this is over and beyond the LAT circuit timer values). 2) User buffer unavailable If the average of this counter is high ( > 1/sec), then try modifying your applications to use overlapped I/O. This counter reflects the number of times the system wanted to give your application a packet, but you had no available buffer for the data. Not generally as serious as system buffer unavailable. > On another node (heavily loaded 8600) the same statistic shows a very few of > these (average 0.11, max 2.25), but the "Internal Buffer Error" averages 0.99, > maximum 29.66. This sounds serious. See above. The maximum is not good. Increase the receive buffers. > > On two other nodes (heavily loaded 8600, not so busy 3600) there is no sign of > buffering problems (counts all zero). > > The configuration for Ethernet lines does not concern itself with any sort of > buffer, so is there any way in which the system manager can influence these > statistics? See above. Use NCP SHOW LINE line CHAR to see the line configuration. Check the the buffer size for any ethernet device is at least 1498 and that there are at least 12 receive buffers. Check SYSGEN MAXBUF to be at least 1700 (this is the minimum value for VMS V5.2 - use 2048 or higher). Note that the important metrics for counters are the AVERAGE and PEAK RATES, not necessarily the total. This is where a *reliable* monitoring tool wins over NCP SHOW LINE line COUNTERS. If anyone wants a copy of XE.C (which gives far more information than MONITOR), let me know. As usual, if there is sufficient interest I'll post it. XE is based on an original idea (in Pascal) by Kevin Carosso and Dan Newman in PAGESWAPPER, November 1984. Note that the interface used to the DEC Ethernet drivers is not NOT ON THE FICHE (as of VMS V5.0). DEC have produced "conditional" fiche that does not include undocumented features. Thanks, guys. However, its all on the V4.7 stuff. regards, nick NICK@NCDLAB.ULCC.AC.UK =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*= | Nick de Smith | Voice: +44 892 511000 | | Applied Telematics Group Ltd | PSImail: +234213300154::NICK | | Telematics House | Fax: +44 892 38556 (G3) | | Tunbridge Wells, Kent | Telex: 95398 TELEMA G (UK) | | TN1 1DJ, England | Internet: NICK@NCDLAB.ULCC.AC.UK | ! Janet: NICK@UK.AC.ULCC.NCDLAB | (NICK%NCDLAB.ULCC.AC.UK@UKACRL) | | "Though the wise man may make his home in a glass house, its the | | happy man who drops his anchor in the bay of discontent" | =*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=