From:	CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 12-NOV-1989 14:44
To:	MRGATE::"ARISIA::EVERHART"
Subj:	Re: MONITOR ETHERNET (a warning)

Message-Id:  <8911121903.AA17274@crdgw1.ge.com>
Received: From NSFNET-RELAY.AC.UK by CRVAX.SRI.COM with TCP; Sat, 11 NOV 89 10:02:17 PDT
Received: from sun.nsfnet-relay.ac.uk by vax.NSFnet-Relay.AC.UK 
           via Janet with NIFTP  id aa04780; 11 Nov 89 13:27 GMT
Date:           Sat, 11 Nov 89  13:31 BST
From: Nick de Smith <"PSI%JANET.000000000040::NICK%ncdlab.ulcc.ac.uk"@NSFnet-Relay.AC.UK>
To: INFO-VAX <@NSFnet-Relay.AC.UK:INFO-VAX@crvax.sri.com>
Subject:        Re: MONITOR ETHERNET (a warning)

Hi,

"Rob Wright <munnari.oz.au!uniwa!vax6!crobw@net.uu.uunet>", writes:

> Thank you for the V5.x patches to Monitor. They reveal some interesting things.
> A little knowledge being a dangerous thing, I wonder whether some kind soul can
> tell me what some of the output means...

I am puzzled by the MONITOR ETHERNET class. The way it is implemented is a
decidely odd hack - at each scan, the ethernet device's port driver has its line
counters read with a modifier that also zeroes the counters. MONITOR, when it
next reads the counters, knows the interval and can thus compute the averages. I
am not sure how it does this, * but the values seem incorrect in places *. Also,
having the counters zeroed each time loses important information, such as the
cumulative error statistics.

I base this statement on having written my own "MONITOR ETHERNET" (supported, on
the DECUS tapes in [.ATG.XE], see below) which uses the same (undocumented)
counters. I could of course be wrong, but my method, which does not involve
zeroing the counters at each scan, gives substantially different statistics to
MONITOR ETHERNET but which, however, agree with pen, pencil and calculator
computations and a third party Ethernet (PC based) monitor. If there are any
users of my XE package out there (I know you are there somewhere!), please let
me know if you agree or not with this.

		/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

I would therefore be VERY WARY of MONITOR ETHERNET, however nice it may seem, IT
IS UNSUPPORTED and (I think) for a very good reason: it give wrong values.

		/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/

> On one node (heavily loaded 6230) the "Buffer Unavailable" statistic averages
> around 16 per second, with an observed maximum of 83. Is this important?

Buffer unavailable is a strange statistic. There are two main types of "Buffer
error":

1) System buffer unavailable

This refers to packets read. There is a buffer used by the etherenet drivers to
store data before placing in in the user's buffers. If a system buffer is
unavailable, the packet will be lost. It is the responsibility of the overlying 
protocol to detect this and to re-request the packet from the sender. DECnet
will do this for you (most of the time!).

If this counter has a high average value (high > 1/sec) then the ethernet
interface is overloading the VAX with data. Use the NCP SET LINE line RECEIVE
BUFFERS n command, where 12 < n < 32. There is not much penalty for increasing
the line buffers (they come from non-paged pool). The default value of 6 that
DECnet uses is totally useless on loaded networks, especially if you use PCSA or
a lot of DECnet (extra especially if you have a lot of routing nodes) or LAT.

The consequence if this value being large is that there will be a large number
of re-requests for data by DECnet etc, which you can and should avoid.

This can acually be seen by users of DECservers: when you type, the character
echo seems to stutter or take a long time (this is over and beyond the LAT
circuit timer values).

2) User buffer unavailable

If the average of this counter is high ( > 1/sec), then try modifying your
applications to use overlapped I/O. This counter reflects the number of times
the system wanted to give your application a packet, but you had no available
buffer for the data. Not generally as serious as system buffer unavailable.

> On another node (heavily loaded 8600) the same statistic shows a very few of
> these (average 0.11, max 2.25), but the "Internal Buffer Error" averages 0.99,
> maximum 29.66. This sounds serious.

See above. The maximum is not good. Increase the receive buffers.

> 
> On two other nodes (heavily loaded 8600, not so busy 3600) there is no sign of
> buffering problems (counts all zero).
> 
> The configuration for Ethernet lines does not concern itself with any sort of
> buffer, so is there any way in which the system manager can influence these
> statistics?

See above. Use NCP SHOW LINE line CHAR to see the line configuration. Check the
the buffer size for any ethernet device is at least 1498 and that there are at
least 12 receive buffers. Check SYSGEN MAXBUF to be at least 1700 (this is the
minimum value for VMS V5.2 - use 2048 or higher).

Note that the important metrics for counters are the AVERAGE and PEAK RATES, not
necessarily the total. This is where a *reliable* monitoring tool wins over NCP
SHOW LINE line COUNTERS.

If anyone wants a copy of XE.C (which gives far more information than MONITOR),
let me know. As usual, if there is sufficient interest I'll post it. XE is based
on an original idea (in Pascal) by Kevin Carosso and Dan Newman in PAGESWAPPER,
November 1984.

Note that the interface used to the DEC Ethernet drivers is not NOT ON THE FICHE
(as of VMS V5.0). DEC have produced "conditional" fiche that does not include
undocumented features. Thanks, guys. However, its all on the V4.7 stuff.

regards,

nick	NICK@NCDLAB.ULCC.AC.UK

=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
| Nick de Smith                  |  Voice:   +44 892 511000         |
| Applied Telematics Group Ltd   |  PSImail: +234213300154::NICK    |
| Telematics House               |  Fax:     +44 892 38556 (G3)     |
| Tunbridge Wells, Kent          |  Telex:   95398 TELEMA G (UK)    |
| TN1 1DJ, England               | Internet: NICK@NCDLAB.ULCC.AC.UK |
! Janet:  NICK@UK.AC.ULCC.NCDLAB | (NICK%NCDLAB.ULCC.AC.UK@UKACRL)  |
| "Though the wise man may make his home in a glass house, its the  |
|      happy man who drops his anchor in the bay of discontent"     |
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=