From:	CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 30-JUN-1989 19:08
To:	MRGATE::"ARISIA::EVERHART"
Subj:	RE: Best TCP/IP for a VAX with VMS

Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Thu, 29 JUN 89 16:35:18 PDT
Received: from IU.AI.SRI.COM by KL.SRI.COM with TCP; Thu, 29 Jun 89 16:36:51 PDT
Date: Thu 29 Jun 89 15:32:13-PST
From: David L. Kashtan <KASHTAN@IU.AI.SRI.COM>
Subject: RE: Best TCP/IP for a VAX with VMS
To: randy@TWG.COM
Cc: info-vax@kl.sri.com
Message-Id: <615162733.760000.KASHTAN@IU.AI.SRI.COM>
Mail-System-Version: <VAX-MM(243)+TOPSLIB(135)+PONY(255)@IU.AI.SRI.COM>
Postal-Address: 15139 Old Ranch Rd; Los Gatos, CA 95030
Phone: (415) 859-5830 (SRI); (408) 353-1643 (Home)

>	Architectural choices usually end up being driven by a trade-off
>	between competing criteria by which the result is measured. Our 
>	decision  to implement per-connection spin-locks to support VMS
>	SMP systems was a result of our desire to produce a product that
>	performs well across the entire spectrum of VAX systems 
>	even if that meant we had to work harder at the 
>	software to get there. In order to make the decision we tested 
>	prototypes of each approach. 

  I think you missed the point I was trying to make.  Correct me if I
  am wrong, but I gather that you are NOT making an argument for fine
  granularity locking in order to get better concurrency in the network
  protocol code -- but in order to reduce the amount of "spinning" that
  other CPUs are doing.  If you ARE claiming more concurrency in the
  network protocol code, I will be more than happy to prove to you that
  you will get none (this is an artifact of the way the 4.3bsd protocol
  code is structured -- I would be happy elaborate on this furthur if
  you are interested).  If you are just concerned with the problem of
  other CPUs "spinning" there are much simpler means than fine-granularity
  locking that can be used to completely eliminate the "spinning" of other
  CPUs (in fact, we use one in MultiNet).  What I WAS saying was that your
  method (if done correctly) will have no more overhead than the one
  we used in MultiNet but will also have no performance advantage
  WHATSOEVER!  There is no performance disadvantage to your method
  (assuming you don't acquire/release locks too much) but it DOES have
  the disadvantage of being harder to implement/maintain and it certainly
  diverges from the 4.3bsd networking code.


>	The prototypes 	we tested which did not provide per connection locking
>	incurred a higher multi-processor synchronization overhead on systems 
>	with more than two processors. This effect was disastrous under load
>	on 6360 systems where a heavy telnet driven 
>	user load would busy out the primary processor while 
>	other processors spun on the lock(s) held by the 
>	primary. At the saturation point on the primary 
>	processor, interrupts were lost and the entire system 
>	hung waiting for synchronization.  

  Could you give us some figures on implementation prototype vs "spinning"?
  We have "spinlock" meters in our network code that we use when checking
  spin contention and can provide some hard numbers -- do you have something
  similar?

  I thought you guys went directly from trying to use processor affinity to
  "keep the code running on a single processor" to your lock per connection
  scheme.  My tests say that network spinning with course-grained locking
  is no worse than VMS scheduler spinning.  An interesting aside -- I did
  take a look through our code and found a 5 line change that will reduce
  that already very low network spinning by a factor of 2 to 10.

>	Conversely we saw very little adverse effect of per-connection
>	locking on systems with one or two processors and  much more
>	graceful performance under load with the 6360 as a result of the
>	granularity of the locking.  We thus concluded that this
>	approach provided a more stable implementation with good performance
>	over a wide range of system configuration and loads. 

  Once again -- your scheme is just fine but NO BETTER than properly implemented
  course-grained locking.  If you still don't believe me, I would be more than
  happy to have someone run both your implementation and ours and measure the
  spinning.

>	As for other details of our implementation environment, we too have
>	enjoyed the performance benefits of moving to newer compiler technology.
>	

  I am pleased to see that my work in porting GCC to VAX/VMS is being widely used!

David
-------