From:	CSBVAX::CSBVAX::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 21-FEB-1989 00:17
To:	MRGATE::"ARISIA::EVERHART"
Subj:	Thoughts On Multi-Processor Selection And Related Issues...


Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Mon, 20 FEB 89 20:53:16 PDT
Received: from central.cis.upenn.edu by KL.SRI.COM with TCP; Mon, 20 Feb 89 19:17:23 PST
Received: from LINC.CIS.UPENN.EDU by central.cis.upenn.edu
	id AA26332; Mon, 20 Feb 89 22:19:16 -0500
Received: from XRT.UPENN.EDU by linc.cis.upenn.edu
	id AA22651; Mon, 20 Feb 89 22:26:06 EST
Posted-Date: Mon, 20 Feb 89 22:23 EDT
Message-Id: <8902210326.AA22651@linc.cis.upenn.edu>
Date: Mon, 20 Feb 89 22:23 EDT
From: "Clayton, Paul D." <CLAYTON@xrt.upenn.edu>
Subject: Thoughts On Multi-Processor Selection And Related Issues...
To: INFO-VAX@KL.SRI.COM
X-Vms-To: @INFOVAX


The recent bout of messages concerning the VUP rating of a VAXcluster, and 
also the rating of a multiprocessor VAX has touched a nerve. It is for that 
reason I offer the following for consideration, and if appropiate, comment.

I have always held, since the beginning of when people started using the 
'cumulative math function' in determining the available 'power', that a 
number of MIS shops were going to get into trouble. I have been proven 
correct to many times, and at a significant cost to the companies needing 
the power. During my tenure at TSO, my hardest job was to try and explain to 
upper management, there the ones who only think in terms of how much money 
you are costing them, why to go for the BIG boxes. My boss two levels up, 
and in the MIS department, always wanted to populate a VAXcluster with 8250's, 
or better yet from his view point 11/750's, because they were cheap. Going 
through the VMS internals, in as non-technical a fashion as possible, was of 
no use. He relied on the concept that a 'homogeneous' cluster means that 
anything can run anywhere and that if the processing is 'spread' out over the 
cluster then everything should be great. He ignored the cluster overhead, 
record sharing, and low VUP rating of the processors in the cluster.

I succeeded in spite of this, installed 3 8810's in 1 year, by showing the 
savings in maintenance for both hardware and software. Helping to get my 
point across, several corrupted RMS ISAM files of 1.1+ million records (900K+ 
blocks per file) had to be rebuilt due to hardware failures. On a standalone 
8810, and sole ownership of the file and scratch disks the elapsed time was in 
the 6 to 15 HOUR range. The high number was the first time it went bad, and it 
had been more then a year since the previous CONVERT to clean up the RMS
internals. The low number was the second, and third time (sigh). Granted that 
the CPU time was less then this, anywhere from 30 to 60 percent, the point 
remains. The processor MUST turn the I/O requests around QUICK. You can not do 
this on a low VUP box. 

In order to set the stage, the following needs to be said.

1. Under VMS 4.x, the ASMP abilities were VERY restricted. No system service 
routines, no I/O initiation could be done on the secondary processor. Only IPL 
0, user code. The primary processor did the system services, all the aspects 
of I/O and general VMS overhead chores. Most systems spent little, if any, time 
using the secondary processor.

2. Under VMS 5.x, SMP comes to town, and all the processors can now do I/O 
initiation, system services and the like. Very little remains solely for the 
primary processor. This is good news. Non IPL 0 code can go most anywhere. The 
only way to get a process off an auxilary processor is for its quantum to end, 
issue an I/O request or stop itself. Compute bound jobs have now found a home.

3. Under SMP, the abilities of parallel programming, and therfore processing, 
have been made available. The sad part, at least for me, is that it sounds 
great but there are several side issues. The first is that only the first 
steps of the boot process do not have a 'process context'. Ignoring this area, 
there are two other cases of 'process context'. The first is the  
'full' context, as is the case for interactive users for example, or a 
'partial' context, as in the case of I/O fork processes. Regardless, there is 
a context and it must therefore be under the control of the job schedular. The 
subtle implication here is that given a multi-processor system, such as a 
6360, the job controller is determining what 'COM' state process to put on 
which processor. There is NOTHING that says to the schedular that for a 
process using the PPL$ (parallel processing library) functions that all the 
processing MUST be done 'IN PARALLEL' and to use as many processors as 
available. In fact, there is nothing stopping the case of all the 'parallel 
segments' running one behind the other on the same processor. This could be 
considered 'worst case' and would take longer then if the code did not use the
PPL$ routine, which were used to 'save' time. In order to create the parallel 
threads, the function PPL$SPAWN is used. This 'spawns' processes, and all the 
problems that the DCL command SPAWN gives with quotas and the like apply here 
as well. I look at the PPL$ routines as an extension to the INSTALL/SHARE and
event flag abilities that we have had for many versions of VMS.

What this boils down to then, is that if the applications being run on the 
processor(s) have moderate I/O, such as word processing, and little compute 
then multi processing is for you on just about any size box. The 'Compute 
queue', as shown by SPM reports, would govern how many processors to put in a 
box. High compute queue, then more processors. If there is moderate to 
significant compute mode processes, then the decision has to made based on the 
'acceptable' execution times. This is directly related to the VUP rating of 
EACH processor, and MUST be based on using a SINGLE processor only. Having a 
multi processor box only cuts down on the CPU queue, NOT CPU time. Granted I am
ignoring the abilities of PPL$, but then very little is currently using them.

The next question, that is of great significance, is as follows.

'Do we add more processors to the curent box, or get another box in the 
cluster'.

I have always answered this with another question that usually dictates the 
answer. In the event that you have a single large box, ie 6360, and it goes 
down, can you live with up to 24 hours downtime? I say 24 hours here, but it 
could be more or less depending on how long your average time is to get a new 
board express mailed the next day. If you can not live with the down time, 
then its new box time. If you can, and will be able to do so in the plannable 
future, you have a choice. Cost would be a factor then.

In the event of a new box, I always shoot for the new box to be able to 
support a 'significant' amount of the total work load. The definition of 
'significant' is a local decision, based on how many users/departments can be 
locked out of the system during an extended outage.


Thats my thoughts on the subject, hope they help provide some guidance on how 
to approach the subject of processor selection.

pdc
Still alive and kicking... ;-)

Paul D. Clayton 
Address - CLAYTON%XRT@RELAY.UPENN.EDU

Disclaimer:  All thoughts and statements here are my own and NOT those of my 
employer, and are also not based on, or contain, restricted information.