From: CSBVAX::CSBVAX::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 21-FEB-1989 00:17 To: MRGATE::"ARISIA::EVERHART" Subj: Thoughts On Multi-Processor Selection And Related Issues... Received: From KL.SRI.COM by CRVAX.SRI.COM with TCP; Mon, 20 FEB 89 20:53:16 PDT Received: from central.cis.upenn.edu by KL.SRI.COM with TCP; Mon, 20 Feb 89 19:17:23 PST Received: from LINC.CIS.UPENN.EDU by central.cis.upenn.edu id AA26332; Mon, 20 Feb 89 22:19:16 -0500 Received: from XRT.UPENN.EDU by linc.cis.upenn.edu id AA22651; Mon, 20 Feb 89 22:26:06 EST Posted-Date: Mon, 20 Feb 89 22:23 EDT Message-Id: <8902210326.AA22651@linc.cis.upenn.edu> Date: Mon, 20 Feb 89 22:23 EDT From: "Clayton, Paul D." Subject: Thoughts On Multi-Processor Selection And Related Issues... To: INFO-VAX@KL.SRI.COM X-Vms-To: @INFOVAX The recent bout of messages concerning the VUP rating of a VAXcluster, and also the rating of a multiprocessor VAX has touched a nerve. It is for that reason I offer the following for consideration, and if appropiate, comment. I have always held, since the beginning of when people started using the 'cumulative math function' in determining the available 'power', that a number of MIS shops were going to get into trouble. I have been proven correct to many times, and at a significant cost to the companies needing the power. During my tenure at TSO, my hardest job was to try and explain to upper management, there the ones who only think in terms of how much money you are costing them, why to go for the BIG boxes. My boss two levels up, and in the MIS department, always wanted to populate a VAXcluster with 8250's, or better yet from his view point 11/750's, because they were cheap. Going through the VMS internals, in as non-technical a fashion as possible, was of no use. He relied on the concept that a 'homogeneous' cluster means that anything can run anywhere and that if the processing is 'spread' out over the cluster then everything should be great. He ignored the cluster overhead, record sharing, and low VUP rating of the processors in the cluster. I succeeded in spite of this, installed 3 8810's in 1 year, by showing the savings in maintenance for both hardware and software. Helping to get my point across, several corrupted RMS ISAM files of 1.1+ million records (900K+ blocks per file) had to be rebuilt due to hardware failures. On a standalone 8810, and sole ownership of the file and scratch disks the elapsed time was in the 6 to 15 HOUR range. The high number was the first time it went bad, and it had been more then a year since the previous CONVERT to clean up the RMS internals. The low number was the second, and third time (sigh). Granted that the CPU time was less then this, anywhere from 30 to 60 percent, the point remains. The processor MUST turn the I/O requests around QUICK. You can not do this on a low VUP box. In order to set the stage, the following needs to be said. 1. Under VMS 4.x, the ASMP abilities were VERY restricted. No system service routines, no I/O initiation could be done on the secondary processor. Only IPL 0, user code. The primary processor did the system services, all the aspects of I/O and general VMS overhead chores. Most systems spent little, if any, time using the secondary processor. 2. Under VMS 5.x, SMP comes to town, and all the processors can now do I/O initiation, system services and the like. Very little remains solely for the primary processor. This is good news. Non IPL 0 code can go most anywhere. The only way to get a process off an auxilary processor is for its quantum to end, issue an I/O request or stop itself. Compute bound jobs have now found a home. 3. Under SMP, the abilities of parallel programming, and therfore processing, have been made available. The sad part, at least for me, is that it sounds great but there are several side issues. The first is that only the first steps of the boot process do not have a 'process context'. Ignoring this area, there are two other cases of 'process context'. The first is the 'full' context, as is the case for interactive users for example, or a 'partial' context, as in the case of I/O fork processes. Regardless, there is a context and it must therefore be under the control of the job schedular. The subtle implication here is that given a multi-processor system, such as a 6360, the job controller is determining what 'COM' state process to put on which processor. There is NOTHING that says to the schedular that for a process using the PPL$ (parallel processing library) functions that all the processing MUST be done 'IN PARALLEL' and to use as many processors as available. In fact, there is nothing stopping the case of all the 'parallel segments' running one behind the other on the same processor. This could be considered 'worst case' and would take longer then if the code did not use the PPL$ routine, which were used to 'save' time. In order to create the parallel threads, the function PPL$SPAWN is used. This 'spawns' processes, and all the problems that the DCL command SPAWN gives with quotas and the like apply here as well. I look at the PPL$ routines as an extension to the INSTALL/SHARE and event flag abilities that we have had for many versions of VMS. What this boils down to then, is that if the applications being run on the processor(s) have moderate I/O, such as word processing, and little compute then multi processing is for you on just about any size box. The 'Compute queue', as shown by SPM reports, would govern how many processors to put in a box. High compute queue, then more processors. If there is moderate to significant compute mode processes, then the decision has to made based on the 'acceptable' execution times. This is directly related to the VUP rating of EACH processor, and MUST be based on using a SINGLE processor only. Having a multi processor box only cuts down on the CPU queue, NOT CPU time. Granted I am ignoring the abilities of PPL$, but then very little is currently using them. The next question, that is of great significance, is as follows. 'Do we add more processors to the curent box, or get another box in the cluster'. I have always answered this with another question that usually dictates the answer. In the event that you have a single large box, ie 6360, and it goes down, can you live with up to 24 hours downtime? I say 24 hours here, but it could be more or less depending on how long your average time is to get a new board express mailed the next day. If you can not live with the down time, then its new box time. If you can, and will be able to do so in the plannable future, you have a choice. Cost would be a factor then. In the event of a new box, I always shoot for the new box to be able to support a 'significant' amount of the total work load. The definition of 'significant' is a local decision, based on how many users/departments can be locked out of the system during an extended outage. Thats my thoughts on the subject, hope they help provide some guidance on how to approach the subject of processor selection. pdc Still alive and kicking... ;-) Paul D. Clayton Address - CLAYTON%XRT@RELAY.UPENN.EDU Disclaimer: All thoughts and statements here are my own and NOT those of my employer, and are also not based on, or contain, restricted information.