From: CRDGW2::CRDGW2::MRGATE::"SMTP::CRVAX.SRI.COM::RELAY-INFO-VAX" 30-OCT-1990 15:34:32.95 To: MRGATE::"ARISIA::EVERHART" CC: Subj: Re: System disk performance on large clusters Received: by crdgw1.ge.com (5.57/GE 1.76) id AA17214; Tue, 30 Oct 90 15:14:52 EST Received: From UCBVAX.BERKELEY.EDU by CRVAX.SRI.COM with TCP; Tue, 30 OCT 90 10:45:45 PDT Received: by ucbvax.Berkeley.EDU (5.63/1.42) id AA09052; Tue, 30 Oct 90 10:36:57 -0800 Received: from USENET by ucbvax.Berkeley.EDU with netnews for info-vax@kl.sri.com (info-vax@kl.sri.com) (contact usenet@ucbvax.Berkeley.EDU if you have questions) Date: 30 Oct 90 15:43:57 GMT From: usc!wuarchive!rex!uflorida!mlb.semi.harris.com!rtpark.rtp.semi.harris.com!rlb@ucsd.edu Organization: Harris Semiconductor, Microelectronics Center Subject: Re: System disk performance on large clusters Message-Id: <1990Oct30.114357.216@rtpark.rtp.semi.harris.com> References: <5746@mwk.uucp> Sender: info-vax-request@kl.sri.com To: info-vax@kl.sri.com In article <5746@mwk.uucp>, gleason@mwk.uucp (Lee K. Gleason, Control-G Consultants) writes: > > I would like comments from other people managing large VMS CI clusters... > > I have a CI cluster consisting of... .. > We have, of course, already moved everything off of the system disk > that we can - SYSUAF,RIGHTSLIST, JBCSYSQUE, ACCOUNTNG, etc - it's down > to just images and libraries, almost 100% read accesses. In like wise, > the XQP caches are sized to take maximum advantage of caching. > > We invite comment from any other large configuration sites that have > experienced IO bandwidth problems to the system disk - what have your > experiences been? How did you attack the problem? What were your > results? -- You didn't mention if you had moved the operator log files, network event log files, accounting data files, audit log files, default decnet account to other disks. These are all candidates. We've done all of these here and I can point you to code we use to manage that. The bulk of it is available from acfcluster.nyu.edu as AUDIT_LOG_KIT via anonymous ftp or MAILSERV. We follow a procedure which creates a new accounting file each midnight. The files are named by node and date, which makes it easy to track down old accounting info. We are also running ARSAP, which requires some trickery for managing this -- it does work just fine though. If you're not running one of the accounting packages that does active collection by placing itself between the JOB_CONTROL process and the accounting data file, then you can do this with a DEFINE/SYSTEM ACCOUNTNG and then do a SET ACCOUNTING/NEW. If you're interested in the procedure we use to do this, I will be glad to make a copy available. You can also move the NET*.dat files from the system directory to somewhere else by defining the logical names NETCIRC, NETCONF, NETLINE, NETLOGING, NETNODE_LOCAL, NETNODE_REMOTE, and NETOBJECT. There is another option that we have not resorted to since we have been able to balance our I/O load short of doing this. This option has been used by some system managers at other sites. The basic idea is to create a collection of alternative locations for some of the most often accessed executables, help libraries, message files, and anything else that isn't just totally bolted down as having to be on the system disk. The basic approach is this: Create alternate directories on other, more moderately loaded disk(s). Copy the most heavily used images/files -- these should be files that do not change except when updating software -- to these other directories. Modify the search list(s) for SYS$SYSTEM, SYS$SHARE, SYS$LIBRARY, SYS$HELP, SYS$MESSAGE, SYS$MANAGER, etc... so that the new definition includes a reference to the "copy" directory before the original value. It is incredibly important in any large configuration of computers to document a comprehensive software update checklist/procedure that is available for hardcopy. During any software update a fresh copy is printed and each step is checked off before proceeding with the actual update. I've developed a checklist like this in my head, but have seldom written it down. Unfortunately, sometimes this has bitten me in the rear because I would forget 1 or 2 little things. If someone should decide they want to try out this idea, there are some things to watch when doing this: 1. Software updates must only be done with the original logicals in place. It is probably best that during any updates that the logicals be put back to their original values on all members of the cluster to avoid problems. I would suggest creating a command procedure which has 2 modes of operation. The 1st mode pre-pends the copy directory specs to the appropriate system logical names translation value, turning them into search lists. The 2nd mode removes the extra value from the front of the search lists, returning them to their original values. I have a command procedure which does these 2 operations for a general logical name search list which I'm appending to this posting. The comments explain how to use it. If you have any questions about it, please send me a message/give me a call. 2. After doing any software updates, fresh copies of all the relevant files should be placed in the copy directories. Because of this, it is best to manage this by creating a command procedure and a data file to manage what gets copied and where it goes to. 3. It might be a good idea to delete the copies before initiating any updates so that users on other nodes in the cluster will not accidentally get an old version of a file. 4. Document any and everything you do in your site procedures documentation so that in the event of catastrophe to you and/or your other system support staff, there will be a record of what you decided and how it works: which command procedures do what and when/how to use them. 5. If images are installed before these logical names are modified in the system startup, you should include code to REINSTALL the images so that the originals on the system disk are replaced by the copies as the KNOWN images. I suggest that the code for copying the files also include a section that does the REINSTALL as an optional function. That way you can include the necessary invocation in your startup procedures without having to add a lot of code there. NOTE: I doubt very seriously that DEC will condone this idea. It is important to be very careful when/if you try this. I would suggest that if you have a workstation or some other form of test system that you try this out where it can't hurt too badly if something goes a little haywire with the trial. Unless you like to fix things in a panic, I wouldn't try this with your users on the system. I would suggest trying this out 1 file at a time until you are confident it's working ok before doing a mass transfer of stuff. Here's the LNMINSREM.COM in VMS_SHARE format: $! ------------------ CUT HERE ----------------------- $ v='f$verify(f$trnlnm("SHARE_VERIFY"))' $! $! This archive created by VMS_SHARE Version 7.2-007 22-FEB-1990 $! On 30-OCT-1990 11:37:58.97 By user RLB $! $! This VMS_SHARE Written by: $! Andy Harper, Kings College London UK $! $! Acknowledgements to: $! James Gray - Original VMS_SHARE $! Michael Bednarek - Original Concept and implementation $! $! TO UNPACK THIS SHARE FILE, CONCATENATE ALL PARTS IN ORDER $! AND EXECUTE AS A COMMAND PROCEDURE ( @name ) $! $! THE FOLLOWING FILE(S) WILL BE CREATED AFTER UNPACKING: $! 1. LNMINSREM.COM;16 $! $set="set" $set symbol/scope=(nolocal,noglobal) $f=f$parse("SHARE_TEMP","SYS$SCRATCH:.TMP_"+f$getjpi("","PID")) $e="write sys$error ""%UNPACK"", " $w="write sys$output ""%UNPACK"", " $ if f$trnlnm("SHARE_LOG") then $ w = "!" $ ve=f$getsyi("version") $ if ve-f$extract(0,1,ve) .ges. "4.4" then $ goto START $ e "-E-OLDVER, Must run at least VMS 4.4" $ v=f$verify(v) $ exit 44 $UNPACK: SUBROUTINE ! P1=filename, P2=checksum $ if f$search(P1) .eqs. "" then $ goto file_absent $ e "-W-EXISTS, File ''P1' exists. Skipped." $ delete 'f'* $ exit $file_absent: $ if f$parse(P1) .nes. "" then $ goto dirok $ dn=f$parse(P1,,,"DIRECTORY") $ w "-I-CREDIR, Creating directory ''dn'." $ create/dir 'dn' $ if $status then $ goto dirok $ e "-E-CREDIRFAIL, Unable to create ''dn'. File skipped." $ delete 'f'* $ exit $dirok: $ w "-I-PROCESS, Processing file ''P1'." $ if .not. f$verify() then $ define/user sys$output nl: $ EDIT/TPU/NOSEC/NODIS/COM=SYS$INPUT 'f'/OUT='P1' PROCEDURE Unpacker ON_ERROR ENDON_ERROR;SET(FACILITY_NAME,"UNPACK");SET( SUCCESS,OFF);SET(INFORMATIONAL,OFF);f:=GET_INFO(COMMAND_LINE,"file_name");b:= CREATE_BUFFER(f,f);p:=SPAN(" ")@r&LINE_END;POSITION(BEGINNING_OF(b)); LOOP EXITIF SEARCH(p,FORWARD)=0;POSITION(r);ERASE(r);ENDLOOP;POSITION( BEGINNING_OF(b));g:=0;LOOP EXITIF MARK(NONE)=END_OF(b);x:=ERASE_CHARACTER(1); IF g=0 THEN IF x="X" THEN MOVE_VERTICAL(1);ENDIF;IF x="V" THEN APPEND_LINE; MOVE_HORIZONTAL(-CURRENT_OFFSET);MOVE_VERTICAL(1);ENDIF;IF x="+" THEN g:=1; ERASE_LINE;ENDIF;ELSE IF x="-" THEN IF INDEX(CURRENT_LINE,"+-+-+-+-+-+-+-+")= 1 THEN g:=0;ENDIF;ENDIF;ERASE_LINE;ENDIF;ENDLOOP;t:="0123456789ABCDEF"; POSITION(BEGINNING_OF(b));LOOP r:=SEARCH("`",FORWARD);EXITIF r=0;POSITION(r); ERASE(r);x1:=INDEX(t,ERASE_CHARACTER(1))-1;x2:=INDEX(t,ERASE_CHARACTER(1))-1; COPY_TEXT(ASCII(16*x1+x2));ENDLOOP;WRITE_FILE(b,GET_INFO(COMMAND_LINE, "output_file"));ENDPROCEDURE;Unpacker;QUIT; $ delete/nolog 'f'* $ CHECKSUM 'P1' $ IF CHECKSUM$CHECKSUM .eqs. P2 THEN $ EXIT $ e "-E-CHKSMFAIL, Checksum of ''P1' failed." $ ENDSUBROUTINE $START: $ create 'f' X$ vfl = f$VER(0.or.f$TRNLNM("debug$dcl")) X$! modify logical name list X$! p1 -- function INSERT or REMOVE X$! p2 -- equivalence name to insert/remove X$! p3 -- Logical name to affect ( no default ) X$! p4 -- table ( defaults to LNM$PROCESS ) X$! p5 -- if function = REMOVE logical value determines if the last X$! member of list may be removed or not. X$! if function = INSERT determines relative position for INSERT X$! p6 -- mode ( defaults to SUPERVISOR ) X$! p7 -- other qualifiers X$!----------------------------------------- X$! author: R.L. Boyd, September 1985 X$!----------------------------------------- X$! insert a new name at the head or remove from the middle X$ if p3.eqs."" then $ goto ERR_NONAME X$ if p1.eqs."" then $ p1 = "INSERT" X$ if p4.eqs."" then $ p4 = "LNM$PROCESS" X$ if p1.eqs."REMOVE" .and. f$TRNLNM(p3,p4).eqs."" then $ goto EXIT X$ if p6.eqs."" then $ p6 = "SUPERVISOR" X$ X$ mode = "/"+p6 X$! initialize X$ com = "," X$ num = "#" X$ null = ""`20 X$ val_list = null X$ in_list = 0 X$ val_eq = null X$ on error then $ goto EXIT X$! see if it exists and how long it is X$ exists = f$TRNLNM(p3,p4,,p6).nes."" X$ max_index = f$TRNLNM(p3,p4,,p6,,"max_index") X$ if .not.exists then $ goto COMMAND X$! let's get the existing list first X$ name_attributes == "" X$ if f$TRNLNM(p3,p4,,p6,,"no_alias") then - X$`09call ADD_LIST name_attributes no_alias X$ if f$TRNLNM(p3,p4,,p6,,"confine") then - X$`09call ADD_LIST name_attributes confine X$ if name_attributes.nes."" then - X$`09name_attributes = "/name_attribute=("+name_attributes+")" X$ cnt = 0 X$VAL_GET: X$ next_val = f$TRNLNM(p3,p4,cnt,p6) X$ add_val = next_val X$ translation_attributes == "" X$ if f$TRNLNM(p3,p4,cnt,p6,,"concealed") then - X$`09call ADD_LIST translation_attributes concealed X$ if f$TRNLNM(p3,p4,cnt,p6,,"terminal") then - X$`09call ADD_LIST translation_attributes terminal X$ if translation_attributes.nes."" then - X$`09add_val=add_val+"/translation_attribute=("+translation_attributes+")" X$ if cnt.gt.0 then $ add_val = num+add_val X$ if next_val.nes.null then $ val_list = val_list+add_val X$ if next_val.eqs.p2`20 X$ then`20 X$ val_eq = cnt X$ val_match = add_val-num X$ in_list = 1`20 X$ endif X$ cnt = cnt+1 X$ if cnt.le.max_index then $ goto VAL_GET X$COMMAND: X$ goto 'p1' X$REMOVE: X$ if .not.in_list then $ goto EXIT X$ if val_eq.lt.max_index then $ goto REMOVE_MIDDLE X$ if .not. p5 then $ goto EXIT X$ remove_string = val_match X$ if max_index.gt.0 then $ remove_string = num+remove_string X$ val_list = val_list - remove_string X$ goto REMOVE_DONE X$REMOVE_MIDDLE: X$ val_list = val_list -(val_match+num) X$REMOVE_DONE: X$ if val_list.eqs.null then $ deassign/table='p4''mode' 'p3' X$ gosub FINAL_DELIMITERS X$ if val_list.nes.null then - X$`09define/table='p4''mode''name_attributes' 'p3' 'val_list' X$ goto EXIT X$INSERT: X$ if p5.eqs.null then $ p5 = 0 X$ if p5.gt.max_index then $ p5 = max_index+1 X$ if in_list .and. val_eq.le.p5 then $ goto INSERT_DONE X$ if .not.in_list then $ goto INSERT_GO X$! it's in the list but too early -- we'll have to delete it first. X$ val_list = val_list - (p2+num) X$INSERT_GO: X$! now we have to figure out how to stick it in at the right place X$! 3 different ways -- front, end, middle X$! set up for default X$ pre_list = null X$ post_list = val_list X$ if p5.eq.0 then $ goto INSERT_DO ! see if it is front (default) X$ if p5.gt.max_index then $ goto INSERT_END`20 X$! yuk -- have to put it in the middle -- now we have to split the list X$ pre_cnt = 0 X$PRE_LOOP: X$ next_val = f$ELEMENT(pre_cnt,num,val_list) X$ pre_cnt = pre_cnt+1 X$ if pre_cnt.lt.p5 then $ next_val = next_val+num X$ pre_list = pre_list+next_val X$ if pre_cnt.lt.p5 then $ goto PRE_LOOP X$! we now have all the first p5 values in pre_list X$ post_list = val_list - (pre_list+num) ! lop off the front of it X$ goto INSERT_DO X$INSERT_END:`20 X$ pre_list = val_list X$ post_list = null X$INSERT_DO: X$ val_ins = p2+p7 X$ if pre_list.nes.null then $ val_ins = num+val_ins X$ if post_list.nes.null then $ val_ins = val_ins+num X$! now we stick the list back together X$ val_list = pre_list+val_ins+post_list X$ gosub FINAL_DELIMITERS X$ define/table='p4''mode''name_attributes' 'p3' 'val_list' X$INSERT_DONE: X$EXIT: X$ if vfl then $ show log 'p3'/table='p4' X$ exit ! 'f$VER(vfl)' X$ERR_NONAME: X$ ss$_argreq = "%X00038268" X$ write sys$error f$MESSAGE(ss$_argreq)," -- P3 (= logical name) " X$ exit X$! replace # signs with ,'s in the VAL_LIST X$FINAL_DELIMITERS: X$ if vfl.or.f$VER() then $ show symbol /loc/all X$ out_list == "" X$ pre_cnt = 0 X$FINAL_LOOP: X$ next_val = f$ELEMENT(pre_cnt,num,val_list) X$ if next_val.nes.num`20 X$ then X$ `09pre_cnt = pre_cnt+1 X$ call ADD_LIST out_list "''next_val'" X$`09goto FINAL_LOOP X$ endif`20 X$ val_list = out_list X$ if vfl.or.f$VER() then $ show symbol val_list X$ return X$! X$! ADD_LIST: add a value to a list contained in a global symbol X$! p1 global symbol name containing the list X$! p2 value to add to the list X$! p3 delimiter to use to separate values in the list, default "," X$ADD_LIST: subroutine X$ list = p1 X$ value = p2 X$ delimiter = p3 X$ if delimiter.eqs."" then $ delimiter = "," X$ if f$TYPE('list').eqs.""`20 X$ then`20 X$ `09'list' == value X$ else X$`09if 'list'.eqs.""`20 X$`09then`20 X$`09`09'list' == value X$`09else X$`09`09'list' == 'list'+delimiter+value X$`09endif X$ endif X$ exit X$endsubroutine ! add_list X$!Last Modified: 22-MAR-1990 13:04:18.66, By: RLB`20 $ CALL UNPACK LNMINSREM.COM;16 254021580 $ v=f$verify(v) $ EXIT Bob ----------------------------------------------------------------- Bob Boyd Voice: (919)549-3627 Harris Semiconductor Microelectronics Center E-Mail Address: rlb@rtpark.rtp.semi.harris.com Disclaimer? Datclaimer? Reclaimer?