Article 163050 of comp.os.vms: moroney@world.std.com (Michael Moroney) writes: >In article , greg@indiana.edu (Gregory Travis) wrote: >> campbejr@phu989.mms.sbphrd.com (John R. Campbell) writes: >> >4> Instructions were 15/30/60 bit so you could have up to 4 instructions >> per word. >The 60 bit instructions (character manipulation) were added as an >afterthought. Before these, the machine was RISC long before RISC became >"popular", and there were only 78 instructions. Opcodes were the first 6 >bits except for two "opcodes" which used the next 3 bits to determine which >of 8 instructions it was. You're talking about the "CMU" (Compare and Move Unit) which implemented VAX-like (close enough) character manipulation instructions. They allowed you (in assembly) to work on byte (12 bit (6 bit?)) sized data instead of having to grab whole words. I don't remember exactly but I don't believe you could put a CMU in a 6600. I know it certainly was an option on our later/slower/smaller Cyber 172 though. >Minor nit: CIO (combined input/output) was only one of several PPU programs >that could be requested. CIO was the most common PPU program run since it >did all disk (and other?) I/O. Other PPU programs were somewhat similar >to VMS system services. Which PPU program was requested was determined by >the 3 letter code at the high 18 bits of word at address 1, the lower bits >specified parameters or addresses of various sorts. Absolutely correct - I oversimplified. As Mike points out, there was one PPU (PPU 0) which always ran just one program. The other PPUs were available for running any PPU program. The program in PPU 0 is closest to what we would today call the "Kernel" Its duties, if I remember correctly, were roughly: 1. Make sure the CPU had not halted (because a program had executed the "PS" (program stop) instruction. 2. Timeslice among the running processes (i.e. interrupt the CPU) 3. Scan each program's address 1 (as Mike pointed out) to see if the program was requesting a system service (CIO/etc.) 4. Schedule PPU programs on the other PPUs and monitor their health In addition to the program in PPU 0, there was a small system "kernel" in CPU memory as well. PPU 0 would schedule this code to perform certain operations that the CPU could do faster than a PPU (such as memory copies) and other stuff. The PPUs could stop/start the CPU because they could write the CPU's program counter. I cannot, for the life of me, remember the name of the program which ran in PPU 0. The number of PPUs that a system had varied - you could order more if you wanted up to a maximum of 20 (not quite sure on the number). Each PPU consisted of only 4K of 12-bit bytes. Many of the larger PPU programs, towards the end of the system's development, were not fitting into 4K. The solution to this was PPU overlays. Any PPU could attach to any "channel" A channel was a datapath to a device. When PPUs were idle they looped on a certain channel waiting for data. The other end of this channel would get connected to PPU 0 when it wanted to load a PPU program into a given PPU Deadstart (IPL to you IBM boys or "reboot" to UNIX people) on the 6600 was something people talked about in low wispers. On the 6600 there was a bay which could be opened up. Inside the bay was a matrix of 12 x 12 toggle switches and a pushbutton labelled "deadstart". The switches represented 12 words of 12-bit PPU instructions. When the deadstart switch was pressed, the contents of the switch matrix were dumped, via channel 0, into PPU 0's memory starting at location 0. The instruction matrix represented a simple bootstrap program which loaded a larger bootstrap from a disk or tape, beginning the system startup procedure. >For some reason unconditional branches were always compiled as "conditional" >branches where register B0 was compared to register B0 and if the same >the branch was taken, despite the existance of an unconditional Jump >instruction. B0 was always equal to B0, of course. (B0 was also always 0, >this was done so fewer instructions were needed) I *believe* this was done >since the unconditional Jump instruction wiped out this cache, or was slower >somehow. Yes, as Mike pointed out there was the JP , Address instruction which jumped to Address + Contents of the B-register. The most common form was probably just: JP Address Which Compass generated as: JP B0, Address. Since B0 was hardwired to the value 0, the was an unconditional jump to
. However, there were also comparison jumps which could be done on B registers such as: EQ B4, B5, Address Which jumped to
if B4 and B5 were equal. This meant that: EQ B0, B0, Address Was functionally equivalent to "JP Address." However, "EQ" was actually a little faster than "JP" (by 100ns) so everyone, including the compilers, always coded an unconditional jump as: EQ
>> Since 6600 had no JUMP TO
instruction, I constructed >The JP instruction didn't specify a B register as one of its inputs? Been >so long now. You are right, I just looked it up in "Assembly Language Programming." Now I'm wondering what rational I had, if any, for constructing the jump instruction on the fly as I did; it was eighteen years ago :-(. >Another interesting feature of this machine was no stack. Also subroutines >were implemented such that the HARDWARE performed self-modifying code! >(The first word of a subroutine was left blank and the subroutine call >instruction wrote a Branch to the instruction after the subroutine call into >this cell! The subroutine "returned" by branching to this cell!) Yes, this was the only real weak point of the machine. The fact that the hardware stored the return address IN the subroutine made it damn difficult to implement recursive or reentrant subroutines. Some more "stuff" about the 6600. As most people know, the machine's memory had no hardware error detection at all. Cray's famous quote at the time was "Parity is for farmers." When parity was added to the later 7600 series Cray was asked what made him change his mind. His response: "Farmers buy a lot of computers." Anyway, it was still important to diagnose and locate failing memory. There was a system CPU program (whose name I forget) whos whole purpose in life was to write patterns into memory and then read them back, flagging an error if it didn't get what it expected. This program ran at a low priority level and would write and read a pattern and then ask the system to "roll it out" (what we would call a swapout today). When it got rolled back in, it was almost always rolled back in at a different physical address. In that way it eventually got around to probing all of memory. There was a separate program, much like the memory checker, which did the following all day. He would RANDOMLY generate a set of instructions and then INTERPRET those instructions and note the result. Then he would execute the same set of instructions on the hardware. If the results that the hardware came up with were different from what the interpreter produced then the program would flag a failing CPU. I can't remember what a divide-by-zero did on the CPU hardware. I don't recall there being any mechanism (until the later XJ ins As for the speed of the machines, here are some numbers to chew on: It became apparent that the 18-bit address size of the 6600 (and its decendent the big 7600) was a constraint in big systems. So CDC came out with ECS - extended core storage. ECS was simply bulk RAM to which the contents of CPU memory could be quickly stored or retrieved. Instructions could not be executed directly out of ECS. The 6600 could transfer information to and from the ECS at 10 million words per second, or roughly 75MB/s. This was late 1960s technology. The 7600 could transfer information to and from the ECS at 36 million words per second, or roughly 270MB/s. This was early 1970s tech! Instruction cycle times on the 6600 were expressed in either minor or major cycles. A minor cycle was 100 nanoseconds, a major cycle a microsecond. The 6600 could add two 60-bit REALs in four minor cyles (400ns) - thus it could do 2.5 million 60-bit REAL adds per second. A multiply took quite a while longer - a whole major cycle and a divide took nearly three major cycles. Most 6600 instructions typically took either 3 or 4 major cycles. Branches took a little longer (branches were very expensive on the 6600). Thus the 6600 was nominally about 3 million instructions per second (I earlier wrote 1 mips). However, the instruction unit was capable of fetching an instruction from memory every minor second (100ns), giving a MAXMUM speed of 10 million instructions per second. Not bad for 1964. It was possible to get much higher speeds than the cycle time of any one instruction because of the multiple function units. Instructions were issued to the functional units based on their type and different functional units could run in parallel if they did not conflict in their register usage. Parallelism occurred more often than not. Here is the breakdown of the 6600/7600 functional units: Branch Boolean Shift Long (60 bit) add Floating add Divide Multiply (there were TWO of these, both identical) Increment (again, TWO increment units) Thus: SB4 B3+40 (Set B4 to the contents of B3 + 40) SB4 B4+20 (Increment B4 by 20) Took a total of 600ns (because of the conflict with B4 and each increment is a 300ns instruction) but: SB1 B2+10 SB3 B4+5 Took only 300ns because there was no register conflict and each instruction could be issued to one of the two increment units in parallel. Likewise, since there were two floating-point multply units and since a 60-bit FP multiply nominally took a major cycle (1000ns), if you coded right you could get two of them going in parallel giving an effective 60-bit FP time of 500ns (2 million per second). Also, remember that loading a value into an "A" register caused the value of the corresponding memory location to be loaded into the associated X register: SA4 10 (load the contents of location 10 into X4) This instruction took 300ns to execute plus an additional 500ns to get the word from memory. Thus it was bad practice to load a value and then IMMEDIATELY use the value. It was much better to anticipate the load a few instructions before using it: SA4 10 IX4 X4+X4 (Add X4 to itself) This ties up the machine since IX4 cant start until the load finishes. Whereas: SA4 10 IX3 X2+X5 SB3 10 IX4 X4+X4 Pretty much filled up the functional unit pipeline The fortran compilers were QUITE adept at scheduling this kind of thing as were the various math libraries. The 7600 was quite a bit faster than the 6600 - it could do a floating point multiply in 137.5ns (see memory timing below) - or over 7 million 60-bit floating multiplies per second. It could do a 60-bit integer add in 55ns (18 million per second) vs 300ns for the 6600. I was incorrect when I wrote that the 6600 had four banks of memory. It had thirty-two banks of core which gave the 6600 the ability to do 10 million word reads or writes per second. The 7600 was faster with an access time of 137.5 nanoseconds and a full cycle time of 275 nano seconds. Again, these are for 60-bit words. >instruction execution) even if much seemed kludgy. Mr. Cray (RIP) was a >genius. I can't think of anything in the 6600 architecture that I could remotely consider Kludgy, even the subroutine calling convention. The instruction set was remarkably rational. The CMU instructions and the CMU's use were awkward and Kludgy, but that was a Control Data addon and didn't come from Seymour. Well, the PPU instruction set was a little wierd. Seymour Cray is the only person in the computer industry who I considered absolutely infallible. The 6600 was a tour-de-force and its existence makes machines like the S/360 family simply incomprehensible. I can't believe I wrote this much, greg -- greg greg@indiana.edu http://gtravis.ucs.indiana.edu/