From: Jerry Leichter [leichter@smarts.com]
Sent: Friday, July 16, 1999 7:30 PM
To: Info-VAX@Mvb.Saic.Com
Subject: Re: Linux in OpenVMS Galaxy (was Re: "Compaq Strategic
Directions"

| Consider the following totally mythical conversation:
| 
| Hardware guy: Hey, cool! We just figured out how to build a
| 128-processor shared memory system! We're going to ship on
| xx/yy/zz. Got some software?

Actually, this first occurred (except for the "we're going to ship"
part) in the mid-1980's.  The machine - an experimental thing called the
Andromeda; two were built - had up to (I think) 128 MicroVAX II nodes
and fully shared memory.  The team that built it did a special version
of VMS for it; when SMP code was added to VMS, they ported that to the
machine.  Since the VMS SMP code only supported 32 nodes, the
experimenters created an in-memory communications channel and
partitioned the machine into several members of a cluster.
 
| VMS kernel guy: Oh, #&(%. $@(*. %*%(. Do you know how many places
| there are in the kernel where we assumed back in the early 80s,
| when this thing was designed, that no SMP would *ever* get bigger
| than N? [N = 8, 16, 32, some number <<128.] It's built into the
| kernel's DNA, for &deity.'s sake! And the thing's NUMA, too --
| more changes! Argh! What do we do?

The choice is, indeed, wired in rather deeply - but that choice was not
made blindly.  There were real MP machines to try it out on - including
the Andromedas.  The limit of 32 - which I think, but an not certain,
turned into a limit of 64 in Alpha-VMS - was based on experimentation
that indicated it was unlikely that SMP configurations larger than 32
(or 64) nodes would be useful.  Further, the Andromeda experiments had
already demonstrated that in-memory clustering was a viable way to
extend the system.  (BTW, to inject a personal note, I was asked my
opinion of the 32-node limitation at the time the code was developed.
I thought it was a reasonable tradeoff, and I still think so.)

There's little evidence that indicates this decision was wrong.  Lock
contention and such is a real problem when the machine gets that large. 
One way or another, you end up having to partition your resources. 
Then, to keep the machine flexible, you have to find ways to move
resources from partition to partition.  Galaxy's one such solution.  You
can do the same thing in other ways, with other tradeoffs.  We'll see
which method works out the best.  (Then again, maybe a combination of
hardware and software techniques will make full SMP's with 128 or more
nodes practical.  It's interesting, though, that IBM, who with years of
research on very fast interconnects is probably in a better position to
build such a thing than almost anyone else, builds SP machines with a
separate system image per node.)
							-- Jerry