From: Jerry Leichter [leichter@smarts.com] Sent: Friday, July 16, 1999 7:30 PM To: Info-VAX@Mvb.Saic.Com Subject: Re: Linux in OpenVMS Galaxy (was Re: "Compaq Strategic Directions" | Consider the following totally mythical conversation: | | Hardware guy: Hey, cool! We just figured out how to build a | 128-processor shared memory system! We're going to ship on | xx/yy/zz. Got some software? Actually, this first occurred (except for the "we're going to ship" part) in the mid-1980's. The machine - an experimental thing called the Andromeda; two were built - had up to (I think) 128 MicroVAX II nodes and fully shared memory. The team that built it did a special version of VMS for it; when SMP code was added to VMS, they ported that to the machine. Since the VMS SMP code only supported 32 nodes, the experimenters created an in-memory communications channel and partitioned the machine into several members of a cluster. | VMS kernel guy: Oh, #&(%. $@(*. %*%(. Do you know how many places | there are in the kernel where we assumed back in the early 80s, | when this thing was designed, that no SMP would *ever* get bigger | than N? [N = 8, 16, 32, some number <<128.] It's built into the | kernel's DNA, for &deity.'s sake! And the thing's NUMA, too -- | more changes! Argh! What do we do? The choice is, indeed, wired in rather deeply - but that choice was not made blindly. There were real MP machines to try it out on - including the Andromedas. The limit of 32 - which I think, but an not certain, turned into a limit of 64 in Alpha-VMS - was based on experimentation that indicated it was unlikely that SMP configurations larger than 32 (or 64) nodes would be useful. Further, the Andromeda experiments had already demonstrated that in-memory clustering was a viable way to extend the system. (BTW, to inject a personal note, I was asked my opinion of the 32-node limitation at the time the code was developed. I thought it was a reasonable tradeoff, and I still think so.) There's little evidence that indicates this decision was wrong. Lock contention and such is a real problem when the machine gets that large. One way or another, you end up having to partition your resources. Then, to keep the machine flexible, you have to find ways to move resources from partition to partition. Galaxy's one such solution. You can do the same thing in other ways, with other tradeoffs. We'll see which method works out the best. (Then again, maybe a combination of hardware and software techniques will make full SMP's with 128 or more nodes practical. It's interesting, though, that IBM, who with years of research on very fast interconnects is probably in a better position to build such a thing than almost anyone else, builds SP machines with a separate system image per node.) -- Jerry