From: young_r@eisner.decus.org Sent: Thursday, July 15, 1999 1:33 AM To: Info-VAX@Mvb.Saic.Com Subject: Re: Whither VMS? In article <7mh6hh$b66$1@pyrite.mv.net>, "Bill Todd" writes: > Rob Young wrote in message >> It is true the Coupling Facility has been expanded to hold >> 16 Gigabytes of cache and it is a wonderfully tuned locking >> mechanism, superior to VMS DLM , yadda , yadda. Point is though >> MVS is trapped in a 31-bit (32, 33 depends on who you run across) >> world. The Alpha systems of the near future will contain several >> hundred processors and 1+ Terabyte of memory. MVS loses as it won't be >> able to address much past its hacked up 16 Gigabytes and while >> that CF is a wonderfully tuned caching/locking beast, it too >> is a bottleneck (a mainframe bottleneck, the CF is a mainframe). > >Leaving aside my suspicion that the IBM hardware types could find a way to >add more address bits to the architecture if they felt it necessary, and the >fact that alternatively they could address arbitrarily large amounts of >local memory similarly to solid-state disk to create as large a local cache >as you might wish for (and possibly map it dynamically into their more >limited physical address space if the difference in access time made this >worthwhile), the fact that the shared CF memory allows sharing of dirty data >without funneling it through the physical disk could make existing Sysplex >facilities faster than Alpha clusters with infinite memory for some >workloads: shared cache isn't useful just for reads. > Yes.. and in another thread without directly addressing this issue Hein writes: Before RMS global buffers go Galactic, I'd like to see them first go really big, outside of the working set, mapped with granularity hints (4MB superpages). That should be a super CPU win already. Next, I'd be inclined to think along the lines of what Oracle has called 'cache fusion'. That is, if a bucket is in some buffer on some system in the cluster, and needed somehwere else then do NOT 'ping' it through the disk, but use a better communication mechanisme to get it accross. That mechanisme might be a memory channel, galactic memory, a kernel assist process to process copy in a single system. Yeah, you might still want to write to the disk, but the reader should not have to wait for it to come back from the disk, loading up the IO bus / controller /adapter twice. ---- This isn't the first time I've heard about eliminating (or more accurately: "negating effects of") the ping to disk. >> As the VMS DLM moves into shared memory and is distributed >> (surely must be , my conjecture) across several nodes (see Galaxy >> Locks thread) the CF suddenly isn't so hot after all. > >If you look at the relative performance of CF vs. locks in Galaxy shared >memory (including the mechanisms required to keep one crashed system from >bringing down the entire lock database), they're likely about equal. But >the CF still has the advantage that it supports shared caching as well. > But isn't it a natural move to migrate the XFC (eXtended File Cache) into shared memory? >> So when you talk about "going up against".. (hee-hee-hee) the >> IBM mainframe folks we'll see Alpha systems with Galaxy in the >> next several years that will be truly monsterous in comparison to >> large Sysplexes. > >As I suggested above, maybe, and maybe not. Not to mention RS/6000 >clusters, which aren't likely to stand still in the interim (and, of course, >are already a full 64-bit architecture): they are not an unreasonable >alternative to VMS today for many (not all: Unix still doesn't treat >in-process asynchrony very well) applications, and the Monterey initiative >is going to make them significantly more attractive in terms of providing a >compatible application architecture from the x86 on up. Give them a >top-notch cluster file system and the (appropriately modified) Sysplex >version of DB2 (or just run Oracle, since they can already run Oracle >Parallel Server on [truly] shared disks the same way VMS can) and they may >well be somewhat superior to VMS for the majority of applications - they >already support distributed shared memory and a high-performance >interconnect which can doubtless be improved if seriously challenged by the >Galaxy shared-memory speed. > Yes... and of course at that point IBM must make a choice. As RS/6000 moves into and surpasses mainframe performance and what-not ... what to do what to do. >> > The underlying VMS cluster facilities are up to the task, but >> > the file system falls a bit short in areas of performance >> >> Think we've beat on that one a while before but worth mentioning >> again. If you have a Terabyte of memory isn't the filesystem mostly >> for writes? And if I have VCC_WRITEDELAY and VCC_WRITEBACK >> enabled, I'll race you, okay? :-) > >I'd be more than happy to race (metaphorically), as long as you let me pull >the power switch on both systems in the middle so we can see how their >robustness compares. What's that? You didn't write back your file system >meta-data and didn't have it logged? Too bad... When it comes to >performance and availability, I prefer to have my cake and eat it too - >especially if I can get better performance on a given amount of hardware >simply by bringing my software up to contemporary designs. > If go back 2 years and change and my memory serves me correctly (unwilling to do a Deja search, gotta get up at 5), a fellow mentioned to this group that much of the trickier aspects of Galaxy was "error-pathing". I believe that fellow reads these threads and may chime in. If I were to design the VCC_WRITE* stuff, I would take advantage of fault tolerant memory. Galactic slides from DFWLUG of May 98 point out that memory allocation (in a future Galaxy phase) will include "fault tolerant" among the choices. Let's run a scenario.. I've got VCC_ turned on.. you pull the plug. I've got redundant battery backup, I also have my VCC_ designed such that it uses fault tolerant memory allocation (typically 64 to 128 MByte is all that is needed, let's say).. as soon as my batteries kick in, the VCC_ master node flushes writes to recovery log(s) while attempting to post the writes. Ah, you say .. let's introduce a total power outage so you also lost your disk farm. Okay, you were to be sure that your recovery log(s) (which is a search list of files) were in locally attached disk(s) powered by the batteries? I'm not a designer... but I am sure they are thinking along these lines. >Then again, this isn't the first time that people have prophesied that >increasing amounts of memory would make file systems effectively >write-only - e.g., this was a large part of the rationale behind >log-structured file systems. Recent papers seem to be backing away from >this position, after having evaluated the LSFSs that are available for >inspection. > Point me to a paper where they are talking about a Terabyte of memory and hundreds of Gigabytes of shared cache, not distributed. Where do they describe the drawbacks of that, I am curious and am willing to read such a paper. I have followed Spiralog's ups and downs. I don't understand all that took place. I thought for sure Spiralog was the next great thing. Apparently, read caching was a pain or limiter (my fuzzy recollection). Sometimes much can be learned from failures or abandoned projects. Project Prism's cancellation was the ashes that Alpha rose from. *Apparently* the next wave of IO caching for VMS (VCC) is an outgrowth of Spiralog IIRC. So maybe my crutch is large memories. So? Big systems of the future (2-5 years) will have massive memory and hundreds of CPUs unless you ascribe to the "attack of the killer 8-CPU node cluster" school. Give me 2 systems and I can run the New York Stock Exchange off of them. Maybe not so far fetched. There is server consolidation and then there is SERVER CONSOLIDATION coming down the pike. >And, of course, there are people who would just as soon make do with a lot >less memory, as long as they could get equivalent performance out of it: in >the amounts you're postulating it's by far the dominant element of the >system hardware cost (either that, or you've got so much disk out there that >it's not going to give you much better cache performance than current >systems do). > Yes. And at 1 to 2 Terabytes of memory it hits a very sweet spot for Data Mining government style.. no doubt. >To digress a bit at the end here: > >Monterey looks like about the only potentially serious general-purpose >contender to MS in the OS arena these days, and may well gather popularity >simply because of that. Aside from being a low-end-to-high-end standard, >however, it's a very respectable system in its own right. If it is truly >based on AIX, then its file system is (currently) JFS: a good single-node >log-backed file system that exports that node's (fail-over-able) portion of >the file system to the rest of the cluster - certainly not ideal, and not >quite as desirable as Tru64's Cluster File System (if I've correctly >interpreted the few bits of data I've been able to come up with on Tru64 >CFS: it certainly looks like it runs AdvFS on locally-owned portions of the >file system, but instead of exporting them like a file server it may export >the meta-data information that allows other nodes to access the data >directly on shared disks - again not ideal, but somewhat more scalable than >JFS-style exporting), but nothing to be sneezed at. > >Given the emergence of a *real* Unix standard (which the Unix community has >been clamoring for for so long), why would customers be likely to choose >Tru64 - especially when Compaq itself will be selling Monterey x86 systems >at the low end, and, at least in the short term, x86 Linux systems which are >binary-compatible at the application level with the SCO UnixWare x86 >Monterey systems? Unless Tru64 jumps onto the Monterey band-wagon, can it >have a serious long-term presence - especially when, unlike HP and Sun >Unixes, one could question the significance of its current presence? > >So if I were Compaq, I'd be asking questions like "Just how much better than >Monterey does Tru64 have to be to be viable in the coming market?" and "Is >it possible that we'd sell more Alpha systems if they ran Monterey on them, >given that Alpha seems likely to enjoy a performance advantage over >competing hardware for at least the immediate future?" and "If we do sell >Monterey on Alpha, how much additional revenue will Tru64 bring in?" and "If >we had a foot in the Monterey camp, what unique software (all right, you >know I'm talking about file systems, but Compaq likely doesn't know that >this specific opportunity - and it's likely not the only one - exists) could >we add transparently that would make our Monterey systems better, but still >'standard'?" > >Customers want standardization and consolidation in the industry - >preferably while maintaining multiple sources (as Monterey will have), >though they put up with MS for lack of any real choice in the matter. Given >a platform that offers standardization and multiple sources, they will >choose it over a platform lacking such perceived advantages unless the >competing platform is significantly superior in price and/or performance >and/or necessary features. > >That tends to suggests industry consolidation into Windows on the desktop >(and perhaps higher), Monterey in the server space, and, if Monterey can't >take over the high end, S/390 and VMS (ignoring true niche markets and those >customers who will continue using whatever they're using today because >changes are too painful: they produce revenue, but of the diminishing >variety). > Very nice read. And perhaps why near term (4 to 5 years) NT will need an ultimate high-end back-end. And if SQL can truly warp through a "Galactic worm-hole" then a Galaxy is that high-end. I think Unix has lost its momentum. NT is rocketing and unless the government pulls the plug NT will dominate the high-end 5 years and out. Marketing, market-share , development, etc. 8-CPU boxes today, more nodes in the cluster for NT. Desktop to DataCenter one OS. Unix lost the desktop (never had it), is losing midrange penetration hand over fist (file and print) the DataCenter is only a matter of time. > >Being pushed by improving Monterey facilities on one side and the IBM >behemoth on the other, can VMS really afford to pass up opportunities for >significant improvements? If VMS does pass them up, can Compaq convince >customers that it's really backing VMS for the long term? > Absolutely. There are still many things VMS does that others hope to approach. Niches but very important big revenue niches. Cell phones and automated toll takers (EzPass). As Kerry Main points out time and again, eCommerce is suddenly very seriously in need of non-stop solutions... VMS is headed there. One initiative mentioned at L.A Decus in October was drastically limiting reboots for patches. Simplistic, but a big plus for non-stop action. Do you think EBay engineers are very happy out there? How many more outages before they ink a contract with someone else to assist in a transition to a much more stable "solution" ala ETrade? 2, 5, 10, outages? Re-reading this: >behemoth on the other, can VMS really afford to pass up opportunities for >significant improvements? If VMS does pass them up, can Compaq convince >customers that it's really backing VMS for the long term? What do you mean? Can you cite a specific example? What opportunities has VMS passed up at an engineering level that would make VMS stronger? Rob