From: Rob Young [young_r@encompasserve.org] Sent: Tuesday, September 24, 2002 5:38 PM To: Info-VAX@Mvb.Saic.Com Subject: Re: VMS future (oh not not another one of these) In article , "Bill Todd" writes: > > What EV7 brings to *any* system is its on-chip memory controller and its > 1.75 MB of on-chip cache. What EV7 brings to *any MP* system is its on-chip > routing. Both reduce surrounding 'glue' chips (including the 8 - 16 MB of > off-chip cache used with EV6 processors: when you have EV7's 75 ns main > memory latency and a large, fast on-chip cache, off-chip cache that's only > about twice as fast as main memory ceases to be cost-effective) and hence > offset the higher cost of the EV7 chip itself. > Ah... who cares about cost, that's a benefit given the L3 is unnecessary. For instance... taking the Power4 and googling for this we find: The chart was log-log of size of cache/memory that can be accessed at various latencies. The latencies are in nanoseconds and are accurate only to the nearest 10 nsecs from 10-100 nsecs. and to the nearest 100 nsecs above that point. Also some of my eyeballing is involved here, introducing more room for error. There are also several definitions possible for 'latency' and it wasn't specified on the slide. Size of cache Latency in or memory nsecs 16 KB* 1 1.4 MB 10 5.6 MB 60 128 MB 80 512 MB 100 64 GB 300 * I know the L1 Dcache size is 32 KB per core, but the slide indicated 16 KB The other slides were referring to a 1.1 Ghz POWER4 so I assume that is what was assumed for this chart. You can do the conversion from nsecs. to cycles then. Clearly these times (at least through the cache hierarchy would reduce if for a 1.3Ghz clock rate. My interpretation of the sizes: The 1.4 MB is for on-chip L2. The 5.6 MB is for accessing the 3 other caches on the same MCM (plus the local one). The 128 MB is for all L3's on the same MCM. The 512 MB includes the L3's on other MCMs in a 4 MCM system. I assume the 64 GB of memory is the maximum for an MCM. I recollect from IBM public docs that latency to the other MCM's memory is only about 10% higher than the local memory latency. So local MCM access to 128 MByte L3 is 80 ns. Contrast with EV7, 200 - 2 MByte pages open (1) (and larger page sizes than that are supported, maybe make it fun and use 512 MByte page sizes?) on EV7 you get 400 MByte of "L3" that is faster and larger than Power4's 128 MByte L3 shared by cores. Many ways to slice and dice this, for instance with 5 EV7 CPUs you have an effective L2 of 5 * 1.5 MByte where the "remote" L2 is slower of course (15 ns router overhead, etc.) yet faster than 75 ns memory access. But maybe more impressive would be the 2000 MByte "L3" that is 75 ns local and 100 ns remote access with an average access of 95 ns, 4 times larger than Power4 512 MByte L3 and just as fast. But the overarching point is... L3 makes no sense for EV7 and the benefit is cheaper (being relative term) system costs (certainly cost more if they stuffed a bunch of 128 MByte SRAM L3s in there ;-). By the way, when Power4 has to go to main memory, as you can see you have a flat 300 ns access. Wonderful in many respects but *MUCH* better of course is NUMA where memory access is 75 ns local and 283 ns *WORST* case ( for 51 CPUs, and average is much better than 300 ns) (2). Rob Men with walkie-talkie I'm home again to you babe Men with flashlights waving You know it makes me wonder Up upon the tower Sittin' in the quiet slipstream The clock reads daylight savin' Rollin' in the thunder -- Neil Young (1) http://www.eecs.umich.edu/vlsi_seminar/f01/slides/bannon.pdf See page 12 (2) http://www.eecs.umich.edu/vlsi_seminar/f01/slides/bannon.pdf See page 30