From: Rob Young [young_r@encompasserve.org]
Sent: Tuesday, September 24, 2002 5:38 PM
To: Info-VAX@Mvb.Saic.Com
Subject: Re: VMS future (oh not not another one of these)

In article <oC2k9.250930$z91.10889890@bin3.nnrp.aus1.giganews.com>, "Bill Todd" <billtodd@metrocast.net> writes:

> 
> What EV7 brings to *any* system is its on-chip memory controller and its
> 1.75 MB of on-chip cache.  What EV7 brings to *any MP* system is its on-chip
> routing.  Both reduce surrounding 'glue' chips (including the 8 - 16 MB of
> off-chip cache used with EV6 processors:  when you have EV7's 75 ns main
> memory latency and a large, fast on-chip cache, off-chip cache that's only
> about twice as fast as main memory ceases to be cost-effective) and hence
> offset the higher cost of the EV7 chip itself.
>

	Ah... who cares about cost, that's a benefit given the L3 is
	unnecessary.  For instance... taking the Power4 and googling
	for this we find:

<QUOTE>

The chart was log-log of size of cache/memory that can be
accessed at various latencies.  The latencies are in nanoseconds
and are accurate only to the nearest 10 nsecs from 10-100 nsecs.
and to the nearest 100 nsecs above that point.  Also some of my
eyeballing is involved here, introducing more room for error. 
There are also several definitions possible for 'latency' and it
wasn't specified on the slide.

Size of cache    Latency in   
 or memory        nsecs
16 KB*             1
1.4 MB             10              
5.6 MB             60
128 MB             80
512 MB             100
64  GB             300

* I know the L1 Dcache size is 32 KB per core, but the slide
indicated 16 KB

The other slides were referring to a 1.1 Ghz POWER4 so I assume
that is what was assumed for this chart.  You can do the
conversion from nsecs. to cycles then.  Clearly these times (at
least through the cache hierarchy would reduce if for a 1.3Ghz
clock rate.

My interpretation of the sizes:
The 1.4 MB is for on-chip L2.  The 5.6 MB is for accessing the 3
other caches on the same MCM (plus the local one).  The 128 MB is
for all L3's on the same MCM.  The 512 MB includes the L3's on
other MCMs in a 4 MCM system.  I assume the 64 GB of memory is
the maximum for an MCM.  I recollect from IBM public docs that
latency to the other MCM's memory is only about 10% higher than
the local memory latency.

<ENDQUOTE>

	So local MCM access to 128 MByte L3 is 80 ns.  Contrast with EV7,
	200 - 2 MByte pages open (1) (and larger page sizes than that are
	supported, maybe make it fun and use 512 MByte page sizes?) on EV7 
	you get 400 MByte of "L3" that is faster and larger than Power4's 128
	MByte L3 shared by cores.

	Many ways to slice and dice this, for instance with 5 EV7 CPUs you
	have an effective L2 of 5 * 1.5 MByte where the "remote" L2 is
	slower of course (15 ns router overhead, etc.) yet faster
	than 75 ns memory access.  But maybe more impressive would be the 
	2000 MByte "L3" that is 75 ns local and 100 ns remote access with
	an average access of 95 ns, 4 times larger than Power4 512 MByte
	L3 and just as fast.

	But the overarching point is... L3 makes no sense for EV7 and
	the benefit is cheaper (being relative term) system costs (certainly
	cost more if they stuffed a bunch of 128 MByte SRAM L3s in there ;-).

	By the way, when Power4 has to go to main memory, as you can see
	you have a flat 300 ns access.  Wonderful in many respects but
	*MUCH* better of course is NUMA where memory access is 75 ns local and
	283 ns *WORST* case ( for 51 CPUs, and average is much better than 
	300 ns)	(2).

				Rob

Men with walkie-talkie                  I'm home again to you babe
Men with flashlights waving             You know it makes me wonder
Up upon the tower                       Sittin' in the quiet slipstream
The clock reads daylight savin'         Rollin' in the thunder

                                -- Neil Young

(1) http://www.eecs.umich.edu/vlsi_seminar/f01/slides/bannon.pdf
    See page 12

(2) http://www.eecs.umich.edu/vlsi_seminar/f01/slides/bannon.pdf
    See page 30