From: peter@abbnm.com Sent: Thursday, January 10, 2002 12:23 AM To: Info-VAX@Mvb.Saic.Com Subject: Re: Compaq still tries to spin Alphacide both ways Let's try this again. I had an attack of the stupids in the last version of this article, and mixed up horizontal and vertical microcode in several places in it. Pardon for the confusion (and if I still have them wrong, feel free to send me nasty email). I also took the opportunity to expand a little near the end. In article , Felger Carbon wrote: > Well, it's been a while since the RISC-CISC wars were fought in this NG. > What I'm asking is, "Has there _ever_ been a true CISC x86 processor? They all are. CISC doesn't describe the implementation, it describes the instruction set. A RISC instruction set is very close to what used to be known as "vertical microcode", where each micro-op performed one simple operation very quickly. The alternative was "horizontal microcode", where each microcode word had many fields to operate on every part of the processor simultaneously. For small processors, and they were all small in the '80s by our standards, horizontal microcode was a significant win, since each microinstruction took about the same amount of time whether it was a vertical micro-op or a horizontal one. Once pipelining was developed, and even more with superscalar designs, vertical microcode became the thing to do, because the fewer side effects a micro-op had the more easily it could be run in parallel with other micro-ops. Part of the background for RISC was the realization that a vertical microcode micro-op wasn't that far removed from the individual operations of simpler microprocessors... and people didn't have a lot of trouble writing code for them, so why not just use them directly. So you had a lot of fiddling with the design, with different kinds of RISC designs being floated from the extremely raw MIPS designs (they didn't even have hardcoded interlocks on the pipeline so you had to put delays in after branches to let the pipeline refill: the branch didn't actually occur for a couple of instructions after the branch opcode) to the complexities of the Sparc. Things have tended towards the MIPS end of the spectrum, but they've had to add interlocks... what happens when the pipeline's longer in the second generation? Meanwhile, some people decided to try and see if horizontal microcode could also be used directly. This is where VLIW and EPIC and the IA64 come from. The problem is, writing horizontal microcode is tough. And it still makes superscalar implementations hard: Intel has switched to a vertical microcode (what they call their RISC core) in the x86 to get performance up. So why did they think they could win with an explicit horizontal design? Well, the idea was that you could get the compiler to handle all the scheduling decisions and dump a stream of pre-decoded horizontal micro-ops that could just be fed into the CPU. You wouldn't need a superscalar design, you'd just make the instruction word wide enough that all the potential parallelization was already handled. Sort of like feeding an athlete on pure predigested proteins and energy drinks instead of beef and beans. There's two main issues with this: 1. You need a heck of a compiler. 2. What happens when the second generation unit comes out? With horizontal microcode, there's no reason two versions of a CPU have to have the same fundamental units, and if you change the implementation you change the rules on what operations happen when, how long they take, even how many subunits there are, well, only the microcode programmers see it. You can't DO that with an instruction set: It's like the MIPS situation, except now you have to add interlocks all over the place, run code in an emulator, or have a hardware mode that does some kind of interpretation on the old instructions so they can feed the new hardware. So what do you do? You compromise. You abstract the underlying hardware somewhat and put a little bit of interpretation in. There's still timing issues, so old code may run slower than a dog with no legs, but at least the instruction set doesn't change every time you tweak the design. You probably want to recompile everything for every new CPU, but you don't *have* to. That makes upgrading to a new machine possible without crosscompiling. Always good. But you still need a hell of a compiler, and you've also thrown away most of the potential advantages of the VLIW model since you still have to have an instruction processor. And looking at the IA64 they've compromised an awful lot: the EPIC instruction now looks like a bunch of RISC instructions crammed asymmetrically into a single word. They think they can get enough of a win from compiler technology that it'll end up much faster than a traditional RISC design. But it's telling that people are seriously talking about the EV8 team putting an Alpha style RISC/Vertical Microcode engine under the hood of the great grandson of Itanium. > If so, why are current x86 processors not considered to be CISCs, Current x86 processors *are* CISCs. They use a microcode that's closer to the kinds of microcodes RISC instruction sets were derived from now, but that's an implementation detail. What goes on under the instruction set has only been part of the way you classify the instruction set in marketing literature. However, it still means that RISC design (that is, something like the internal microcode your Pentium IV's 'RISC CORE' runs) has proven itself the best design to build a high performance CPU. Everything between the Pentium IV's 'RISC CORE' and the compiler has two effects: 1. It adds hardware that has to be designed, taped out, tested, verified, and so on. It adds hardware that reduces yeild, increases costs, and takes resources away from hardware that actually makes the secret high performance instruction set run fast. 2. It makes it harder for the compiler to schedule instructions, because the code it generates doesn't actually get executed by the processor. About all it can do is pick instructions that run fast, and let the dynamic instruction scheduler do the work. In addition: 3. The 'RISC CORE' has to be better at scheduling micro-ops than regular RISC processors, which means that it's got to be more complex, and thus slower than one that can use a simpler scheduler. So to get equivalent performance you have to spend more resources on design and fabrication. > aside from > the fact that there are a lot of former Taliban - excuse me, former RISC > supporters - out there who don't like the fact that they wound up on the > wrong side, as their worthless stock options conclusively prove?" How do you figure it's the wrong side? > This is comp.arch. Isn't it true that a micro's instruction set > architecture, and not its implementation, defines the micro? Absolutely. The x86 is a CISC, and its performance is due to some truly brilliant implementations and amazing heroics, and the fact that Intel has been able to throw orders of magnitude more engineering into making it run fast than everyone else put together. It's telling that Digital/Compaq, using a fraction of Intel's resources, has been able to stay ahead of them... usually significantly so... in CPU perfomance for almost all the last decade. I would love to see what Intel could do with a good design in their pocket. If Intel had, say, adopted something like the Alpha in 1995 instead of striking out on their traditional path of trying to build a complex design that'll knock everyone's socks off (iApx432), at least when the compiler technology comes together (i860, IA64), we'd already be using them for all our high-performance systems. -- `-_-' In hoc signo hack, Peter da Silva. 'U` "A well-rounded geek should be able to geek about anything." -- nicolai@esperi.org Disclaimer: WWFD?