Historical Notes on CISC and RISC

From OSDev.wiki
Revision as of 02:55, 29 March 2017 by osdev>Schol-r-lea
Jump to navigation Jump to search

(NB: I originally wrote this for some of the other students in an assembly language class I took in Fall 2006. While it is not about OS-dev per se, it may help clarify some of the puzzling aspects of assembly language for some members. - User:schol-r-lea)

It is somewhat ironic that many Assembly language instructors today have found it easier to teach assembly programming in a RISC architecture such as MIPS or ARM, rather than in the ubiquitous but far more complex x86 PC system: historically, it was the CISC (Complex Instruction Set Computer) designs which are generally associated with extensive assembly language programming, while the RISC (Reduced Instruction Set Computer) designs are explicitly intended to make compiling code from a high-level language easier, with the expectation that assembly language would be rarely used except for in the operating system.

To understand why and how this reversal came about, you need to know a bit of history. When stored-program computers based on the Von Neumann architecture were first being developed in the late 1940s and early 1950s, the size and complexity of the hardware (which was much more diverse than today, using such disparate technologies as mercury delay lines, CRT-phosphor memories, magnetic drums, wire and tape recorders, ferromagnetic cores, and the ever-present vacuum tubes) meant that the hardware capacities were extremely limited; it was not uncommon for even a large machine to have only one or two registers and a main memory of a few thousand words (which varied in size from machine to machine). The instruction sets were equally constrained, and often had special-purpose instructions for connecting to the peripherals, taking up the already small instruction space.

For example, the 18-bit Lincoln Labs TX-0 (one of the first transistor-based systems designed) had a grand total of four primary instructions (though one of these was a kind of escape code that told it to use the following word as an instruction from a different group of special-purpose operations) and two registers, X and Y. Many of these designs had what would today be considered poorly developed instruction sets, full of special cases, lacking useful instructions (see footnote), and including instructions that were of little use.

Generally speaking, these early designs had only a single special-purpose register, called the accumulator, on which it could perform most of the common operations such as addition or subtraction; the other operand, if any, usually was either in main memory, or in another special-purpose register called the source, and the result would remain in the accumulator (hence the name).

As the designers began to develop more effective architectural principles based on the success and failures of earlier efforts, hardware became smaller and less expensive, which in turn allowed for greater freedom in the CPU design, so the complexity of the systems grew; by 1966, the PDP-6, a typical design of the time, had sixteen 36-bit general-purpose registers, plus a stack register and frame register. The instruction sets became more complex as well; since a significant percentage of programming even into the 1970s was done in assembly language, it was thought that the instruction sets should be 'rich', that is, they should have a special instruction for nearly every common operation the programmer would want to use.

This culminated with the 32-bit VAX-11 series of minicomputers and mainframes, which had 256 primary instructions, including ones for polynomial evaluation, trigonometric functions, and CRC calculation. Many of these instructions had multiple addressing modes, so that they could operate on registers, on main memory, or some combination thereof; while this increased the convenience of assembly programming, it also meant that there were several special cases the programmer had to be aware of, and it complicated the CPU design substantially.

To provide these instructions, designers began to use 'microcode', which was a set of what could be called firmware-encoded macro instructions that would be handled by the CPU itself. The complex instructions would get broken down into simpler instructions internally, which the programmers wouldn't need to be aware of, and executed as if they were a fixed part of the hardware. Such instructions would often take several system clock cycles to run, and in most processors, there was no pipelining in the modern sense - the system would have to wait until the each instruction was finished before it began processing the next one.

About the time the VAX was being designed, four other things were happening that would change this attitude. First, high-level languages were becoming the primary method of programming, meaning that the baroque instruction sets of the then-current CPUs were becoming unnecessary.

Second, assembly language programmers were noting that in the majority of programs, whether in assembly or in a compiled language, only a handful of instructions were being used: few programs needed a CRC instruction, and hardly any programmers were familiar enough with the entire VAX instruction set to know that the instruction existed and how to use it. A side aspect of this is that the more complex instruction sets were increasingly difficult to learn and to teach.

Third, computer hardware design had advanced to the point where graduate courses in CPU design were being offered; such courses naturally enough stuck to very simple and regular designs, which they found actually outperformed comparable complex systems (though professionally-constructed CPUs often used techniques which were patented or trade secrets, skewing the results when comparing the student designs to the commercial ones). They especially noted that the operations that worked directly on memory were generally slower than if one instead loaded the values into registers, performed several operations on those registers, and then stored the results back into memory.

Finally, the advent of single chip microprocessors meant that it would soon be possible to mass-produce inexpensive computers for a fraction of the cost of minis and mainframes. Early on, these were limited by the capabilities of the hardware in much the same ways the first generation of CPUs were (and some of the design mistakes of the first generation were repeated as well), but the technology soon grew beyond anyone's expectations, and in some ways, beyond the abilities of the designers to predict what they would later need to support.

This last part had some major repercussions, especially regarding some of the design decisions as microprocessors became more powerful. Chief amongst these was the decision by Intel to try and extend the 8-bit 8080 design into a new 16-bit design, the 8086; they bent over backwards to make the CPU as familiar to older programmers as possible, with the result that the design ended up with some unnecessary complications and limitations, especially in how it addresses larger sections of memory. Also, because the designers had given the system only a small number of registers, most of the operations have several complicated addressing modes to avoid running out of registers.

The fact that this processor was selected for the IBM PC, which would soon become the dominant platform, meant that these weaknesses were of critical importance to millions of users. This was further exacerbated when they extended the design still further with the 80286, which now needed a separate 'protected mode' to access its full abilities while retaining 'normal mode' for backwards compatibility. Some of the design flaws were resolved in the next design, the 80386, but at the cost of exponentially increasing complexity both of the chip itself and of assembly programming for it.

Meanwhile, other chip designers were going in other directions. One, Motorola, started over with a classic 'big' design, the 68000, which resembled a scaled-down version of the VAX in many ways, with 16 general-purpose registers and a complicated instruction set. This would become the CPU for several successful workstations, as well as the original Apple Macintosh line. Later design extensions would complicate this, though not to the extent that the 80x86 design would be.

Still, it was growing clear that the complex instruction sets were growing counter-productive. Thus, many chip designers decided to go in the opposite direction: minimal instruction sets, no microcoding, load/store architectures with few if any operations working on memory directly, large register sets which could be used to avoid accessing main memory whenever possible, and an emphasis on supporting high-level languages rather than assembly programing. This new idea, which was called RISC (reduced instruction set computer), would be the basis several new CPU designs, including the MIPS, the SPARC, the ARM, the PowerPC, and the Alpha. Of these, all but the Alpha remain in use today for certain specialized areas of use, and the ARM in particular has become the de facto standard for mobile computing, though for the most part the domination by the Windows-x86 system has forced them out of the market for home and business systems.

1) In principle, only a single operation, 'subtract two memory values and branch if the result is negative' (or several variants on this) is sufficient for to allow a Random Access Machine to perform all Turing-computable calculations). There are even (simulated) machines which are designed on this principle, such as the OISC. In practice, of course, such a system would be both tedious and wasteful, especially for the more commonly used operations such as integer arithmetic.