IA32 Architecture Family: Difference between revisions

Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content deleted Content added
Solar (talk | contribs)
→‎Advanced Micro Device Intel-compatible Processors: Removed "advertising" and "CPU history lore", see discussion.
Line 264: Line 264:


==Advanced Micro Device Intel-compatible Processors==
==Advanced Micro Device Intel-compatible Processors==
The biggest competitor to Intel at this time (2004 August). They came into being slightly after Cyrix with a 5k86 (being a 486 compatible similar to the 5x86, don't confuse them) and then followed it up by a K6 processor. This one was faster than the Pentiums, and more popular than the Cyrix ones because they both didn't rate it (afaik), and they didn't overheat (as was claimed, untrue, for the Cyrixes).
AMD came into being slightly after Cyrix with a 5k86 (being a 486 compatible similar to the 5x86, don't confuse them), followed up by the K6 processor.


The CPUID identifier string is "AuthenticAMD"
The CPUID identifier string is "AuthenticAMD"


It is important to note that the "SSE" used by AMD and the "SSE" used by Intel are actually not compatible, not fully at least (somebody verify this?). This causes lots of confusion.
It is important to note that the "SSE" used by AMD and the "SSE" used by Intel are actually not compatible, not fully at least. This causes lots of confusion. (Somebody verify this claim?)


{| {{wikitable}}
{| {{wikitable}}
Line 296: Line 296:
| {{No}}
| {{No}}
| {{No}}
| {{No}}
| AMD's 486 clone, 2x the cache size of most of Intel's 486 chips.
| AMD's 486 clone turned out a success for being on-par with Intel's processors, while being a lot cheaper. Eventually, the higher-clocked Am5x86 was the fastest processor to fit in on a 486-based motherboard. Together with its 4x multiplier and 2x the cache size of most Intel's 486 chips, it was able to outperform the low end of Pentium chips.
|-
|-
! K5
! K5
Line 308: Line 308:
| {{No}}
| {{No}}
| {{No}}
| {{No}}
| AMD's first try at a Pentium-compatible CPU. Not a huge success, mostly due to low clock speeds.
| AMD's first try at a Pentium-compatible CPU.
|-
|-
! K6
! K6
Line 320: Line 320:
| {{No}}
| {{No}}
| {{No}}
| {{No}}
| Actually designed by NexGen (who AMD took over), the K6 is a fully Pentium-compatible CPU. It outperformed the Pentium in several areas. One notable instruction was the LOOPcc instruction, which executed in 2 cycles compared to a Pentium's 18, causing problems.
| Actually designed by NexGen (taken over by AMD), the K6 is a fully Pentium-compatible CPU. One notable instruction was the LOOPcc instruction, which executed in 2 cycles compared to a Pentium's 18, causing timing problems.
|-
|-
! K6-2
! K6-2
Line 332: Line 332:
| {{No}}
| {{No}}
| {{No}}
| {{No}}
| AMD had a lesson learned there, don't make your processor too fast in some instructions. Not that they were put off by that, they just added 16 wait states to the execution of the LOOPcc and thus caused it to slow to the speed of a Pentium. AMD didn't just do this however. They added a special case (speculation, might be coincidence) for the DEC (E)CX; Jcc combination, which is semantically equivalent with the LOOPcc instruction, but this semantic equivalence and the loop being faster on Intels caused the loop instruction to always be used. Nobody used the DEC/Jcc combo. They kept the original speed for this combo and specified in their optimization manuals that this was the preferred method over the loopcc instruction.
| AMD added 16 wait states to the execution of the LOOPcc and thus caused it to slow to the speed of a Pentium. They added a special case (speculation, might be coincidence) for the DEC (E)CX; Jcc combination, which is semantically equivalent with the LOOPcc instruction; since LOOPcc was faster on Intels, nobody used the DEC/Jcc combo there. So AMD kept the original speed for this combo, and specified in their optimization manuals that this was the preferred method over the loopcc instruction.


It also featured a new technology, the 3DNOW! technology, which was MMX using floating point numbers, and multiplexed (again) on the floating point registers. The K6-2 was quite popular, and scaled higher than the P1 ever did. It was largely compatible with the P2, but (afaik) not completely.
The K6-2 also featured the 3DNOW! technology, which was "MMX using floating point numbers", and multiplexed (again) on the floating point registers. It was largely compatible with the P2, but (afaik) not completely.
|-
|-
! K6-3
! K6-3
Line 346: Line 346:
| {{No}}
| {{No}}
| {{No}}
| {{No}}
| They started this design off with the concept of not making it underpowered in any place, and to make it at least P2 compatible. It was fully P2 compatible.
| This design was fully P2 compatible.


The K6-3 suffered from a bottleneck at the instruction decode unit (which converts the x86 instructions to native instructions). While it did have 3 execution units of each type (ALU / MMX / loadstore), they were not used much at all since the instruction decode unit could not keep up.
The K6-3 was not too popular, mainly because the K6-2 did very well and people didn't see why they should buy a more expensive K6-3 for the same amount of megahertz. This of course was a joke, same as it is to call a 2 GHz opteron slower than a 2.2 GHz celeron.

A little known fact about the K6-3 is that it is in fact an Athlon, minus a few instructions, and minus one very important piece. The K6-3 suffered from a bottleneck at the instruction decode unit (which converts the X86 instructions to native instructions). It could only handle 2 in a cycle, which it made during about 20-30% of the cycles for average software. For optimized software you could bring it to 100% easily, and still want another channel. This wasn't too weird, because it did have 3 execution units of each type (ALU / MMX / loadstore) which were not used much at all. Note that these units are units executing the native instructions, so making 3 of each is not a stupid idea. They needed a new front end, and of course a new copy of instructions from Intel.
|-
|-
! Athlon
! Athlon
Line 374: Line 372:
| {{No}}
| {{No}}
| {{No}}
| {{No}}
| Athlon XP (..starting with Palomino) introduced SSE, SMP capable chips were branded as Athlon MP.
| Athlon XP (starting with Palomino) introduced SSE. SMP capable chips were branded as Athlon MP.
|-
|-
! Athlon 64
! Athlon 64
Line 413: Line 411:
|-
|-
|}
|}

===Athlon (first try)===

The first models of the Athlon were distinct, they were the first time that a competitor to Intel actually had a faster processor, without Intel having a backup plan. It was poised against the PIII, which at that time was their top model and best-running one too. The Athlon beat them to the 1 GHz mark, and at that time the 1 GHz had become completely irrelevant. It just meant that they had a new size to mark their processors with. Intel missed the point here, and they did until very shortly ago. The GHz myth had been broken, the Athlon at 1.1 GHz was still faster than the PIII at 1.3 GHz, and people knew. They didn't go for a P3 if a faster athlon was available at a lower clock speed, and at a lower price.

===Athlon XP / MP / Duron (new style)===

AMD switched to a big offensive, trying to persuade the buyers to demand AMD CPU's instead of being OK with Intels. The new versions of these processors were all just a tad better than the previous one, could do a slight number of instructions more (the Athlons started with not even SSE1, and from model 6 (both Athlon and Duron) they supported it). The processors also advanced very slightly in each other direction, making each new type just a tad faster than the previous one. In the end of the GHz wars (past year, about) the fastest Athlon was running at 2.2 GHz, but outperformed the better half of the 3 GHz P4's.


===AMD64 based CPU's===
===AMD64 based CPU's===
This is slightly offtopic here, but still quite relevant, since these processors all support the entire IA32 family natively. AMD created a new processor, with 64-bit (actually 48-bit, but who notices those 16 bits?) memory addressing and 64-bit calculations, being very compatible with the old style CPU's. So compatible, that the core for 32-bit and 64-bit is essentially equal, aside from the size of calculations and the support of a few encodings that were in effect redundant. They removed a few 1-byte opcodes (about 20 in total, including all 1-byte INC and 1-byte DEC instructions) to make place for a new REX prefix. They modified it to use 16 registers instead of 8, added a load of new names, got the old software working, and optimized the 32-bit performance to unprecedented levels. These CPU's outperform the P4 at any clock speed, in almost (1/20 programs not) any calculation-intensive program.
These processors all support the entire IA32 family natively. AMD created a new processor, with 48-bit memory addressing and 64-bit calculations, being very compatible with the old style CPU's. So compatible, that the core for 32-bit and 64-bit is essentially identical, aside from the size of calculations and the support of a few encodings that were effectively redundant. They removed a few 1-byte opcodes (about 20 in total, including all 1-byte INC and 1-byte DEC instructions) to make room for a new REX prefix. They modified the core to use 16 registers instead of 8, added a load of new names, got the old software working, and optimized the 32-bit performance.


== Other CPU vendors making similar chips==
== Other CPU vendors making similar chips==