CPU Bugs

Computers are made by humans, and thus inherently prone to errors. This page describes known bugs for various models and brands.

x86 misfeatures

ESP is not cleared

The x86 IRET will not clear upper bits of the stack register (32:16) when returning to 16-bit mode. As the result, the kernel high 16bit of ESP may be leaked to the userspace. Same is true for 64-bit kernel to 16-bit userspace transition.

NULL selector load may not clear MSR_GS_BASE

Intel CPUs do not specify what happens with MSR_GS_BASE if NULL selector is loaded. The Intel CPUs seem to load it with zero, AMD CPUs preserve the previous values (now documented in the AMD64 Architecture Programmer's Manual Volume 2: System Programming). This detail needs to be taken into account for the context switches, if kernel tries to optimize the slow MSR operations.

FXSAVE/FNSAVE

The Intel and AMD differ in what context is saved/restored. AMD CPUs do not save/restore certain parts (FIP/FOP) only when exception is pending (see CVE-2006-1056)

SYSRET

The Intel CPUs do not handle properly the non-canonical return address. If a non-canonical address is present in RCX when executing SYSRET, a General Protection Fault will be taken in CPL0 with CPL3 registers. (see CVE-2006-0744)

SS selector

On AMD CPU, SS selector may become unusable when in-kernel interrupt arrives (sets SS to NULL) and thread is switched and returned to userspace via SYSRET. The numerical SS value is correct however the descriptor cache is wrong. This affects only the 32-bit compatibility mode usage of SS.

Nesting of NMI interrupt

If CPU is executing the NMI interrupt handler, CPU guarantees to keep NMI masked until the IRET is executed. However if for some reason NMI triggers some other exception, which executes IRET then the NMI may trigger again, possibly overwriting its own stack as on AMD64 it runs with IST stack (fun starts if SMI is triggering IRET for some reason).

Intel

Transactional Synchronization eXtensions (TSX) Bug

In August 2014, Intel announced that a bug exists in the TSX implementation on Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX feature on affected CPUs via a microcode update. The bug was fixed in F-0 steppings of the vPro-enabled Core M-5Y70 Broadwell CPU in November 2014.

Extended Page Table (EPT) Bug

A MOV to CR3 when EPT is enabled may lead to an unexpected page fault or an incorrect page translation.

Affected processors:

Intel Xeon E5-#### v2, where #### is a 4-digit number, optionally followed by a letter.
Intel Xeon E7-#### v2, where #### is a 4-digit number.
Intel Xeon E3-12## v2, where ## is a 2-digit number, optionally followed by a letter.

F00F Bug

Affects: Intel i586 series (Pentium 1, Pentium MMX, Pentium Overdrive, Pentium MMX Overdrive)

This bug is caused by executing LOCK CMPXCHG8B eax (F0 0F C7 C8) By containing two opcode errors, an unallowed lock and a non-memory target, together with trying to cache the results, it confuses the cpu to enter a deadlock state, locking up the entire computer involved.

To fix this bug, the IDT entry containing the invalid opcode should be marked as uncacheable or writethrough to eliminate one necessary factor, or by marking the same page as not-writable which further confuses the processor, this time into the pagefault handler instead of into a deadlock. If paging is to be left disabled, the only workaround is to disable the cpu's caches, which is far from efficient. Further discussion of various solutions is presented here.

We can check, if the processor is Pentium through the CPUID instruction. Calling it with EAX=1 will return the CPU signature in EAX. We can extract the Family Number from the CPU signature and compare it with 5, because the Pentium belongs to Family 5.

FDIV bug

The Pentium FDIV bug is a bug in the Intel P5 Pentium floating point unit (FPU). Because of the bug, the processor can return incorrect decimal results, an issue troublesome for the precise calculations needed in fields like math and science. Discovered by Professor Thomas R. Nicely at Lynchburg College, Intel attributed the error to missing entries in the lookup table used by the floating-point division circuitry.

This problem occurs only on some models of the original Pentium processor. Any Pentium family processor with a clock speed of at least 120 MHz is new enough not to have this bug.

Buggy HLT

Some of the first 100 MHz Intel DX chips had a buggy HLT state, prompting the developers of Linux to implement a "no-hlt" option for use when running on those chips, but this was fixed in later chips.

Core-microarchitecture Bugs

[1]

'Meltdown' Page Table Bug

Modern (1995 and upwards) Intel x86 chips contain a bug in the out-of-order execution hardware that allows unprivileged userland software to gain access to kernel memory when the kernel is mapped into the userland address space. To avoid vulnerability, it is recommended that the kernel and userland page tables remain separate (i.e: PTI, Page Table Isolation). For more details, visit this page.

AMD

DragonFly BSD Heavy Load Crash

AMD has confirmed that some of its processors contain a bug that could cause program errors under certain specific conditions. The bug was initially discovered by Matt Dillon, a DragonFly BSD developer.

Consecutive back-to-back pops and (near) return instructions can create a condition where the processor incorrectly updates the stack pointer. The specific manifestations in DragonFly were random segmentation faults under heavy load.

A program exception has been identified in previous generations of the AMD Opteron processor that occurs in certain environments that leverage a very specific GCC compiler build. A workaround has been identified for the small segment of customers this could potentially impact. Also, this marginal erratum impacts the previous four generations of AMD Opteron processors which include the AMD Opteron 2300, 8300 ("Barcelona" and "Shanghai",) 2400, 8400 ("Istanbul",) and 4100, 6100 ("Lisbon" and "Magny-Cours") series processors.

Ryzen Bug

AMD has confirmed that some of its processors contain a bug that could cause program errors under certain specific conditions when executing code near the canonical address boundary. Insert a guard page (unmapped 4K page, or larger page) before canonical address boundary.

CPUID Bugs

For older K5 CPUs, the feature flags returned by "CPUID 0x00000001" in EDX are dodgy - bit 9 is used to indicate support for PGE (and not used to indicate support for the local APIC).

Cyrix

Coma Bug

Affects: Cyrix 6x86 series

This bug is caused when several implicitly locked instructions are pipelined into an infinite loop. In effect when an instruction completes, the following locked instruction is executed directly afterward, maintaining bus lock and inhibiting interrupts. In an infinite loop, this will lock all interrupts on the processor, rendering it useless.

To fix this bug, one must write to the cyrix registers and set the NO-LOCK bit in CCR1, which disables all but the most essential bus locks. The downside of this is that read-modify-write atomicity is no longer guaranteed on multiprocessor systems. Source code that should prevent this condition: (untested)

MOV AL, 0xC1   ; 0xC1 refers to CCR1
OUT 0x22, AL   ; Select Register
IN 0x23, AL    ; Load Contents
OR AL, 0x10    ; Set No-Lock bit
MOV AH, AL     ;
MOV AL, 0xC1   ; 0xC1 refers to CCR1
OUT 0x22, AL   ; Select register
MOV AL, AH     ; Load new contents
OUT 0x23, AL   ; Write new CCR1 with No-Lock set