CPU Bugs: Difference between revisions

4,215 bytes added ,  26 days ago
m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
(Add new PUSH selector issue)
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(7 intermediate revisions by 2 users not shown)
Line 12:
 
The x86 IRET will not clear upper bits of the stack register (32:16) when returning to 16-bit mode. As the result, the kernel high 16bit of ESP may be leaked to the userspace. Same is true for 64-bit kernel to 16-bit userspace transition.
 
==== Mitigations ====
One mitigation would be to simply not allow 16-bit code in user mode. Linux, however, does the following to work around the issue: In 32-bit mode, it reserves a segment for fixing up the stack pointer. If it detects a return to userspace with a 16-bit stack, it sets the base address of that segment such that the leaked bits can be set in ESP without moving the actual stack pointer, and then loads that segment as stack segment. Therefore the leaked bits will equal the original bits, so no harm is done.
 
In 64-bit mode, this is not possible for lack of segmentation support. So instead Linux adopts another method, and will always leak bits chosen randomly at boot. During boot, Linux allocates 48 bytes for each CPU to serve as ESPFIX stack (actually, as long as there is space on the same page left, all CPUs will use the same page, just different offsets on it). These 48 bytes are mapped into kernel space, such that the leaked bits of ESP are entirely random. If a return to a 16-bit stack is detected, the return frame is copied into the ESPFIX stack, before switching the stack over to that address and performing the return. Since that may fail, the ESPFIX stack is mapped read-only (for the copy, a writable mapping of the same page is used). So if the IRET succeeds, nothing is ever written to the ESPFIX stack and everything works out. If something goes wrong, however, since the CPU is already running in Ring 0, it would not switch stacks back. It would try to write an interrupt frame onto the stack, but since the stack is read-only, that will fail. Therefore, a double-fault is caused, and the double-fault handler (running with an IST stack) changes the information such that it looks like a general protection fault occurred on the IRET instruction. And then the GPF handler just does its normal thing.
 
This way, the leaked bits will not equal the bits userspace had at the start of the interrupt, but they will be random for each boot. Since the ESPFIX stack is so small, a large number of CPUs will have the same leaked ESP bits, so no important data is leaked.
 
=== NULL selector load may not clear MSR_GS_BASE ===
 
Intel CPUs do not specify what happens with MSR_GS_BASE if NULL selector is loaded. The Intel CPUs seem to load it with zero, AMD CPUs preserve the previous values (now documented in the AMD64 Architecture Programmer's Manual Volume 2: System Programming). This detail needs to be taken into account for the context switches, if kernel tries to optimize the slow MSR operations.
 
==== Mitigations ====
Don't load a NULL selector into GS thinking it changes the GS.base register. Always use the MSR operations, unless running on a CPU with the WRFSGSBASE feature, in which case, the GS.base address can be set with the WRGSBASE instruction.
 
=== FXSAVE/FNSAVE ===
Line 24 ⟶ 34:
 
The Intel CPUs do not handle properly the non-canonical return address. If a non-canonical address is present in RCX when executing SYSRET, a General Protection Fault will be taken in CPL0 with CPL3 registers. (see CVE-2006-0744)
 
==== Mitigations ====
The only possible way to execute a SYSCALL instruction such that it jumps to kernel space with a non-canonical address is to put that instruction at address 0x00007ffffffffffe. This is invalid since there is nowhere for the CPU to return to (the return address is invalid). To prevent this from occurring, it is possible to disallow the mapping of an executable page at address 0x00007ffffffff000. Alternatively, the problem can be handled directly: If the return address has any of the top 16 bits set, return to userspace with the IRET instruction instead. Alternatively, construct the necessary stack frame, and jump directly to your GPF handler.
 
=== SS selector ===
 
On AMD CPU, SS selector may become unusable when in-kernel interrupt arrives (sets SS to NULL) and thread is switched and returned to userspace via SYSRET. The numerical SS value is correct however the descriptor cache is wrong. This affects only the 32-bit compatibility mode usage of SS.
 
only the 32-bit compatibility mode usage of SS.
==== Mitigation ====
Don't switch threads after an in-kernel interrupt (that is: Only switch threads from the outermost stack layer, when returning to userspace or handling a system call). Alternatively, if you did switch threads after an in-kernel interrupt, always return via IRET. You need to jump to the slower IRET return path after a system call in some cases anyway, so you may as well make this one of the cases.
 
=== PUSH selector ===
 
On Intel CPUs, when running in 32-bit protected mode, the push will only modify the low 16bit of stack and write there the selector. The high 16 bits remains unmodified. AMD CPUs do not do this. It may have some security impact, that some of stack is not initialized.
 
The high 16 bits remains unmodified. AMD CPUs do not do this. It may have some security impact, that some of stack is not initialized.
Note: This applies to every time a selector is pushed to the stack. So both a PUSH instruction and an implicit push following an interrupt of some sort are affected.
 
==== Mitigation ====
When reading a selector value off the stack, always mask out the bits you are interested in. Very often, the important information is just the RPL of the selector (which already contains the information, whether the selector is kernel or user space), or the TI bit. The actual table index is rarely needed. Doing this also hardens all parts of the kernel that read selectors against modifications of the GDT later on.
 
=== Nesting of NMI interrupt ===
Line 61 ⟶ 80:
This bug is caused by executing LOCK CMPXCHG8B eax (F0 0F C7 C8) By containing two opcode errors, an unallowed lock and a non-memory target, together with trying to cache the results, it confuses the cpu to enter a deadlock state, locking up the entire computer involved.
 
To fix this bug, the [[IDT]] entry containing the invalid opcode should be marked as uncacheable or writethrough to eliminate one necessary factor, or by marking the same page as not-writable which further confuses the processor, this time into the pagefault handler instead of into a deadlock. If paging is to be left disabled, the only workaround is to disable the cpu's caches, which is far from efficient. Further discussion of various solutions is presented [http://web.archive.org/web/20070927195343/http://www.x86.org/errata/dec97/f00fbug.htm here].
 
We can check, if the processor is Pentium through the [[CPUID]] instruction. Calling it with EAX=1 will return the CPU signature in EAX. We can extract the Family Number from the CPU signature and compare it with 5, because the Pentium belongs to Family 5.
Line 77 ⟶ 96:
=== Core-microarchitecture Bugs ===
 
[http://web.archive.org/web/20190804131349if_/https://www.geek.com/images/geeknews/2006Jan/core_duo_errata__2006_01_21__full.gif See a list of known bugs as of 2006]
 
=== 'Meltdown' Page Table Bug ===
Line 110 ⟶ 129:
To fix this bug, one must write to the cyrix registers and set the NO-LOCK bit in CCR1, which disables all but the most essential bus locks. The downside of this is that read-modify-write atomicity is no longer guaranteed on multiprocessor systems. Source code that should prevent this condition: (untested)
 
<sourcesyntaxhighlight lang="asm">
MOV AL, 0xC1 ; 0xC1 refers to CCR1
OUT 0x22, AL ; Select Register
Line 120 ⟶ 139:
MOV AL, AH ; Load new contents
OUT 0x23, AL ; Write new CCR1 with No-Lock set
</syntaxhighlight>
</source>
[[Category:X86 CPU]]