Revision as of 12:26, 9 November 2018 view source osdev>Ruik Add new PUSH selector issue ← Older edit		Revision as of 10:00, 9 April 2020 view source osdev>Nullplan →‎ESP is not cleared: Added explanation of both ESPFIX implementations in Linux Newer edit →
Line 12: The x86 IRET will not clear upper bits of the stack register (32:16) when returning to 16-bit mode. As the result, the kernel high 16bit of ESP may be leaked to the userspace. Same is true for 64-bit kernel to 16-bit userspace transition. ==== Mitigations ==== One mitigation would be to simply not allow 16-bit code in user mode. Linux, however, does the following to work around the issue: In 32-bit mode, it reserves a segment for fixing up the stack pointer. If it detects a return to userspace with a 16-bit stack, it sets the base address of that segment such that the leaked bits can be set in ESP without moving the actual stack pointer, and then loads that segment as stack segment. Therefore the leaked bits will equal the original bits, so no harm is done. In 64-bit mode, this is not possible for lack of segmentation support. So instead Linux adopts another method, and will always leak bits chosen randomly at boot. During boot, Linux allocates 48 bytes for each CPU to serve as ESPFIX stack (actually, as long as there is space on the same page left, all CPUs will use the same page, just different offsets on it). These 48 bytes are mapped into kernel space, such that the leaked bits of ESP are entirely random. If a return to a 16-bit stack is detected, the return frame is copied into the ESPFIX stack, before switching the stack over to that address and performing the return. Since that may fail, the ESPFIX stack is mapped read-only (for the copy, a writable mapping of the same page is used). So if the IRET succeeds, nothing is ever written to the ESPFIX stack and everything works out. If something goes wrong, however, since the CPU is already running in Ring 0, it would not switch stacks back. It would try to write an interrupt frame onto the stack, but since the stack is read-only, that will fail. Therefore, a double-fault is caused, and the double-fault handler (running with an IST stack) changes the information such that it looks like a general protection fault occurred on the IRET instruction. And then the GPF handler just does its normal thing. This way, the leaked bits will not equal the bits userspace had at the start of the interrupt, but they will be random for each boot. Since the ESPFIX stack is so small, a large number of CPUs will have the same leaked ESP bits, so no important data is leaked. === NULL selector load may not clear MSR_GS_BASE ===

CPU Bugs: Difference between revisions