The x86 FPU was originally an optional addition to the processor that was able to perform floating point math in hardware, but has since been integrated into the CPU proper and has collected over the years the majority of math-heavy instructions. The modern FPU has become an legacy term for what is actually the vector processing units, which just happens to be include the original floating point operations.

x86 FPU Legacy

Originally, the FPU was a dedicated coprocessor chip placed on top of the actual processor. Since it was performing calculations asynchronously from the core logic, it's results would have been available after the main processor has executed several other instructions. Since errors would also become available asynchronously, the original PC had the error line of the FPU wired to the interrupt controller. When the 486 added multiprocessor support, it became impossible to detect which of the FPUs has raised an exception, after which they integrated the FPU on-die and added an option to signal a regular exception rather than an interrupt. To provide backwards compatibility, the 486 was given a pin to replace the original FPU error line, which would be routed to the PIC and then back into the CPU's IRQ line to simulate the original setup with a dedicated coprocessor. This has the unfortunate consequence that by default, floating point exceptions will not operate as recommended by the manual.

FPU configuration

Due to the many forms of FPUs and vector units, some logic is required to get them in the expected state.

Detecting an FPU

On 386s, FPUs were external and strictly optional. The 486 came in an FPU-included and an FPU-less package, with the "FPU upgrade" being just a modified 486 that disabled its lesser counterpart. From the Pentium onwards, FPUs were always integrated and present. To make things more tricky, 386s were capable of operating with both an 287 (the 286's FPU), and the 387 (the intended FPU)

There are two ways to detect an FPU:

Check the FPU bit in CPUID
Check the EM bit in CR0, if it is set then the FPU is not meant to be used.
Check the ET bit in CR0, if it is clear, then the CPU did not detect an 80387 on boot
Probe for an FPU

The correct order is a bit doubtful. The current official manuals state that attempts to use the FPU when one is not present will lock up the CPU. There are however many sources that contain probing code to various degrees of complexity, with the common consensus that fwait or actual calculations are not to be performed. Similarly, the EM and ET bits can be modified by code and might not have the right values. Different wirings on actual hardware may also cause 386s to not detect an FPU as an 80386, causing the ET bit to have the wrong value on boot.

The common way of testing the presence of an FPU is to have it write it's status somewhere and then check if it actually did.

MOV EDX, CR0                            ; Start probe, get CR0
AND EDX, (-1) - (CR0_TS + CR0_EM)       ; clear TS and EM to force fpu access
MOV CR0, EDX                            ; store control word
FNINIT                                  ; load defaults to FPU
FNSTSW [.testword]                      ; store status word
CMP word [.testword], 0                 ; compare the written status with the expected FPU state
JNE .nofpu                              ; jump if the FPU hasn't written anything (i.e. it's not there)
JMP .hasfpu

.testword: DW 0x55AA                    ; store garbage to be able to detect a change

FPU control

If an FPU is found to be present, you should set up the control registers accordingly. If an FPU is not present, you should also set up the registers accordingly.

CR0.EM

If the EM bit is set, all FPU and vector operations will cause and #UD so they can be EMulated in software. Should be off to be actually able to use the FPU

CR0.ET

This bit is used on the 386 to tell it how to communicate with the coprocessor, which is 0 for an 287, and 1 for a 387 or later. This bit is hardwired on 486+

CR0.NE

When set, enables Native Exception handling which will use the FPU exceptions. When cleared, an exception is sent via the interrupt controller. Should be on for 486+, but not on 386s because they lack that bit.

CR0.TS

Task switched. The FPU state is designed to be lazily switched to save read and write cycles. If set, all meaningful operations will cause an #NM exception so that the OS can backup the FPU state. This bit is automatically set on a hardware task switch, and can be cleared with the CLTS opcode. Software task switching may want to manually set this bit on a reschedule if they want to lazily store FPU state.

Vector unit

MMX, 3DNow and the rare EMMI reuse the old FPU registers as vector units, aliasing them into 64 bit data registers. This means that they can be used safely without modifications of the FPU handling. SSE however adds a whole new set of registers, and therefore is disabled by default. To allow SSE instructions, CR4.OSFXSR should be set. Be careful though since writing it on a processor without SSE support causes an exception. When SSE is enabled, FXSAVE and FXRSTOR should be used to store the entire FPU and vector register file. It is good practice to enable the other SSE bit (CR4.OSXMMEXCPT) as well so that SSE exceptions are routed to the #XF handler, instead of your vector unit automatically disabling itself when an exception occurs.

FPU state

When the FPU is configured, the only thing left to do is to initialize its registers to their proper states. FNINIT will reset the user-visible part of the FPU stack.

Original reference

The factual accuracy of this article is disputed. ^Talk

Example:

 
 void set_fpu_control_word(const uint16_t cw)
 {
     if(cpuid_features.FPU) // checks for the FPU flag
     {
         // FLDCW = Load FPU Control Word
         asm volatile("fldcw %0;    "    // sets the FPU control word to "cw"
                      ::"m"(cw)); 
     }
 }
 
 void setup_x87_fpu()
 {
    size_t cr4; // backup of CR4
    
    if(cpuid_features.FPU) // checks for the FPU flag
    {
        // place CR4 into our variable
        __asm__ __volatile__("mov %%cr4, %0;" : "=r" (cr4));
    
        // set the OSFXSR bit
        cr4 |= 0x200;
    
        // reload CR4 and INIT the FPU (FINIT)
        __asm__ __volatile__("mov %0, %%cr4; finit;" : : "r"(cr4));
    
        // set the FPU Control Word
        set_fpu_control_word(0x37F);
     }
 }

The 'mode' variable is set to 0x37f, but what exactly does this mean when loading the FPU control word? Bits 0-5 of the control word are various interrupt masks, when set they mask the interrupt from reaching the CPU and (most likely) causing a fault if proper handling doesn't exist. Bit 7 is the IEM, which stands for the Interrupt Enable Mask, and should only be set if there are interrupts that will be UNMASKED. Bits 8-11 are for two things, the PC (Precision Control) and RC (Rounding Control). In our case, we have the PC fully set (default) which corresponds to 64-bit precision. The RC is set to the default of 0 and will round to the nearest and to the even-most value if split.

References

http://www.website.masmforum.com/tutorials/fptute/fpuchap1.htm

FPU

Contents

x86 FPU Legacy

FPU configuration

Detecting an FPU

FPU control

Vector unit

FPU state

Original reference

References

Navigation menu

FPU

x86 FPU Legacy

FPU configuration

Detecting an FPU

FPU control

Vector unit

FPU state

Original reference

References

Navigation menu

Search