FPU: Difference between revisions

1,530 bytes added ,  28 days ago
m
no edit summary
[unchecked revision][unchecked revision]
(Added Programming the FPU section)
mNo edit summary
 
(9 intermediate revisions by 7 users not shown)
Line 1:
{{Floats}}
The x86 FPU was originally an optional addition to the processor that was able to perform floating point math in hardware, but has since been integrated into the CPU proper and has collected over the years the majority of math-heavy instructions. The modern FPU has become a legacy term for what is actually the vector processing units, which just happens to include the original floating point operations.
 
== x86 FPU Legacy ==
 
Originally, the FPU was a dedicated coprocessor chip placed on top of the actual processor. Since it was performing calculations asynchronously from the core logic, it'sits results would have been available after the main processor has executed several other instructions. Since errors would also become available asynchronously, the original PC had the error line of the FPU wired to the [[PIC|interrupt controller]]. When the 486 added multiprocessor support, it became impossible to detect which of the FPUs has raised an exception, after which they integrated the FPU on-die and added an option to signal a regular exception rather than an interrupt. To provide backwards compatibility, the 486 was given a pin to replace the original FPU error line, which would be routed to the PIC and then back into the CPU's IRQ line to simulate the original setup with a dedicated coprocessor. This has the unfortunate consequence that by default, floating point exceptions will not operate as recommended by the manual.
 
== FPU configuration ==
Line 10 ⟶ 11:
 
== Detecting an FPU ==
On x86 processors up to the 386, FPUs were external and strictly optional. They allowed the use of different floating-point units, including those which did not strictly correspond to the processor's generation. For example, the 386 was capable of operating with both a 287 (the FPU corresponding to the 286), and the 387 (the contemporary FPU). The 486 line of microprocessors was bifurcated into the 486DX, which included an on-chip floating-point unit, and the 486SX, which did not. The external 487 coprocessor was essentially a modified 486DX that disabled the installed CPU. All x86 CPUs from the Pentium onward have an integrated FPU present (excluding the [https://en.wikipedia.org/wiki/NexGen NexGen 5x86]).
On 386s, FPUs were external and strictly optional. The 486 came in an FPU-included and an FPU-less package, with the "FPU upgrade" being just a modified 486 that disabled its lesser counterpart. From the Pentium onwards, FPUs were always integrated and present. To make things more tricky, 386s were capable of operating with both a 287 (the 286's FPU), and the 387 (the intended FPU)
 
There are two ways to detect an FPU:
Line 20 ⟶ 21:
 
The common way of testing the presence of an FPU is to have it write it's status somewhere and then check if it actually did.
<sourcesyntaxhighlight lang="asm">
MOV EDX, CR0 ; Start probe, get CR0
AND EDX, (-1) - (CR0_TS + CR0_EM) ; clear TS and EM to force fpu access
Line 31 ⟶ 32:
 
.testword: DW 0x55AA ; store garbage to be able to detect a change
</syntaxhighlight>
</source>
 
To distinguish a 287 and a 387 FPU, you can try if it can see the difference between +infinity and -infinity.
Line 41 ⟶ 42:
:If the EM bit is set, all FPU and vector operations will cause a #UD so they can be '''EM'''ulated in software. Should be off to be actually able to use the FPU
'''CR0.ET''' (bit 4)
:This bit is used on the 386 to tell it how to communicate with the coprocessor, which is 0 for an 287, and 1 for a 387 or later. This bit is hardwired to 1 on 486+
'''CR0.NE''' (bit 5)
:When set, enables '''N'''ative '''E'''xception handling which will use the FPU exceptions. When cleared, an exception is sent via the interrupt controller. Should be on for 486+, but not on 386s because they lack that bit.
Line 67 ⟶ 68:
== FPU state ==
When the FPU is configured, the only thing left to do is to initialize its registers to their proper states. FNINIT will reset the user-visible part of the FPU stack. This will set precision to 64-bit and rounding to nearest, which should be correct for most operations. It will also mask all exceptions from causing an interrupt. You can change the control by issuing an FLDCW. To diagnose broken code, you usually want to enable exceptions for invalid operands and stack overflows (bit 0). Bit 2 allows you to catch divisions by zero as well. Some examples:
<sourcesyntaxhighlight lang="asm">; FLDCW requires a 16-bit memory operand, immediates do not work
FLDCW [value_37F] ; writes 0x37f into the control word: the value written by F(N)INIT
FLDCW [value_37E] ; writes 0x37e, the default with invalid operand exceptions enabled
FLDCW [value_37A] ; writes 0x37a, both division by zero and invalid operands cause exceptions.</sourcesyntaxhighlight>
 
 
Line 78 ⟶ 79:
 
== Programming the FPU ==
The x87 FPU contains 8 floating point registers. Each floating point register holds an 80-bit extended double value (1-bit sign, 15-bit exponent, and 64-bit fractional value), and each register has a matching 162-bit "tag" registervalue in the Tag Register that acts as thethat register'sregisters flags. The tagTag registerRegister contains information about whether thateach register is empty or not, whether theits value is accurate or not, and whether theits value is a special value, like "infinity" or "Nan" (Not a number).
 
The 8 floating point registers are organized in a "stack" configuration, and most FPU instructions operate on the current "top" of the stack, (which is register 0 by default). The current "top" register index is stored in the FPU Status Register, and is updated automatically by the FPU when a PUSH (FLD) or POP (FST) instruction is executed. When all 8 stack registers are full (i.e. all 8 tag registers are not marked as "empty"), and a PUSH instruction is executed, a FPU stack-overflow exception will occur. If the stack-overflow exception interrupt has been enabled, the main CPU will also receive an interrupt.
 
Because the x87 FPU is a separate processor (and has its own clock), it can execute FPU instructions in parallel, at the same time as the CPU is executing its own instructions. Applications that use the x87 FPU must first execute one or more FPU commands, then at some later point, it must instruct the main CPU to wait for the FPU to finish processing (FWAIT) in order to ensure that the FPU has finished executing those instructions. Most of the x87 FPU instructions have a "wait" version and a "no wait" version, so that the programmer can specify at which points in the application that the two processors need to be synchronized. WhenAfter the FWAIT CPU instruction is executed (or a "wait" instruction is executed), any calculations performed by the FPU (or any exceptions that have been detected by the FPU) can be addressed by the application at that point.
 
Sending data to, and pulling data from the 8 FPU registers, ST(0) through ST(7), must be performed using system memory. It is not possible to directly copy values from a CPU register to an FPU register. The FPU can copy data from/to system memory in the following formats: 16-Bit Integer, 32-Bit Integer, 32-Bit Float (single), 64-Bit Float (double), and 80-bit Float (extended double). The FPU also supports reading and writing a 80-bit Binary Coded Decimal (BCD) format, which contains a single "sign" bit, 7 reserved bits, and 18 four-bit hexadecimal "characters".
 
When reading values from system memory, the extended double format is copied directly into the FPU register, while the other formats are converted to the 80-Bit extended double format before being stored in the FPU register. When writing values to system memory, the 80-bit value is copied directly when storing the extended double format, and is converted to the appropriate structure for the other formats. This conversion includes rounding the value based on the current rounding settings in the FPU Control Register.
 
== Rent-a-coder ==
These functions can be used with GCC (or TCC) to perform some FPU operations without resorting to dedicated assembly:
<sourcesyntaxhighlight lang="c">void fpu_load_control_word(const uint16_t control)
{
asm volatile("fldcw %0;"::"m"(control));
}</sourcesyntaxhighlight>
 
==See Also==
===External Links===
* [http://www.website.masmforum.com/tutorials/fptute/ Simply FPU], a practical guide covering the FPU basics in ana userland perspective
* [http://www.ragestorm.net/downloads/387intel.txt Intel 80387 Programmer's Reference Manual], complete with example code
* [http://developer.amd.com/documentation/guides/pages/default.aspx#manuals AMD Programmer's Manuals], has FPU instruction reference conveniently ordered by processor component.
Line 99 ⟶ 104:
 
[[Category:X86]]
[[de:FPU_(x86)]]