SSE: Difference between revisions

Jump to navigation Jump to search
1,411 bytes added ,  12 years ago
Merged the SSE content
[unchecked revision][unchecked revision]
m (Corrected jnz to jz)
(Merged the SSE content)
Line 7:
 
There are 8 (16 in 64-bit mode) XMM registers (XMM0-7(15)) that come with SSE, and they are 128-bit registers. Certain SSE instructions (movntdqa, movdqa, movdqu, etc...) can load 16 bytes from memory or store 16 bytes to memory in a single operation. Also, SSE introduces a few non-temporal hint instructions (movntdqa and movntdq) that allow one-shot memory locations to be stored in non-temporal memory so those location references to do not pollute the small on-chip caches.
 
Since this change added new registers, it is disabled by default as the typical operating system of that time was not yet able to save those registers on a task switch. To support SSE, you will need to implement separate code paths for saving and restoring SSE state (as those instructions will cause an exception on processors that do not support it), and handlers for the new exceptions. After that, you can tell the CPU to enable SSE use in userland tasks.
 
=== Checking for SSE ===
Line 50 ⟶ 52:
The MXCSR register holds all of the masking and flag information for use with SSE floating-point operations. Just like the x87 FPU control word, if you would like to mask certain exceptions from occuring or would like to specify rounding types, MXCSR will need to be modified. Bits 16-31 are reserved and will cause a #GP exception if set. LDMXCSR and STMXCSR load and write the MXCSR register respectively. They both require a 32-bit memory operand. SSE support needs to already be set up before using either of these instructions (CR4.OSFXSR = 1, CR0.EM = 0, and CR0.TS = 0). If bits 7-12 are set, all SSE floating-point exceptions are masked. Bits 0-5 are exception status flags that are set if the corresponding exception has occured. Bits 13-14 are the RC (Rounding Control) bits. RC:0 = to nearest, RC:1 = down, RC:2 = up, RC:3 = truncate.
== Updates to SSE ==
Later processors have added more instructions for different work to be performed on the vector registers. Supporting them with SSE support in place doesn't require any effort on the part of the OS. The actual user of the instructions should however check if those instructions actually exist.
=== CPUID bits ===
=====SSE2=====
The bit for SSE2 can be found on CPUID page 1, in EDX bit 26.
=====SSE3=====
The bit for SSE3 can be found on CPUID page 1, in ECX bit 0.
=====SSE4=====
The bit for SSE4.1 can be found on CPUID page 1, in ECX bit 19
 
The bit for SSE4.2 can be found on CPUID page 1, in ECX bit 20
 
===X86_64===
When the [[X86-64]] architecture was introduced, AMD demanded a minimum level of SSE support to simplify OS code. Any system capable of long mode should support at least SSE and SSE2, which means that the kernel does not need to care about the old FPU save code.
 
==See Also==
* [[MMX]]
* The [[User:01000101/optlib/|optimisation library]] of 01000101, containing example code
*[[SSE2]]
 
=== References ===
SSE* Wikipedia article:The [http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions Wikipedia article] on SSE
 
[[Category:SSEX86]]
A few SSE examples: [[User:01000101/optlib/]]
[[Category:SSE]]
1,490

edits

Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu