SSE: Difference between revisions

390 bytes added ,  9 days ago
m
no edit summary
[unchecked revision][unchecked revision]
m (Typo in fxsave inline asm)
mNo edit summary
 
(3 intermediate revisions by 2 users not shown)
Line 12:
=== Checking for SSE ===
to check for SSE CPUID.01h:EDX.SSE[bit 25] needs to be set
<sourcesyntaxhighlight lang="asm">
mov eax, 0x1
cpuid
Line 18:
jz .noSSE
;SSE is available
</syntaxhighlight>
</source>
 
=== Adding support ===
Line 28:
 
Here is an asm example:
<sourcesyntaxhighlight lang="asm">
;now enable SSE and the like
mov eax, cr0
Line 38:
mov cr4, eax
ret
</syntaxhighlight>
</source>
 
=== FXSAVE and FXRSTOR ===
Line 44:
 
Example usage:
<sourcesyntaxhighlight lang="c">
char fxsave_region[512] __attribute__((aligned(16)));
asm volatile(" fxsave %0 "::"m"(fxsave_region));
</syntaxhighlight>
</source>
or in asm:
<sourcesyntaxhighlight lang="asm">
segment .code
SaveFloats:
Line 56:
align 16
SavedFloats: TIMES 512 db 0
</syntaxhighlight>
</source>
Pitfalls: only one level of saving supported.
 
Line 97:
The bit for XSAVE (needed to manage extended processor states) can be found on CPUID page 1, in ECX bit 26
======AVX2======
The bit for AVX2 can be found on CPUID page 7, 0, in EDXEBX bit 265
=====AVX-512=====
The bits for AVX-512 are in CPUID page 0x0D, 0x0, EAX bits 5-7
 
AVX512 implements separate features that can also be detected in CPUID page 7, 0. Basic support is detected by checking the AVX512F Bit (AVX-512 Foundation) in CPUID page 7, 0 EBX Bit 16, you can also check various AVX512 Features through the same CPUID Function, the bits are listed [[w:CPUID|here]]
===X86_64===
When the [[X86-64]] architecture was introduced, AMD demanded a minimum level of SSE support to simplify OS code. Any system capable of long mode should support at least SSE and SSE2, which means that the kernel does not need to care about the old FPU save code.
Line 116 ⟶ 117:
Here is an example of assembly code enabling AVX after SSE has been enabled (you should check AVX and XSAVE are supported first, see above):
 
<sourcesyntaxhighlight lang="asm">
enable_avx:
push rax
Line 131 ⟶ 132:
pop rax
ret
</syntaxhighlight>
</source>
 
To enable AVX-512, set the OPMASK (bit 5), ZMM_Hi256 (bit 6), Hi16_ZMM (bit 7) of XCR0. You must ensure that these bits are valid first (see above).