AVX2: Difference between revisions
[unchecked revision] | [unchecked revision] |
Line 2: | Line 2: | ||
== Advanced Vector Extensions == |
== Advanced Vector Extensions == |
||
AVX or (Advanced Vector Extensions) are extensions to the x86 architecture introduced by Intel with the [[https://en.wikipedia.org/wiki/Sandy_Bridge SandyBridge]] micro-architecture. AVX adds 86 instructions to the CPU instruction set, it extends the XMM |
AVX or (Advanced Vector Extensions) are extensions to the x86 architecture introduced by Intel with the [[https://en.wikipedia.org/wiki/Sandy_Bridge SandyBridge]] micro-architecture. AVX adds 86 instructions to the CPU instruction set, it extends the 128 Bit XMM registers to 256 Bit YMM registers, these registers operate as lower-upper halves meaning that XMMx contains the low 128 bits of YMMx, thus, the AVX instruction set increases the size of memory transfers and parallel floating point computations. Effective usage of these extensions may vastly increase the performance of your program. |
||
== AVX2 == |
== AVX2 == |
Revision as of 23:32, 19 September 2022
Devc1, unfinished.
Advanced Vector Extensions
AVX or (Advanced Vector Extensions) are extensions to the x86 architecture introduced by Intel with the [SandyBridge] micro-architecture. AVX adds 86 instructions to the CPU instruction set, it extends the 128 Bit XMM registers to 256 Bit YMM registers, these registers operate as lower-upper halves meaning that XMMx contains the low 128 bits of YMMx, thus, the AVX instruction set increases the size of memory transfers and parallel floating point computations. Effective usage of these extensions may vastly increase the performance of your program.
AVX2
AVX2 expands the AVX instruction set, it includes an expansion of the SSE Vector integer instructions to 256 Bits, Gather support, vector shifts and more.
; Available in AVX vmulpd ymm0, ymm0, ymm1 ; (Floating point computations) vpaddb xmm0, xmm1, xmm2 ; (Multiply 16 Byte integers in XMM1 by integers in XMM2 and store result in XMM0) ; Requires AVX2 vpaddb ymm0, ymm1, ymm2 ; (Multiply 32 Byte integers in YMM1 by integers in YMM2 and store result in XMM0)
Integer Arithmetic
To introduce AVX2, we should see how these register work :
lets take for example the previous instruction VPADDB/W/D/Q