X86-64 Instruction Encoding: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(→‎32-bit addressing: fixed mod 11 registers to 32-bit and special SSE)
(→‎ModR/M: clarification)
Line 165: Line 165:


=== ModR/M ===
=== ModR/M ===
The ModR/M byte encodes a register or an opcode extension, and a register or a memory address. It is only required when ''/r'' or ''/0'' through ''/7'' is specified at the opcode syntax. It has the following fields:
The ModR/M byte has the following fields:
<pre>
* Mod (2 bits)
8 7 6 5 4 3 2 1 0
* Reg/Opcode (3 bits)
+--+--+--+--+--+--+--+--+
* R/M (3 bits)
| mod | reg | rm |
+--+--+--+--+--+--+--+--+
</pre>
* mod (2 bits)
* reg (3 bits)
* rm (3 bits)


The ''Mod'' and ''R/M'' fields are explained in the table. The ''Reg/Opcode'' field can have one of two values:
The ''mod'' and ''rm'' fields are explained in the table. The ''reg'' field can have one of two values:
* A 3-bit opcode extension, which is used by some instructions but has no futher meaning other than distincting the instruction from other instructions.
* A 3-bit opcode extension, which is used by some instructions but has no futher meaning other than distincting the instruction from other instructions. When ''/0'' through ''/7'' is specified at the opcode syntax, use that value as the value for the ''reg'' field.
* A 3-bit register reference, which can be used as the source or the destination of an instruction (depending on the instruction). Which register is meant depends on the [[#Operand-size and address-size override prefix|operand-size]] of the instruction and the instruction itself:
* A 3-bit register reference, which can be used as the source or the destination of an instruction (depending on the instruction). When this is the case, ''/r'' is specified at the opcode syntax. Which register is meant depends on the [[#Operand-size and address-size override prefix|operand-size]] of the instruction and the instruction itself:
** b000 - AL, AX, EAX, MM0 or XMM0
** b000 - AL, AX, EAX, MM0 or XMM0
** b001 - CL, CX, ECX, MM1 or XMM1
** b001 - CL, CX, ECX, MM1 or XMM1

Revision as of 23:30, 22 July 2008

This article describes how X86 and X86-64 instructions are encoded.

General Overview

An X86 instruction may be at most 15 bytes in length. It consists of the following components in the given order, where the prefixes are at the least-significant (lowest) address in memory:

  • Legacy prefixes (optional):
    • Lock prefix (1 byte)
    • Repeat prefix (1 byte)
    • Segment override prefix (1 byte)
    • Operand-size override prefix (1 byte)
    • Address-size override prefix (1 byte)
  • REX prefix (1 byte, 64-bit only)
    • b0100 (4 bits)
    • W (1 bit)
    • R (1 bit)
    • X (1 bit)
    • B (1-bit)
  • Opcode (1, 2 or 3 bytes, required)
  • ModR/M (1 byte, if required)
    • Mod (2 bits)
    • Reg/Opcode (3 bits)
    • R/M (3 bits)
  • SIB (1 byte, if required)
    • Scale (2 bits)
    • Index (3 bits)
    • Base (3 bits)
  • Displacement (1, 2 or 4 bytes, if required)
  • Immediate (1, 2 or 4 bytes, if required)

When the 15 byte instruction length limit is exceeded (by using redundant prefixes), a General Protection Fault occurs.

Legacy Prefixes

Each instruction can have up to four prefixes. Sometimes a prefix is required for the instruction while it loses it's original meaning. The following prefixes can be used:

  • Prefix group 1
    • 0xF0: LOCK prefix
    • 0xF2: REPNE/REPNZ prefix
    • 0xF3: REP or REPE/REPZ prefix
  • Prefix group 2
    • 0x2E: CS segment override
    • 0x36: SS segment override
    • 0x3E: DS segment override
    • 0x26: ES segment override
    • 0x64: FS segment override
    • 0x65: GS segment override
    • 0x2E: Branch not taken
    • 0x3E: Branch taken
  • Prefix group 3
    • 0x66: Operand-size override prefix
  • Prefix group 4
    • 0x67: Address-size override prefix

LOCK prefix

With the LOCK prefix, certain read-modify-write instructions are executed atomically. The LOCK prefix can only be used with the following instructions or an Invalid Opcode Exception occurs: ADC, ADD, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XADD, XCHG and XOR.

REPNE/REPNZ, REP and REPE/REPZ prefixes

The repeat prefixes cause string handling instructions to be repeated.

The REP prefix will repeat the associated instruction up to CX times, decreasing CX with every repetition. It can be used with the INS, LODS, MOVS, OUTS and STOS instructions.

REPE and REPZ are synonyms and repeat the instruction until CX reaches 0 or when ZF is set to 0. It can be used with the CMPS, CMPSB, CMPSD, CMPSW, SCAS, SCASB, SCASD and SCASW instructions.

REPNE and REPNZ also are synonyms and repeat the instruction until CX reaches 0 or when ZF is set to 1. It can be used with the CMPS, CMPSB, CMPSD, CMPSW, SCAS, SCASB, SCASD and SCASW instructions

CS, SS, DS, ES, FS and GS segment override prefixes

Segment overrides are used with instructions that reference non-stack memory. The default segment is implied by the instruction, and using a specific override forces the use of the specified segment for memory operands.

In 64-bit the CS, SS, DS and ES segment overrides are ignored.

Branch taken/not taken prefixes

{Stub} Branch prediction prefixes.

Operand-size and address-size override prefix

The default operand-size and address-size can be overridden using these prefix. See the following table:

Operating mode CS segment D-bit 0x66 operand-size override prefix 0x67 address-size override prefix REX prefix W-bit Operand-size Address-size
Real mode N/A N/A N/A N/A 16-bit 16-bit
Virtual 8086 mode N/A N/A N/A N/A 16-bit 16-bit
Protected mode 0 no no N/A 16-bit 16-bit
0 no yes N/A 16-bit 32-bit
0 yes no N/A 32-bit 16-bit
0 yes yes N/A 32-bit 32-bit
1 no no N/A 32-bit 32-bit
1 no yes N/A 32-bit 16-bit
1 yes no N/A 16-bit 32-bit
1 yes yes N/A 16-bit 16-bit
Long compatibility mode 0 no no N/A 16-bit 16-bit
0 no yes N/A 16-bit 32-bit
0 yes no N/A 32-bit 16-bit
0 yes yes N/A 32-bit 32-bit
1 no no N/A 32-bit 32-bit
1 no yes N/A 32-bit 16-bit
1 yes no N/A 16-bit 32-bit
1 yes yes N/A 16-bit 16-bit
Long 64-bit mode ignored no no 0 32-bit 64-bit
ignored no yes 0 32-bit1 32-bit
ignored yes no 0 16-bit 64-bit
ignored yes yes 0 16-bit 32-bit
ignored ignored no 1 64-bit1 64-bit
ignored ignored yes 1 64-bit 32-bit

1: The following instructions default to (or are fixed at) 64-bit operands and do not need the REX prefix for this: CALL (near), ENTER, Jcc, JrCXZ, JMP (near), LEAVE, LGDT, LIDT, LLDT, LOOP, LOOPcc, LTR, MOV CR(n), MOV DR(n), POP reg/mem, POP reg, POP FS, POP GS, POPFQ, PUSH imm8, PUSH imm32, PUSH reg/mem, PUSH reg, PUSH FS, PUSH GS, PUSHFQ and RET (near).

REX prefix

The REX prefix enabled 64-bit specific features. The layout is as follows:

b0100 REX.W REX.R REX.X REX.B

The REX prefix starts (bits 7-4) with the binary value b0100.

REX.W

When this 1-bit value is 0, the default operand size is used (which is 32-bit for most instructions, see this table). When 1, a 64-bit operand size is used.

REX.R

This 1-bit value is an extension to the reg field of the ModR/M byte. See 64-bit addressing.

REX.X

This 1-bit value is an extension to the index field of the SIB byte. See 64-bit addressing.

REX.B

This 1-bit value is an extension to the r/m field of the ModR/M byte or the base field of the SIB byte. See 64-bit addressing.

Opcode

There are 1, 2 and 3-byte opcodes. Some opcodes start with the same bytes as some prefixes, without taking the meaning of the prefix.

ModR/M and SIB bytes

The ModR/M and SIB bytes is used to encode the source and destination register or memory offset of an instruction.

ModR/M

The ModR/M byte encodes a register or an opcode extension, and a register or a memory address. It is only required when /r or /0 through /7 is specified at the opcode syntax. It has the following fields:

8  7  6  5  4  3  2  1  0
+--+--+--+--+--+--+--+--+
| mod |  reg   |   rm   |
+--+--+--+--+--+--+--+--+
  • mod (2 bits)
  • reg (3 bits)
  • rm (3 bits)

The mod and rm fields are explained in the table. The reg field can have one of two values:

  • A 3-bit opcode extension, which is used by some instructions but has no futher meaning other than distincting the instruction from other instructions. When /0 through /7 is specified at the opcode syntax, use that value as the value for the reg field.
  • A 3-bit register reference, which can be used as the source or the destination of an instruction (depending on the instruction). When this is the case, /r is specified at the opcode syntax. Which register is meant depends on the operand-size of the instruction and the instruction itself:
    • b000 - AL, AX, EAX, MM0 or XMM0
    • b001 - CL, CX, ECX, MM1 or XMM1
    • b010 - DL, DX, EDX, MM2 or XMM2
    • b011 - BL, BX, EBX, MM3 or XMM3
    • b100 - AH, SP, ESP, MM4 or XMM4
    • b101 - CH, BP, EBP, MM5 or XMM5
    • b110 - DH, SI, ESI, MM6 or XMM6
    • b111 - BH, DI, EDI, MM7 or XMM7

SIB

The SIB byte has the following fields:

  • Scale (2 bits)
  • Index (3 bits)
  • Base (3 bits)
The scale field indicates the scaling factor, where s (as used in the tables) equals 2scale. I.e. a value of scale=0 means s=1, scale=1 means s=2, scale=2 means s=4 and last but not least scale=3 means s=8.

16-bit addressing

These are the meanings of the Mod (vertically) and R/M (horizontally) bits for 16-bit addressing: (The SIB-byte is not used in 16-bit addressing.)

Mod, R/M= b000 b001 b010 b011 b100 b101 b110 b111
b00 DS:[BX+SI] DS:[BX+DI] SS:[BP+SI] SS:[BP+DI] DS:[SI] DS:[DI] DS:disp16 DS:[BX]
b01 DS:[BX+SI]+disp8 DS:[BX+DI]+disp8 SS:[BP+SI]+disp8 SS:[BP+DI]+disp8 DS:[SI]+disp8 DS:[DI]+disp8 SS:[BP]+disp8 DS:[BX]+disp8
b10 DS:[BX+SI]+disp16 DS:[BX+DI]+disp16 SS:[BP+SI]+disp16 SS:[BP+DI]+disp16 DS:[SI]+disp16 DS:[DI]+disp16 SS:[BP]+disp16 DS:[BX]+disp16
b11 AL, AX CL, CX DL, DX BL, BX AH, SP CH, BP DH, SI BH, DI

32-bit addressing

These are the meanings of the Mod (vertically) and R/M (horizontally) bits for 32-bit addressing:

Mod, R/M= b000 b001 b010 b011 b100 b101 b110 b111
b00 [EAX] [ECX] [EDX] [EBX] SIB disp32 [ESI] [EDI]
b01 [EAX]+disp8 [ECX]+disp8 [EDX]+disp8 [EBX]+disp8 SIB+disp8 [EBP]+disp8 [ESI]+disp8 [EDI]+disp8
b10 [EAX]+disp32 [ECX]+disp32 [EDX]+disp32 [EBX]+disp32 SIB+disp32 [EBP]+disp32 [ESI]+disp32 [EDI]+disp32
b11 EAX, MM0, XMM0 ECX, MM1, XMM1 EDX, MM2, XMM2 EBX, MM3, XMM3 ESP, MM4, XMM4 EBP, MM5, XMM5 ESI, MM6, XMM6 EDI, MM7, XMM7

32-bit SIB byte

The meaning of the SIB byte while using 32-bit addressing is as follows. The ModR/M byte's Mod field and the SIB byte's index field are used vertically, the SIB byte's base field horizontally. The s is the scaling factor.

Mod Index, Base= b000 b001 b010 b011 b100 b101 b110 b111
b00 b000 [EAX] + ([EAX] * s) [ECX] + ([EAX] * s) [EDX] + ([EAX] * s) [EBX] + ([EAX] * s) [ESP] + ([EAX] * s) ([EAX] * s) + disp32 [ESI] + ([EAX] * s) [EDI] + ([EAX] * s)
b001 [EAX] + ([ECX] * s) [ECX] + ([ECX] * s) [EDX] + ([ECX] * s) [EBX] + ([ECX] * s) [ESP] + ([ECX] * s) ([ECX] * s) + disp32 [ESI] + ([ECX] * s) [EDI] + ([ECX] * s)
b010 [EAX] + ([EDX] * s) [ECX] + ([EDX] * s) [EDX] + ([EDX] * s) [EBX] + ([EDX] * s) [ESP] + ([EDX] * s) ([EDX] * s) + disp32 [ESI] + ([EDX] * s) [EDI] + ([EDX] * s)
b011 [EAX] + ([EBX] * s) [ECX] + ([EBX] * s) [EDX] + ([EBX] * s) [EBX] + ([EBX] * s) [ESP] + ([EBX] * s) ([EBX] * s) + disp32 [ESI] + ([EBX] * s) [EDI] + ([EBX] * s)
b100 [EAX] [ECX] [EDX] [EBX] [ESP] disp32 [ESI] [EDI]
b101 [EAX] + ([EBP] * s) [ECX] + ([EBP] * s) [EDX] + ([EBP] * s) [EBX] + ([EBP] * s) [ESP] + ([EBP] * s) ([EBP] * s) + disp32 [ESI] + ([EBP] * s) [EDI] + ([EBP] * s)
b110 [EAX] + ([ESI] * s) [ECX] + ([ESI] * s) [EDX] + ([ESI] * s) [EBX] + ([ESI] * s) [ESP] + ([ESI] * s) ([ESI] * s) + disp32 [ESI] + ([ESI] * s) [EDI] + ([ESI] * s)
b111 [EAX] + ([EDI] * s) [ECX] + ([EDI] * s) [EDX] + ([EDI] * s) [EBX] + ([EDI] * s) [ESP] + ([EDI] * s) ([EDI] * s) + disp32 [ESI] + ([EDI] * s) [EDI] + ([EDI] * s)
b01 b000 [EAX] + ([EAX] * s) + disp8 [ECX] + ([EAX] * s) + disp8 [EDX] + ([EAX] * s) + disp8 [EBX] + ([EAX] * s) + disp8 [ESP] + ([EAX] * s) + disp8 [EBP] + ([EAX] * s) + disp8 [ESI] + ([EAX] * s) + disp8 [EDI] + ([EAX] * s) + disp8
b001 [EAX] + ([ECX] * s) + disp8 [ECX] + ([ECX] * s) + disp8 [EDX] + ([ECX] * s) + disp8 [EBX] + ([ECX] * s) + disp8 [ESP] + ([ECX] * s) + disp8 [EBP] + ([ECX] * s) + disp8 [ESI] + ([ECX] * s) + disp8 [EDI] + ([ECX] * s) + disp8
b010 [EAX] + ([EDX] * s) + disp8 [ECX] + ([EDX] * s) + disp8 [EDX] + ([EDX] * s) + disp8 [EBX] + ([EDX] * s) + disp8 [ESP] + ([EDX] * s) + disp8 [EBP] + ([EDX] * s) + disp8 [ESI] + ([EDX] * s) + disp8 [EDI] + ([EDX] * s) + disp8
b011 [EAX] + ([EBX] * s) + disp8 [ECX] + ([EBX] * s) + disp8 [EDX] + ([EBX] * s) + disp8 [EBX] + ([EBX] * s) + disp8 [ESP] + ([EBX] * s) + disp8 [EBP] + ([EBX] * s) + disp8 [ESI] + ([EBX] * s) + disp8 [EDI] + ([EBX] * s) + disp8
b100 [EAX] + disp8 [ECX] + disp8 [EDX] + disp8 [EBX] + disp8 [ESP] + disp8 [EBP] + disp8 [ESI] + disp8 [EDI] + disp8
b101 [EAX] + ([EBP] * s) + disp8 [ECX] + ([EBP] * s) + disp8 [EDX] + ([EBP] * s) + disp8 [EBX] + ([EBP] * s) + disp8 [ESP] + ([EBP] * s) + disp8 [EBP] + ([EBP] * s) + disp8 [ESI] + ([EBP] * s) + disp8 [EDI] + ([EBP] * s) + disp8
b110 [EAX] + ([ESI] * s) + disp8 [ECX] + ([ESI] * s) + disp8 [EDX] + ([ESI] * s) + disp8 [EBX] + ([ESI] * s) + disp8 [ESP] + ([ESI] * s) + disp8 [EBP] + ([ESI] * s) + disp8 [ESI] + ([ESI] * s) + disp8 [EDI] + ([ESI] * s) + disp8
b111 [EAX] + ([EDI] * s) + disp8 [ECX] + ([EDI] * s) + disp8 [EDX] + ([EDI] * s) + disp8 [EBX] + ([EDI] * s) + disp8 [ESP] + ([EDI] * s) + disp8 [EBP] + ([EDI] * s) + disp8 [ESI] + ([EDI] * s) + disp8 [EDI] + ([EDI] * s) + disp8
b10 b000 [EAX] + ([EAX] * s) + disp32 [ECX] + ([EAX] * s) + disp32 [EDX] + ([EAX] * s) + disp32 [EBX] + ([EAX] * s) + disp32 [ESP] + ([EAX] * s) + disp32 [EBP] + ([EAX] * s) + disp32 [ESI] + ([EAX] * s) + disp32 [EDI] + ([EAX] * s) + disp32
b001 [EAX] + ([ECX] * s) + disp32 [ECX] + ([ECX] * s) + disp32 [EDX] + ([ECX] * s) + disp32 [EBX] + ([ECX] * s) + disp32 [ESP] + ([ECX] * s) + disp32 [EBP] + ([ECX] * s) + disp32 [ESI] + ([ECX] * s) + disp32 [EDI] + ([ECX] * s) + disp32
b010 [EAX] + ([EDX] * s) + disp32 [ECX] + ([EDX] * s) + disp32 [EDX] + ([EDX] * s) + disp32 [EBX] + ([EDX] * s) + disp32 [ESP] + ([EDX] * s) + disp32 [EBP] + ([EDX] * s) + disp32 [ESI] + ([EDX] * s) + disp32 [EDI] + ([EDX] * s) + disp32
b011 [EAX] + ([EBX] * s) + disp32 [ECX] + ([EBX] * s) + disp32 [EDX] + ([EBX] * s) + disp32 [EBX] + ([EBX] * s) + disp32 [ESP] + ([EBX] * s) + disp32 [EBP] + ([EBX] * s) + disp32 [ESI] + ([EBX] * s) + disp32 [EDI] + ([EBX] * s) + disp32
b100 [EAX] + disp32 [ECX] + disp32 [EDX] + disp32 [EBX] + disp32 [ESP] + disp32 [EBP] + disp32 [ESI] + disp32 [EDI] + disp32
b101 [EAX] + ([EBP] * s) + disp32 [ECX] + ([EBP] * s) + disp32 [EDX] + ([EBP] * s) + disp32 [EBX] + ([EBP] * s) + disp32 [ESP] + ([EBP] * s) + disp32 [EBP] + ([EBP] * s) + disp32 [ESI] + ([EBP] * s) + disp32 [EDI] + ([EBP] * s) + disp32
b110 [EAX] + ([ESI] * s) + disp32 [ECX] + ([ESI] * s) + disp32 [EDX] + ([ESI] * s) + disp32 [EBX] + ([ESI] * s) + disp32 [ESP] + ([ESI] * s) + disp32 [EBP] + ([ESI] * s) + disp32 [ESI] + ([ESI] * s) + disp32 [EDI] + ([ESI] * s) + disp32
b111 [EAX] + ([EDI] * s) + disp32 [ECX] + ([EDI] * s) + disp32 [EDX] + ([EDI] * s) + disp32 [EBX] + ([EDI] * s) + disp32 [ESP] + ([EDI] * s) + disp32 [EBP] + ([EDI] * s) + disp32 [ESI] + ([EDI] * s) + disp32 [EDI] + ([EDI] * s) + disp32

64-bit addressing

{Stub}

Displacement

The displacement value, if any, follows the ModR/M and SIB bytes discussed above. When the ModR/M or SIB tables or the instruction's documentation state that a disp8 value is required, this means a 1 byte displacement. Also, disp16 is a two byte displacement and disp32 a four byte displacement.

Immediate

Some instructions require an immediate value. The instruction (and the operand-size column in the above table) determine the length of the immediate value. The imm8 mnemonic (or 8-bit operand-size) means a one byte immediate value, imm16(or 16-bit operand-size) means a two byte immediate value, imm32 (or 32-bit operand-size) a four byte value and imm64 (or 64-bit operand-size) an eight byte value.

See Also

External References