X86-64 Instruction Encoding: Difference between revisions

[unchecked revision]

← Older edit Newer edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 23:30, 22 July 2008

This article describes how X86 and X86-64 instructions are encoded.

General Overview

An X86 instruction may be at most 15 bytes in length. It consists of the following components in the given order, where the prefixes are at the least-significant (lowest) address in memory:

Legacy prefixes (optional):
- Lock prefix (1 byte)
- Repeat prefix (1 byte)
- Segment override prefix (1 byte)
- Operand-size override prefix (1 byte)
- Address-size override prefix (1 byte)
REX prefix (1 byte, 64-bit only)
- b0100 (4 bits)
- W (1 bit)
- R (1 bit)
- X (1 bit)
- B (1-bit)
Opcode (1, 2 or 3 bytes, required)
ModR/M (1 byte, if required)
- Mod (2 bits)
- Reg/Opcode (3 bits)
- R/M (3 bits)
SIB (1 byte, if required)
- Scale (2 bits)
- Index (3 bits)
- Base (3 bits)
Displacement (1, 2 or 4 bytes, if required)
Immediate (1, 2 or 4 bytes, if required)

When the 15 byte instruction length limit is exceeded (by using redundant prefixes), a General Protection Fault occurs.

Legacy Prefixes

Each instruction can have up to four prefixes. Sometimes a prefix is required for the instruction while it loses it's original meaning. The following prefixes can be used:

Prefix group 1
- 0xF0: LOCK prefix
- 0xF2: REPNE/REPNZ prefix
- 0xF3: REP or REPE/REPZ prefix
Prefix group 2
- 0x2E: CS segment override
- 0x36: SS segment override
- 0x3E: DS segment override
- 0x26: ES segment override
- 0x64: FS segment override
- 0x65: GS segment override
- 0x2E: Branch not taken
- 0x3E: Branch taken
Prefix group 3
- 0x66: Operand-size override prefix
Prefix group 4
- 0x67: Address-size override prefix

LOCK prefix

With the LOCK prefix, certain read-modify-write instructions are executed atomically. The LOCK prefix can only be used with the following instructions or an Invalid Opcode Exception occurs: ADC, ADD, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XADD, XCHG and XOR.

REPNE/REPNZ, REP and REPE/REPZ prefixes

The repeat prefixes cause string handling instructions to be repeated.

The REP prefix will repeat the associated instruction up to CX times, decreasing CX with every repetition. It can be used with the INS, LODS, MOVS, OUTS and STOS instructions.

REPE and REPZ are synonyms and repeat the instruction until CX reaches 0 or when ZF is set to 0. It can be used with the CMPS, CMPSB, CMPSD, CMPSW, SCAS, SCASB, SCASD and SCASW instructions.

REPNE and REPNZ also are synonyms and repeat the instruction until CX reaches 0 or when ZF is set to 1. It can be used with the CMPS, CMPSB, CMPSD, CMPSW, SCAS, SCASB, SCASD and SCASW instructions

CS, SS, DS, ES, FS and GS segment override prefixes

Segment overrides are used with instructions that reference non-stack memory. The default segment is implied by the instruction, and using a specific override forces the use of the specified segment for memory operands.

In 64-bit the CS, SS, DS and ES segment overrides are ignored.

Branch taken/not taken prefixes

{Stub} Branch prediction prefixes.

Operand-size and address-size override prefix

The default operand-size and address-size can be overridden using these prefix. See the following table:

Operating mode	CS segment D-bit	0x66 operand-size override prefix	0x67 address-size override prefix	REX prefix W-bit	Operand-size	Address-size
Real mode	N/A	N/A	N/A	N/A	16-bit	16-bit
Virtual 8086 mode	N/A	N/A	N/A	N/A	16-bit	16-bit
Protected mode	0	no	no	N/A	16-bit	16-bit
	0	no	yes	N/A	16-bit	32-bit
	0	yes	no	N/A	32-bit	16-bit
	0	yes	yes	N/A	32-bit	32-bit
	1	no	no	N/A	32-bit	32-bit
	1	no	yes	N/A	32-bit	16-bit
	1	yes	no	N/A	16-bit	32-bit
	1	yes	yes	N/A	16-bit	16-bit
Long compatibility mode	0	no	no	N/A	16-bit	16-bit
	0	no	yes	N/A	16-bit	32-bit
	0	yes	no	N/A	32-bit	16-bit
	0	yes	yes	N/A	32-bit	32-bit
	1	no	no	N/A	32-bit	32-bit
	1	no	yes	N/A	32-bit	16-bit
	1	yes	no	N/A	16-bit	32-bit
	1	yes	yes	N/A	16-bit	16-bit
Long 64-bit mode	ignored	no	no	0	32-bit	64-bit
	ignored	no	yes	0	32-bit1	32-bit
	ignored	yes	no	0	16-bit	64-bit
	ignored	yes	yes	0	16-bit	32-bit
	ignored	ignored	no	1	64-bit1	64-bit
	ignored	ignored	yes	1	64-bit	32-bit

1: The following instructions default to (or are fixed at) 64-bit operands and do not need the REX prefix for this: CALL (near), ENTER, Jcc, JrCXZ, JMP (near), LEAVE, LGDT, LIDT, LLDT, LOOP, LOOPcc, LTR, MOV CR(n), MOV DR(n), POP reg/mem, POP reg, POP FS, POP GS, POPFQ, PUSH imm8, PUSH imm32, PUSH reg/mem, PUSH reg, PUSH FS, PUSH GS, PUSHFQ and RET (near).

REX prefix

The REX prefix enabled 64-bit specific features. The layout is as follows:

b0100

REX.W

REX.R

REX.X

REX.B

The REX prefix starts (bits 7-4) with the binary value b0100.

REX.W

When this 1-bit value is 0, the default operand size is used (which is 32-bit for most instructions, see this table). When 1, a 64-bit operand size is used.

REX.R

This 1-bit value is an extension to the reg field of the ModR/M byte. See 64-bit addressing.

REX.X

This 1-bit value is an extension to the index field of the SIB byte. See 64-bit addressing.

REX.B

This 1-bit value is an extension to the r/m field of the ModR/M byte or the base field of the SIB byte. See 64-bit addressing.

Opcode

There are 1, 2 and 3-byte opcodes. Some opcodes start with the same bytes as some prefixes, without taking the meaning of the prefix.

ModR/M and SIB bytes

The ModR/M and SIB bytes is used to encode the source and destination register or memory offset of an instruction.

ModR/M

The ModR/M byte encodes a register or an opcode extension, and a register or a memory address. It is only required when /r or /0 through /7 is specified at the opcode syntax. It has the following fields:

8  7  6  5  4  3  2  1  0
+--+--+--+--+--+--+--+--+
| mod |  reg   |   rm   |
+--+--+--+--+--+--+--+--+

mod (2 bits)
reg (3 bits)
rm (3 bits)

The mod and rm fields are explained in the table. The reg field can have one of two values:

A 3-bit opcode extension, which is used by some instructions but has no futher meaning other than distincting the instruction from other instructions. When /0 through /7 is specified at the opcode syntax, use that value as the value for the reg field.
A 3-bit register reference, which can be used as the source or the destination of an instruction (depending on the instruction). When this is the case, /r is specified at the opcode syntax. Which register is meant depends on the operand-size of the instruction and the instruction itself:
- b000 - AL, AX, EAX, MM0 or XMM0
- b001 - CL, CX, ECX, MM1 or XMM1
- b010 - DL, DX, EDX, MM2 or XMM2
- b011 - BL, BX, EBX, MM3 or XMM3
- b100 - AH, SP, ESP, MM4 or XMM4
- b101 - CH, BP, EBP, MM5 or XMM5
- b110 - DH, SI, ESI, MM6 or XMM6
- b111 - BH, DI, EDI, MM7 or XMM7

SIB

The SIB byte has the following fields:

Scale (2 bits)
Index (3 bits)
Base (3 bits)

The scale field indicates the scaling factor, where s (as used in the tables) equals 2scale. I.e. a value of scale=0 means s=1, scale=1 means s=2, scale=2 means s=4 and last but not least scale=3 means s=8.

16-bit addressing

These are the meanings of the Mod (vertically) and R/M (horizontally) bits for 16-bit addressing: (The SIB-byte is not used in 16-bit addressing.)

Mod, R/M=	b000	b001	b010	b011	b100	b101	b110	b111
b00	DS:[BX+SI]	DS:[BX+DI]	SS:[BP+SI]	SS:[BP+DI]	DS:[SI]	DS:[DI]	DS:disp16	DS:[BX]
b01	DS:[BX+SI]+disp8	DS:[BX+DI]+disp8	SS:[BP+SI]+disp8	SS:[BP+DI]+disp8	DS:[SI]+disp8	DS:[DI]+disp8	SS:[BP]+disp8	DS:[BX]+disp8
b10	DS:[BX+SI]+disp16	DS:[BX+DI]+disp16	SS:[BP+SI]+disp16	SS:[BP+DI]+disp16	DS:[SI]+disp16	DS:[DI]+disp16	SS:[BP]+disp16	DS:[BX]+disp16
b11	AL, AX	CL, CX	DL, DX	BL, BX	AH, SP	CH, BP	DH, SI	BH, DI

32-bit addressing

These are the meanings of the Mod (vertically) and R/M (horizontally) bits for 32-bit addressing:

Mod, R/M=	b000	b001	b010	b011	b100	b101	b110	b111
b00	[EAX]	[ECX]	[EDX]	[EBX]	SIB	disp32	[ESI]	[EDI]
b01	[EAX]+disp8	[ECX]+disp8	[EDX]+disp8	[EBX]+disp8	SIB+disp8	[EBP]+disp8	[ESI]+disp8	[EDI]+disp8
b10	[EAX]+disp32	[ECX]+disp32	[EDX]+disp32	[EBX]+disp32	SIB+disp32	[EBP]+disp32	[ESI]+disp32	[EDI]+disp32
b11	EAX, MM0, XMM0	ECX, MM1, XMM1	EDX, MM2, XMM2	EBX, MM3, XMM3	ESP, MM4, XMM4	EBP, MM5, XMM5	ESI, MM6, XMM6	EDI, MM7, XMM7

32-bit SIB byte

The meaning of the SIB byte while using 32-bit addressing is as follows. The ModR/M byte's Mod field and the SIB byte's index field are used vertically, the SIB byte's base field horizontally. The s is the scaling factor.

Mod	Index, Base=	b000	b001	b010	b011	b100	b101	b110	b111
b00	b000	[EAX] + ([EAX] * s)	[ECX] + ([EAX] * s)	[EDX] + ([EAX] * s)	[EBX] + ([EAX] * s)	[ESP] + ([EAX] * s)	([EAX] * s) + disp32	[ESI] + ([EAX] * s)	[EDI] + ([EAX] * s)
	b001	[EAX] + ([ECX] * s)	[ECX] + ([ECX] * s)	[EDX] + ([ECX] * s)	[EBX] + ([ECX] * s)	[ESP] + ([ECX] * s)	([ECX] * s) + disp32	[ESI] + ([ECX] * s)	[EDI] + ([ECX] * s)
	b010	[EAX] + ([EDX] * s)	[ECX] + ([EDX] * s)	[EDX] + ([EDX] * s)	[EBX] + ([EDX] * s)	[ESP] + ([EDX] * s)	([EDX] * s) + disp32	[ESI] + ([EDX] * s)	[EDI] + ([EDX] * s)
	b011	[EAX] + ([EBX] * s)	[ECX] + ([EBX] * s)	[EDX] + ([EBX] * s)	[EBX] + ([EBX] * s)	[ESP] + ([EBX] * s)	([EBX] * s) + disp32	[ESI] + ([EBX] * s)	[EDI] + ([EBX] * s)
	b100	[EAX]	[ECX]	[EDX]	[EBX]	[ESP]	disp32	[ESI]	[EDI]
	b101	[EAX] + ([EBP] * s)	[ECX] + ([EBP] * s)	[EDX] + ([EBP] * s)	[EBX] + ([EBP] * s)	[ESP] + ([EBP] * s)	([EBP] * s) + disp32	[ESI] + ([EBP] * s)	[EDI] + ([EBP] * s)
	b110	[EAX] + ([ESI] * s)	[ECX] + ([ESI] * s)	[EDX] + ([ESI] * s)	[EBX] + ([ESI] * s)	[ESP] + ([ESI] * s)	([ESI] * s) + disp32	[ESI] + ([ESI] * s)	[EDI] + ([ESI] * s)
	b111	[EAX] + ([EDI] * s)	[ECX] + ([EDI] * s)	[EDX] + ([EDI] * s)	[EBX] + ([EDI] * s)	[ESP] + ([EDI] * s)	([EDI] * s) + disp32	[ESI] + ([EDI] * s)	[EDI] + ([EDI] * s)

b01	b000	[EAX] + ([EAX] * s) + disp8	[ECX] + ([EAX] * s) + disp8	[EDX] + ([EAX] * s) + disp8	[EBX] + ([EAX] * s) + disp8	[ESP] + ([EAX] * s) + disp8	[EBP] + ([EAX] * s) + disp8	[ESI] + ([EAX] * s) + disp8	[EDI] + ([EAX] * s) + disp8
	b001	[EAX] + ([ECX] * s) + disp8	[ECX] + ([ECX] * s) + disp8	[EDX] + ([ECX] * s) + disp8	[EBX] + ([ECX] * s) + disp8	[ESP] + ([ECX] * s) + disp8	[EBP] + ([ECX] * s) + disp8	[ESI] + ([ECX] * s) + disp8	[EDI] + ([ECX] * s) + disp8
	b010	[EAX] + ([EDX] * s) + disp8	[ECX] + ([EDX] * s) + disp8	[EDX] + ([EDX] * s) + disp8	[EBX] + ([EDX] * s) + disp8	[ESP] + ([EDX] * s) + disp8	[EBP] + ([EDX] * s) + disp8	[ESI] + ([EDX] * s) + disp8	[EDI] + ([EDX] * s) + disp8
	b011	[EAX] + ([EBX] * s) + disp8	[ECX] + ([EBX] * s) + disp8	[EDX] + ([EBX] * s) + disp8	[EBX] + ([EBX] * s) + disp8	[ESP] + ([EBX] * s) + disp8	[EBP] + ([EBX] * s) + disp8	[ESI] + ([EBX] * s) + disp8	[EDI] + ([EBX] * s) + disp8
	b100	[EAX] + disp8	[ECX] + disp8	[EDX] + disp8	[EBX] + disp8	[ESP] + disp8	[EBP] + disp8	[ESI] + disp8	[EDI] + disp8
	b101	[EAX] + ([EBP] * s) + disp8	[ECX] + ([EBP] * s) + disp8	[EDX] + ([EBP] * s) + disp8	[EBX] + ([EBP] * s) + disp8	[ESP] + ([EBP] * s) + disp8	[EBP] + ([EBP] * s) + disp8	[ESI] + ([EBP] * s) + disp8	[EDI] + ([EBP] * s) + disp8
	b110	[EAX] + ([ESI] * s) + disp8	[ECX] + ([ESI] * s) + disp8	[EDX] + ([ESI] * s) + disp8	[EBX] + ([ESI] * s) + disp8	[ESP] + ([ESI] * s) + disp8	[EBP] + ([ESI] * s) + disp8	[ESI] + ([ESI] * s) + disp8	[EDI] + ([ESI] * s) + disp8
	b111	[EAX] + ([EDI] * s) + disp8	[ECX] + ([EDI] * s) + disp8	[EDX] + ([EDI] * s) + disp8	[EBX] + ([EDI] * s) + disp8	[ESP] + ([EDI] * s) + disp8	[EBP] + ([EDI] * s) + disp8	[ESI] + ([EDI] * s) + disp8	[EDI] + ([EDI] * s) + disp8

b10	b000	[EAX] + ([EAX] * s) + disp32	[ECX] + ([EAX] * s) + disp32	[EDX] + ([EAX] * s) + disp32	[EBX] + ([EAX] * s) + disp32	[ESP] + ([EAX] * s) + disp32	[EBP] + ([EAX] * s) + disp32	[ESI] + ([EAX] * s) + disp32	[EDI] + ([EAX] * s) + disp32
	b001	[EAX] + ([ECX] * s) + disp32	[ECX] + ([ECX] * s) + disp32	[EDX] + ([ECX] * s) + disp32	[EBX] + ([ECX] * s) + disp32	[ESP] + ([ECX] * s) + disp32	[EBP] + ([ECX] * s) + disp32	[ESI] + ([ECX] * s) + disp32	[EDI] + ([ECX] * s) + disp32
	b010	[EAX] + ([EDX] * s) + disp32	[ECX] + ([EDX] * s) + disp32	[EDX] + ([EDX] * s) + disp32	[EBX] + ([EDX] * s) + disp32	[ESP] + ([EDX] * s) + disp32	[EBP] + ([EDX] * s) + disp32	[ESI] + ([EDX] * s) + disp32	[EDI] + ([EDX] * s) + disp32
	b011	[EAX] + ([EBX] * s) + disp32	[ECX] + ([EBX] * s) + disp32	[EDX] + ([EBX] * s) + disp32	[EBX] + ([EBX] * s) + disp32	[ESP] + ([EBX] * s) + disp32	[EBP] + ([EBX] * s) + disp32	[ESI] + ([EBX] * s) + disp32	[EDI] + ([EBX] * s) + disp32
	b100	[EAX] + disp32	[ECX] + disp32	[EDX] + disp32	[EBX] + disp32	[ESP] + disp32	[EBP] + disp32	[ESI] + disp32	[EDI] + disp32
	b101	[EAX] + ([EBP] * s) + disp32	[ECX] + ([EBP] * s) + disp32	[EDX] + ([EBP] * s) + disp32	[EBX] + ([EBP] * s) + disp32	[ESP] + ([EBP] * s) + disp32	[EBP] + ([EBP] * s) + disp32	[ESI] + ([EBP] * s) + disp32	[EDI] + ([EBP] * s) + disp32
	b110	[EAX] + ([ESI] * s) + disp32	[ECX] + ([ESI] * s) + disp32	[EDX] + ([ESI] * s) + disp32	[EBX] + ([ESI] * s) + disp32	[ESP] + ([ESI] * s) + disp32	[EBP] + ([ESI] * s) + disp32	[ESI] + ([ESI] * s) + disp32	[EDI] + ([ESI] * s) + disp32
	b111	[EAX] + ([EDI] * s) + disp32	[ECX] + ([EDI] * s) + disp32	[EDX] + ([EDI] * s) + disp32	[EBX] + ([EDI] * s) + disp32	[ESP] + ([EDI] * s) + disp32	[EBP] + ([EDI] * s) + disp32	[ESI] + ([EDI] * s) + disp32	[EDI] + ([EDI] * s) + disp32

64-bit addressing

{Stub}

Displacement

The displacement value, if any, follows the ModR/M and SIB bytes discussed above. When the ModR/M or SIB tables or the instruction's documentation state that a disp8 value is required, this means a 1 byte displacement. Also, disp16 is a two byte displacement and disp32 a four byte displacement.

Immediate

Some instructions require an immediate value. The instruction (and the operand-size column in the above table) determine the length of the immediate value. The imm8 mnemonic (or 8-bit operand-size) means a one byte immediate value, imm16(or 16-bit operand-size) means a two byte immediate value, imm32 (or 32-bit operand-size) a four byte value and imm64 (or 64-bit operand-size) an eight byte value.

@@ Line 165: / Line 165: @@
 === ModR/M ===
+The ModR/M byte encodes a register or an opcode extension, and a register or a memory address. It is only required when ''/r'' or ''/0'' through ''/7'' is specified at the opcode syntax. It has the following fields:
-The ModR/M byte has the following fields:
+<pre>
-* Mod (2 bits)
+7  6  5  4  3  2  1  0
-* Reg/Opcode (3 bits)
++--+--+--+--+--+--+--+--+
-* R/M (3 bits)
+| mod |  reg   |   rm   |
++--+--+--+--+--+--+--+--+
+</pre>
+* mod (2 bits)
+* reg (3 bits)
+* rm (3 bits)
-The ''Mod'' and ''R/M'' fields are explained in the table. The ''Reg/Opcode'' field can have one of two values:
+The ''mod'' and ''rm'' fields are explained in the table. The ''reg'' field can have one of two values:
-* A 3-bit opcode extension, which is used by some instructions but has no futher meaning other than distincting the instruction from other instructions.
+* A 3-bit opcode extension, which is used by some instructions but has no futher meaning other than distincting the instruction from other instructions. When ''/0'' through ''/7'' is specified at the opcode syntax, use that value as the value for the ''reg'' field.
-* A 3-bit register reference, which can be used as the source or the destination of an instruction (depending on the instruction). Which register is meant depends on the [[#Operand-size and address-size override prefix|operand-size]] of the instruction and the instruction itself:
+* A 3-bit register reference, which can be used as the source or the destination of an instruction (depending on the instruction). When this is the case, ''/r'' is specified at the opcode syntax. Which register is meant depends on the [[#Operand-size and address-size override prefix|operand-size]] of the instruction and the instruction itself:
 ** b000 - AL, AX, EAX, MM0 or XMM0
 ** b001 - CL, CX, ECX, MM1 or XMM1

X86-64 Instruction Encoding: Difference between revisions

Revision as of 23:30, 22 July 2008

Contents

General Overview

Legacy Prefixes

LOCK prefix

REPNE/REPNZ, REP and REPE/REPZ prefixes

CS, SS, DS, ES, FS and GS segment override prefixes

Branch taken/not taken prefixes

Operand-size and address-size override prefix

REX prefix

REX.W

REX.R

REX.X

REX.B

Opcode

ModR/M and SIB bytes

ModR/M

SIB

16-bit addressing

32-bit addressing

32-bit SIB byte

64-bit addressing

Displacement

Immediate

See Also

External References

Navigation menu

X86-64 Instruction Encoding: Difference between revisions

Revision as of 23:30, 22 July 2008

General Overview

Legacy Prefixes

LOCK prefix

REPNE/REPNZ, REP and REPE/REPZ prefixes

CS, SS, DS, ES, FS and GS segment override prefixes

Branch taken/not taken prefixes

Operand-size and address-size override prefix

REX prefix

REX.W

REX.R

REX.X

REX.B

Opcode

ModR/M and SIB bytes

ModR/M

SIB

16-bit addressing

32-bit addressing

32-bit SIB byte

64-bit addressing

Displacement

Immediate

See Also

External References

Navigation menu

Search