Setting Up Long Mode: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(41 intermediate revisions by 18 users not shown)
Line 17: Line 17:
=== Detection of CPUID ===
=== Detection of CPUID ===


Basically, detecting whether CPUID is supported by your processor is covered [[CPUID|here]], but we will show how to do it here. CPUID is supported when the ID-bit in the FLAGS-register can be flipped. So lets try that, then:
Basically, detecting whether CPUID is supported by your processor is covered [[CPUID|here]], but we will show how to do it here. CPUID is supported when the ID-bit in the FLAGS-register can be flipped. So let's try that, then:


<source lang="asm">
<syntaxhighlight lang="asm">
; Check if CPUID is supported by attempting to flip the ID bit (bit 21) in
; Store the FLAGS-register.
; the FLAGS register. If we can flip it, CPUID is available.

; Copy FLAGS in to EAX via stack
pushfd
pushfd
; Restore the A-register.
pop eax
pop eax

; Copy to ECX as well for comparing later on
; Set the C-register to the A-register.
mov ecx, eax
mov ecx, eax

; Flip the ID-bit, which is the 21st bit.
; Flip the ID bit
xor eax, 00000000001000000000000000000000b
xor eax, 1 << 21

; Store the A-register.
; Copy EAX to FLAGS via the stack
push eax
push eax
; Restore the FLAGS-register.
popfd
popfd

; Copy FLAGS back to EAX (with the flipped bit if CPUID is supported)
; Store the FLAGS-register.
pushfd
pushfd
; Restore the A-register.
pop eax
pop eax

; Restore FLAGS from the old version stored in ECX (i.e. flipping the ID bit
; Store the C-register.
; back if it was ever flipped).
push ecx
push ecx
; Restore the FLAGS-register.
popfd
popfd

; Compare EAX and ECX. If they are equal then that means the bit wasn't
; Do a XOR-operation on the A-register and the C-register.
; flipped, and CPUID isn't supported.
xor eax, ecx
xor eax, ecx
; The zero flag is set, no CPUID.
jz .NoCPUID
jz .NoCPUID
ret
</syntaxhighlight>
; CPUID is available for use.
</source>


=== x86 or x86-64 ===
=== x86 or x86-64 ===
Line 63: Line 57:
Now that CPUID is available we have to check whether long mode can be used or not. Long mode can only be detected using the extended functions of CPUID (> 0x80000000), so we have to check if the function that determines whether long mode is available or not is actually available:
Now that CPUID is available we have to check whether long mode can be used or not. Long mode can only be detected using the extended functions of CPUID (> 0x80000000), so we have to check if the function that determines whether long mode is available or not is actually available:


<source lang="asm">
<syntaxhighlight lang="asm">
; Set the A-register to 0x80000000.
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
mov eax, 0x80000000
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .NoLongMode ; It is less, there is no long mode.
; CPU identification.
</syntaxhighlight>
cpuid
; Compare the A-register with 0x80000001.
cmp eax, 0x80000001
; It is less, there is no long mode.
jb .NoLongMode
</source>


Now that we know that extended function is available we can use it to detect long mode:
Now that we know that extended function is available we can use it to detect long mode:


<source lang="asm">
<syntaxhighlight lang="asm">
; Set the A-register to 0x80000001.
mov eax, 0x80000001 ; Set the A-register to 0x80000001.
cpuid ; CPU identification.
mov eax, 0x80000001
test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register.
jz .NoLongMode ; They aren't, there is no long mode.
; CPU identification.
</syntaxhighlight>
cpuid
; Test if the LM-bit, which is the 29th bit, is set in the D-register.
test edx, 00100000000000000000000000000000b
; They aren't, there is no long mode.
jz .NoLongMode
</source>


Now that we know if long mode is actually supported by the processor, we can actually use it.
Now that we know if long mode is actually supported by the processor, we can actually use it.
Line 97: Line 77:
== Entering Long Mode ==
== Entering Long Mode ==


Entering long mode can be both done from real mode and protected mode, however only protected mode is covered in the AMD64 manuals and we got real mode to long mode thanks to Brendan
Entering long mode can be done from both real mode and protected mode, however only protected mode is covered in the Intel and AMD64 manuals. Early AMD documentation explains this process works from real mode as well.


Before anything, it is recommended that you enable the [[A20 Line]]; otherwise only odd MiBs can be accessed.
=== From Real Mode ===

Basically I'm going to assume that the [[A20]] is already enabled by you (or the BIOS) and that paging is still disabled as well as protected mode.

=== From Protected Mode ===

I'm going to assume that the [[A20]] is already enabled by your (or the BIOS) and that paging is either enabled or disabled and that protected mode is set up with a 32-bit [[GDT]].


=== Setting up the Paging ===
=== Setting up the Paging ===
Line 111: Line 85:
Before we actually cover up the new paging used in x86-64 we should disable the old paging first (you can skip this if you never set up paging in protected mode) as this is required.
Before we actually cover up the new paging used in x86-64 we should disable the old paging first (you can skip this if you never set up paging in protected mode) as this is required.


<source lang="asm">
<syntaxhighlight lang="asm">
; Set the A-register to control register 0.
mov eax, cr0 ; Set the A-register to control register 0.
and eax, 01111111111111111111111111111111b ; Clear the PG-bit, which is bit 31.
mov eax, cr0
mov cr0, eax ; Set control register 0 to the A-register.
</syntaxhighlight>
; Clear the PG-bit, which is the 32nd bit.
and eax, 01111111111111111111111111111111b
; Set control register 0 to the A-register.
mov cr0, eax
</source>


Now that paging is disabled, we can actually take a look at how paging is set up in x86-64 (It's recommended to read Chapter 5.3 of the AMD64 Architecture Programmer's Manual, Volume 2). First of all, long mode uses PAE paging and therefore you have the page-directory pointer table (PDPT), the page-directory table (PDT) and the page table (PT). There's also another table which now forms the root (instead of the PDPT or the PDT) and that is page-map level-4 table (PML4T). In protected mode a page table entry was only four bytes long, so you had 1024 entries per table. In long mode, however, you only have 512 entries per table as each entry is eight bytes long. This means that one entry in a PT can address 4kB, one entry in a PDT can address 2MB, one entry in a PDPT can address 1GB and one entry in a PML4T can address 512GB. This means that only 256TB can be addressed. The way these tables work is that each entry in the PML4T point to a PDPT, each entry in a PDPT to a PDT and each entry in a PDT to a PT. Each entry in a PT then points to the physical address, that is, if it is marked as present. So how does the MMU/processor know which physical address should be used with which virtual address? Basically each table has 512 entries ranging from 0 to 511. These entries each refer to a specific domain of memory. Like index 0 of the PML4T refers to the first 512GB in virtual memory, index 1 refers to the next 512GB and so on. The same applies to the PDPT, PDT and the PT (except with smaller sizes; 1GB, 2MB and 4kB as seen above). The last gigabyte of virtual memory would be described in the table referred to by 511th index of the PDPT which is referred to by the 511th index of the PML4T or in psuedo-C:
Now that paging is disabled, we can actually take a look at how paging is set up in x86-64 (It's recommended to read Chapter 5.3 of the AMD64 Architecture Programmer's Manual, Volume 2). First of all, long mode uses PAE paging and therefore you have the page-directory pointer table (PDPT), the page-directory table (PDT) and the page table (PT). There's also another table which now forms the root (instead of the PDPT or the PDT) and that is page-map level-4 table (PML4T). In protected mode a page table entry was only four bytes long, so you had 1024 entries per table. In long mode, however, you only have 512 entries per table as each entry is eight bytes long. This means that one entry in a PT can address 4kB, one entry in a PDT can address 2MB, one entry in a PDPT can address 1GB and one entry in a PML4T can address 512GB. This means that only 256TB can be addressed. The way these tables work is that each entry in the PML4T point to a PDPT, each entry in a PDPT to a PDT and each entry in a PDT to a PT. Each entry in a PT then points to the physical address, that is, if it is marked as present. So how does the MMU/processor know which physical address should be used with which virtual address? Basically each table has 512 entries ranging from 0 to 511. These entries each refer to a specific domain of memory. Like index 0 of the PML4T refers to the first 512GB in virtual memory, index 1 refers to the next 512GB and so on. The same applies to the PDPT, PDT and the PT (except with smaller sizes; 1GB, 2MB and 4kB as seen above). The last gigabyte of virtual memory would be described in the table referred to by 511th index of the PDPT which is referred to by the 511th index of the PML4T or in psuedo-C:


<source lang="c">
<syntaxhighlight lang="c">
pagedir_t* PDT = PML4[511]->PDPT[511];
pagedir_t* PDT = PML4[511]->PDPT[511];
</syntaxhighlight>
</source>


Basically, what pages you want to set up and how you want them to be set up is up to you, but I'd identity map the first megabyte and then map some high memory to the memory after the first megabyte (a.k.a. 0x100000 - ???), however, this may be pretty difficult to set up first. So lets identity map the first two megabytes. We'll set up four tables at 0x1000 (assuming that this is free to use): a PML4T, a PDPT, a PDT and a PT. Basically we want to identity map the first two megabytes so:
Basically, what pages you want to set up and how you want them to be set up is up to you, but I'd identity map the first megabyte and then map some high memory to the memory after the first megabyte, however, this may be pretty difficult to set up first. So let's identity map the first two megabytes. We'll set up four tables at 0x1000 (assuming that this is free to use): a PML4T, a PDPT, a PDT and a PT. Basically we want to identity map the first two megabytes so:


*PML4T[0] -> PDPT.
*PML4T[0] -> PDPT.
Line 136: Line 105:


First we will clear the tables:
First we will clear the tables:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the destination index to 0x1000.
mov edi, 0x1000 ; Set the destination index to 0x1000.
mov cr3, edi ; Set control register 3 to the destination index.
mov di, 0x1000
xor eax, eax ; Nullify the A-register.
; Nullify the A-register.
mov ecx, 4096 ; Set the C-register to 4096.
rep stosd ; Clear the memory.
xor ax, ax
mov edi, cr3 ; Set the destination index to control register 3.
</syntaxhighlight>
; Set the C-register to 16384.
mov cx, 16384
; Clear the memory.
rep stosb
</source>


Now that the page are clear we're going to set up the tables, the page tables are going to be located at these addresses:
Now that the page are clear we're going to set up the tables, the page tables are going to be located at these addresses:
Line 158: Line 122:


So lets make PML4T[0] point to the PDPT and so on:
So lets make PML4T[0] point to the PDPT and so on:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the word at the destination index to 0x2003.
mov DWORD [edi], 0x2003 ; Set the uint32_t at the destination index to 0x2003.
add edi, 0x1000 ; Add 0x1000 to the destination index.
mov WORD [di], 0x2003
mov DWORD [edi], 0x3003 ; Set the uint32_t at the destination index to 0x3003.
; Add 0x1000 to the destination index.
add edi, 0x1000 ; Add 0x1000 to the destination index.
mov DWORD [edi], 0x4003 ; Set the uint32_t at the destination index to 0x4003.
add di, 0x1000
add edi, 0x1000 ; Add 0x1000 to the destination index.
</syntaxhighlight>
; Set the word at the destination index to 0x3003.
mov WORD [di], 0x3003
; Add 0x1000 to the destination index.
add di, 0x1000
; Set the word at the destination index to 0x4003.
mov WORD [di], 0x4003
; Add 0x1000 to the destination index.
add di, 0x1000
</source>


If you haven't noticed already, I used a three. This simply means that the first two bits should be set. These bits indicate that the page is present and that it is readable as well as writable. Now all that's left to do is identity map the first two megabytes:
If you haven't noticed already, I used a three. This simply means that the first two bits should be set. These bits indicate that the page is present and that it is readable as well as writable. Now all that's left to do is identity map the first two megabytes:


<source lang="asm">
<syntaxhighlight lang="asm">
; Set the destination index to 0x4000.
mov ebx, 0x00000003 ; Set the B-register to 0x00000003.
mov ecx, 512 ; Set the C-register to 512.
add di, 0x4000
; Set the B-register to 0x00000003.
mov ebx, 0x00000003
; Set the C-register to 512.
mov cx, 512
.SetEntry:
.SetEntry:
; Set the double word at the destination index to the B-register.
mov DWORD [edi], ebx ; Set the uint32_t at the destination index to the B-register.
add ebx, 0x1000 ; Add 0x1000 to the B-register.
mov DWORD [di], ebx
add edi, 8 ; Add eight to the destination index.
; Add 0x1000 to the B-register.
loop .SetEntry ; Set the next entry.
</syntaxhighlight>
add ebx, 0x1000
; Add eight to the destination index.
add di, 8
; Set the next entry.
loop .SetEntry
</source>


Now we should enable PAE-paging by setting the PAE-bit in the fourth control register:
Now we should enable PAE-paging by setting the PAE-bit in the fourth control register:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the A-register to control register 4.
mov eax, cr4 ; Set the A-register to control register 4.
or eax, 1 << 5 ; Set the PAE-bit, which is the 6th bit (bit 5).
mov eax, cr4
mov cr4, eax ; Set control register 4 to the A-register.
</syntaxhighlight>
; Set the PAE-bit, which is the 6th bit.
or eax, 00000000000000000000000000100000b
; Set control register 4 to the A-register.
mov cr4, eax
</source>

And we should set the third control register to the PML4T:
<source lang="asm">
; Set the A-register to 0x00004000.
or eax, 0x00004000
; Set control register 3 to the A-register.
mov cr3, eax
</source>


Now paging is set up, but it isn't enabled yet.
Now paging is set up, but it isn't enabled yet.

=== Future of x86-64 - the PML5 ===
In November 2016, Intel released a white paper [https://cdrdv2-public.intel.com/671442/5-level-paging-white-paper.pdf] about 5-level paging, and started supporting it with Ice Lake processors in 2019. These processors support a 128 PiB address space, and furthermore up to 4 PiB of physical memory (far above the 4-level 256 TiB/64 TiB limits). Support for this is detected as follows:
<syntaxhighlight lang="asm">
mov eax, 0x7 ; You might want to check for page 7 first!
xor ecx, ecx
cpuid
test ecx, (1<<16)
jnz .5_level_paging
</syntaxhighlight>
The paging structures are identical to the 4 level versions, there's just an added layer of indirection. Your recursive address (if you use recursive mappings) will change.
5 level paging is enabled if CR4.LA57=1, and EFER.LMA=1.
<syntaxhighlight lang="asm">
BITS 32
mov eax, cr4
or eax, (1<<12) ;CR4.LA57
mov cr4, eax
</syntaxhighlight>
Note that attempting to set CR4.LA57 while EFER.LMA=1 causes a #GP general protection fault. You therefore need to drop into protected mode or set up 5 level paging before entering long mode in the first place.


=== The Switch from Real Mode ===
=== The Switch from Real Mode ===
Line 232: Line 177:


So we first set the LM-bit:
So we first set the LM-bit:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the C-register to 0xC0000080, which is the EFER MSR.
mov ecx, 0xC0000080 ; Set the C-register to 0xC0000080, which is the EFER MSR.
rdmsr ; Read from the model-specific register.
mov ecx, 0xC0000080
or eax, 1 << 8 ; Set the LM-bit which is the 9th bit (bit 8).
; Read from the model-specific register.
wrmsr ; Write to the model-specific register.
</syntaxhighlight>
rdmsr
; Set the LM-bit which is the 9th bit.
or eax, 00000000000000000000000100000000b
; Write to the model-specific register.
wrmsr
</source>


Enabling paging and protected mode:
Enabling paging and protected mode:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the A-register to control register 0.
mov eax, cr0 ; Set the A-register to control register 0.
or eax, 1 << 31 | 1 << 0 ; Set the PG-bit, which is the 31nd bit, and the PM-bit, which is the 0th bit.
mov eax, cr0
mov cr0, eax ; Set control register 0 to the A-register.
</syntaxhighlight>
; Set the PG-bit, which is the 32nd bit, and the PM-bit, which is the 1st bit.
or eax, 10000000000000000000000000000001b
; Set control register 0 to the A-register.
mov cr0, eax
</source>


Now we're in compatibility mode.
Now we're in compatibility mode.
Line 265: Line 198:


So we first set the LM-bit:
So we first set the LM-bit:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the C-register to 0xC0000080, which is the EFER MSR.
mov ecx, 0xC0000080 ; Set the C-register to 0xC0000080, which is the EFER MSR.
rdmsr ; Read from the model-specific register.
mov ecx, 0xC0000080
or eax, 1 << 8 ; Set the LM-bit which is the 9th bit (bit 8).
; Read from the model-specific register.
wrmsr ; Write to the model-specific register.
</syntaxhighlight>
rdmsr
; Set the LM-bit which is the 9th bit.
or eax, 00000000000000000000000100000000b
; Write to the model-specific register.
wrmsr
</source>


Enabling paging:
Enabling paging:
<source lang="asm">
<syntaxhighlight lang="asm">
; Set the A-register to control register 0.
mov eax, cr0 ; Set the A-register to control register 0.
or eax, 1 << 31 ; Set the PG-bit, which is the 32nd bit (bit 31).
mov eax, cr0
mov cr0, eax ; Set control register 0 to the A-register.
</syntaxhighlight>
; Set the PG-bit, which is the 32nd bit.
or eax, 10000000000000000000000000000000b
; Set control register 0 to the A-register.
mov cr0, eax
</source>


Now we're in compatibility mode.
Now we're in compatibility mode.
Line 295: Line 216:
== Entering the 64-bit Submode ==
== Entering the 64-bit Submode ==


Now that we're in long mode, there's one issue left: we are in the IA32e submode and we actually wanted to enter 64-bit long mode. This isn't a hard thing to do. We should load just load a GDT with the 64-bit flags set in the code and data selectors.
Now that we're in long mode, there's one issue left: we are in the 32-bit compatibility submode and we actually wanted to enter 64-bit long mode. This isn't a hard thing to do. We should load just load a GDT with the 64-bit flags set in the code and data selectors.


Our GDT (see chapter 4.8.1 and 4.8.2 of the AMD64 Architecture Programmer's Manual Volume 2) should look like this:
Our GDT (see chapter 4.8.1 and 4.8.2 of the AMD64 Architecture Programmer's Manual Volume 2) should look like this:
<source lang="asm">
<syntaxhighlight lang="asm">
; Access bits
; Global Descriptor Table (64-bit).
PRESENT equ 1 << 7
GDT64:
; The null descriptor.
NOT_SYS equ 1 << 4
.Null equ $ - GDT64
EXEC equ 1 << 3
DC equ 1 << 2
RW equ 1 << 1
; Limit (low).
dw 0
ACCESSED equ 1 << 0

; Flags bits
; Base (low).
GRAN_4K equ 1 << 7
dw 0
SZ_32 equ 1 << 6
; Base (middle)
LONG_MODE equ 1 << 5

db 0
GDT:
.Null: equ $ - GDT
; Access.
db 0
dq 0
.Code: equ $ - GDT
dd 0xFFFF ; Limit & Base (low, bits 0-15)
; Granularity.
db 0 ; Base (mid, bits 16-23)
db 0
db PRESENT | NOT_SYS | EXEC | RW ; Access
db GRAN_4K | LONG_MODE | 0xF ; Flags & Limit (high, bits 16-19)
; Base (high).
db 0 ; Base (high, bits 24-31)
db 0
.Data: equ $ - GDT
dd 0xFFFF ; Limit & Base (low, bits 0-15)
; The code descriptor.
db 0 ; Base (mid, bits 16-23)
.Code equ $ - GDT64
db PRESENT | NOT_SYS | RW ; Access
db GRAN_4K | SZ_32 | 0xF ; Flags & Limit (high, bits 16-19)
; Limit (low).
db 0 ; Base (high, bits 24-31)
dw 0
.TSS: equ $ - GDT
; Base (low).
dd 0x00000068
dw 0
dd 0x00CF8900
; Base (middle).
db 0
; Access.
db 10011000b
; Granularity.
db 00100000b
; Base (high).
db 0
; The data descriptor.
.Data equ $ - GDT64
; Limit (low).
dw 0
; Base (low).
dw 0
; Base (middle).
db 0
; Access.
db 10010000b
; Granularity.
db 00000000b
; Base (high).
db 0
; The GDT-pointer.
.Pointer:
.Pointer:
dw $ - GDT - 1
; Limit.
dq GDT
</syntaxhighlight>
dw $ - GDT64 - 1
; Base.
dq GDT64
</source>


Now the only thing left to do is load it and make the jump to 64-bit:
Notice that we set a 4gb limit for code. This is needed because the processor will make a last limit check before the jump, and having a limit of 0 will cause a #GP (tested in bochs). After that, the limit will be ignored. Now the only thing left to do is load it and make the jump to 64-bit:
<source lang="asm">
<syntaxhighlight lang="asm">
; Load the 64-bit global descriptor table.
lgdt [GDT64.Pointer] ; Load the 64-bit global descriptor table.
jmp GDT64.Code:Realm64 ; Set the code segment and enter 64-bit long mode.
lgdt [GDT64.Pointer]
</syntaxhighlight>
; Set the code segment and enter 64-bit long mode.
jmp GDT64.Code:Realm64
</source>


=== Sample ===
=== Sample ===
Line 387: Line 266:
Now that we're in 64-bit, we want to do something that is actually 64-bit. In this sample I'm just going with an ordinary clear the screen:
Now that we're in 64-bit, we want to do something that is actually 64-bit. In this sample I'm just going with an ordinary clear the screen:


<source lang="asm">
<syntaxhighlight lang="asm">
; Use 64-bit.
; Use 64-bit.
[BITS 64]
[BITS 64]


Realm64:
Realm64:
; Clear the interrupt flag.
cli ; Clear the interrupt flag.
mov ax, GDT64.Data ; Set the A-register to the data descriptor.
cli
mov ds, ax ; Set the data segment to the A-register.
; Set the A-register to the data descriptor.
mov es, ax ; Set the extra segment to the A-register.
mov fs, ax ; Set the F-segment to the A-register.
mov ax, GDT64.Data
mov gs, ax ; Set the G-segment to the A-register.
; Set the data segment to the A-register.
mov ss, ax ; Set the stack segment to the A-register.
mov edi, 0xB8000 ; Set the destination index to 0xB8000.
mov ds, ax
mov rax, 0x1F201F201F201F20 ; Set the A-register to 0x1F201F201F201F20.
; Set the extra segment to the A-register.
mov ecx, 500 ; Set the C-register to 500.
rep stosq ; Clear the screen.
mov es, ax
hlt ; Halt the processor.
</syntaxhighlight>
; Set the F-segment to the A-register.
mov fs, ax
; Set the G-segment to the A-register.
mov gs, ax
; Set the destination index 0x00000000000B8000.
mov rdi, 0x00000000000B8000
; Set the A-register to 0x1F201F201F201F20.
mov rax, 0x1F201F201F201F20
; Set the C-register to 500.
mov rcx, 500
; Clear the screen.
rep movsq
; Halt the processor.
hlt
</source>


It is very important that you don't set the stack segment and that you don't enable the interrupts (unless you have set up a 64-bit IDT of course).
It is very important that you don't enable the interrupts (unless you have set up a 64-bit IDT of course).


== Articles ==
== See also ==
=== Articles ===
* [[EM64T|Intel EM64T]]
* [[EM64T|Intel EM64T]]
* [[X86-64]]
* [[X86-64]]
* [[Creating a 64-bit kernel]]
* [[Creating a 64-bit kernel]]
* [[Entering Long Mode Directly]]


=== Threads ===
=== Threads ===
Line 438: Line 299:
* [http://forum.osdev.org/viewtopic.php?f=1&t=21772 Setting up the stack after the switch to long mode] on the stack segment note.
* [http://forum.osdev.org/viewtopic.php?f=1&t=21772 Setting up the stack after the switch to long mode] on the stack segment note.


== Wikipedia ==
=== Wikipedia ===
* [[Wikipedia:AMD64|AMD64]]
* [[Wikipedia:AMD64|AMD64]]
* [[Wikipedia:64-bit|64-bit]]
* [[Wikipedia:64-bit|64-bit]]

[[Category:X86-64‏]]
[[Category:Operating Modes]]

Latest revision as of 05:46, 9 June 2024

Overview

  • Covering long mode.
  • How to detect long mode (Recommended read: CPUID).
  • How to set up paging for long mode (Recommended reads: Setting up Paging and Setting up Paging using PAE).
  • How to enter long mode.
  • How to set up the GDT for long mode (Recommended read: GDT).

Introduction

What is long mode and why set it up? Since the introduction of the x86-64 processors (AMD64, Intel 64 (a.k.a. EM64T), VIA Nano) a new mode has been introduced as well, which is called long mode. Long mode basically consists out of two sub modes which are the actual 64-bit mode and compatibility mode (32-bit, usually referred to as IA32e in the AMD64 manuals). What we are interested in is simply the 64-bit mode as this mode provides a lot of new features such as: registers being extended to 64-bit (rax, rcx, rdx, rbx, rsp, rbp, rip, etc.) and the introduction of eight new general-purpose registers (r8 - r15), but also the introduction of eight new multimedia registers (xmm8 - xmm15). 64-bit mode is basically a new world as it is almost completely void of the segmentation that was used on the 8086-processors and the GDT, the IDT, paging, etc. are also kind of different compared to the old 32-bit mode (a.k.a. protected mode).

Detecting the Presence of Long Mode

There are only three processor vendors so far who have made processors that are capable of entering and using long mode, they're: AMD, Intel and VIA. Basically Intel tried to get 64-bit processors on the market with EM64T, but failed to do so and now they use AMD's x86-64 architecture instead, which means using 64-bit on an Intel processor is (almost) identical to using 64-bit on an AMD processor (and VIA should be identical as well). We can detect the presence of long mode by using the CPUID-instruction.

Detection of CPUID

Basically, detecting whether CPUID is supported by your processor is covered here, but we will show how to do it here. CPUID is supported when the ID-bit in the FLAGS-register can be flipped. So let's try that, then:

    ; Check if CPUID is supported by attempting to flip the ID bit (bit 21) in
    ; the FLAGS register. If we can flip it, CPUID is available.

    ; Copy FLAGS in to EAX via stack
    pushfd
    pop eax

    ; Copy to ECX as well for comparing later on
    mov ecx, eax

    ; Flip the ID bit
    xor eax, 1 << 21

    ; Copy EAX to FLAGS via the stack
    push eax
    popfd

    ; Copy FLAGS back to EAX (with the flipped bit if CPUID is supported)
    pushfd
    pop eax

    ; Restore FLAGS from the old version stored in ECX (i.e. flipping the ID bit
    ; back if it was ever flipped).
    push ecx
    popfd

    ; Compare EAX and ECX. If they are equal then that means the bit wasn't
    ; flipped, and CPUID isn't supported.
    xor eax, ecx
    jz .NoCPUID
    ret

x86 or x86-64

Now that CPUID is available we have to check whether long mode can be used or not. Long mode can only be detected using the extended functions of CPUID (> 0x80000000), so we have to check if the function that determines whether long mode is available or not is actually available:

    mov eax, 0x80000000    ; Set the A-register to 0x80000000.
    cpuid                  ; CPU identification.
    cmp eax, 0x80000001    ; Compare the A-register with 0x80000001.
    jb .NoLongMode         ; It is less, there is no long mode.

Now that we know that extended function is available we can use it to detect long mode:

    mov eax, 0x80000001    ; Set the A-register to 0x80000001.
    cpuid                  ; CPU identification.
    test edx, 1 << 29      ; Test if the LM-bit, which is bit 29, is set in the D-register.
    jz .NoLongMode         ; They aren't, there is no long mode.

Now that we know if long mode is actually supported by the processor, we can actually use it.

Entering Long Mode

Entering long mode can be done from both real mode and protected mode, however only protected mode is covered in the Intel and AMD64 manuals. Early AMD documentation explains this process works from real mode as well.

Before anything, it is recommended that you enable the A20 Line; otherwise only odd MiBs can be accessed.

Setting up the Paging

Before we actually cover up the new paging used in x86-64 we should disable the old paging first (you can skip this if you never set up paging in protected mode) as this is required.

    mov eax, cr0                                   ; Set the A-register to control register 0.
    and eax, 01111111111111111111111111111111b     ; Clear the PG-bit, which is bit 31.
    mov cr0, eax                                   ; Set control register 0 to the A-register.

Now that paging is disabled, we can actually take a look at how paging is set up in x86-64 (It's recommended to read Chapter 5.3 of the AMD64 Architecture Programmer's Manual, Volume 2). First of all, long mode uses PAE paging and therefore you have the page-directory pointer table (PDPT), the page-directory table (PDT) and the page table (PT). There's also another table which now forms the root (instead of the PDPT or the PDT) and that is page-map level-4 table (PML4T). In protected mode a page table entry was only four bytes long, so you had 1024 entries per table. In long mode, however, you only have 512 entries per table as each entry is eight bytes long. This means that one entry in a PT can address 4kB, one entry in a PDT can address 2MB, one entry in a PDPT can address 1GB and one entry in a PML4T can address 512GB. This means that only 256TB can be addressed. The way these tables work is that each entry in the PML4T point to a PDPT, each entry in a PDPT to a PDT and each entry in a PDT to a PT. Each entry in a PT then points to the physical address, that is, if it is marked as present. So how does the MMU/processor know which physical address should be used with which virtual address? Basically each table has 512 entries ranging from 0 to 511. These entries each refer to a specific domain of memory. Like index 0 of the PML4T refers to the first 512GB in virtual memory, index 1 refers to the next 512GB and so on. The same applies to the PDPT, PDT and the PT (except with smaller sizes; 1GB, 2MB and 4kB as seen above). The last gigabyte of virtual memory would be described in the table referred to by 511th index of the PDPT which is referred to by the 511th index of the PML4T or in psuedo-C:

pagedir_t* PDT = PML4[511]->PDPT[511];

Basically, what pages you want to set up and how you want them to be set up is up to you, but I'd identity map the first megabyte and then map some high memory to the memory after the first megabyte, however, this may be pretty difficult to set up first. So let's identity map the first two megabytes. We'll set up four tables at 0x1000 (assuming that this is free to use): a PML4T, a PDPT, a PDT and a PT. Basically we want to identity map the first two megabytes so:

  • PML4T[0] -> PDPT.
  • PDPT[0] -> PDT.
  • PDT[0] -> PT.
  • PT -> 0x00000000 - 0x00200000.

First we will clear the tables:

    mov edi, 0x1000    ; Set the destination index to 0x1000.
    mov cr3, edi       ; Set control register 3 to the destination index.
    xor eax, eax       ; Nullify the A-register.
    mov ecx, 4096      ; Set the C-register to 4096.
    rep stosd          ; Clear the memory.
    mov edi, cr3       ; Set the destination index to control register 3.

Now that the page are clear we're going to set up the tables, the page tables are going to be located at these addresses:

  • PML4T - 0x1000.
  • PDPT - 0x2000.
  • PDT - 0x3000.
  • PT - 0x4000.

So lets make PML4T[0] point to the PDPT and so on:

    mov DWORD [edi], 0x2003      ; Set the uint32_t at the destination index to 0x2003.
    add edi, 0x1000              ; Add 0x1000 to the destination index.
    mov DWORD [edi], 0x3003      ; Set the uint32_t at the destination index to 0x3003.
    add edi, 0x1000              ; Add 0x1000 to the destination index.
    mov DWORD [edi], 0x4003      ; Set the uint32_t at the destination index to 0x4003.
    add edi, 0x1000              ; Add 0x1000 to the destination index.

If you haven't noticed already, I used a three. This simply means that the first two bits should be set. These bits indicate that the page is present and that it is readable as well as writable. Now all that's left to do is identity map the first two megabytes:

    mov ebx, 0x00000003          ; Set the B-register to 0x00000003.
    mov ecx, 512                 ; Set the C-register to 512.
    
.SetEntry:
    mov DWORD [edi], ebx         ; Set the uint32_t at the destination index to the B-register.
    add ebx, 0x1000              ; Add 0x1000 to the B-register.
    add edi, 8                   ; Add eight to the destination index.
    loop .SetEntry               ; Set the next entry.

Now we should enable PAE-paging by setting the PAE-bit in the fourth control register:

    mov eax, cr4                 ; Set the A-register to control register 4.
    or eax, 1 << 5               ; Set the PAE-bit, which is the 6th bit (bit 5).
    mov cr4, eax                 ; Set control register 4 to the A-register.

Now paging is set up, but it isn't enabled yet.

Future of x86-64 - the PML5

In November 2016, Intel released a white paper [1] about 5-level paging, and started supporting it with Ice Lake processors in 2019. These processors support a 128 PiB address space, and furthermore up to 4 PiB of physical memory (far above the 4-level 256 TiB/64 TiB limits). Support for this is detected as follows:

    mov eax, 0x7                 ; You might want to check for page 7 first!
    xor ecx, ecx
    cpuid
    test ecx, (1<<16)
    jnz .5_level_paging

The paging structures are identical to the 4 level versions, there's just an added layer of indirection. Your recursive address (if you use recursive mappings) will change. 5 level paging is enabled if CR4.LA57=1, and EFER.LMA=1.

BITS 32
mov eax, cr4
or eax, (1<<12) ;CR4.LA57
mov cr4, eax

Note that attempting to set CR4.LA57 while EFER.LMA=1 causes a #GP general protection fault. You therefore need to drop into protected mode or set up 5 level paging before entering long mode in the first place.

The Switch from Real Mode

There's not much left to do. We should set the long mode bit in the EFER MSR and then we should enable paging and protected mode and then we are in compatibility mode (which is part of long mode).

So we first set the LM-bit:

    mov ecx, 0xC0000080          ; Set the C-register to 0xC0000080, which is the EFER MSR.
    rdmsr                        ; Read from the model-specific register.
    or eax, 1 << 8               ; Set the LM-bit which is the 9th bit (bit 8).
    wrmsr                        ; Write to the model-specific register.

Enabling paging and protected mode:

    mov eax, cr0                 ; Set the A-register to control register 0.
    or eax, 1 << 31 | 1 << 0     ; Set the PG-bit, which is the 31nd bit, and the PM-bit, which is the 0th bit.
    mov cr0, eax                 ; Set control register 0 to the A-register.

Now we're in compatibility mode.

The Switch from Protected Mode

There's not much left to do. We should set the long mode bit in the EFER MSR and then we should enable paging and then we are in compatibility mode (which is part of long mode).

So we first set the LM-bit:

    mov ecx, 0xC0000080          ; Set the C-register to 0xC0000080, which is the EFER MSR.
    rdmsr                        ; Read from the model-specific register.
    or eax, 1 << 8               ; Set the LM-bit which is the 9th bit (bit 8).
    wrmsr                        ; Write to the model-specific register.

Enabling paging:

    mov eax, cr0                 ; Set the A-register to control register 0.
    or eax, 1 << 31              ; Set the PG-bit, which is the 32nd bit (bit 31).
    mov cr0, eax                 ; Set control register 0 to the A-register.

Now we're in compatibility mode.

Entering the 64-bit Submode

Now that we're in long mode, there's one issue left: we are in the 32-bit compatibility submode and we actually wanted to enter 64-bit long mode. This isn't a hard thing to do. We should load just load a GDT with the 64-bit flags set in the code and data selectors.

Our GDT (see chapter 4.8.1 and 4.8.2 of the AMD64 Architecture Programmer's Manual Volume 2) should look like this:

; Access bits
PRESENT        equ 1 << 7
NOT_SYS        equ 1 << 4
EXEC           equ 1 << 3
DC             equ 1 << 2
RW             equ 1 << 1
ACCESSED       equ 1 << 0

; Flags bits
GRAN_4K       equ 1 << 7
SZ_32         equ 1 << 6
LONG_MODE     equ 1 << 5

GDT:
    .Null: equ $ - GDT
        dq 0
    .Code: equ $ - GDT
        dd 0xFFFF                                   ; Limit & Base (low, bits 0-15)
        db 0                                        ; Base (mid, bits 16-23)
        db PRESENT | NOT_SYS | EXEC | RW            ; Access
        db GRAN_4K | LONG_MODE | 0xF                ; Flags & Limit (high, bits 16-19)
        db 0                                        ; Base (high, bits 24-31)
    .Data: equ $ - GDT
        dd 0xFFFF                                   ; Limit & Base (low, bits 0-15)
        db 0                                        ; Base (mid, bits 16-23)
        db PRESENT | NOT_SYS | RW                   ; Access
        db GRAN_4K | SZ_32 | 0xF                    ; Flags & Limit (high, bits 16-19)
        db 0                                        ; Base (high, bits 24-31)
    .TSS: equ $ - GDT
        dd 0x00000068
        dd 0x00CF8900
    .Pointer:
        dw $ - GDT - 1
        dq GDT

Notice that we set a 4gb limit for code. This is needed because the processor will make a last limit check before the jump, and having a limit of 0 will cause a #GP (tested in bochs). After that, the limit will be ignored. Now the only thing left to do is load it and make the jump to 64-bit:

    lgdt [GDT64.Pointer]         ; Load the 64-bit global descriptor table.
    jmp GDT64.Code:Realm64       ; Set the code segment and enter 64-bit long mode.

Sample

Now that we're in 64-bit, we want to do something that is actually 64-bit. In this sample I'm just going with an ordinary clear the screen:

; Use 64-bit.
[BITS 64]

Realm64:
    cli                           ; Clear the interrupt flag.
    mov ax, GDT64.Data            ; Set the A-register to the data descriptor.
    mov ds, ax                    ; Set the data segment to the A-register.
    mov es, ax                    ; Set the extra segment to the A-register.
    mov fs, ax                    ; Set the F-segment to the A-register.
    mov gs, ax                    ; Set the G-segment to the A-register.
    mov ss, ax                    ; Set the stack segment to the A-register.
    mov edi, 0xB8000              ; Set the destination index to 0xB8000.
    mov rax, 0x1F201F201F201F20   ; Set the A-register to 0x1F201F201F201F20.
    mov ecx, 500                  ; Set the C-register to 500.
    rep stosq                     ; Clear the screen.
    hlt                           ; Halt the processor.

It is very important that you don't enable the interrupts (unless you have set up a 64-bit IDT of course).

See also

Articles

Threads

Wikipedia