Setting Up Long Mode: Difference between revisions
[unchecked revision] | [unchecked revision] |
m (→Articles) |
m (Bot: Replace deprecated source tag with syntaxhighlight) |
||
(41 intermediate revisions by 18 users not shown) | |||
Line 17: | Line 17: | ||
=== Detection of CPUID === |
=== Detection of CPUID === |
||
Basically, detecting whether CPUID is supported by your processor is covered [[CPUID|here]], but we will show how to do it here. CPUID is supported when the ID-bit in the FLAGS-register can be flipped. So |
Basically, detecting whether CPUID is supported by your processor is covered [[CPUID|here]], but we will show how to do it here. CPUID is supported when the ID-bit in the FLAGS-register can be flipped. So let's try that, then: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Check if CPUID is supported by attempting to flip the ID bit (bit 21) in |
|||
; Store the FLAGS-register. |
|||
; the FLAGS register. If we can flip it, CPUID is available. |
|||
; Copy FLAGS in to EAX via stack |
|||
pushfd |
pushfd |
||
; Restore the A-register. |
|||
pop eax |
pop eax |
||
; Copy to ECX as well for comparing later on |
|||
; Set the C-register to the A-register. |
|||
mov ecx, eax |
mov ecx, eax |
||
; Flip the ID |
; Flip the ID bit |
||
xor eax, |
xor eax, 1 << 21 |
||
; |
; Copy EAX to FLAGS via the stack |
||
push eax |
push eax |
||
; Restore the FLAGS-register. |
|||
popfd |
popfd |
||
; Copy FLAGS back to EAX (with the flipped bit if CPUID is supported) |
|||
; Store the FLAGS-register. |
|||
pushfd |
pushfd |
||
; Restore the A-register. |
|||
pop eax |
pop eax |
||
; Restore FLAGS from the old version stored in ECX (i.e. flipping the ID bit |
|||
; Store the C-register. |
|||
; back if it was ever flipped). |
|||
push ecx |
push ecx |
||
; Restore the FLAGS-register. |
|||
popfd |
popfd |
||
; Compare EAX and ECX. If they are equal then that means the bit wasn't |
|||
; Do a XOR-operation on the A-register and the C-register. |
|||
; flipped, and CPUID isn't supported. |
|||
xor eax, ecx |
xor eax, ecx |
||
; The zero flag is set, no CPUID. |
|||
jz .NoCPUID |
jz .NoCPUID |
||
ret |
|||
</syntaxhighlight> |
|||
; CPUID is available for use. |
|||
</source> |
|||
=== x86 or x86-64 === |
=== x86 or x86-64 === |
||
Line 63: | Line 57: | ||
Now that CPUID is available we have to check whether long mode can be used or not. Long mode can only be detected using the extended functions of CPUID (> 0x80000000), so we have to check if the function that determines whether long mode is available or not is actually available: |
Now that CPUID is available we have to check whether long mode can be used or not. Long mode can only be detected using the extended functions of CPUID (> 0x80000000), so we have to check if the function that determines whether long mode is available or not is actually available: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the A-register to 0x80000000. |
mov eax, 0x80000000 ; Set the A-register to 0x80000000. |
||
cpuid ; CPU identification. |
|||
mov eax, 0x80000000 |
|||
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001. |
|||
jb .NoLongMode ; It is less, there is no long mode. |
|||
; CPU identification. |
|||
</syntaxhighlight> |
|||
cpuid |
|||
; Compare the A-register with 0x80000001. |
|||
cmp eax, 0x80000001 |
|||
; It is less, there is no long mode. |
|||
jb .NoLongMode |
|||
</source> |
|||
Now that we know that extended function is available we can use it to detect long mode: |
Now that we know that extended function is available we can use it to detect long mode: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the A-register to 0x80000001. |
mov eax, 0x80000001 ; Set the A-register to 0x80000001. |
||
cpuid ; CPU identification. |
|||
mov eax, 0x80000001 |
|||
test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register. |
|||
jz .NoLongMode ; They aren't, there is no long mode. |
|||
; CPU identification. |
|||
</syntaxhighlight> |
|||
cpuid |
|||
; Test if the LM-bit, which is the 29th bit, is set in the D-register. |
|||
test edx, 00100000000000000000000000000000b |
|||
; They aren't, there is no long mode. |
|||
jz .NoLongMode |
|||
</source> |
|||
Now that we know if long mode is actually supported by the processor, we can actually use it. |
Now that we know if long mode is actually supported by the processor, we can actually use it. |
||
Line 97: | Line 77: | ||
== Entering Long Mode == |
== Entering Long Mode == |
||
Entering long mode can be |
Entering long mode can be done from both real mode and protected mode, however only protected mode is covered in the Intel and AMD64 manuals. Early AMD documentation explains this process works from real mode as well. |
||
Before anything, it is recommended that you enable the [[A20 Line]]; otherwise only odd MiBs can be accessed. |
|||
=== From Real Mode === |
|||
Basically I'm going to assume that the [[A20]] is already enabled by you (or the BIOS) and that paging is still disabled as well as protected mode. |
|||
=== From Protected Mode === |
|||
I'm going to assume that the [[A20]] is already enabled by your (or the BIOS) and that paging is either enabled or disabled and that protected mode is set up with a 32-bit [[GDT]]. |
|||
=== Setting up the Paging === |
=== Setting up the Paging === |
||
Line 111: | Line 85: | ||
Before we actually cover up the new paging used in x86-64 we should disable the old paging first (you can skip this if you never set up paging in protected mode) as this is required. |
Before we actually cover up the new paging used in x86-64 we should disable the old paging first (you can skip this if you never set up paging in protected mode) as this is required. |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the A-register to control register 0. |
mov eax, cr0 ; Set the A-register to control register 0. |
||
and eax, 01111111111111111111111111111111b ; Clear the PG-bit, which is bit 31. |
|||
mov eax, cr0 |
|||
mov cr0, eax ; Set control register 0 to the A-register. |
|||
</syntaxhighlight> |
|||
; Clear the PG-bit, which is the 32nd bit. |
|||
and eax, 01111111111111111111111111111111b |
|||
; Set control register 0 to the A-register. |
|||
mov cr0, eax |
|||
</source> |
|||
Now that paging is disabled, we can actually take a look at how paging is set up in x86-64 (It's recommended to read Chapter 5.3 of the AMD64 Architecture Programmer's Manual, Volume 2). First of all, long mode uses PAE paging and therefore you have the page-directory pointer table (PDPT), the page-directory table (PDT) and the page table (PT). There's also another table which now forms the root (instead of the PDPT or the PDT) and that is page-map level-4 table (PML4T). In protected mode a page table entry was only four bytes long, so you had 1024 entries per table. In long mode, however, you only have 512 entries per table as each entry is eight bytes long. This means that one entry in a PT can address 4kB, one entry in a PDT can address 2MB, one entry in a PDPT can address 1GB and one entry in a PML4T can address 512GB. This means that only 256TB can be addressed. The way these tables work is that each entry in the PML4T point to a PDPT, each entry in a PDPT to a PDT and each entry in a PDT to a PT. Each entry in a PT then points to the physical address, that is, if it is marked as present. So how does the MMU/processor know which physical address should be used with which virtual address? Basically each table has 512 entries ranging from 0 to 511. These entries each refer to a specific domain of memory. Like index 0 of the PML4T refers to the first 512GB in virtual memory, index 1 refers to the next 512GB and so on. The same applies to the PDPT, PDT and the PT (except with smaller sizes; 1GB, 2MB and 4kB as seen above). The last gigabyte of virtual memory would be described in the table referred to by 511th index of the PDPT which is referred to by the 511th index of the PML4T or in psuedo-C: |
Now that paging is disabled, we can actually take a look at how paging is set up in x86-64 (It's recommended to read Chapter 5.3 of the AMD64 Architecture Programmer's Manual, Volume 2). First of all, long mode uses PAE paging and therefore you have the page-directory pointer table (PDPT), the page-directory table (PDT) and the page table (PT). There's also another table which now forms the root (instead of the PDPT or the PDT) and that is page-map level-4 table (PML4T). In protected mode a page table entry was only four bytes long, so you had 1024 entries per table. In long mode, however, you only have 512 entries per table as each entry is eight bytes long. This means that one entry in a PT can address 4kB, one entry in a PDT can address 2MB, one entry in a PDPT can address 1GB and one entry in a PML4T can address 512GB. This means that only 256TB can be addressed. The way these tables work is that each entry in the PML4T point to a PDPT, each entry in a PDPT to a PDT and each entry in a PDT to a PT. Each entry in a PT then points to the physical address, that is, if it is marked as present. So how does the MMU/processor know which physical address should be used with which virtual address? Basically each table has 512 entries ranging from 0 to 511. These entries each refer to a specific domain of memory. Like index 0 of the PML4T refers to the first 512GB in virtual memory, index 1 refers to the next 512GB and so on. The same applies to the PDPT, PDT and the PT (except with smaller sizes; 1GB, 2MB and 4kB as seen above). The last gigabyte of virtual memory would be described in the table referred to by 511th index of the PDPT which is referred to by the 511th index of the PML4T or in psuedo-C: |
||
< |
<syntaxhighlight lang="c"> |
||
pagedir_t* PDT = PML4[511]->PDPT[511]; |
pagedir_t* PDT = PML4[511]->PDPT[511]; |
||
</syntaxhighlight> |
|||
</source> |
|||
Basically, what pages you want to set up and how you want them to be set up is up to you, but I'd identity map the first megabyte and then map some high memory to the memory after the first megabyte |
Basically, what pages you want to set up and how you want them to be set up is up to you, but I'd identity map the first megabyte and then map some high memory to the memory after the first megabyte, however, this may be pretty difficult to set up first. So let's identity map the first two megabytes. We'll set up four tables at 0x1000 (assuming that this is free to use): a PML4T, a PDPT, a PDT and a PT. Basically we want to identity map the first two megabytes so: |
||
*PML4T[0] -> PDPT. |
*PML4T[0] -> PDPT. |
||
Line 136: | Line 105: | ||
First we will clear the tables: |
First we will clear the tables: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the destination index to 0x1000. |
mov edi, 0x1000 ; Set the destination index to 0x1000. |
||
mov cr3, edi ; Set control register 3 to the destination index. |
|||
mov di, 0x1000 |
|||
xor eax, eax ; Nullify the A-register. |
|||
; |
mov ecx, 4096 ; Set the C-register to 4096. |
||
rep stosd ; Clear the memory. |
|||
xor ax, ax |
|||
mov edi, cr3 ; Set the destination index to control register 3. |
|||
</syntaxhighlight> |
|||
; Set the C-register to 16384. |
|||
mov cx, 16384 |
|||
; Clear the memory. |
|||
rep stosb |
|||
</source> |
|||
Now that the page are clear we're going to set up the tables, the page tables are going to be located at these addresses: |
Now that the page are clear we're going to set up the tables, the page tables are going to be located at these addresses: |
||
Line 158: | Line 122: | ||
So lets make PML4T[0] point to the PDPT and so on: |
So lets make PML4T[0] point to the PDPT and so on: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the |
mov DWORD [edi], 0x2003 ; Set the uint32_t at the destination index to 0x2003. |
||
add edi, 0x1000 ; Add 0x1000 to the destination index. |
|||
mov WORD [di], 0x2003 |
|||
mov DWORD [edi], 0x3003 ; Set the uint32_t at the destination index to 0x3003. |
|||
; Add 0x1000 to the destination index. |
add edi, 0x1000 ; Add 0x1000 to the destination index. |
||
mov DWORD [edi], 0x4003 ; Set the uint32_t at the destination index to 0x4003. |
|||
add di, 0x1000 |
|||
add edi, 0x1000 ; Add 0x1000 to the destination index. |
|||
</syntaxhighlight> |
|||
; Set the word at the destination index to 0x3003. |
|||
mov WORD [di], 0x3003 |
|||
; Add 0x1000 to the destination index. |
|||
add di, 0x1000 |
|||
; Set the word at the destination index to 0x4003. |
|||
mov WORD [di], 0x4003 |
|||
; Add 0x1000 to the destination index. |
|||
add di, 0x1000 |
|||
</source> |
|||
If you haven't noticed already, I used a three. This simply means that the first two bits should be set. These bits indicate that the page is present and that it is readable as well as writable. Now all that's left to do is identity map the first two megabytes: |
If you haven't noticed already, I used a three. This simply means that the first two bits should be set. These bits indicate that the page is present and that it is readable as well as writable. Now all that's left to do is identity map the first two megabytes: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the |
mov ebx, 0x00000003 ; Set the B-register to 0x00000003. |
||
mov ecx, 512 ; Set the C-register to 512. |
|||
add di, 0x4000 |
|||
; Set the B-register to 0x00000003. |
|||
mov ebx, 0x00000003 |
|||
; Set the C-register to 512. |
|||
mov cx, 512 |
|||
.SetEntry: |
.SetEntry: |
||
; Set the |
mov DWORD [edi], ebx ; Set the uint32_t at the destination index to the B-register. |
||
add ebx, 0x1000 ; Add 0x1000 to the B-register. |
|||
mov DWORD [di], ebx |
|||
add edi, 8 ; Add eight to the destination index. |
|||
loop .SetEntry ; Set the next entry. |
|||
</syntaxhighlight> |
|||
add ebx, 0x1000 |
|||
; Add eight to the destination index. |
|||
add di, 8 |
|||
; Set the next entry. |
|||
loop .SetEntry |
|||
</source> |
|||
Now we should enable PAE-paging by setting the PAE-bit in the fourth control register: |
Now we should enable PAE-paging by setting the PAE-bit in the fourth control register: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the A-register to control register 4. |
mov eax, cr4 ; Set the A-register to control register 4. |
||
or eax, 1 << 5 ; Set the PAE-bit, which is the 6th bit (bit 5). |
|||
mov eax, cr4 |
|||
mov cr4, eax ; Set control register 4 to the A-register. |
|||
</syntaxhighlight> |
|||
; Set the PAE-bit, which is the 6th bit. |
|||
or eax, 00000000000000000000000000100000b |
|||
; Set control register 4 to the A-register. |
|||
mov cr4, eax |
|||
</source> |
|||
And we should set the third control register to the PML4T: |
|||
<source lang="asm"> |
|||
; Set the A-register to 0x00004000. |
|||
or eax, 0x00004000 |
|||
; Set control register 3 to the A-register. |
|||
mov cr3, eax |
|||
</source> |
|||
Now paging is set up, but it isn't enabled yet. |
Now paging is set up, but it isn't enabled yet. |
||
=== Future of x86-64 - the PML5 === |
|||
In November 2016, Intel released a white paper [https://cdrdv2-public.intel.com/671442/5-level-paging-white-paper.pdf] about 5-level paging, and started supporting it with Ice Lake processors in 2019. These processors support a 128 PiB address space, and furthermore up to 4 PiB of physical memory (far above the 4-level 256 TiB/64 TiB limits). Support for this is detected as follows: |
|||
<syntaxhighlight lang="asm"> |
|||
mov eax, 0x7 ; You might want to check for page 7 first! |
|||
xor ecx, ecx |
|||
cpuid |
|||
test ecx, (1<<16) |
|||
jnz .5_level_paging |
|||
</syntaxhighlight> |
|||
The paging structures are identical to the 4 level versions, there's just an added layer of indirection. Your recursive address (if you use recursive mappings) will change. |
|||
5 level paging is enabled if CR4.LA57=1, and EFER.LMA=1. |
|||
<syntaxhighlight lang="asm"> |
|||
BITS 32 |
|||
mov eax, cr4 |
|||
or eax, (1<<12) ;CR4.LA57 |
|||
mov cr4, eax |
|||
</syntaxhighlight> |
|||
Note that attempting to set CR4.LA57 while EFER.LMA=1 causes a #GP general protection fault. You therefore need to drop into protected mode or set up 5 level paging before entering long mode in the first place. |
|||
=== The Switch from Real Mode === |
=== The Switch from Real Mode === |
||
Line 232: | Line 177: | ||
So we first set the LM-bit: |
So we first set the LM-bit: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the C-register to 0xC0000080, which is the EFER MSR. |
mov ecx, 0xC0000080 ; Set the C-register to 0xC0000080, which is the EFER MSR. |
||
rdmsr ; Read from the model-specific register. |
|||
mov ecx, 0xC0000080 |
|||
or eax, 1 << 8 ; Set the LM-bit which is the 9th bit (bit 8). |
|||
; |
wrmsr ; Write to the model-specific register. |
||
</syntaxhighlight> |
|||
rdmsr |
|||
; Set the LM-bit which is the 9th bit. |
|||
or eax, 00000000000000000000000100000000b |
|||
; Write to the model-specific register. |
|||
wrmsr |
|||
</source> |
|||
Enabling paging and protected mode: |
Enabling paging and protected mode: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the A-register to control register 0. |
mov eax, cr0 ; Set the A-register to control register 0. |
||
or eax, 1 << 31 | 1 << 0 ; Set the PG-bit, which is the 31nd bit, and the PM-bit, which is the 0th bit. |
|||
mov eax, cr0 |
|||
mov cr0, eax ; Set control register 0 to the A-register. |
|||
</syntaxhighlight> |
|||
; Set the PG-bit, which is the 32nd bit, and the PM-bit, which is the 1st bit. |
|||
or eax, 10000000000000000000000000000001b |
|||
; Set control register 0 to the A-register. |
|||
mov cr0, eax |
|||
</source> |
|||
Now we're in compatibility mode. |
Now we're in compatibility mode. |
||
Line 265: | Line 198: | ||
So we first set the LM-bit: |
So we first set the LM-bit: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the C-register to 0xC0000080, which is the EFER MSR. |
mov ecx, 0xC0000080 ; Set the C-register to 0xC0000080, which is the EFER MSR. |
||
rdmsr ; Read from the model-specific register. |
|||
mov ecx, 0xC0000080 |
|||
or eax, 1 << 8 ; Set the LM-bit which is the 9th bit (bit 8). |
|||
; |
wrmsr ; Write to the model-specific register. |
||
</syntaxhighlight> |
|||
rdmsr |
|||
; Set the LM-bit which is the 9th bit. |
|||
or eax, 00000000000000000000000100000000b |
|||
; Write to the model-specific register. |
|||
wrmsr |
|||
</source> |
|||
Enabling paging: |
Enabling paging: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Set the A-register to control register 0. |
mov eax, cr0 ; Set the A-register to control register 0. |
||
or eax, 1 << 31 ; Set the PG-bit, which is the 32nd bit (bit 31). |
|||
mov eax, cr0 |
|||
mov cr0, eax ; Set control register 0 to the A-register. |
|||
</syntaxhighlight> |
|||
; Set the PG-bit, which is the 32nd bit. |
|||
or eax, 10000000000000000000000000000000b |
|||
; Set control register 0 to the A-register. |
|||
mov cr0, eax |
|||
</source> |
|||
Now we're in compatibility mode. |
Now we're in compatibility mode. |
||
Line 295: | Line 216: | ||
== Entering the 64-bit Submode == |
== Entering the 64-bit Submode == |
||
Now that we're in long mode, there's one issue left: we are in the |
Now that we're in long mode, there's one issue left: we are in the 32-bit compatibility submode and we actually wanted to enter 64-bit long mode. This isn't a hard thing to do. We should load just load a GDT with the 64-bit flags set in the code and data selectors. |
||
Our GDT (see chapter 4.8.1 and 4.8.2 of the AMD64 Architecture Programmer's Manual Volume 2) should look like this: |
Our GDT (see chapter 4.8.1 and 4.8.2 of the AMD64 Architecture Programmer's Manual Volume 2) should look like this: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Access bits |
|||
; Global Descriptor Table (64-bit). |
|||
PRESENT equ 1 << 7 |
|||
GDT64: |
|||
NOT_SYS equ 1 << 4 |
|||
EXEC equ 1 << 3 |
|||
DC equ 1 << 2 |
|||
RW equ 1 << 1 |
|||
; Limit (low). |
|||
ACCESSED equ 1 << 0 |
|||
; Flags bits |
|||
; Base (low). |
|||
GRAN_4K equ 1 << 7 |
|||
dw 0 |
|||
SZ_32 equ 1 << 6 |
|||
LONG_MODE equ 1 << 5 |
|||
db 0 |
|||
GDT: |
|||
.Null: equ $ - GDT |
|||
; Access. |
|||
dq 0 |
|||
.Code: equ $ - GDT |
|||
dd 0xFFFF ; Limit & Base (low, bits 0-15) |
|||
; Granularity. |
|||
db 0 ; Base (mid, bits 16-23) |
|||
db 0 |
|||
db PRESENT | NOT_SYS | EXEC | RW ; Access |
|||
db GRAN_4K | LONG_MODE | 0xF ; Flags & Limit (high, bits 16-19) |
|||
; Base (high). |
|||
db 0 ; Base (high, bits 24-31) |
|||
db 0 |
|||
.Data: equ $ - GDT |
|||
dd 0xFFFF ; Limit & Base (low, bits 0-15) |
|||
; The code descriptor. |
|||
db 0 ; Base (mid, bits 16-23) |
|||
.Code equ $ - GDT64 |
|||
db PRESENT | NOT_SYS | RW ; Access |
|||
db GRAN_4K | SZ_32 | 0xF ; Flags & Limit (high, bits 16-19) |
|||
; Limit (low). |
|||
db 0 ; Base (high, bits 24-31) |
|||
dw 0 |
|||
.TSS: equ $ - GDT |
|||
dd 0x00000068 |
|||
dd 0x00CF8900 |
|||
; Base (middle). |
|||
db 0 |
|||
; Access. |
|||
db 10011000b |
|||
; Granularity. |
|||
db 00100000b |
|||
; Base (high). |
|||
db 0 |
|||
; The data descriptor. |
|||
.Data equ $ - GDT64 |
|||
; Limit (low). |
|||
dw 0 |
|||
; Base (low). |
|||
dw 0 |
|||
; Base (middle). |
|||
db 0 |
|||
; Access. |
|||
db 10010000b |
|||
; Granularity. |
|||
db 00000000b |
|||
; Base (high). |
|||
db 0 |
|||
; The GDT-pointer. |
|||
.Pointer: |
.Pointer: |
||
dw $ - GDT - 1 |
|||
dq GDT |
|||
</syntaxhighlight> |
|||
dw $ - GDT64 - 1 |
|||
; Base. |
|||
dq GDT64 |
|||
</source> |
|||
Now the only thing left to do is load it and make the jump to 64-bit: |
Notice that we set a 4gb limit for code. This is needed because the processor will make a last limit check before the jump, and having a limit of 0 will cause a #GP (tested in bochs). After that, the limit will be ignored. Now the only thing left to do is load it and make the jump to 64-bit: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Load the 64-bit global descriptor table. |
lgdt [GDT64.Pointer] ; Load the 64-bit global descriptor table. |
||
jmp GDT64.Code:Realm64 ; Set the code segment and enter 64-bit long mode. |
|||
lgdt [GDT64.Pointer] |
|||
</syntaxhighlight> |
|||
; Set the code segment and enter 64-bit long mode. |
|||
jmp GDT64.Code:Realm64 |
|||
</source> |
|||
=== Sample === |
=== Sample === |
||
Line 387: | Line 266: | ||
Now that we're in 64-bit, we want to do something that is actually 64-bit. In this sample I'm just going with an ordinary clear the screen: |
Now that we're in 64-bit, we want to do something that is actually 64-bit. In this sample I'm just going with an ordinary clear the screen: |
||
< |
<syntaxhighlight lang="asm"> |
||
; Use 64-bit. |
; Use 64-bit. |
||
[BITS 64] |
[BITS 64] |
||
Realm64: |
Realm64: |
||
; Clear the interrupt flag. |
cli ; Clear the interrupt flag. |
||
mov ax, GDT64.Data ; Set the A-register to the data descriptor. |
|||
cli |
|||
mov ds, ax ; Set the data segment to the A-register. |
|||
; Set the |
mov es, ax ; Set the extra segment to the A-register. |
||
mov fs, ax ; Set the F-segment to the A-register. |
|||
mov ax, GDT64.Data |
|||
mov gs, ax ; Set the G-segment to the A-register. |
|||
; Set the |
mov ss, ax ; Set the stack segment to the A-register. |
||
mov edi, 0xB8000 ; Set the destination index to 0xB8000. |
|||
mov ds, ax |
|||
mov rax, 0x1F201F201F201F20 ; Set the A-register to 0x1F201F201F201F20. |
|||
mov ecx, 500 ; Set the C-register to 500. |
|||
rep stosq ; Clear the screen. |
|||
mov es, ax |
|||
hlt ; Halt the processor. |
|||
</syntaxhighlight> |
|||
; Set the F-segment to the A-register. |
|||
mov fs, ax |
|||
; Set the G-segment to the A-register. |
|||
mov gs, ax |
|||
; Set the destination index 0x00000000000B8000. |
|||
mov rdi, 0x00000000000B8000 |
|||
; Set the A-register to 0x1F201F201F201F20. |
|||
mov rax, 0x1F201F201F201F20 |
|||
; Set the C-register to 500. |
|||
mov rcx, 500 |
|||
; Clear the screen. |
|||
rep movsq |
|||
; Halt the processor. |
|||
hlt |
|||
</source> |
|||
It is very important |
It is very important that you don't enable the interrupts (unless you have set up a 64-bit IDT of course). |
||
== |
== See also == |
||
=== Articles === |
|||
* [[EM64T|Intel EM64T]] |
* [[EM64T|Intel EM64T]] |
||
* [[X86-64]] |
* [[X86-64]] |
||
* [[Creating a 64-bit kernel]] |
* [[Creating a 64-bit kernel]] |
||
* [[Entering Long Mode Directly]] |
|||
=== Threads === |
=== Threads === |
||
Line 438: | Line 299: | ||
* [http://forum.osdev.org/viewtopic.php?f=1&t=21772 Setting up the stack after the switch to long mode] on the stack segment note. |
* [http://forum.osdev.org/viewtopic.php?f=1&t=21772 Setting up the stack after the switch to long mode] on the stack segment note. |
||
== Wikipedia == |
=== Wikipedia === |
||
* [[Wikipedia:AMD64|AMD64]] |
* [[Wikipedia:AMD64|AMD64]] |
||
* [[Wikipedia:64-bit|64-bit]] |
* [[Wikipedia:64-bit|64-bit]] |
||
[[Category:X86-64]] |
|||
[[Category:Operating Modes]] |
Latest revision as of 05:46, 9 June 2024
Overview
- Covering long mode.
- How to detect long mode (Recommended read: CPUID).
- How to set up paging for long mode (Recommended reads: Setting up Paging and Setting up Paging using PAE).
- How to enter long mode.
- How to set up the GDT for long mode (Recommended read: GDT).
Introduction
What is long mode and why set it up? Since the introduction of the x86-64 processors (AMD64, Intel 64 (a.k.a. EM64T), VIA Nano) a new mode has been introduced as well, which is called long mode. Long mode basically consists out of two sub modes which are the actual 64-bit mode and compatibility mode (32-bit, usually referred to as IA32e in the AMD64 manuals). What we are interested in is simply the 64-bit mode as this mode provides a lot of new features such as: registers being extended to 64-bit (rax, rcx, rdx, rbx, rsp, rbp, rip, etc.) and the introduction of eight new general-purpose registers (r8 - r15), but also the introduction of eight new multimedia registers (xmm8 - xmm15). 64-bit mode is basically a new world as it is almost completely void of the segmentation that was used on the 8086-processors and the GDT, the IDT, paging, etc. are also kind of different compared to the old 32-bit mode (a.k.a. protected mode).
Detecting the Presence of Long Mode
There are only three processor vendors so far who have made processors that are capable of entering and using long mode, they're: AMD, Intel and VIA. Basically Intel tried to get 64-bit processors on the market with EM64T, but failed to do so and now they use AMD's x86-64 architecture instead, which means using 64-bit on an Intel processor is (almost) identical to using 64-bit on an AMD processor (and VIA should be identical as well). We can detect the presence of long mode by using the CPUID-instruction.
Detection of CPUID
Basically, detecting whether CPUID is supported by your processor is covered here, but we will show how to do it here. CPUID is supported when the ID-bit in the FLAGS-register can be flipped. So let's try that, then:
; Check if CPUID is supported by attempting to flip the ID bit (bit 21) in
; the FLAGS register. If we can flip it, CPUID is available.
; Copy FLAGS in to EAX via stack
pushfd
pop eax
; Copy to ECX as well for comparing later on
mov ecx, eax
; Flip the ID bit
xor eax, 1 << 21
; Copy EAX to FLAGS via the stack
push eax
popfd
; Copy FLAGS back to EAX (with the flipped bit if CPUID is supported)
pushfd
pop eax
; Restore FLAGS from the old version stored in ECX (i.e. flipping the ID bit
; back if it was ever flipped).
push ecx
popfd
; Compare EAX and ECX. If they are equal then that means the bit wasn't
; flipped, and CPUID isn't supported.
xor eax, ecx
jz .NoCPUID
ret
x86 or x86-64
Now that CPUID is available we have to check whether long mode can be used or not. Long mode can only be detected using the extended functions of CPUID (> 0x80000000), so we have to check if the function that determines whether long mode is available or not is actually available:
mov eax, 0x80000000 ; Set the A-register to 0x80000000.
cpuid ; CPU identification.
cmp eax, 0x80000001 ; Compare the A-register with 0x80000001.
jb .NoLongMode ; It is less, there is no long mode.
Now that we know that extended function is available we can use it to detect long mode:
mov eax, 0x80000001 ; Set the A-register to 0x80000001.
cpuid ; CPU identification.
test edx, 1 << 29 ; Test if the LM-bit, which is bit 29, is set in the D-register.
jz .NoLongMode ; They aren't, there is no long mode.
Now that we know if long mode is actually supported by the processor, we can actually use it.
Entering Long Mode
Entering long mode can be done from both real mode and protected mode, however only protected mode is covered in the Intel and AMD64 manuals. Early AMD documentation explains this process works from real mode as well.
Before anything, it is recommended that you enable the A20 Line; otherwise only odd MiBs can be accessed.
Setting up the Paging
Before we actually cover up the new paging used in x86-64 we should disable the old paging first (you can skip this if you never set up paging in protected mode) as this is required.
mov eax, cr0 ; Set the A-register to control register 0.
and eax, 01111111111111111111111111111111b ; Clear the PG-bit, which is bit 31.
mov cr0, eax ; Set control register 0 to the A-register.
Now that paging is disabled, we can actually take a look at how paging is set up in x86-64 (It's recommended to read Chapter 5.3 of the AMD64 Architecture Programmer's Manual, Volume 2). First of all, long mode uses PAE paging and therefore you have the page-directory pointer table (PDPT), the page-directory table (PDT) and the page table (PT). There's also another table which now forms the root (instead of the PDPT or the PDT) and that is page-map level-4 table (PML4T). In protected mode a page table entry was only four bytes long, so you had 1024 entries per table. In long mode, however, you only have 512 entries per table as each entry is eight bytes long. This means that one entry in a PT can address 4kB, one entry in a PDT can address 2MB, one entry in a PDPT can address 1GB and one entry in a PML4T can address 512GB. This means that only 256TB can be addressed. The way these tables work is that each entry in the PML4T point to a PDPT, each entry in a PDPT to a PDT and each entry in a PDT to a PT. Each entry in a PT then points to the physical address, that is, if it is marked as present. So how does the MMU/processor know which physical address should be used with which virtual address? Basically each table has 512 entries ranging from 0 to 511. These entries each refer to a specific domain of memory. Like index 0 of the PML4T refers to the first 512GB in virtual memory, index 1 refers to the next 512GB and so on. The same applies to the PDPT, PDT and the PT (except with smaller sizes; 1GB, 2MB and 4kB as seen above). The last gigabyte of virtual memory would be described in the table referred to by 511th index of the PDPT which is referred to by the 511th index of the PML4T or in psuedo-C:
pagedir_t* PDT = PML4[511]->PDPT[511];
Basically, what pages you want to set up and how you want them to be set up is up to you, but I'd identity map the first megabyte and then map some high memory to the memory after the first megabyte, however, this may be pretty difficult to set up first. So let's identity map the first two megabytes. We'll set up four tables at 0x1000 (assuming that this is free to use): a PML4T, a PDPT, a PDT and a PT. Basically we want to identity map the first two megabytes so:
- PML4T[0] -> PDPT.
- PDPT[0] -> PDT.
- PDT[0] -> PT.
- PT -> 0x00000000 - 0x00200000.
First we will clear the tables:
mov edi, 0x1000 ; Set the destination index to 0x1000.
mov cr3, edi ; Set control register 3 to the destination index.
xor eax, eax ; Nullify the A-register.
mov ecx, 4096 ; Set the C-register to 4096.
rep stosd ; Clear the memory.
mov edi, cr3 ; Set the destination index to control register 3.
Now that the page are clear we're going to set up the tables, the page tables are going to be located at these addresses:
- PML4T - 0x1000.
- PDPT - 0x2000.
- PDT - 0x3000.
- PT - 0x4000.
So lets make PML4T[0] point to the PDPT and so on:
mov DWORD [edi], 0x2003 ; Set the uint32_t at the destination index to 0x2003.
add edi, 0x1000 ; Add 0x1000 to the destination index.
mov DWORD [edi], 0x3003 ; Set the uint32_t at the destination index to 0x3003.
add edi, 0x1000 ; Add 0x1000 to the destination index.
mov DWORD [edi], 0x4003 ; Set the uint32_t at the destination index to 0x4003.
add edi, 0x1000 ; Add 0x1000 to the destination index.
If you haven't noticed already, I used a three. This simply means that the first two bits should be set. These bits indicate that the page is present and that it is readable as well as writable. Now all that's left to do is identity map the first two megabytes:
mov ebx, 0x00000003 ; Set the B-register to 0x00000003.
mov ecx, 512 ; Set the C-register to 512.
.SetEntry:
mov DWORD [edi], ebx ; Set the uint32_t at the destination index to the B-register.
add ebx, 0x1000 ; Add 0x1000 to the B-register.
add edi, 8 ; Add eight to the destination index.
loop .SetEntry ; Set the next entry.
Now we should enable PAE-paging by setting the PAE-bit in the fourth control register:
mov eax, cr4 ; Set the A-register to control register 4.
or eax, 1 << 5 ; Set the PAE-bit, which is the 6th bit (bit 5).
mov cr4, eax ; Set control register 4 to the A-register.
Now paging is set up, but it isn't enabled yet.
Future of x86-64 - the PML5
In November 2016, Intel released a white paper [1] about 5-level paging, and started supporting it with Ice Lake processors in 2019. These processors support a 128 PiB address space, and furthermore up to 4 PiB of physical memory (far above the 4-level 256 TiB/64 TiB limits). Support for this is detected as follows:
mov eax, 0x7 ; You might want to check for page 7 first!
xor ecx, ecx
cpuid
test ecx, (1<<16)
jnz .5_level_paging
The paging structures are identical to the 4 level versions, there's just an added layer of indirection. Your recursive address (if you use recursive mappings) will change. 5 level paging is enabled if CR4.LA57=1, and EFER.LMA=1.
BITS 32
mov eax, cr4
or eax, (1<<12) ;CR4.LA57
mov cr4, eax
Note that attempting to set CR4.LA57 while EFER.LMA=1 causes a #GP general protection fault. You therefore need to drop into protected mode or set up 5 level paging before entering long mode in the first place.
The Switch from Real Mode
There's not much left to do. We should set the long mode bit in the EFER MSR and then we should enable paging and protected mode and then we are in compatibility mode (which is part of long mode).
So we first set the LM-bit:
mov ecx, 0xC0000080 ; Set the C-register to 0xC0000080, which is the EFER MSR.
rdmsr ; Read from the model-specific register.
or eax, 1 << 8 ; Set the LM-bit which is the 9th bit (bit 8).
wrmsr ; Write to the model-specific register.
Enabling paging and protected mode:
mov eax, cr0 ; Set the A-register to control register 0.
or eax, 1 << 31 | 1 << 0 ; Set the PG-bit, which is the 31nd bit, and the PM-bit, which is the 0th bit.
mov cr0, eax ; Set control register 0 to the A-register.
Now we're in compatibility mode.
The Switch from Protected Mode
There's not much left to do. We should set the long mode bit in the EFER MSR and then we should enable paging and then we are in compatibility mode (which is part of long mode).
So we first set the LM-bit:
mov ecx, 0xC0000080 ; Set the C-register to 0xC0000080, which is the EFER MSR.
rdmsr ; Read from the model-specific register.
or eax, 1 << 8 ; Set the LM-bit which is the 9th bit (bit 8).
wrmsr ; Write to the model-specific register.
Enabling paging:
mov eax, cr0 ; Set the A-register to control register 0.
or eax, 1 << 31 ; Set the PG-bit, which is the 32nd bit (bit 31).
mov cr0, eax ; Set control register 0 to the A-register.
Now we're in compatibility mode.
Entering the 64-bit Submode
Now that we're in long mode, there's one issue left: we are in the 32-bit compatibility submode and we actually wanted to enter 64-bit long mode. This isn't a hard thing to do. We should load just load a GDT with the 64-bit flags set in the code and data selectors.
Our GDT (see chapter 4.8.1 and 4.8.2 of the AMD64 Architecture Programmer's Manual Volume 2) should look like this:
; Access bits
PRESENT equ 1 << 7
NOT_SYS equ 1 << 4
EXEC equ 1 << 3
DC equ 1 << 2
RW equ 1 << 1
ACCESSED equ 1 << 0
; Flags bits
GRAN_4K equ 1 << 7
SZ_32 equ 1 << 6
LONG_MODE equ 1 << 5
GDT:
.Null: equ $ - GDT
dq 0
.Code: equ $ - GDT
dd 0xFFFF ; Limit & Base (low, bits 0-15)
db 0 ; Base (mid, bits 16-23)
db PRESENT | NOT_SYS | EXEC | RW ; Access
db GRAN_4K | LONG_MODE | 0xF ; Flags & Limit (high, bits 16-19)
db 0 ; Base (high, bits 24-31)
.Data: equ $ - GDT
dd 0xFFFF ; Limit & Base (low, bits 0-15)
db 0 ; Base (mid, bits 16-23)
db PRESENT | NOT_SYS | RW ; Access
db GRAN_4K | SZ_32 | 0xF ; Flags & Limit (high, bits 16-19)
db 0 ; Base (high, bits 24-31)
.TSS: equ $ - GDT
dd 0x00000068
dd 0x00CF8900
.Pointer:
dw $ - GDT - 1
dq GDT
Notice that we set a 4gb limit for code. This is needed because the processor will make a last limit check before the jump, and having a limit of 0 will cause a #GP (tested in bochs). After that, the limit will be ignored. Now the only thing left to do is load it and make the jump to 64-bit:
lgdt [GDT64.Pointer] ; Load the 64-bit global descriptor table.
jmp GDT64.Code:Realm64 ; Set the code segment and enter 64-bit long mode.
Sample
Now that we're in 64-bit, we want to do something that is actually 64-bit. In this sample I'm just going with an ordinary clear the screen:
; Use 64-bit.
[BITS 64]
Realm64:
cli ; Clear the interrupt flag.
mov ax, GDT64.Data ; Set the A-register to the data descriptor.
mov ds, ax ; Set the data segment to the A-register.
mov es, ax ; Set the extra segment to the A-register.
mov fs, ax ; Set the F-segment to the A-register.
mov gs, ax ; Set the G-segment to the A-register.
mov ss, ax ; Set the stack segment to the A-register.
mov edi, 0xB8000 ; Set the destination index to 0xB8000.
mov rax, 0x1F201F201F201F20 ; Set the A-register to 0x1F201F201F201F20.
mov ecx, 500 ; Set the C-register to 500.
rep stosq ; Clear the screen.
hlt ; Halt the processor.
It is very important that you don't enable the interrupts (unless you have set up a 64-bit IDT of course).
See also
Articles
Threads
- Wrote a tutorial covering long mode about this article.
- Loading a higher-half kernel on setting up long mode for a higher-half kernel.
- Setting up the stack after the switch to long mode on the stack segment note.