Welcome

Hi, and welcome to this small tutorial. It is assembled from the code, that i have written for my own hobby OS. As a prerequisite i recommend that you make yourself familiar with the AT&T x86 assembler syntax used by the GNU assembler (which i will be using here). I will try to explain everything i do in very much detail, but still - it's not trivial :)

QEMU

As a side note, i want to use this to advertise qemu as a development utility (no i'm not affiliated with 'em ;)). I use it very much, and it is really, really useful. For example you can investigate the current page tables of your kernel right after loading CR3, which prove very usefull when writing this code. Also having a symbolic debugger (although "symbolic" is not so much usefull in assembler ;p) is the bare minimum you should be equipped with when going on the kernel development route...

Additionally, you don't have to worry about boot loaders and such, as long as your kernel is multiboot compliant.

Kernel Structure explained

I will give a rough outline of the kernel (if you can call my 3-filer a kernel ;p) as i will explain it here:

boot.S: This is the main meat of the startup code. it will contain code to:
- setup a initial stack.
- setup initial paging structures (the kernel PD, and two PTs)
- identity map the low 1MB, and all of the kernel.
- map all the kernel to the higher half
- enable paging
- call the C kernel
- write "PANIC!" on the screen in nice white-on-red letters
  (this is why you read this, right?)
link.ld: This will contain the instructions for the linker.

Also, i plan to provide the original files i use in my kernel, so you can have a closer look at them. When reading through them, you will mention, that i have (basic) C++ support in them. You either don't have to worry about it, or can use it to extend your own kernel in this direction.

Thinking about it

At first, i want you to think a little about what we will need, and what we want to accomplish.

We want to have a kernel, which runs (not loads!) at a high address. At such a high address, that it may be, that there is not so much physical memory, that this address is available. So we need virtual memory (or a segmentation trick, described in Higher Half With GDT). We will use paging, to map our kernel to a high address after it has been loaded to a low address (which should be always available physically).

For this to work correctly, the code in our kernel needs to know about this. Most of the kernel will have to be linked to a high address, so we don't need to worry about this anymore, once we're over the bootstrap phase. The initial bootstrap code, in contrast, needs to be linked to a low address, or be completely position independent.

For the paging stuff, we will need to have at least a PD (Page Directory) for the kernel, and two PTs (Page Tables). Why two, you might ask. The reason is, we need one PT for the low addresses (the lower 1MB, and the kernel in lower half), and one for the addresses in the higher half. If your kernel grows larger and larger, it might even require more PTs, and you will for sure need to deal with PDs and PTs a lot when it comes to processes, where probably each process will have it's own PD and associated PTs.

boot.S

Now for the real code. i will go through it, and explain every section as thorough as possible.

Global symbols

We will have only one single global symbol for this file, and thats the entry point of the kernel. So the file starts with the following line:

.global bootstrap_ia32

The required .data stuff

We will need some room in our kernel to do the things we want to. However, we don't (yet) have means to dynamically allocate things, and we don't want to worry too much, so we will simple reserve all the required space in the executable itself. this way, the linker will take care of the kernels bounds, and the bootloader (or qemu in our case), will take care about loading, and checking for enough room, etc., etc.

Use the .data section

First, we'll tell the assembler, that we want to put the following stuff in the .data section of the output file:

.section .data

Make room for the PD and PTs

Then, we need room for the PD and PTs. I'll reserve some space for them like this:

# the kernel page directory, lowest page table (for the low 1MB identity 
# mapping in kernel space) and the kernel page table, used to map the
# kernel itself from physical 0x100000 to 0xC0000000).

.align 0x1000
_kernel_pd:
   .space 0x1000, 0x00
_kernel_pt:
   .space 0x1000, 0x00
_kernel_low_pt:
   .space 0x1000, 0x00

The above tells the assembler to pad the current position in the output file until it is aligned to 4K (0x1000 -> 4096). The we reserver three times 4K of space for the PD and PTs. These symbols will end up in the .data section of the kernel, and thus will be there as soon as the bootloader loaded the kernel. No need to worry about allocating those from "somewhere".

Make room for the Stack

Another thing we will need from the very start, is the stack. we won't have time to defer it's creation until we can allocate memory, so we need to put it in here too. Of course you can make it way smaller if you like; Linux uses 16K or 8K if you tell it to do so.

# a whopping 64K of initial stack space.
.set INITSTACKSIZE, 0x10000
initstack:
   .space INITSTACKSIZE, 0x00

Strings

The last thing in the .data section are our strings we will be using from within the assembler code. There are not many of them, just the "PANIC!" i promised above :)

_msg_panic:
   .asciz "PANIC!"

User:Mduft/HigherHalf Kernel with 32-bit Paging

Contents