MBR (x86)

From OSDev.wiki
Revision as of 10:27, 6 August 2006 by osdev>SpooK (→‎Memory Segmentation: fixed ambiguous "10" (presumed hex) to "16" (decimal))
Jump to navigation Jump to search

"So we have complete control of the computer... what now???"

That question is usually the main topic in operating system design altogether. Most people that I know will try to approach the design and programming stage like reading a book, cover to cover and in a linear fashion. I have witnessed multitudes of operating system projects that seemed promising turn out to be nothing more than a fancy boot program because this type of design policy.


It is my suggestion that you take operating system design and programming as to writing a book, not reading one. When an author writes a book, they most likely worry about the contents first and the cover last. A good author will lay out the base of their book chapter by chapter, making note to finish each chapter first, then they will go back and "per-fect" the contents of each chapter so they flow. When the author is done, a cover is tacked on and the book is complete.


Now let's take that book writing process and apply it to operating system design. The designer starts with a base, which most likely will be the minimal code needed to successfully bring the computer to a stable running state, then they add more features and such to make the operating system more stable and user friendly. Once the designer is satisfied with how each part of the operating system interacts in general, they go back to do some code optimization and clean-ups (quite an eternal task). I think the idea behind this method was best stated with this quote, "First make it run, then make it run better." The very last step would be for the designer to program their own MBR program (boot loader), most people in fact use other boot loaders such as GRUB to handle the boot-up task. If you wish to take the GRUB approach, that has been well documented and is beyond the scope of this guide. I will concentrate on the minimal design approach to the MBR.


How the MBR is Loaded

First thing for you to understand is where the MBR program is loaded to in memory. Much like the special "0x55AA" pattern, the location in memory for the MBR is just a means to create a standard. All x86 based processors load the 512 Byte MBR program to memory at 0000:7C00. That address format (0000:7C00) is nothing more than the adaptation of the old 16-bit "segment:offset" format back when memory could only be addressed 64 Kilobytes at a time. The true format of that memory location would have been 7C00:0000, which basically states that the 64 Kilobyte region starts at 0000:7C00 and stops after 64 K (0001:7C00). The trick was for a program to change the contents of the segment registers so they would essientally state "I'm accessing this 64 Kilobyte segment of the total memory". With the 32-bit processor series (Intel 80386 and above), this memory addressing limitation was expanded to a total of 4 Gigabytes when using 32-bit Protected Mode, so we do not have to worry about changing segment registers to access different parts of memory under.

Memory Address Basics

If you already know the basics of the memory address format and the hexidecimal number system, skip this next paragraph. If you do not, let's quickly break down the memory address format so it can be understood better. Think of the format like you do the decimal number system, each digit is a place holder. The only difference is that each place holder is in hexidecimal format, which means as we move place holders from right to left they become significantly larger and represent a much larger value. For example, 0000:0001 is at one byte, 0000:0010 is at 16 bytes, 0000:0100 is at 256 bytes, 0000:1000 is at 4 Kilobytes, 0001:0000 is at 64 Kilobytes, 0010:0000 is at one Megabyte, 0100:0000 is at 16 Megabytes and 1000:0000 is at 256 Megabytes. The maximum address value is FFFF:FFFF, which is at the 4 Gigabyte limit. If you noticed, the place holders from right to left grow by a factor of 16, and hexidecimal is a 16-base number system... yes... number systems are that easy.

Memory Segmentation

In real mode, you get one megabyte of memory space. Thus, a maximum of FFFFF. People looked at this and thought, "How are we going to be able to access all this memory?" Since their registers only went up to 16 bits, (maximum of FFFF), they found that they needed to address any memory location with two registers - a segment and an offset.

   12F3:4B27
    ^     ^
Segment  Offset

How does it work? The exact memory location is found with the following simple equation:

MemoryPlace = Segment * 16 + Offset

Thus, 12F3:4B27 would be the exact memory location of 17A57. People good at math would recognize that an exact memory location can be found with different Segments and Offsets. This is true. For example, exact memory location 00210 can be 0020:0010, 0000:0210, and 0021:0000.

Assumptions

Now for the part that really matters, the MBR structure. Since we will be using the computer from an "uninitialized" state, we have to make some assumptions.


The first assumption for x86 based processors is that we are in 16-bit Real Mode (details discussed later).


Stack

For the next assumption we will introduce you to the stack. Think of the stack as one giant variable in memory used for the quick storage and loading of data. The stack segment (SS) and stack pointer (SP) are 16-bit registers that combine to specify a 32-bit address offset, which is the "top" of the stack. When the stack is used to store data, the amount of data stored is subtracted from SP and then is stored into memory using the SS and SP registers as a memory location (SS:SP). When the stack grows, it will grow downwards in memory, not up. Since you are required to set up the SS and SP registers, it is probably a good idea to set SS to 0x0000 and SP to 0x7C00 because we know everything between the beginning of memory and up to 0000:7C00 is pretty much unused since we are in an uninitialized state (we will expand on this later), and should be big enough for our needs.


BIOS

Our final assumption will be that we need to load a program from a disk drive. Here I will introduce you to the Real Mode BIOS interrupt routine 0x13, which are disk drive access routines. BIOS interrupt routines rely on data in the registers to specify the action to be taken. In the case of an interrupt routine 0x13 read operation, AH being the value of 2 specifies this operation type. AL represents how many 512-byte sectors (physical design standard) we want to read. ES and BX form the memory address location ES:BX, which is the destination of the sectors to be read. The next 3 structures combine to create a physical offset on the floppy disk drive. CH is the track on which to start reading from (equivalent of a cylinder on a hard disk). CL is the sector on which to start reading from. DH is the head on which to start reading from. The last structure is DL, which represents the drive to be used. In this case DL is set to 0, which represents the first floppy disk drive (A:). Finally we call the 0x13 interrupt routine which executes everything we just setup. Now that our program is loaded into memory, we transfer execution control to it by simply jumping to its base address specified earlier by ES:BX.


Finishing Touches

We are almost finished with the basic MBR, only one more thing to do. We need to make sure the MBR program is 512 bytes long with the last 2 bytes being the special boot pattern (0x55AA). How this is done depends on your programming method. Here is an example in NASM syntax to help paint a picture.


TIMES 510-($-$$) DB 0x00     ;Pad the rest of the boot sector upto byte 510 with zeros
SIGNATURE DW 0xAA55          ;Create the special pattern


We can observe the usage of NASM's "TIMES" and "$" directives. TIMES simply states to excute the following code as many times as specified (i.e. TIMES NUMBER_OF_TIMES INSTRUCTION_TO_EXECUTE). The "$" character states the current address in the program, putting two "$" together as such "$$" states the program's beginning address. On the next line is the "SIGNATURE" directive, which in itself is nothing more than a way for us to see if a data structure is particulary special. The line could easily have been "DW 0xAA55" and the end result would be the same. One thing you may have noticed is that the special pattern looks backwards, it should be 0x55AA right? Well yes, when loaded from memory into a register that is precisely what it should be but the Intel x86 series works by loading the most significant byte first, this is called Little Endian order. When the BIOS program looks for that special pattern, it sets a pointer to that location in memory and loads it into a register to be compared to what it should be. Due to the Little Endian order, the bytes are swapped around. When loaded into a register it reads as 0x55AA and everything is gravy.

Code Examples

NASM

[BITS 16]        ;Tell NASM that we wish to use 16-bit Addressing and Indexing for Real Mode
ORG 0x00007C00   ;Tell NASM the base address location of this program, as it is in RAM

;Setup the initial Stack
  cli            ;Disable Interrupts to avoid corrupting the stack
  xor bx,bx      ;BX = 0
  mov ss,bx      ;Stack Segment = 0
  mov sp,0x7C00  ;Stack Pointer = 0x7C00
  sti            ;Enable Interrupts

;Load Kernel from Floppy Disk to RAM
;-Interrupt 0x13 (BIOS Common Disk Access), Function 0x02 (Read Sectors into Memory)
;-AH = BIOS Interrupt Function
;-AL = Number of 512-byte Sectors to load
;-CX = Cylinder & Sector Offsets
;-DX = Head Offset & Drive Number
;-ES:BX = Segment:Offset of Address Destination in RAM

  mov ah,0x02       ;BIOS Interrupt Function 0x02 (Read Sectors int Memory)
  mov al,0x01       ;Load 1 sector from floppy
  mov es,bx         ;Zeroed-out above, sets the ES Segment Register
  mov bx,0x7E00     ;Set BX offset to our desired load location 
  mov cx,0x0002     ;Set Cylinder Offset to 0, Set Sector offset to 2
  xor dx,dx         ;Set Head Offset to 0, Set Drive Number to 0 (First Floppy Disk)
  int 0x13          ;Execute BIOS Interrupt

;Jump to newly loaded Kernel
  jmp 0x0000:0x7E00

;Fill the rest of the MBR with Zeroes, and write the "Magic" Boot Signature
  TIMES 510-($-$$) DB 0x00
  SIGNATURE DW 0xAA55

Conclusion

That's it, our simple MBR is complete, now we just have to write it to the floppy disk. To write the MBR we must use special I/O tools. If you are using a POSIX compliant operating system then you should have access to "dd". If you are using Windows, you will have to obtain a program such as "RawWrite". Once the MBR is written to the floppy disk, the disk is considered unformatted by Windows since the MBR also acts as holder for partition information, so don't try to access it with anything but RawWrite. If you wrote a MBR program verbatim from this chapter, you can go ahead and boot from the disk, but don't expect much more than a freeze-up since we haven't put any program in the sectors that the simple MBR program loads into memory.