MBR (x86)

From OSDev.wiki
Revision as of 19:05, 31 October 2006 by Chase (talk | contribs) (X86:MBR moved to MBR (x86): updating orginizational standards)
Jump to navigation Jump to search

"So we have complete control of the computer... what now???"

That question is usually the main topic in operating system design altogether. Most people that I know will try to approach the design and programming stage like reading a book, cover to cover and in a linear fashion. I have witnessed multitudes of operating system projects that seemed promising turn out to be nothing more than a fancy boot program because this type of design policy.


It is my suggestion that you take operating system design and programming as to writing a book, not reading one. When an author writes a book, they most likely worry about the contents first and the cover last. A good author will lay out the base of their book chapter by chapter, making note to finish each chapter first, then they will go back and "per-fect" the contents of each chapter so they flow. When the author is done, a cover is tacked on and the book is complete.


Now let's take that book writing process and apply it to operating system design. The designer starts with a base, which most likely will be the minimal code needed to successfully bring the computer to a stable running state, then they add more features and such to make the operating system more stable and user friendly. Once the designer is satisfied with how each part of the operating system interacts in general, they go back to do some code optimization and clean-ups (quite an eternal task). I think the idea behind this method was best stated with this quote, "First make it run, then make it run better." The very last step would be for the designer to program their own MBR program (boot loader), most people in fact use other boot loaders such as GRUB to handle the boot-up task. If you wish to take the GRUB approach, that has been well documented and is beyond the scope of this guide. I will concentrate on the minimal design approach to the MBR.


How the MBR is Loaded

First thing for you to understand is where the MBR program is loaded to in memory. Much like the special "0x55AA" pattern, the location in memory for the MBR is just a means to create a standard. All x86 based processors load the 512 Byte MBR program to memory at 0000:7C00. That address format (0000:7C00) is nothing more than the adaptation of the old 16-bit "segment:offset" format back when memory could only be addressed 64 Kilobytes at a time. The true format of that memory location would have been 07C0:0000, which basically states that the 64 Kilobyte region starts at 0000:7C00 and stops after 64 K (1000:7C00 or 0x17C00). The trick was for a program to change the contents of the segment registers so they would essientally state "I'm accessing this 64 Kilobyte segment of the total memory". With the 32-bit processor series (Intel 80386 and above), this memory addressing limitation was expanded to a total of 4 Gigabytes when using 32-bit Protected Mode, so we do not have to worry about changing segment registers to access different parts of memory.

Memory Address Basics

If you already know the basics of the memory address format and the hexidecimal number system, skip this next paragraph. If you do not, let's quickly break down the memory address format so it can be understood better. Think of the format like you do the decimal number system, each digit is a place holder. The only difference is that each place holder is in hexidecimal format, which means as we move place holders from right to left they become significantly larger and represent a much larger value. For example, 0x00000001 is at one byte, 0x00000010 is at 16 bytes, 0x00000100 is at 256 bytes, 0x00001000 is at 4 Kilobytes, 0x00010000 is at 64 Kilobytes, 0x00100000 is at one Megabyte, 0x01000000 is at 16 Megabytes and 0x10000000 is at 256 Megabytes. The maximum address value is 0xFFFFFFFF, which is at the 4 Gigabyte limit. If you noticed, the place holders from right to left grow by a factor of 16, and hexidecimal is a 16-base number system... yes... number systems are that easy.

Memory Segmentation

In real mode, you get one megabyte of memory space. Thus, a maximum of FFFFF. People looked at this and thought, "How are we going to be able to access all this memory?" Since their registers only went up to 16 bits, (maximum of FFFF), they found that they needed to address any memory location with two registers - a segment and an offset.

   12F3:4B27
    ^     ^
Segment  Offset

How does it work? The exact memory location is found with the following simple equation:

MemoryPlace = Segment * 16 + Offset

Thus, 12F3:4B27 would be the exact memory location of 17A57. People good at math would recognize that an exact memory location can be found with different Segments and Offsets. This is true. For example, exact memory location 00210 can be 0020:0010, 0000:0210, and 0021:0000.

Assumptions

Now for the part that really matters, the MBR structure. Since we will be using the computer from an "uninitialized" state, we have to make some assumptions.


16-bit Real Mode

The first assumption for x86 based processors is that we are in standard starting mode, or the initial processor state, called 16-bit Real Mode (details will be discussed later).


Stack

For the next assumption we will introduce you to the stack. Think of the stack as one giant variable in memory used for the quick storage and loading of data. The stack segment (SS) and stack pointer (SP) are 16-bit registers that combine to specify a 20-bit memory address in real mode, which is the "top" of the stack. When the stack is used to store data, the amount of data stored is subtracted from SP and then is stored into memory using the SS and SP registers as a memory location (SS:SP). When the stack grows, it will grow downwards in memory, not up. Since you are required to set up the SS and SP registers, it is probably a good idea to set SS to 0x0000 and SP to 0x7C00 because we know everything between the beginning of memory and up to 0000:7C00 is pretty much unused since we are in an uninitialized state (we will expand on this later), and should be big enough for our needs.


BIOS

Our final assumption will be that we need to load a program from a disk drive. Here I will introduce you to the Real Mode BIOS interrupt routine 0x13, which are disk drive access routines. BIOS interrupt routines rely on data in the registers to specify the action to be taken. In the case of an interrupt routine 0x13 read operation, AH being the value of 2 specifies this operation type. AL represents how many 512-byte sectors (physical design standard) we want to read. ES and BX form the memory address location ES:BX, which is the destination of the sectors to be read. The next 3 structures combine to create a physical offset on the floppy disk drive. CH is the track on which to start reading from (equivalent of a cylinder on a hard disk). CL is the sector on which to start reading from. DH is the head on which to start reading from. The last structure is DL, which represents the drive to be used. In this case DL is set to 0, which represents the first floppy disk drive (A:). Finally we call the 0x13 interrupt routine which executes everything we just setup. Now that our program is loaded into memory, we transfer execution control to it by simply jumping to its base address specified earlier by ES:BX.

Finishing Touches

We are almost finished with the basic MBR, only one more thing to do. We need to make sure the MBR program is 512 bytes long with the last 2 bytes being the special boot pattern (0xAA55). How this is done depends on your programming method. Here is an example in NASM syntax to help paint a picture.


TIMES 510-($-$$) DB 0x00     ;Pad the rest of the boot sector upto byte 510 with zeros
SIGNATURE DW 0xAA55          ;Create the special pattern


We can observe the usage of NASM's "TIMES" and "$" directives. TIMES simply states to excute the following code as many times as specified (i.e. TIMES NUMBER_OF_TIMES INSTRUCTION_TO_EXECUTE). The "$" character states the current address in the program, putting two "$" together as such "$$" states the program's beginning address. On the next line is the "SIGNATURE" directive, which in itself is nothing more than a way for us to see if a data structure is particulary special. The line could easily have been "DW 0xAA55" and the end result would be the same.

One thing you may have noticed when looking at the MBR in a hex editor or debugger, is that the special pattern looks backwards, it should be 0xAA55 right? Well yes, when loaded into the register that is precisely what it should be, but when stored in the program file, it looks like 0x55AA as the Intel x86 series register/memory operations work by loading or storing the least significant byte first. This form of "byte swapping" is called Little Endian order. NASM data instructions (DB/DW/DD/etc...) store to the program file in the same way as if you were storing data from a register, because that is the most common use for such data in Assembly Language (wysiwyg). When the BIOS program looks for that special pattern, it sets a pointer to that location in memory and loads that WORD value into a register in order to check it for the value of 0xAA55.

Please note that the above paragraph is something you do not need to be readily concerned with, but it is good information to keep in mind when using hex editors and disassemblers. This also goes for using debuggers, as whatever code/data/information that is stored in the program file, is loaded into memory in exactly the same order (no byte-swapping like register/memory operations).

Code Examples

NASM

[BITS 16]        ;Tell NASM that we wish to use 16-bit Addressing and Indexing for Real Mode
ORG 0x00007C00   ;Tell NASM the base address location of this program, as it is in RAM

;Setup the initial Stack and all the Segment Registers
  cli            ;Disable Interrupts to avoid corrupting the stack
  xor ax,ax      ;AX = 0
  mov es,ax      ;We're making sure all the segment registers are cleared to 0.
  mov fs,ax      ; ES, FS, GS, and DS are being cleared, as CS will be
  mov gs,ax      ; ...
  mov ds,ax      ; ...
  mov ss,ax      ;Stack Segment = 0
  mov sp,0x7C00  ;Stack Pointer = 0x7C00
  sti            ;Enable Interrupts

  jmp 0:flush_cs  ;Here we are jumping to flush_cs in memory segment 0,
                  ; to make sure CS is set to 0. The reason this needs
                  ; to be done, is that some computers load the MBR
                  ; into memory at 07C0:000, which is another valid way
                  ; of saying 0000:7C00 in the segement:offset address
                  ; format. This could create problems with certain
                  ; instructions that assume CS to be 0 instead of
                  ; 0x07C0. By using "ORG" above and this instruction,
                  ; we are ensuring that program is offered the address
                  ; space format it is expecting... preventing crashes.       
flush_cs:

;Load Kernel from Floppy Disk to RAM
;-Interrupt 0x13 (BIOS Common Disk Access), Function 0x02 (Read Sectors into Memory)
;-AH = BIOS Interrupt Function
;-AL = Number of 512-byte Sectors to load
;-CX = Cylinder & Sector Offsets
;-DX = Head Offset & Drive Number
;-ES:BX = Segment:Offset of Address Destination in RAM

  mov ah,0x02       ;BIOS Interrupt Function 0x02 (Read Sectors into Memory)
  mov al,0x01       ;Load 1 sector from floppy
  mov bx,0x7E00     ;Set BX offset to our desired load location, ES already set from above
  mov cx,0x0002     ;Set Cylinder Offset to 0, Set Sector offset to 2
  xor dx,dx         ;Set Head Offset to 0, Set Drive Number to 0 (First Floppy Disk)
  int 0x13          ;Execute BIOS Interrupt

;Jump to newly loaded Kernel
  jmp 0x0000:0x7E00 ;Sets CS to 0x0000 and the IP to 0x7E00, execution continues at that address

;Fill the rest of the MBR with Zeroes, and write the "Magic" Boot Signature
  TIMES 510-($-$$) DB 0x00
  SIGNATURE DW 0xAA55

Conclusion

That's it, our simple MBR is complete, now we just have to write it to the floppy disk. To write the MBR we must use special I/O tools. If you are using a POSIX compliant operating system then you should have access to "dd". If you are using Windows, you will have to obtain the Windows equivalent of "dd" or a program such as "RawWrite". Once the MBR is written to the floppy disk, the disk is considered unformatted by Windows since the MBR also acts as holder for partition information, so don't try to access it with anything but RawWrite. If you wrote a MBR program verbatim from this chapter, you can go ahead and boot from the disk, but don't expect much more than a freeze-up since we haven't put any program in the sectors that the simple MBR program loads into memory.