Diskless Booting: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(→‎The gPXE+GRUB2 Way: how to configure gpxe, grub2, and dnsmasq for netbooting)
mNo edit summary
Line 71: Line 71:
When your code is started, the BIOS gives it a valid stack (with at least 1.5 KB of free stack space) and passes some parameters to your code on this stack and in registers. See section 4.4.5 "Client State at Bootstrap Execution Time (Remote.0)" in the specification for details. There's only 2 things that are important here - the address of an older "PXENV+ structure" and the address of the newer "PXE structure". These structures are used to find the PXE API entry point/s and a few other things.
When your code is started, the BIOS gives it a valid stack (with at least 1.5 KB of free stack space) and passes some parameters to your code on this stack and in registers. See section 4.4.5 "Client State at Bootstrap Execution Time (Remote.0)" in the specification for details. There's only 2 things that are important here - the address of an older "PXENV+ structure" and the address of the newer "PXE structure". These structures are used to find the PXE API entry point/s and a few other things.


The PXE entry structure was introduced in version 2.1 of the specification, while the PXENV+ structure is becoming obsolete. This means that only one of these structures may be present depending on how old your ROM's code is. The specification states that if the !PXE structure is present then your code should use the PXE entry point (not the entry point from the PXENV+ structure, which may be present for backwards compatability). Both of these structures contain a ''segment:offset'' for their own PXE API entry point. The calling conventions you need use to access the PXE API depend on which structure is are present.
The PXE entry structure was introduced in version 2.1 of the specification, while the PXENV+ structure is becoming obsolete. This means that only one of these structures may be present depending on how old your ROM's code is. The specification states that if the !PXE structure is present then your code should use the PXE entry point (not the entry point from the PXENV+ structure, which may be present for backwards compatibility). Both of these structures contain a ''segment:offset'' for their own PXE API entry point. The calling conventions you need use to access the PXE API depend on which structure is are present.


For the PXE entry point you use ''Pascal'' calling conventions (push the parameters on the stack, far call the entry point, then clean up the stack), while the older PXENV+ entry point uses registers instead. Apart from this the PXE API is the same, and fortunately all PXE API functions have the same parameters - the "opcode" (or API function number) and the ''segment:offset'' for a structure containing data for the function.
For the PXE entry point you use ''Pascal'' calling conventions (push the parameters on the stack, far call the entry point, then clean up the stack), while the older PXENV+ entry point uses registers instead. Apart from this the PXE API is the same, and fortunately all PXE API functions have the same parameters - the "opcode" (or API function number) and the ''segment:offset'' for a structure containing data for the function.

Revision as of 17:10, 6 September 2010

"Diskless Booting" is a synonym for booting across a network. The kernel and its modules are downloaded from a computer on the network. This can be very useful for large projects where Bochs is too slow or one has to use a floppy disk, and is used in some corporate environments to enable centralized OS updates.

In order to boot up your kernel by network, you need a DHCP server, a TFTP server, and a program acting as client on the other computer.

The GRUB Way

First, you have to create a floppy with GRUB configured for net support. You can either get a floppy image from http://i30www.ira.uka.de/%7Eud3/mkgrubdisk/ or download a current source release of GRUB and ./configure it with support for your NIC.

Although this is the simplest way, GRUB doesn't seem to support all network cards.

The PXELINUX Way

Compile syslinux; a pxelinux.0 file will be created. It is a PXE binary of a simple bootloader-over-tftp, which can be booted by the client computer (not the one with the TFTP server). After setting up DHCP and TFTP accordingly so the file boots, you can use pxelinux to load "memdisk", which comes with syslinux as well.

This file is loaded with a memdisk initrd=grub.ima syntax, which will cause pxelinux to load memdisk and grub.ima through TFTP. Memdisk will hook interrupt 0x13, and boot the disk image that way. (However, not all GRUB disk images seem to access floppies through bios. If you've got such an image you're stuck.)

You should get a pxelinux.0 file, which can be loaded by, for example, etherboot. Many modern computers allow booting from NICs so you only need the TFTP and DHCP server up.

At this point, you can make changes to the grub.ima disk image, and put a GRUB config file and your kernel's binaries there.

Try mount /tftpboot/grub.ima /mnt/fpy -o loop under linux, for example.

The gPXE+GRUB2 Way

GRUB-legacy does not support newer network cards, but if you can use GRUB2 then you can piggy-back on gPXE's network support. The gPXE project is a currently-maintained, open source, free network bootloader. It is easy to get gPXE ISO, disk, or USB disk images from their website, but there is a workaround you need to apply in order to get GRUB2 to successfully load.

gPXE supports multiboot, but if it detects a multiboot image then it will not provide PXE services. Unfortunately, it detects your generated GRUB2 image as multiboot, and the only way I found to get around this was to recompile gPXE without multiboot support. Recompiling gPXE is easy: unpack it, cd src; make. Before you do that, you will want to edit src/config/defaults/pcbios.h and comment out the line that defines IMAGE_MULTIBOOT. After compiling you should be left with bin/gpxe.{dsk,iso,usb} which you can write to disk or CD.

To create a GRUB2 PXE bootable image, you can follow the advice in the GRUB manual's Network chapter. Keep in mind that GRUB2 is in flux, and several of the options to grub-mkimage did not exist in past versions. I recommend getting the latest version and compiling from source, it's fairly straightforward, and then you are left with all the *.{lst,mod,img} files you need in the source directory. In the case you compile your own GRUB2 you do not need to install it, just do commands like this:

./grub-mkimage -d . --format=i386-pc --output=core.img --prefix="(pxe)/boot/grub" pxe pxecmd
cat pxeboot.img core.img > grub2pxe

The final thing you need to do is setup a DHCP/BOOTP/TFTP server. I used dnsmasq which came preinstalled on my workstations, and seems to be widely available in distributions. It can easily be configured on the command line or in /etc/dnsmasq.conf which uses the same syntax as the long-form command-line options but without the leading dashes. You will need the following options:

interface=...                # be careful what interface the dhcp server runs on!
bind-interfaces              # *really* only bind that interface
dhcp-range=a.b.c.d,e.f.g.h   # whatever your private network uses
dhcp-boot=boot/grub/grub2pxe # tells machine to boot grub
dhcp-no-override             # some kind of workaround that gpxe needs
enable-tftp
tftp-root=/tftp              # or wherever

and there are other options to explore as well. Now make sure that you take grub2pxe,*.lst,*.mod from the grub2 source and put them in /tftp/boot/grub or equivalent. Also put your grub.cfg file there. The format of that is fairly simple. Here's the essence of what I use:

set timeout=0
set default=0
menuentry "MY OS" {
  set root=(pxe)
  multiboot /kernel
  module    /shell
  module    /test
}

Make sure your kernel and modules appear in the tftp root, and you should be set to boot using your gPXE media, over a private network connection hooked up between workstations.

The Direct Way

Both of the options above involve using someone else's code to do the dirty work, which may be undesirable in some situations - licensing conflicts, technical problems (e.g. for "memdisk" the interrupt 0x13 hook won't work in protected mode) and possibly personal pride. Fortunately, writing your own PXE boot code isn't as difficult as it sounds.

The first step is to download the PXE specification from the internet and take a look at it. At first glance this specification can be rather daunting, but don't let that stop you - most of it relates to BIOS and network cards and can be safely ignored. The important part is in chapter 3, the PXE API.

Basically, "somehow" your PXE boot code gets into the client computer and it doesn't really matter how (networking code in the client machine's ROM arranges this with help from the DHCP server and TFTP server). Your code is loaded at 0x0007C00 and started in real mode, just like a normal floppy boot sector. The difference is that you don't need the "0xAA55" magic value and the size of your code can be up to 32 KB (no more trouble squeezing everything into 512 bytes).

When your code is started, the BIOS gives it a valid stack (with at least 1.5 KB of free stack space) and passes some parameters to your code on this stack and in registers. See section 4.4.5 "Client State at Bootstrap Execution Time (Remote.0)" in the specification for details. There's only 2 things that are important here - the address of an older "PXENV+ structure" and the address of the newer "PXE structure". These structures are used to find the PXE API entry point/s and a few other things.

The PXE entry structure was introduced in version 2.1 of the specification, while the PXENV+ structure is becoming obsolete. This means that only one of these structures may be present depending on how old your ROM's code is. The specification states that if the !PXE structure is present then your code should use the PXE entry point (not the entry point from the PXENV+ structure, which may be present for backwards compatibility). Both of these structures contain a segment:offset for their own PXE API entry point. The calling conventions you need use to access the PXE API depend on which structure is are present.

For the PXE entry point you use Pascal calling conventions (push the parameters on the stack, far call the entry point, then clean up the stack), while the older PXENV+ entry point uses registers instead. Apart from this the PXE API is the same, and fortunately all PXE API functions have the same parameters - the "opcode" (or API function number) and the segment:offset for a structure containing data for the function.

For the PXE entry point you'd use something like this:

    push structureSegment
    push structureOffset
    push opcode
    call far [PXEAPIentry]
    add sp, 6

For the older PXENV+ entry point you'd use something like this instead:

    mov ds, structureSegment
    mov di, structureOffset
    mov bx, opcode
    call far [PXEAPIentry]

Therefore, one of the first things your code will need to do is to detect which entry point to use and setup a generic interface so that the rest of your code doesn't need to care which calling conventions the PXE API needs. For example:

callPXEAPI:
    cmp [PXEAPItype], 0
    je .olderAPI
    push ds
    push di
    push bx
    call far [PXEAPIentry]
    add sp, 6
    ret

.olderAPI:
    call far [PXEAPIentry]
    ret

In any case, the PXE API will leave a status flag in AX (0x0000 = OK, 0x0001 = Failed). If the function failed the PXE API will set more specific flags in the "status" field of the structure you passed to the function.

Now you should be able to call the PXE API. The next step is to get some information about the network. Specifically, you will need the TFTP server's IP address, but you may also want the DHCP server's IP address, the client machine's IP address and the client's MAC address. To get this information (and more) you need to call the PXE API's "Get Cached Info" function. Setup the "t_PXENV_CACHED_INFO" structure as described in the specification and call the PXE API with the address of this structure and 0x0071 as the opcode parameter. For the packet type field in this structure, use "type = 2" and leave everything else set to zero (if you just want the address where the information already is, instead of having the information copied to your own buffer).

If this worked correctly you should have the segment and offset for a "cached" TCPI/IP packet that was sent to the BIOS's code by the DHCP server. This data contains everything you need, and you should be able to extract the TFTP server's IP address from it (the "sip" field in the "bootph" structure).

Now that you know the IP address for the TFTP server you can use the PXE API to access the TFTP server. The functions for this are "TFTP Open", "TFTP Read" and "TFTP Close" (similar to common file I/O operations), or alternatively you could use the "TFTP Read File" function.

After you've read any files your OS needs to boot (and you're finished with networking) it's good to shut down PXE and reclaim any memory it was using. Unfortunately the PXE Specification isn't very clear on how this is meant to be done (which functions in which order), so I've assumed the code for PXE booting Linux is correct.

PXE support is unloaded by using the "PXEAPI_UNDI_SHUTDOWN" function to restore the network adapter to it's default state, followed by the "PXENV_UNLOAD_STACK" function to unhook any IRQ handlers. Lasty, PXE needs to prepare itself to be removed from memory (e.g. unhook interrupt 0x1A) using the "PXENV_STOP_UNDI" function (for PXE) or the "PXENV_UNDI_CLEANUP" function (for PXENV+).

Once PXE is unloaded, memory can be reclaimed by checking the UNDI code segment and UNDI data segment start addresses and sizes (in the PXENV+ data structure), calculating "start address + size" and selecting the highest value. This value is the "new" number of bytes of RAM starting at 0x00000000 (or the amount of conventional memory), and can be used directly and/or the value at 0x40:0x13 (or 0x00000413) can be changed to reflect the new number of 1 KB blocks of conventional memory. Please note that it's probably a good idea to calculate the amount of conventional memory before you unload PXE, because some of the functions above may clear the segment addresses and sizes in the PXENV+ structure.

That is all you really need to know for booting from PXE (however the PXE API is capable of doing raw UDP connections and other things).

External Links