PXE

From OSDev.wiki
Jump to navigation Jump to search

Preboot eXecution Environment, or PXE, is the only way for Diskless Booting in PCs, and the standard is developed by Intel. It allows the Network Bootstrap Program (NBP) or the Bootloader to interface with the PXE, and load files via TFTP.

State of the Machine

Described by Section 4.4.5, Client State at Bootstrap Execution Time, the PXE is responsible for getting the boot code in the Client Machine. This is entirely similar to other boot devices, however, the BIOS Magic Signature, 0xAA55 isn't required, and the size of the boot file is limited to 32KiB, instead of the familiar 512 bytes.

Prior to Version 2.1 of the PXE specification, a PXENV+ Structure was given to the NBP. The PXE Entry Structure, or the !PXE structure, was introduced in Version 2.1, while the PXENV+ Structure has become obsolete. However, the address of the PXENV+ Structure is still given to the NBP during boot, to maintain compatibility with older NBPs.

The PXE Specification promises that the boot file is loaded at the address 0x0000:0x7C00, though it isn't trustable. Moreover,

  • ES:BX ES:BX should point to the PXENV+ structure, to maintain backward compatibility.
  • SS:SP SS:SP points to a valid stack, with at least 1.5KiB of free stack.
  • SS:SP + 4 SS:SP + 4 points to the newer !PXE structure, if and only the PXE Version is greater than or equal to 1.5.

PXENV+

The structure of the PXENV+ structure is as follows:

    // The Signature of the PXENV+ structure - contains "PXENV+".
    uint8_t  Signature[6];

    // The MSB contains the Major version number, while the LSB contains the minor version number.
    uint16_t Version;

    // The Length and the checksum of the structure, used for calculating the checksum.
    uint8_t  Length;
    uint8_t  Checksum;

    // A Far Pointer to the real mode PXE API entry point. 
    uint32_t RMEntry;

    // 32-bit offset to the protected mode API entry point. Instead of this, the !PXE structure is recommended.
    uint32_t PMOffset;

    // The rest of the fields don't matter much, and the Specifications can be referred.
    uint16_t PMSelector;
    uint16_t StackSeg;
    uint16_t StackSize;
    uint16_t BCCodeSeg;
    uint16_t BCCodeSize;
    uint16_t BCDataSeg;
    uint16_t BCDataSize;

    uint16_t UNDIDataSeg;
    uint16_t UNDIDataSize;
    uint16_t UNDICodeSeg;
    uint16_t UNDICodeSize;

    // This is a far pointer to the "!PXE" structure, only present when the structure is present.
    uint32_t PXEPtr;

Recommended Usage

First of all, the integrity of the PXENV+ structure should be checked. This can be done by:

  • Checking the PXENV+ signature at the starting.
  • And, by checking the checksum of the structure.

If the structure fails to pass either of the above, boot should be aborted.

If the structure is good, the Version number should be checked. If it is higher than or equal to 0x0201 then the !PXE structure should be used. On the other hand, if it isn't, the PXENV+ structure should be used.

!PXE

The structure of the !PXE structure is as follows:

    // The Signature of the !PXE structure - contains "!PXE".
    uint8_t  Signature[4];

    // The Length and the checksum of the structure, used for calculating the checksum.
    uint8_t  Length;
    uint8_t  Checksum;
   
    // Contains zero - both of them.
    uint8_t Revision;
    uint8_t Reserved;

    // Far pointer to UNDI ROM ID structure and BC ROM ID structure.
    uint32_t UNDIROMID;
    uint32_t BCROMID;

    // A Far Pointer to the real mode PXE API entry point. 
    uint32_t RMEntry;

    // A Far Pointer to the protected mode API entry point. 
    uint32_t PMEntry;

    // The rest of the fields don't matter much (in the rest of the article), and the Specifications can be referred.

Recommended Usage

If the PXENV+ lists the revision as greater than or equal to 0x0201, then only must the !PXE stucture be relied upond. Then, the integrity of the !PXE structure should be checked. This can be done by:

  • Checking the !PXE signature at the starting.
  • And, by checking the checksum of the structure.

If the structure fails to pass either of the above, the PXENV+ structure should be used.

PXE API

The PXE Entry Point uses the __cdecl calling convention - push the parameters on the stack (rightmost pushed first, leftmost pushed last), far call the entry point, clean up the stack - while the older PXENV+ Entry Point uses registers.

To access the API, a generic CallAPI function can be created, which eases out the calling. To do this, first of all, the address of the Entry Point, from either of the structures should be stored in a location (PXEAPI in the example). The PXE API accept the same parameters:

  • Opcode, or the API function number.
  • Segment:Offset address to a structure containing input data.

A example generic call function is as follows:

; Calls the PXE API, and abstracts things such that it works on both old and new APIs.
; @ds:di          The address of the input buffer.
; @bx             The opcode.
; NOTE: The above registers are what the legacy API also used, so it should cause no problem with it.
UsePXEAPI:
    ; And we push it over here, so that we have no problems with the new API.
    push ds
    push di
    push bx
    
    call far [PXEAPI]
    
    add sp, 6                         ; Clean up the stack.
    ret

The PXE API returns a status flag in AX and the beginning of the input buffer passed. It is recommended to test both of these, where a non zero value indicates failure.

Files, via TFTP

Getting the files via TFTP require the TFTP server's IP address. Along with that, you might want the client machine's IP address, the DHCP server's IP address, the client's MAC address, and perhaps several other things.

While this might seem daunting at first, PXE defines a neat way to do this.

Getting "Cached Information"

The PXE API's Get Cached Information function, with opcode 0x0071, requires a "t_PXENV_CACHED_INFO" passed to it, and returns a "bootph" structure. This "bootph" structure contains important information about the client and server machine, which is what is exactly wanted.

The "t_PXENV_CACHED_INFO" follows the following structure:

; Note - due to comfort reasons, the author has decided to provide the following snippet
; in Intel (NASM) Assembly. If someone wishes to change it to a more generic format, he is welcome.
t_PXENV_GET_CACHED:
    .Status       dw 0
    .PacketType   dw 2
    .BufferSize   dw 0
       .BufferOff dw 0 
       .BufferSeg dw 0
    .BufferLimit  dw 0
  • .Status. This should be zero, and returns the status after the function call.
  • .PacketType. This is the type of the packet, and 2 means "PXENV_PACKET_TYPE_DHCP_ACK", which is what we require. For further information, read the specifications.
  • .BufferSize. The number of bytes copied into the segment and offset field given.
  • .BufferOff/Seg. This gives the output buffer, to which to copy the bootph structure to. If you want, keep this to zero, and PXE would return the address of the Bootph structure in the BC data segment.

To do a call, put the address of the above structure in ES:DI, put the opcode, 0x0071, in BX, and call the generic call function. A example call to the above function is, as follows:

    mov di, t_PXENV_GET_CACHED
    mov bx, GET_CACHED_INFO           ; 0x0071.
    
    call UsePXEAPI                    ; Ask for the cached info.
    
    or ax, [t_PXENV_GET_CACHED]       ; Get the status into AX.
    test ax, ax
    jnz .Error

Bootph has a little complex structure, and the Specification should be referenced to know more about it. For now, the following is what you may require:

  • uint8_t CIP[4]. The Client IP address, at offset 12.
  • uint8_t YIP[4]. "Your" IP address, at offset 16.
  • uint8_t SIP[4]. Server IP address, at offset 20.
  • uint8_t GIP[4]. Relay agent IP address, at offset 24.
  • uint8_t MAC[16]. The MAC IP address, present at offset 28.
  • uint8_t SER[64]. The server host name, a asciiz string, at offset 40.
  • uint8_t BOO[128]. The Boot File name, a asciiz string, at offset 104.

Opening

PXE offers "TFTP Open", "TFTP Read" and "TFTP Close", which could be used like other common file I/O operations. It also offers a "TFTP Read File" - but it is recommended to use the former, which is more configurable.

PXENV_TFTP_OPEN, with opcode 0x0020 is required to open a file via TFTP. It takes a t_PXENV_TFTP_OPEN structure as a parameter, which has the following format:

t_PXENV_TFTP_OPEN:
    .Status       dw 0
    .SIP          dd 0
    .GIP          dd 0
    .Filename     times 128 db 0
    .Port         dw 0
    .PacketSize   dw 0
  • .Status. The status word.
  • .SIP. The Server IP, which we just obtained.
  • .Filename. A null terminated string, of the name of the file to open.
  • .Port. This should contain 69, or the UDP Port.
  • .PacketSize. This contains the size of each packet read. Sizes of 512, 1024 and something small are recommended. Usually, bigger sizes load the file faster, but aren't supported much. See the section on performance below for more information relating to packet size.

NOTE: The SIP, GIP (0 means default) and Port are all big endian.

Reading

The PXENV_TFTP_READ function, with opcode 0x0022, reads a single packet from the currently opened file. The size of the packet is set in the Open call, while the function requires a t_PXENV_TFTP_READ structure.

The t_PXENV_TFTP_READ has the following structure:

t_PXENV_TFTP_READ:
    .Status       dw 0
    .PacketNumber dw 0
    .BufferSize   dw 0
    .BufferOff    dw 0
    .BufferSeg    dw 0
  • .Status. This contains the status returned after the function call.
  • .PacketNumber. This contains the packet number sent by the TFTP server. This is used to read packets in a order. Think of it like a index into the packets of the file - used to track of the current index.
  • .BufferSize. The buffer size - number of bytes written in the packet buffer. It is the last packet if this is less than the size of the packet negotiated.
  • .BufferOff/Seg. The address where you want to read the file packet.

What you want to do here is to keep reading packets from the file by increasing the '.BufferOff/Seg' address, and checking if the size of the packet read is less than the size negotiated in TFTP_OPEN. It can be concluded that several implementations use 0 to indicate EOF, while other implementations give a 'packet read' size less than the size negotiated.

With the above point in mind, the following pseudo code can be used to read a file:

while (bytes_left > 0)
{
    call_api();

    // AX register contains whether it failed or not, while Status contains the detailed status.
    if (status || asm("ax"))                  
        panic();

    if (bytes_read < PACKET_SIZE)
        break;

    bytes_left -= PACKET_SIZE;
}

Closing

The PXENV_TFTP_CLOSE, with opcode 0x0021, is perhaps the simplest function. It requires a input buffer, with the following format, to be passed to it:

t_PXENV_TFTP_CLOSE:
    .Status       dw 0
  • .Status. The Status returned after calling the function. The function would most probably ONLY fail if there is no active TFTP connection.

Performance

The original TFTP protocol is a very simple protocol (mostly because it was designed to be implemented easily on tiny embedded systems). The consequence is that it's relative slow (the "stop and wait" nature of it means that the network connection's bandwidth is significantly under-utilised). For larger files this can noticeably effect boot times.

There are 3 approaches for improving performance.

TFTP Block Size

In 1998 a "TFTP Blocksize option" was introduced. The basic idea is to improve performance by increasing the block/packet size (and reducing the number of data packets and the number of acknowledgement packets). This is supported by the PXE specification.

In theory when you open a file via "TFTP Open" you're supposed to set the "PacketSize" field to what you're requesting, and (where possible) PXE is supposed to negotiate an appropriate packet size with the TFTP server and then set the "PacketSize" field to size of the packet that both the PXE code in the client and the TFTP server agreed upon. This means that (e.g.) you should be able to set "PacketSize" to 65535 bytes and PXE will change it to the maximum size that is supported.

In practice this only works on some/most network cards, and some networking cards are broken and don't work correctly. For example, the implementation of PXE that comes with various RealTek cards will simply return an "Unknown error" if the requested packet size is too large (even though the PXE specification says otherwise, and even though PXE defines more appropriate status codes).

It is possible to work around this. The recommended work-around is:

  • If the "Version" field of the PXENV+ structure is less than "version 2.1"; be cautious and use a 512 byte packet size.
  • Use the "UNDI GET INFORMATION" function to get information about the network adapter. The returned information includes an MTU (Max. Transmission Unit) field. If this fails be cautious and use a 512 byte packet size.
  • Check if the MTU field is sane (within a reasonable range). If it's not within a reasonable range be cautious and use a 512 byte packet size.
  • Subtract the size of an IPv4 header (20 bytes), the size of the UDP header (8 byte), the size of the TFTP header (4 bytes) and 16 more bytes just in case from the MTU. This gives you the maximum packet size you can safely use.

TFTP "Sliding Window"

In 2015 a "TFTP Window Size" option was introduced. The basic idea here is to use "sliding window" (like TCP) instead of "stop and wait" to increase network bandwidth utilisation. Unfortunately, at this time I doubt any network cards support this.

Advanced/Non-standard Approaches

The PXE API exposes lower level functionality, including the ability to send arbitrary packets and to intercept received packets (via. a callback/"ISR"). In theory it would be possible for a boot loader to use these to implement any protocol it likes (e.g. use FTP or HTTP instead of TFTP) as long as the server is setup to support the protocol (in addition to TFTP that would still be needed to download the boot loader).

The most extreme approach would be to design your own protocol and implement your own server. In this case (in theory) you would be able to maximize network bandwidth utilisation, and the client could also offload processing to the server. With a massive amount of over-engineering you could significantly reduce boot times this way.

Cleaning up

After you have finished reading files via PXE, the best idea would be to shutdown PXE and reclaim any memory it was using. PXE support is unloaded by using the "PXEAPI_UNDI_SHUTDOWN" function to restore the network adapter to it's default state, followed by the "PXENV_UNLOAD_STACK" function to unhook any IRQ handlers.

Lastly, PXE needs to prepare itself to be removed from memory (unhook interrupt 0x1A) using the "PXENV_STOP_UNDI" function (for PXE) or the "PXENV_UNDI_CLEANUP" function (for PXENV+).

Once PXE is unloaded, memory can be reclaimed by checking the UNDI code segment and UNDI data segment start addresses and sizes (in the PXENV+ data structure), calculating "start address + size" and selecting the highest value. This value is the "new" number of bytes of RAM starting at 0x00000000 (or the amount of conventional memory), and can be used directly and/or the value at 0x40:0x13 (or 0x00000413) can be changed to reflect the new number of 1 KB blocks of conventional memory.

Please note that it's probably a good idea to calculate the amount of conventional memory before you unload PXE, because some of the functions above may clear the segment addresses and sizes in the PXENV+ structure.


See Also

Articles

External Links