Raspberry Pi: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(→‎uart.cc: It's C++)
Line 374: Line 374:
===uart.cc===
===uart.cc===


<source lang=c>
<source lang=cpp>
/* uart.cc - UART initialization & communication */
/* uart.cc - UART initialization & communication */
/* Reference material:
/* Reference material:

Revision as of 15:12, 18 January 2013

This page is a work in progress.
This page may thus be incomplete. Its content may be changed in the near future.

Intro

This is a tutorial on bare-metal [OS] development on the Raspberry Pi. This tutorial is written specifically for the Raspberry Pi Model B Rev 2 because the author has no other hardware to test on. But so far the models are basically identical for the purpose of this tutorial (Rev 1 has 256MB ram, Model A has no ethernet).

This is the authors very first ARM system and we learn as we write without any prior knowledge about arm. Experience in Linux/Unix (very important) and C/C++ language (incredibly important, including how to use inline asm) is assumed and required. This is not a tutorial about how to build a kernel but a simple intro in how to get started on the RPi.

Materials

You will need a:

  • Raspberry Pi, RPi in short.
  • SD Card to boot from.
  • A SD Card reader so you can write to the SD Card from your developement system.
  • A serial adaptor for the RPi.
  • Power from an external power supply, usb or the serial adaptor.

Serial adaptor

The RPi has 2 serials (UARTs). This tutorial only concerns itself with UART0, called simply UART or serial port. UART1 is ignored from now on. The basic UART onboard uses a 3.3V TTL and is connected to some of the GPIO pins labeled "P1" on the board. x86 PCs and MACs do use 5V TTL so you need some adaptor to convert the TTL. I recommend a USB to TTL Serial Cable - Debug / Console Cable for Raspberry Pi with seperate connectors per lead, like commercial RPi serial adaptor. Which is then connected to the RPi like this.

Note: The serial adaptor I use provides both a 0V and 5V lead (black and red) which provide power to the RPi. No extra power supply is needed besides this.

Preparations

Testing your hardware/serial port

First things first, you're going to want to make sure all your hardware works. Connect your serial adaptor to the RPi and boot up the official Raspian image. The boot process will output to both the serial and the HDMI and will start a getty on the serial. Set up your serial port, however yours works, and open up minicom. Make sure you have flow control turned off. Ensure you can run at 115200 baud, 8N1, which is what the RPi uses.

If you get 'Permission Denied' do NOT become root! This is unnecessary. Instead do:

sudo adduser <user> dialout

This will let your user use serial ports without needing root.

Or do ls -l /dev/ttyS* to find out the group that own the device, then add you into that group under /etc/group (normally the group is uucp)

If you started minicom only after the RPi has booted then simply press return in minicom so the getty will output a fresh login prompt. Otherwise wait for the boot messages to appear. If you don't get any output then connect the RPi to a monitor to check that it actually boots, check your connections and minicom settings.

Building a cross compiler

Like me you are probably using a x86 PC as main machine and want to edit and compile the source on that and the RPi is an ARM cpu so you absoluetly need a cross compiler. But even if you are developing on an ARM system it is still a good idea to build a cross compiler to avoid accidentally mixing stuff from your developement system with your own kernel. Follow the steps from GCC_Cross-Compiler to build your own cross compiler but use:

export TARGET=arm-none-eabi

Now we are ready to start.

Bare minimum kernel

Lets start with a minimum of 4 files. The kernel is going to use a subset of C++, meaning C++ without exceptions and without runtime types. The main function will be in main.cc. Before the main function can be called though some things have to be set up using assembly. This will be placed in boot.S. On top of that we also need a linker script and a Makefile to build the kernel and need to create an include directory for later use.

main.cc

/* main.cc - the entry point for the kernel */

extern "C" {
    // kernel_main gets called from boot.S. Declaring it extern "C" avoid
    // having to deal with the C++ name mangling.
    void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags);
}

#define UNUSED(x) (void)(x)

// kernel main function, it all begins here
void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags) {
    UNUSED(r0);
    UNUSED(r1);
    UNUSED(atags);
}

This simply declares an empty kernel_main function that simply returns. The GPU bootloader passes arguments to the kernel via r0-r2 and the boot.S makes sure to preserve those 3 registers. They are the first 3 arguments in a C function call. I'm not sure what R0 and R1 are for but r2 contains the address of the ATAGs (more about them later). For now the UNUSED() makro makes sure the compiler doesn't complain about unused variables.

boot.S

/* boot.S - assembly startup code */

// To keep this in the first portion of the binary.
.section ".text.boot"

// Make Start global.
.globl Start

// Entry point for the kernel.
// r15 -> should begin execution at 0x8000.
// r0 -> 0x00000000
// r1 -> 0x00000C42
// r2 -> 0x00000100 - start of ATAGS
// preserve these registers as argument for kernel_main
Start:
	// Setup the stack.
	mov	sp, #0x8000

	// Clear out bss.
	ldr	r4, =_bss_start
	ldr	r9, =_bss_end
	mov	r5, #0
	mov	r6, #0
	mov	r7, #0
	mov	r8, #0
1:
	// store multiple at r4.
	stmia	r4!, {r5-r8}

	// If we're still below bss_end, loop.
	cmp	r4, r9
	blo	1b

	// Call kernel_main
	ldr	r3, =kernel_main
	blx	r3

	// halt
halt:
	wfe
	b	halt

The section ".text.boot" will be used in the linker script to place the boot.S as the verry first thing in out kernel image. The code initializes a minimum C environment, which means having a stack and zeroing the BSS segment, before calling the kernel_main function. Note that the code avoids using r0-r2 so the remain valid for the kernel_main call.

link-arm-eabi.ld

* link-arm-eabi.ld - linker script for arm eabi */
ENTRY(Start)

SECTIONS
{
    /* Starts at LOADER_ADDR. */
    .text 0x8000 :
    _text_start = .;
    _start = .;
    {
        KEEP(*(.text.boot))
        *(.text)
    }
    . = ALIGN(4096); /* align to page size */
    _text_end = .;
    .rodata:
    _rodata_start = .;
    {
	*(.rodata)
    }
    . = ALIGN(4096); /* align to page size */
    _rodata_end = .;
    .data :
    _data_start = .;
    {
        *(.data)
    }
    . = ALIGN(4096); /* align to page size */
    _data_end = .;
    .bss :
    _bss_start = .;
    {
        bss = .;
        *(.bss)
    }
    . = ALIGN(4096); /* align to page size */
    _bss_end = .;
    
    _end = .;
}

There is a lot of text here but don't despair. The script is rather simple if you look at it bit by bit.

ENTRY(Start) declares the entry point for the kernel image. That symbol was declared in the boot.S file. Since we are actually booting a binary image I think the entry is completly irelevant. But it has to be there in the elf file we build as intermediate file. Declaring it makes the linker happy.

SECTIONS declares, well, sections. It decides where the bits and pieces of our code and data go and also sets a few symbols that help us track the size of each section.

    . = 0x8000;
    _start = .;

The "." denotes the current address so the first line tells the linker to set the current address to 0x8000, where the kernel starts. The current address is automatically incremented when the linker adds data. The second line then creates a symbol "_start" and sets it to the current address.

After that sections are defined for text (code), read-only data, read-write data and BSS (0 initialized memory). Other than the name the sections are identical so lets just look at one of them:

    _text_start = .;
    .text : {
        KEEP(*(.text.boot))
        *(.text)
    }
    . = ALIGN(4096); /* align to page size */
    _text_end = .;

The first line creates a _text_start symbol for the section. The second line opens a .text section for the output file which gets closed in the fifth line. Lines 3 and 4 declare what sections from the input files will be placed inside the output .text section. In our case ".text.boot" is to be placed first followed by the more general ".text". ".text.boot" is only used in boot.S and ensures that it ends up at the beginning of the kernel image. ".text" then contains all the remaining code. Any data added by the linker automatically increments the current addrress ("."). In line 6 we explicitly increment it so that it is aligned to a 4096 byte boundary (which is the page size for the RPi). And last line 7 creates a _text_end symbol so we know where the section ends.

What are the _text_start and _text_end for and why use page alignment? The 2 symbols can be used in the kernel source and the linker will then place the correct addresses into the binary. As an example the _bss_start and _bss_end are used in boot.S. But you can also use the symbols from C by declaring them extern first. While not required I made all sections aligned to page size. This later allows mapping them in the page tables with executable, read-only and read-write permissions without having to handle overlaps (2 sections in one page).

    _end = .;

After all sections are declared the _end symbol is created. If you ever want to know how large your kernel is at runtime you can use _start and _end to find out.

Makefile

# Makefile - build script */

# build environment
PREFIX ?= /usr/local/cross
ARMGNU ?= $(PREFIX)/bin/arm-none-eabi

# source files
SOURCES_ASM := $(wildcard *.S)
SOURCES_CC  := $(wildcard *.cc)

# object files
OBJS        := $(patsubst %.S,%.o,$(SOURCES_ASM))
OBJS        += $(patsubst %.cc,%.o,$(SOURCES_CC))

# Build flags
DEPENDFLAGS := -MD -MP
INCLUDES    := -I include
BASEFLAGS   := -O2 -fpic -pedantic -pedantic-errors -nostdlib
BASEFLAGS   += -nostartfiles -ffreestanding -nodefaultlibs
BASEFLAGS   += -fno-builtin -fomit-frame-pointer -mcpu=arm1176jzf-s
WARNFLAGS   := -Wall -Wextra -Wshadow -Wcast-align -Wwrite-strings
WARNFLAGS   += -Wredundant-decls -Winline
WARNFLAGS   += -Wno-attributes -Wno-deprecated-declarations
WARNFLAGS   += -Wno-div-by-zero -Wno-endif-labels -Wfloat-equal
WARNFLAGS   += -Wformat=2 -Wno-format-extra-args -Winit-self
WARNFLAGS   += -Winvalid-pch -Wmissing-format-attribute
WARNFLAGS   += -Wmissing-include-dirs -Wno-multichar
WARNFLAGS   += -Wredundant-decls -Wshadow
WARNFLAGS   += -Wno-sign-compare -Wswitch -Wsystem-headers -Wundef
WARNFLAGS   += -Wno-pragmas -Wno-unused-but-set-parameter
WARNFLAGS   += -Wno-unused-but-set-variable -Wno-unused-result
WARNFLAGS   += -Wwrite-strings -Wdisabled-optimization -Wpointer-arith
WARNFLAGS   += -Werror
ASFLAGS     := $(INCLUDES) $(DEPENDFLAGS) -D__ASSEMBLY__
CXXFLAGS    := $(INCLUDES) $(DEPENDFLAGS) $(BASEFLAGS) $(WARNFLAGS)
CXXFLAGS    += -fno-exceptions -std=c++0x

# build rules
all: kernel.img

include $(wildcard *.d)

kernel.elf: $(OBJS) link-arm-eabi.ld
	$(ARMGNU)-ld $(OBJS) -Tlink-arm-eabi.ld -o $@

kernel.img: kernel.elf
	$(ARMGNU)-objcopy kernel.elf -O binary kernel.img

clean:
	$(RM) -f $(OBJS) kernel.elf kernel.img

dist-clean: clean
	$(RM) -f *.d

# C++.
%.o: %.cc Makefile
	$(ARMGNU)-g++ $(CXXFLAGS) -c $< -o $@

# AS.
%.o: %.S Makefile
	$(ARMGNU)-g++ $(ASFLAGS) -c $< -o $@

And there you go. Try building it. A minimum kernel that does absolutely nothing.

Hello World kernel

Lets make the kernel do something. Lets say hello to the world using the serial port.

main.cc

/* main.cc - the entry point for the kernel */

#include <uart.h>

extern "C" {
    // kernel_main gets called from boot.S. Declaring it extern "C" avoid
    // having to deal with the C++ name mangling.
    void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags);
}

#define UNUSED(x) (void)(x)

const char hello[] = "\r\nHello World\r\n";
const char halting[] = "\r\n*** system halting ***";

// kernel main function, it all begins here
void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags) {
    UNUSED(r0);
    UNUSED(r1);
    UNUSED(atags);

    UART::init();
    
    UART::puts(hello);

    // Wait a bit
    for(volatile int i = 0; i < 10000000; ++i) { }

    UART::puts(halting);
}

include/mmio.h

/* mmio.h - access to MMIO registers */

#ifndef MMIO_H
#define MMIO_H

#include <stdint.h>

namespace MMIO {
    // write to MMIO register
    static inline void write(uint32_t reg, uint32_t data) {
	uint32_t *ptr = (uint32_t*)reg;
	asm volatile("str %[data], [%[reg]]"
		     : : [reg]"r"(ptr), [data]"r"(data));
    }

    // read from MMIO register
    static inline uint32_t read(uint32_t reg) {
	uint32_t *ptr = (uint32_t*)reg;
	uint32_t data;
	asm volatile("ldr %[data], [%[reg]]"
		     : [data]"=r"(data) : [reg]"r"(ptr));
	return data;
    }
}

#endif // #ifndef MMIO_H


include/uart.h

/* uart.h - UART initialization & communication */

#ifndef UART_H
#define UART_H

#include <stdint.h>

namespace UART {
    /*
     * Initialize UART0.
     */
    void init();

    /*
     * Transmit a byte via UART0.
     * uint8_t Byte: byte to send.
     */
    void putc(uint8_t byte);

    /*
     * print a string to the UART one character at a time
     * const char *str: 0-terminated string
     */
    void puts(const char *str);
}

#endif // #ifndef UART_H

uart.cc

/* uart.cc - UART initialization & communication */
/* Reference material:
 * http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
 * Chapter 13: UART
 */

#include <stdint.h>
#include <mmio.h>
#include <uart.h>

namespace UART {
    enum {
	// The GPIO registers base address.
	GPIO_BASE = 0x20200000,

	// The offsets for reach register.

	// Controls actuation of pull up/down to ALL GPIO pins.
	GPPUD = (GPIO_BASE + 0x94),

	// Controls actuation of pull up/down for specific GPIO pin.
	GPPUDCLK0 = (GPIO_BASE + 0x98),

	// The base address for UART.
	UART0_BASE = 0x20201000,

	// The offsets for reach register for the UART.
	UART0_DR     = (UART0_BASE + 0x00),
	UART0_RSRECR = (UART0_BASE + 0x04),
	UART0_FR     = (UART0_BASE + 0x18),
	UART0_ILPR   = (UART0_BASE + 0x20),
	UART0_IBRD   = (UART0_BASE + 0x24),
	UART0_FBRD   = (UART0_BASE + 0x28),
	UART0_LCRH   = (UART0_BASE + 0x2C),
	UART0_CR     = (UART0_BASE + 0x30),
	UART0_IFLS   = (UART0_BASE + 0x34),
	UART0_IMSC   = (UART0_BASE + 0x38),
	UART0_RIS    = (UART0_BASE + 0x3C),
	UART0_MIS    = (UART0_BASE + 0x40),
	UART0_ICR    = (UART0_BASE + 0x44),
	UART0_DMACR  = (UART0_BASE + 0x48),
	UART0_ITCR   = (UART0_BASE + 0x80),
	UART0_ITIP   = (UART0_BASE + 0x84),
	UART0_ITOP   = (UART0_BASE + 0x88),
	UART0_TDR    = (UART0_BASE + 0x8C),
    };

    /*
     * delay function
     * int32_t delay: number of cycles to delay
     *
     * This just loops <delay> times in a way that the compiler
     * wont optimize away.
     */
    void delay(int32_t count) {
	asm volatile("1: subs %[count], %[count], #1; bne 1b"
		     : : [count]"r"(count));
    }
    
    /*
     * Initialize UART0.
     */
    void init() {
	// Disable UART0.
	MMIO::write(UART0_CR, 0x00000000);
	// Setup the GPIO pin 14 && 15.
    
	// Disable pull up/down for all GPIO pins & delay for 150 cycles.
	MMIO::write(GPPUD, 0x00000000);
	delay(150);

	// Disable pull up/down for pin 14,15 & delay for 150 cycles.
	MMIO::write(GPPUDCLK0, (1 << 14) | (1 << 15));
	delay(150);

	// Write 0 to GPPUDCLK0 to make it take effect.
	MMIO::write(GPPUDCLK0, 0x00000000);
    
	// Clear pending interrupts.
	MMIO::write(UART0_ICR, 0x7FF);

	// Set integer & fractional part of baud rate.
	// Divider = UART_CLOCK/(16 * Baud)
	// Fraction part register = (Fractional part * 64) + 0.5
	// UART_CLOCK = 3000000; Baud = 115200.

	// Divider = 3000000/(16 * 115200) = 1.627 = ~1.
	// Fractional part register = (.627 * 64) + 0.5 = 40.6 = ~40.
	MMIO::write(UART0_IBRD, 1);
	MMIO::write(UART0_FBRD, 40);

	// Enable FIFO & 8 bit data transmissio (1 stop bit, no parity).
	MMIO::write(UART0_LCRH, (1 << 4) | (1 << 5) | (1 << 6));

	// Mask all interrupts.
	MMIO::write(UART0_IMSC, (1 << 1) | (1 << 4) | (1 << 5) |
		    (1 << 6) | (1 << 7) | (1 << 8) |
		    (1 << 9) | (1 << 10));

	// Enable UART0, receive & transfer part of UART.
	MMIO::write(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9));
    }

    /*
     * Transmit a byte via UART0.
     * uint8_t Byte: byte to send.
     */
    void putc(uint8_t byte) {
	// wait for UART to become ready to transmit
	while(true) {
	    if (!(MMIO::read(UART0_FR) & (1 << 5))) {
		break;
	    }
	}
	MMIO::write(UART0_DR, byte);
    }

    /*
     * print a string to the UART one character at a time
     * const char *str: 0-terminated string
     */
    void puts(const char *str) {
	while(*str) {
	    UART::putc(*str++);
	}
    }
}

Booting the kernel

Do you still have the SD card with the original raspian image on it from when you where testing the hardware above? Great. So you already have a SD card with a boot partition and the required files. If not then download one of the original raspberry boot images and copy them to the SD card.

Now mount the first partition from the SD card and look at it:

bootcode.bin  fixup.dat     kernel.img            start.elf
cmdline.txt   fixup_cd.dat  kernel_cutdown.img    start_cd.elf
config.txt    issue.txt     kernel_emergency.img

Simplified when the RPi powers up the ARM cpu is halted and the GPU runs. The GPU loads the bootloader from rom and executes it. That then finds the SD card and loads the bootcode.bin. The bootcode handles the config.txt and cmdline.txt (or does start.elf read that?) and then runs start.elf. start.elf loads the kernel.img and at last the ARM cpu is started running that kernel image.

So now we replace the original kernel.img with out own, umount, sync, stick the SD card into RPi and turn the power on. Your minicom should then show the following:

Hello World                                                                     
                                                                                
*** system halting ***

Echo kernel

The hello world kernel shows how to do output to the UART. Next lets do some input. And to see that the input works we simply echo the input as output. A simple echo kernel.

In include/uart.h add the following:

    /*
     * Receive a byte via UART0.
     *
     * Returns:
     * uint8_t: byte received.
     */
    uint8_t getc();

In uart.c add the following:

    /*
     * Receive a byte via UART0.
     *
     * Returns:
     * uint8_t: byte received.
     */
    uint8_t getc() {
	// wait for UART to have recieved something
	while(true) {
	    if (!(MMIO::read(UART0_FR) & (1 << 4))) {
		break;
	    }
	}
	return MMIO::read(UART0_DR);
    }

And last in main.cc put the following:

const char hello[] = "\r\nHello World, feel the echo\r\n";
b
// kernel main function, it all begins here
void kernel_main() {
    UART::init();

    UART::puts(hello);

    while(true) {
	UART::putc(UART::getc());
    }
}

Boot-from-serial kernel

The RPi boots the kernel directly form SD card and only from SD card. There is no other option. While devloping this becomes tiresome since one has to constantly swap the SD card from the RPi to a SD card reader and back. Writing the kernel to the SD card over and over also wears out the card. Plus the SD card slot is somewhat fragile, several people have reported that they broke it accidentally. So what can we do abou that?

Above we have seen how to get into C/C++ code at boot and how to read from and write to the serial port. We can use that to download code over the serial port and then execute that. We will call that kernel, or bootloader if you will, Raspbootin (pronounced Rasputin). Before you start editing files make a copy of the echo-kernel you have so far. We will later boot the echo-kernel over the serial console to test Raspbootin.

To make Raspbootin work we need a bootloader on the SD card but also an app on another system that then uploads the kernel over the serial port. Those two need to communicate and for that we have a boot protocol.

Boot protocol

The boot protocol for Raspbootin is rather simple. Raspbootin first sends 3 breaks (\x03) over the serial line to signal that it is ready to recieve a kernel. It then expects the size of the kernel as uint32_t in little endian byte order. After the size it replies with "OK" if the size is acceptable or "SE" if it is too large for it to handle. After "OK" it expects size many bytes representing the kernel. That's it.

Rapbootin

The bootloader will be called Raspbootin (pronounced Rasputin) and is verry similar to what we have already. None the less some changes need to be made. The problem is that any RPi kernel expects to be loaded at 0x8000 and started there. That also holds for Raspbootin itself. Loading a new kernel to 0x8000 would overwrite Raspbootin. But loading the new kernel somewhere else won't work either. So we have to move Raspbootin out of the way first before loading the new kernel.

boot.S

// To keep this in the first portion of the binary.
.section ".text.boot"

// Make Start global.
.globl Start

// Entry point for the kernel.
// r15 -> should begin execution at 0x8000.
// r0 -> 0x00000000
// r1 -> 0x00000C42
// r2 -> 0x00000100 - start of ATAGS
// preserve these registers as argument for kernel_main
Start:
	// Setup the stack.
	mov	sp, #0x8000

	// we're loaded at 0x8000, relocate to _start.
.relocate:
	// copy from r3 to r4.
	mov	r3, #0x8000
	ldr	r4, =_start
	ldr	r9, =_data_end
1:
	// Load multiple from r3, and store at r4.
	ldmia	r3!, {r5-r8}
	stmia	r4!, {r5-r8}

	// If we're still below file_end, loop.
	cmp	r4, r9
	blo	1b

	// Clear out bss.
	ldr	r4, =_bss_start
	ldr	r9, =_bss_end
	mov	r5, #0
	mov	r6, #0
	mov	r7, #0
	mov	r8, #0
1:
	// store multiple at r4.
	stmia	r4!, {r5-r8}

	// If we're still below bss_end, loop.
	cmp	r4, r9
	blo	1b

	// Call kernel_main
	ldr	r3, =kernel_main
	blx	r3

	// halt
halt:
	wfe
	b	halt

New in this is the relocate chunk. The GPU loads the kernel.bin at address 0x8000 while it should be at _start (where that is is defined in the linker script). The relocate is a simple memcpy to put Raspbootin in the right place. Then the BSS is cleared and kernel_main is called like before.

link-arm-eabi.ld

In the linker script change the start address:

ENTRY(Start)

SECTIONS
{
    /* Starts at LOADER_ADDR. */
    . = 0x2000000;
    _start = .;
...

main.cc

#include <stdint.h>
#include <uart.h>

extern "C" {
    // kernel_main gets called from boot.S. Declaring it extern "C" avoid
    // having to deal with the C++ name mangling.
    void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags);
}


#define LOADER_ADDR 0x2000000

const char hello[] = "\r\nRaspbootin V1.0\r\n";
const char halting[] = "\r\n*** system halting ***";

typedef void (*entry_fn)(uint32_t r0, uint32_t r1, uint32_t atags);

// kernel main function, it all begins here
void kernel_main(uint32_t r0, uint32_t r1, uint32_t atags) {
    UART::init();
again:
    UART::puts(hello);

    // request kernel by sending 3 breaks
    UART::puts("\x03\x03\x03");

    // get kernel size
    uint32_t size = UART::getc();
    size |= UART::getc() << 8;
    size |= UART::getc() << 16;
    size |= UART::getc() << 24;

    if (0x8000 + size > LOADER_ADDR) {
	UART::puts("SE");
	goto again;
    } else {
	UART::puts("OK");
    }
    
    // get kernel
    uint8_t *kernel = (uint8_t*)0x8000;
    while(size-- > 0) {
	*kernel++ = UART::getc();
    }

    // Kernel is loaded at 0x8000, call it via function pointer
    UART::puts("booting...");
    entry_fn fn = (entry_fn)0x8000;
    fn(r0, r1, atags);

    // fn() should never return. But it might, so make sure we catch it.
    // Wait a bit
    for(volatile int i = 0; i < 10000000; ++i) { }

    // Say goodbye and return to boot.S to halt.
    UART::puts(halting);
}

boot-server.cc

Don't put this in the kernel source directory. This is a standalone app and not part of the kernel. Compile and use with

gcc -O2 -W -Wall -g -o boot-server boot-server.cc
./boot-server /dev/ttyUSB0 kernel/kernel.img

The boot-server handles uploading the kernel (second argument) to the RPi over the serial device (first argument). The boot-server is rather complex because it handles a few extra perks. You can unplug the USB serial adaptor (which is how I reboot my RPi) and replug it and it will reopen the device. Also the kernel is read from disk fresh every time the Raspbootin requests it. So you do not need to restart the boot-server every time you compile a new kernel. The boot-server switches between a simple console mode and uploading kernels when it detects the 3 breaks send by Raspbootin to initiate a kernel upload. So you can just leave it running all the time and use it as your RPi terminal as well.

#define _BSD_SOURCE             /* See feature_test_macros(7) */

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <endian.h>
#include <stdint.h>
#include <termios.h>

#define BUF_SIZE 65536

struct termios old_tio, new_tio;

void do_exit(int fd, int res) {
    // close FD
    if (fd != -1) close(fd);
    // restore settings for STDIN_FILENO
    if (isatty(STDIN_FILENO)) {
	tcsetattr(STDIN_FILENO,TCSANOW,&old_tio);
    }
    exit(res);
}

// open serial connection
int open_serial(const char *dev) {
    // The termios structure, to be configured for serial interface.
    struct termios termios;

    // Open the device, read/write, not the controlling tty, and non-blocking I/O
    int fd = open(dev, O_RDWR | O_NOCTTY | O_NONBLOCK);
    if (fd == -1) {
	// failed to open
	return -1;
    }
    // must be a tty
    if (!isatty(fd)) {
        fprintf(stderr, "%s is not a tty\n", dev);
	do_exit(fd, EXIT_FAILURE);
    }

    // Get the attributes.
    if(tcgetattr(fd, &termios) == -1)
    {
        perror("Failed to get attributes of device");
	do_exit(fd, EXIT_FAILURE);
    }

    // So, we poll.
    termios.c_cc[VTIME] = 0;
    termios.c_cc[VMIN] = 0;

    // 8N1 mode, no input/output/line processing masks.
    termios.c_iflag = 0;
    termios.c_oflag = 0;
    termios.c_cflag = CS8 | CREAD | CLOCAL;
    termios.c_lflag = 0;

    // Set the baud rate.
    if((cfsetispeed(&termios, B115200) < 0) ||
       (cfsetospeed(&termios, B115200) < 0))
    {
        perror("Failed to set baud-rate");
	do_exit(fd, EXIT_FAILURE);
    }

    // Write the attributes.
    if (tcsetattr(fd, TCSAFLUSH, &termios) == -1) {
	perror("tcsetattr()");
	do_exit(fd, EXIT_FAILURE);
    }
    return fd;
}

// send kernel to rpi
void send_kernel(int fd, const char *file) {
    int file_fd;
    off_t off;
    uint32_t size;
    ssize_t pos;
    char *p;
    bool done = false;
    
    // Set fd blocking
    if (fcntl(fd, F_SETFL, 0) == -1) {
	perror("fcntl()");
	do_exit(fd, EXIT_FAILURE);
    }

    // Open file
    if ((file_fd = open(file, O_RDONLY)) == -1) {
	perror(file);
	do_exit(fd, EXIT_FAILURE);
    }

    // Get kernel size
    off = lseek(file_fd, 0L, SEEK_END);
    if (off > 0x200000) {
	fprintf(stderr, "kernel too big\n");
	do_exit(fd, EXIT_FAILURE);
    }
    size = htole32(off);
    lseek(file_fd, 0L, SEEK_SET);

    fprintf(stderr, "### sending kernel %s [%zu byte]\n", file, (size_t)off);

    // send kernel size to RPi
    p = (char*)&size;
    pos = 0;
    while(pos < 4) {
	ssize_t len = write(fd, &p[pos], 4 - pos);
	if (len == -1) {
	    perror("write()");
	    do_exit(fd, EXIT_FAILURE);
	}
	pos += len;
    }
    // wait for OK
    char ok_buf[2];
    p = ok_buf;
    pos = 0;
    while(pos < 2) {
	ssize_t len = read(fd, &p[pos], 2 - pos);
	if (len == -1) {
	    perror("read()");
	    do_exit(fd, EXIT_FAILURE);
	}
	pos += len;
    }
    if (ok_buf[0] != 'O' || ok_buf[1] != 'K') {
	fprintf(stderr, "error after sending size\n");
	do_exit(fd, EXIT_FAILURE);
    }

    while(!done) {
	char buf[BUF_SIZE];
	ssize_t pos = 0;
	ssize_t len = read(file_fd, buf, BUF_SIZE);
	switch(len) {
	case -1:
	    perror("read()");
	    do_exit(fd, EXIT_FAILURE);
	case 0:
	    done = true;
	}
	while(len > 0) {
	    ssize_t len2 = write(fd, &buf[pos], len);
	    if (len2 == -1) {
		perror("write()");
		do_exit(fd, EXIT_FAILURE);
	    }
	    len -= len2;
	    pos += len2;
	}
    }
    
    // Set fd non-blocking
    if (fcntl(fd, F_SETFL, O_NONBLOCK) == -1) {
	perror("fcntl()");
	do_exit(fd, EXIT_FAILURE);
    }

    fprintf(stderr, "### finished sending\n");

    return;
}

int main(int argc, char *argv[]) {
    int fd, max_fd = STDIN_FILENO;
    fd_set rfds, wfds, efds;
    char buf[BUF_SIZE];
    size_t start = 0;
    size_t end = 0;
    bool done = false, leave = false;
    int breaks = 0;
	
    if (argc != 3) {
	printf("USAGE: %s <dev> <file>\n", argv[0]);
	printf("Example: %s /dev/ttyUSB0 kernel/kernel.img\n", argv[0]);
	exit(EXIT_FAILURE);
    }

    // Set STDIN non-blocking and unbuffered
    if (fcntl(STDIN_FILENO, F_SETFL, O_NONBLOCK) == -1) {
	perror("fcntl()");
	exit(EXIT_FAILURE);
    }
    if (isatty(STDIN_FILENO)) {
	// get the terminal settings for stdin
	if (tcgetattr(STDIN_FILENO, &old_tio) == -1) {
	    perror("tcgetattr");
	    exit(EXIT_FAILURE);
	}
	
	// we want to keep the old setting to restore them a the end
	new_tio=old_tio;

	// disable canonical mode (buffered i/o) and local echo
	new_tio.c_lflag &= (~ICANON & ~ECHO);

	// set the new settings immediately
	if (tcsetattr(STDIN_FILENO, TCSANOW, &new_tio) == -1) {
	    perror("tcsetattr()");
	    do_exit(-1, EXIT_FAILURE);
	}
    }
    
    while(!leave) {
	// Open device
	if ((fd = open_serial(argv[1])) == -1) {
	    if (errno == ENOENT || errno == ENODEV) {
		fprintf(stderr, "\r### Waiting for %s...\r", argv[1]);
		sleep(1);
		continue;
	    }
	    perror(argv[1]);
	    do_exit(fd, EXIT_FAILURE);
	}
	fprintf(stderr, "### Listening on %s     \n", argv[1]);

	// select needs the largeds FD + 1
	if (fd > STDIN_FILENO) {
	    max_fd = fd + 1;
	} else {
	    max_fd = STDIN_FILENO + 1;
	}

	done = false;
	start = end = 0;
	while(!done || start != end) {	
	    // Watch stdin and dev for input.
	    FD_ZERO(&rfds);
	    if (!done && end < BUF_SIZE) FD_SET(STDIN_FILENO, &rfds);
	    FD_SET(fd, &rfds);
	    
	    // Watch fd for output if needed.
	    FD_ZERO(&wfds);
	    if (start != end) FD_SET(fd, &wfds);

	    // Watch stdin and dev for error.
	    FD_ZERO(&efds);
	    FD_SET(STDIN_FILENO, &efds);
	    FD_SET(fd, &efds);

	    // Wait for something to happend
	    if (select(max_fd, &rfds, &wfds, &efds, NULL) == -1) {
		perror("select()");
		do_exit(fd, EXIT_FAILURE);
	    } else {
		// check for errors
		if (FD_ISSET(STDIN_FILENO, &efds)) {
		    fprintf(stderr, "error on STDIN\n");
		    do_exit(fd, EXIT_FAILURE);
		}
		if (FD_ISSET(fd, &efds)) {
		    fprintf(stderr, "error on device\n");
		    do_exit(fd, EXIT_FAILURE);
		}
		// RPi is ready to recieve more data, send more
		if (FD_ISSET(fd, &wfds)) {
		    ssize_t len = write(fd, &buf[start], end - start);
		    if (len == -1) {
			perror("write()");
			do_exit(fd, EXIT_FAILURE);
		    }
		    start += len;
		    if (start == end) start = end = 0;
		    // shift buffer contents
		    if (end == BUF_SIZE) {
			memmove(buf, &buf[start], end - start);
			end -= start;
			start = 0;
		    }
		}
		// input from the user, copy to RPi
		if (FD_ISSET(STDIN_FILENO, &rfds)) {
		    ssize_t len = read(STDIN_FILENO, &buf[end], BUF_SIZE - end);
		    switch(len) {
		    case -1:
			perror("read()");
			do_exit(fd, EXIT_FAILURE);
		    case 0:
			done = true;
			leave = true;
		    }
		    end += len;
		}
		// output from the RPi, copy to STDOUT
		if (FD_ISSET(fd, &rfds)) {
		    char buf2[BUF_SIZE];
		    ssize_t len = read(fd, buf2, BUF_SIZE);
		    switch(len) {
		    case -1:
			perror("read()");
			do_exit(fd, EXIT_FAILURE);
		    case 0:
			done = true;
		    }
		    // scan output for tripple break (^C^C^C)
		    // send kernel on tripple break, otherwise output text
		    const char *p = buf2;
		    while(p < &buf2[len]) {
			const char *q = index(p, '\x03');
			if (q == NULL) q = &buf2[len];
			if (p == q) {
			    ++breaks;
			    ++p;
			    if (breaks == 3) {
				if (start != end) {
				    fprintf(stderr, "Discarding input after tripple break\n");
				    start = end = 0;
				}
				send_kernel(fd, argv[2]);
				breaks = 0;
			    }
			} else {
			    while (breaks > 0) {
				ssize_t len2 = write(STDOUT_FILENO, "\x03\x03\x03", breaks);
				if (len2 == -1) {
				    perror("write()");
				    do_exit(fd, EXIT_FAILURE);
				}
				breaks -= len2;
			    }
			    while(p < q) {
				ssize_t len2 = write(STDOUT_FILENO, p, q - p);
				if (len2 == -1) {
				    perror("write()");
				    do_exit(fd, EXIT_FAILURE);
				}
				p += len2;
			    }
			}
		    }
		}
	    }
	}
	close(fd);
    }
		
    do_exit(-1, EXIT_SUCCESS);
}

Enjoy.

Parsing ATAGs

Framebuffer support

Interrupts

USB

External references

  1. arm_arm.pdf - general ARM Architecture Reference Manual v6
  2. DDI0301H_arm1176jzfs_r0p7_trm.pdf - More specific ARM for the RPi
  3. [1] - basic toolchain + UART stuff
  4. RPi_Hardware - list of datasheets (and one manual about peripherals on the broadcom chip)
  5. [2] - for mailboxes and video stuff
  6. BCM2835-ARM-Peripherals.pdf - Datasheep for RPi peripherals