This page is a work in progress.
This page may thus be incomplete. Its content may be changed in the near future.

Intro

This is a tutorial on bare-metal [OS] development on the Raspberry Pi. This tutorial is written specifically for the Raspberry Pi Model B Rev 2 because the author has no other hardware to test on. But so far the models are basically identical for the purpose of this tutorial (Rev 1 has 256MB ram, Model A has no ethernet).

This is the authors very first ARM system and we learn as we write without any prior knowledge about arm. Experience in Linux/Unix (very important) and C/C++ language (incredibly important, including how to use inline asm) is assumed and required. This is not a tutorial about how to build a kernel but a simple intro in how to get started on the RPi.

Materials

You will need a:

  • Raspberry Pi, RPi in short.
  • SD Card to boot from.
  • A SD Card reader so you can write to the SD Card from your developement system.
  • A serial adaptor for the RPi.
  • Power from an external power supply, usb or the serial adaptor.

Serial adaptor

The RPi has 2 serials (UARTs). This tutorial only concerns itself with UART0, called simply UART or serial port. UART1 is ignored from now on. The basic UART onboard uses a 3.3V TTL and is connected to some of the GPIO pins labeled "P1" on the board. x86 PCs and MACs do use 5V TTL so you need some adaptor to convert the TTL. I recommend a USB to TTL Serial Cable - Debug / Console Cable for Raspberry Pi with seperate connectors per lead, like commercial RPi serial adaptor. Which is then connected to the RPi like this.

Note: The serial adaptor I use provides both a 0V and 5V lead (black and red) which provide power to the RPi. No extra power supply is needed besides this.

Preparations

Testing your hardware/serial port

First things first, you're going to want to make sure all your hardware works. Connect your serial adaptor to the RPi and boot up the official Raspian image. The boot process will output to both the serial and the HDMI and will start a getty on the serial. Set up your serial port, however yours works, and open up minicom. Make sure you have flow control turned off. Ensure you can run at 115200 baud, 8N1, which is what the RPi uses.

If you get 'Permission Denied' do NOT become root! This is unnecessary. Instead do:

sudo adduser <user> dialout

This will let your user use serial ports without needing root.

Or do ls -l /dev/ttyS* to find out the group that own the device, then add you into that group under /etc/group (normally the group is uucp)

If you started minicom only after the RPi has booted then simply press return in minicom so the getty will output a fresh login prompt. Otherwise wait for the boot messages to appear. If you don't get any output then connect the RPi to a monitor to check that it actually boots, check your connections and minicom settings.

Building a cross compiler

Like me you are probably using a x86 PC as main machine and want to edit and compile the source on that and the RPi is an ARM cpu so you absoluetly need a cross compiler. But even if you are developing on an ARM system it is still a good idea to build a cross compiler to avoid accidentally mixing stuff from your developement system with your own kernel. Follow the steps from GCC_Cross-Compiler to build your own cross compiler but use:

export TARGET=arm-none-eabi

Now we are ready to start.

Bare minimum kernel

Lets start with a minimum of 4 files. The kernel is going to use a subset of C++, meaning C++ without exceptions and without runtime types. The main function will be in main.cc. Before the main function can be called though some things have to be set up using assembly. This will be placed in boot.S. On top of that we also need a linker script and a Makefile to build the kernel and need to create an include directory for later use.

main.cc

/* main.cc - the entry point for the kernel */

extern "C" {
    void kernel_main(void);
}

void kernel_main(void) {

}

boot.S

/* boot.S - assembly startup code */

// To keep this in the first portion of the binary.
.text

// Make Start global.
.globl Start

// Entry point for the kernel.
// r15 -> should begin execution at 0x8000.
Start:
	// Setup the stack.
	mov	sp, #0x8000

	// Clear out bss.
	ldr	r1, =_bss_start
	ldr	r6, =_bss_end
	mov	r2, #0
	mov	r3, #0
	mov	r4, #0
	mov	r5, #0
1:
	// store multiple at r1.
	stmia	r1!, {r2-r5}

	// If we're still below bss_end, loop.
	cmp	r1, r6
	blo	1b

	// Call kernel_main
	ldr	r0, =kernel_main
	blx	r0

	// halt
halt:
	wfe
	b	halt

link-arm-eabi.ld

* link-arm-eabi.ld - linker script for arm eabi */
ENTRY(Start)

SECTIONS
{
    /* Starts at LOADER_ADDR. */
    .text 0x8000 :
    _text_start = .;
    _start = .;
    {
        *(.text)
    }
    . = ALIGN(4096); /* align to page size */
    _text_end = .;
    .rodata:
    _rodata_start = .;
    {
	*(.rodata)
    }
    . = ALIGN(4096); /* align to page size */
    _rodata_end = .;
    .data :
    _data_start = .;
    {
        *(.data)
    }
    . = ALIGN(4096); /* align to page size */
    _data_end = .;
    .bss :
    _bss_start = .;
    {
        bss = .;
        *(.bss)
    }
    . = ALIGN(4096); /* align to page size */
    _bss_end = .;
    
    _end = .;
}

Makefile

# Makefile - build script */

# build environment
PREFIX ?= /usr/local/cross
ARMGNU ?= $(PREFIX)/bin/arm-none-eabi

# source files
SOURCES_ASM := $(wildcard *.S)
SOURCES_CC  := $(wildcard *.cc)

# object files
OBJS        := $(patsubst %.S,%.o,$(SOURCES_ASM))
OBJS        += $(patsubst %.cc,%.o,$(SOURCES_CC))

# Build flags
DEPENDFLAGS := -MD -MP
INCLUDES    := -I include
BASEFLAGS   := -O2 -fpic -pedantic -pedantic-errors -nostdlib
BASEFLAGS   += -nostartfiles -ffreestanding -nodefaultlibs
BASEFLAGS   += -fno-builtin -fomit-frame-pointer -mcpu=arm1176jzf-s
WARNFLAGS   := -Wall -Wextra -Wshadow -Wcast-align -Wwrite-strings
WARNFLAGS   += -Wredundant-decls -Winline
WARNFLAGS   += -Wno-attributes -Wno-deprecated-declarations
WARNFLAGS   += -Wno-div-by-zero -Wno-endif-labels -Wfloat-equal
WARNFLAGS   += -Wformat=2 -Wno-format-extra-args -Winit-self
WARNFLAGS   += -Winvalid-pch -Wmissing-format-attribute
WARNFLAGS   += -Wmissing-include-dirs -Wno-multichar
WARNFLAGS   += -Wredundant-decls -Wshadow
WARNFLAGS   += -Wno-sign-compare -Wswitch -Wsystem-headers -Wundef
WARNFLAGS   += -Wno-pragmas -Wno-unused-but-set-parameter
WARNFLAGS   += -Wno-unused-but-set-variable -Wno-unused-result
WARNFLAGS   += -Wwrite-strings -Wdisabled-optimization -Wpointer-arith
WARNFLAGS   += -Werror
ASFLAGS     := $(INCLUDES) $(DEPENDFLAGS) -D__ASSEMBLY__
CXXFLAGS    := $(INCLUDES) $(DEPENDFLAGS) $(BASEFLAGS) $(WARNFLAGS)
CXXFLAGS    += -fno-exceptions -std=c++0x

# build rules
all: kernel.img

include $(wildcard *.d)

kernel.elf: $(OBJS) link-arm-eabi.ld
	$(ARMGNU)-ld $(OBJS) -Tlink-arm-eabi.ld -o $@

kernel.img: kernel.elf
	$(ARMGNU)-objcopy kernel.elf -O binary kernel.img

clean:
	$(RM) -f $(OBJS) kernel.elf kernel.img

dist-clean: clean
	$(RM) -f *.d

# C++.
%.o: %.cc Makefile
	$(ARMGNU)-g++ $(CXXFLAGS) -c $< -o $@

# AS.
%.o: %.S Makefile
	$(ARMGNU)-g++ $(ASFLAGS) -c $< -o $@

And there you go. Try building it. A minimum kernel that does absolutely nothing.

Hello World kernel

Lets make the kernel do something. Lets say hello to the world using the serial port.

main.cc

/* main.cc - the entry point for the kernel */

#include <uart.h>

extern "C" {
    // kernel_main gets called from boot.S. Declaring it extern "C" avoid
    // having to deal with the C++ name mangling.
    void kernel_main(void);
}

const char hello[] = "\r\nHello World\r\n";
const char halting[] = "\r\n*** system halting ***";

// kernel main function, it all begins here
void kernel_main(void) {
    UART::init();
    
    UART::puts(hello);

    // Wait a bit
    for(volatile int i = 0; i < 10000000; ++i) { }

    UART::puts(halting);
}

include/mmio.h

/* mmio.h - access to MMIO registers */

#ifndef MMIO_H
#define MMIO_H

#include <stdint.h>

namespace MMIO {
    // write to MMIO register
    static inline void write(uint32_t reg, uint32_t data) {
	uint32_t *ptr = (uint32_t*)reg;
	asm volatile("str %[data], [%[reg]]"
		     : : [reg]"r"(ptr), [data]"r"(data));
    }

    // read from MMIO register
    static inline uint32_t read(uint32_t reg) {
	uint32_t *ptr = (uint32_t*)reg;
	uint32_t data;
	asm volatile("ldr %[data], [%[reg]]"
		     : [data]"=r"(data) : [reg]"r"(ptr));
	return data;
    }
}

#endif // #ifndef MMIO_H


include/uart.h

/* uart.h - UART initialization & communication */

#ifndef UART_H
#define UART_H

#include <stdint.h>

namespace UART {
    /*
     * Transmit a byte via UART0.
     * uint8_t Byte: byte to send.
     */
    void putc(uint8_t byte);

    /*
     * print a string to the UART one character at a time
     * const char *str: 0-terminated string
     */
    void puts(const char *str);
}

#endif // #ifndef UART_H

uart.cc

/* uart.cc - UART initialization & communication */
/* Reference material:
 * http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
 * Chapter 13: UART
 */

#include <stdint.h>
#include <mmio.h>
#include <uart.h>

namespace UART {
    enum {
	// The GPIO registers base address.
	GPIO_BASE = 0x20200000,

	// The offsets for reach register.

	// Controls actuation of pull up/down to ALL GPIO pins.
	GPPUD = (GPIO_BASE + 0x94),

	// Controls actuation of pull up/down for specific GPIO pin.
	GPPUDCLK0 = (GPIO_BASE + 0x98),

	// The base address for UART.
	UART0_BASE = 0x20201000,

	// The offsets for reach register for the UART.
	UART0_DR     = (UART0_BASE + 0x00),
	UART0_RSRECR = (UART0_BASE + 0x04),
	UART0_FR     = (UART0_BASE + 0x18),
	UART0_ILPR   = (UART0_BASE + 0x20),
	UART0_IBRD   = (UART0_BASE + 0x24),
	UART0_FBRD   = (UART0_BASE + 0x28),
	UART0_LCRH   = (UART0_BASE + 0x2C),
	UART0_CR     = (UART0_BASE + 0x30),
	UART0_IFLS   = (UART0_BASE + 0x34),
	UART0_IMSC   = (UART0_BASE + 0x38),
	UART0_RIS    = (UART0_BASE + 0x3C),
	UART0_MIS    = (UART0_BASE + 0x40),
	UART0_ICR    = (UART0_BASE + 0x44),
	UART0_DMACR  = (UART0_BASE + 0x48),
	UART0_ITCR   = (UART0_BASE + 0x80),
	UART0_ITIP   = (UART0_BASE + 0x84),
	UART0_ITOP   = (UART0_BASE + 0x88),
	UART0_TDR    = (UART0_BASE + 0x8C),
    };

    /*
     * delay function
     * int32_t delay: number of cycles to delay
     *
     * This just loops <delay> times in a way that the compiler
     * wont optimize away.
     */
    void delay(int32_t count) {
	asm volatile("1: subs %[count], %[count], #1; bne 1b"
		     : : [count]"r"(count));
    }
    
    /*
     * Initialize UART0.
     */
    void init(void) {
	// Disable UART0.
	MMIO::write(UART0_CR, 0x00000000);
	// Setup the GPIO pin 14 && 15.
    
	// Disable pull up/down for all GPIO pins & delay for 150 cycles.
	MMIO::write(GPPUD, 0x00000000);
	delay(150);

	// Disable pull up/down for pin 14,15 & delay for 150 cycles.
	MMIO::write(GPPUDCLK0, (1 << 14) | (1 << 15));
	delay(150);

	// Write 0 to GPPUDCLK0 to make it take effect.
	MMIO::write(GPPUDCLK0, 0x00000000);
    
	// Clear pending interrupts.
	MMIO::write(UART0_ICR, 0x7FF);

	// Set integer & fractional part of baud rate.
	// Divider = UART_CLOCK/(16 * Baud)
	// Fraction part register = (Fractional part * 64) + 0.5
	// UART_CLOCK = 3000000; Baud = 115200.

	// Divider = 3000000/(16 * 115200) = 1.627 = ~1.
	// Fractional part register = (.627 * 64) + 0.5 = 40.6 = ~40.
	MMIO::write(UART0_IBRD, 1);
	MMIO::write(UART0_FBRD, 40);

	// Enable FIFO & 8 bit data transmissio (1 stop bit, no parity).
	MMIO::write(UART0_LCRH, (1 << 4) | (1 << 5) | (1 << 6));

	// Mask all interrupts.
	MMIO::write(UART0_IMSC, (1 << 1) | (1 << 4) | (1 << 5) |
		    (1 << 6) | (1 << 7) | (1 << 8) |
		    (1 << 9) | (1 << 10));

	// Enable UART0, receive & transfer part of UART.
	MMIO::write(UART0_CR, (1 << 0) | (1 << 8) | (1 << 9));
    }

    /*
     * Transmit a byte via UART0.
     * uint8_t Byte: byte to send.
     */
    void putc(uint8_t byte) {
	// wait for UART to become ready to transmit
	while(true) {
	    if (!(MMIO::read(UART0_FR) & (1 << 5))) {
		break;
	    }
	}
	MMIO::write(UART0_DR, byte);
    }

    /*
     * print a string to the UART one character at a time
     * const char *str: 0-terminated string
     */
    void puts(const char *str) {
	while(*str) {
	    UART::putc(*str++);
	}
    }
}

Booting the kernel

Do you still have the SD card with the original raspian image on it from when you where testing the hardware above? Great. So you already have a SD card with a boot partition and the required files. If not then download one of the original raspberry boot images and copy them to the SD card.

Now mount the first partition from the SD card and look at it:

bootcode.bin  fixup.dat     kernel.img            start.elf
cmdline.txt   fixup_cd.dat  kernel_cutdown.img    start_cd.elf
config.txt    issue.txt     kernel_emergency.img

Simplified when the RPi powers up the ARM cpu is halted and the GPU runs. The GPU loads the bootloader from rom and executes it. That then finds the SD card and loads the bootcode.bin. The bootcode handles the config.txt and cmdline.txt (or does start.elf read that?) and then runs start.elf. start.elf loads the kernel.img and at last the ARM cpu is started running that kernel image.

So now we replace the original kernel.img with out own, umount, sync, stick the SD card into RPi and turn the power on. Your minicom should then show the following:

Hello World                                                                     
                                                                                
*** system halting ***

Echo kernel

The hello world kernel shows how to do output to the UART. Next lets do some input. And to see that the input works we simply echo the input as output. A simple echo kernel.

In include/uart.h add the following:

    /*
     * Receive a byte via UART0.
     *
     * Returns:
     * uint8_t: byte received.
     */
    uint8_t getc(void);

In uart.c add the following:

    /*
     * Receive a byte via UART0.
     *
     * Returns:
     * uint8_t: byte received.
     */
    uint8_t getc(void) {
	// wait for UART to have recieved something
	while(true) {
	    if (!(MMIO::read(UART0_FR) & (1 << 4))) {
		break;
	    }
	}
	return MMIO::read(UART0_DR);
    }

And last in main.cc put the following:

const char hello[] = "\r\nHello World, feel the echo\r\n";
b
// kernel main function, it all begins here
void kernel_main(void) {
    UART::init();

    UART::puts(hello);

    while(true) {
	UART::putc(UART::getc());
    }
}

Boot-from-serial kernel

The RPi boots the kernel directly form SD card and only from SD card. There is no other option. While devloping this becomes tiresome since one has to constantly swap the SD card from the RPi to a SD card reader and back. Writing the kernel to the SD card over and over also wears out the card.

But above we have seen how to get into C/C++ code at boot and how to read from and write to the serial port. We can use that to download code over the serial port and then execute that. We will call that Raspbootin (pronounced Rasputin). Before you start editing files make a copy of the echo-kernel you have so far. We will later boot that from the serial console to test the bootloader.

External references

  1. arm_arm.pdf - general ARM Architecture Reference Manual v6
  2. DDI0301H_arm1176jzfs_r0p7_trm.pdf - More specific ARM for the RPi
  3. [1] - basic toolchain + UART stuff
  4. RPi_Hardware - list of datasheets (and one manual about peripherals on the broadcom chip)
  5. [2] - for mailboxes and video stuff
  6. BCM2835-ARM-Peripherals.pdf - Datasheep for RPi peripherals