OS Specific Toolchain

From OSDev.wiki
Revision as of 03:00, 20 September 2014 by osdev>Sortie (This section is no longer needed as it was moved to 'Hosted GCC Cross-Compiler')
Jump to navigation Jump to search
Difficulty level

Advanced

This tutorial will walk you through creating a toolchain comprising Binutils, GCC (with cc1 and libgcc), newlib and optionally cc1plus and libstdc++ (with libsupc++) that specifically targets your OS. In this tutorial we will assume you are using an i586 compatible PC for a hypothetical OS named 'MyOS' and so the GNU canonical name we will be targeting is 'i586-pc-myos'. It should not, however, be difficult to adapt this tutorial for other systems, e.g. x86-64. This tutorial will produce a setup with not much different functionality from those produced by the GCC Cross-Compiler and Porting Newlib tutorials (aside from the name). It is merely intended to demonstrate the principles behind adding a new target to these packages and provide you with a framework for incorporating your ownOS-specific modifications. You will, however, get a newlib that does not require libgloss (linking will work 'out of the box').

Prerequisites

  • The latest binutils, gcc-core and newlib packages, and optionally gcc-g++. This tutorial was developed with binutils-2.18, gcc-4.2.1 and newlib-1.15.0, and has recently been tested on binutils-2.22 and gcc-4.7.0, and again on binutils-2.23 and gcc-4.7.2 using newlib-2.0.0.
  • A build environment that can successfully build a GCC Cross-Compiler. This tutorial was developed with Cygwin and Ubuntu 12.04.
  • autoconf, automake, autoreconf, libtool.
  • The OS you are targeting should be able to handle syscalls, with a well defined interface.
  • Knowledge of the internals of binutils, gcc and newlib, along with a working knowledge of autoconf and automake.
  • Depending on your version of gcc, you may also need gmp, mpfr and mpc.

Modify the source directories to add your target

Now you can start adding your own target to each package. Some of the changes are quite similar between each package.

Binutils

config.sub

This is a file you will modify in the same way for each package. It is a gnu standard file produced by including the line 'AC_CANONICAL_SYSTEM' in a configure.in that is processed by autoconf, and is designed to convert a canonical name of the form 'i586-pc-myos' into separate variables for the processor, vendor and OS, and also rejects systems it doesn't know about. We simply need to add 'myos' to the list of acceptable operating systems. Find the section that begins with the comment "First accept the basic system types" (it begins '-gnu*') and add '-myos*' to the list. I typically add it after -aos* simply because there is some free room on the line there.

bfd/config.bfd

This file is part of the configuration for libbfd, the back-end to binutils which provides a consistent interface for many object file formats. Generally, each platform-specific version of binutils contains a libbfd which only supports the object files normally in use on that system, as otherwise the library would be massive (libbfd can support a _lot_ of object types). We need to associate our os with some particular object types. There is a long list starting 'WHEN ADDING ENTRIES TO THIS MATRIX' with the first line as 'case "${targ}" in'. We need to add our full canonical name to this list, by adding some cases such as:

 i[3-7]86-*-myos*)
   targ_defvec=bfd_elf32_i386_vec
   targ_selvecs=i386coff_vec
   ;;

Be sure to follow the instructions in the comment block above the list and add your entry beneath the comment "#START OF targmatch.h". If you like, you could support different object formats (look at other entries in the list, and the contents of 'bfd' for hints) and also provide more than one to the 'targ_selvecs' line. The example here lets you deal with coff object files, but defaults to elf for object files and executables (similar to the i686-elf GCC Cross-Compiler).

gas/configure.tgt

This file tells the gnu assembler what type of output to generate for each target. It automatically matches the i586 part of your target and generates the correct output for that. We just need to tell it what type of object file to generate for myos. In the section starting 'Assign object format ... case ${generic_target} in' you need to add a line like

 i386-*-myos*)    fmt=elf ;;

You should use 'i386' in this line even if you are targeting x86_64. This is the only file where you should do it. It is basically because the variable 'generic_target' is not your canonical target name, but rather a variable generated further up in the configure.tgt file, and it sets the first part to i386 for any i[3-7]86 or x86_64.

ld/configure.tgt

This file tells the gnu linker what 'emulation' to use for each target. An emulation is basically a combination of linker script and executable file format. We are going to define our own emulation called myos_i386. We need to add an entry to the case statement here beginning 'Please try to keep this table in alphabetical order ... case "${targ}" in'

 i[3-7]86-*-myos*)    targ_emul=myos_i386 ;;

You can also add a targ_extra_emuls line to specify other targets ld should support, e.g. elf_i386, although this is untested. See the ld/configure.tgt file for examples.

ld/emulparams/myos_i386.sh

Now we need to actually define our emulation. There is a generic file called ld/genscripts.sh which creates the required linker scripts for our target (you need more than one, depending on shared object usage and the like: I have 13 for a single target). It uses a linker script template (from the ld/scripttempl directory) to do this, and it creates the actual emulation C file from an emulation template (from the ld/emultempl directory). These templates are customised by running a script in the ld/emulparams directory which sets various variables. You are welcome to define your own emulation and linker templates, but I find the ELF ones adequate, given that they can be customised by simply adding a file to the emulparams directory. This is what we are going to do now. The content of the file could be something like:

 SCRIPT_NAME=elf

(defines the scripttempl file to use, in this case the ELF one)

 OUTPUT_FORMAT=elf32-i386

(create our executables in elf32 format)

 TEXT_START_ADDR=0x40000000

(start of the .text section and therefore the entire executable image. I set to this because my kernel occupies 0x0-0x3fffffff)

 MAXPAGESIZE="CONSTANT (MAXPAGESIZE)"
 COMMONPAGESIZE="CONSTANT (COMMONPAGESIZE)"

(Tell ld the page sizes so it can properly align sections to page boundaries. Just use the defaults here)

 ARCH=i386
 MACHINE=

(self explanatory)

 NOP=0x90909090

(what ld pads sections with, basically the i386 NOP instruction 4 times to fill a 32-bit value)

 TEMPLATE_NAME=elf32

(defines the emultempl file to use, again, stick with ELF)

 GENERATE_SHLIB_SCRIPT=yes
 GENERATE_PIE_SCRIPT=yes

(should we generate linker scripts to produced shared libraries and position-independent executables respectively)

 NO_SMALL_DATA=yes
 SEPARATE_GOTPLT=12

(unsure as to these, just copied from the ELF one, seems to work okay)

ld/Makefile.in

We now just need to tell make how to produce the emulation C file for our specific emulation. Putting the 'targ_emul=myos_i386' line into ld/configure.tgt above implies that your host linker will try to link your target ld executable with an object file called emyos_i386.o. There is a default rule to generate this from emyos_i386.c, so we just need to tell it how to make this emyos_i386.c file. As stated above, we let the genscripts.sh file do the hard work. You need to add a makefile rule (I add it after the one to build eelf_i386.c) similar to:

 emyos_i386.c: $(srcdir)/emulparams/myos_i386.sh $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
          ${GENSCRIPTS} myos_i386 "$(tdir_myos_i386)"

Note that some parts of the line use normal brackets () whereas other parts use curly braces {}. Also, the second line is indented with a tab, not spaces. You can also add 'emyos_i386.o' to the dependencies of 'ALL_EMULATIONS' if you like, but it is not essential.

Note: The second line must start with single tab, not spaces, because this is a Makefile.

GCC

config.sub

Similar modification to config.sub in binutils.

gcc/config.gcc

This file defines what needs to be built for each particular target and what to include in the final executable. There are two main sections: one which defines generic options for your operating system, and those which define options specific to your operating system on each individual machine type. For the first part, find the section starting 'Common parts for widely ported systems ... case ${target} in' and add something like:

 *-*-myos*)
   extra_parts="crtbegin.o crtend.o"
   gas=yes
   gnu_ld=yes
   default_use_cxa_atexit=yes
   ;;

This basically says: build the static versions of crtbegin and crtend (important for various parts of libgcc), our operating system by default uses the GNU linker and assembler and that we will provide __cxa_atexit (newlib will actually provide it). The second section we need to add to is the architecture-specific one. Find the section starting 'case ${target} in ... Support site-specific machine types' and add

 i[3-7]86-*-myos*)
   tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h i386/i386elf.h myos.h"
   tmake_file="i386/t-i386elf t-svr4"
   use_fixproto=yes
   ;;

Which defines some extra include files and makefile fragments to use. In GCC 4.7.0, the text 'Support site-specific maching types' does not exist in this file, but you still need to edit the appropriate table - it is the one immediately following the table to which you added the first part ('Common parts for widely ported systems'). We will create the myos.h in the next step.

gcc/config/myos.h

Now we create a header file which sets some OS-specific stuff in GCC. This currently just sets some variables which state that the C preprocessor should #define some macros, make some asserts and set the default system name when we are compiling an application for our OS. Something like the following should be fine.

/* Useful if you wish to make target-specific gcc changes. */
#undef TARGET_MYOS
#define TARGET_MYOS 1

/* Don't automatically add extern "C" { } around header files.  */
#undef  NO_IMPLICIT_EXTERN_C
#define NO_IMPLICIT_EXTERN_C 1

/* Additional predefined macros. */
#undef TARGET_OS_CPP_BUILTINS
#define TARGET_OS_CPP_BUILTINS()      \
  do {                                \
    builtin_define ("__myos__");      \
    builtin_define ("__unix__");      \
    builtin_assert ("system=myos");   \
    builtin_assert ("system=unix");   \
    builtin_assert ("system=posix");   \
  } while(0);

libstdc++-v3/crossconfig.m4

Only necessary if you are compiling the GNU C++ compiler AND you wish to cross-compile the standard C++ library for your os. You can experiment with adding more options (ala the Linux case) if you wish. Add a case similar to

 *-myos*)
   AC_CHECK_HEADERS([sys/types.h locale.h float.h])
   GLIBCXX_CHECK_BUILTIN_MATH_SUPPORT
   GLIBCXX_CHECK_COMPLEX_MATH_SUPPORT
   GLIBCXX_CHECK_STDLIB_SUPPORT
   ;;

You need to run 'autoconf' in the libstdc++-v3 directory if you change this file.

libgcc/config.host

Older GCC versions

This file defines special configuration for libgcc for different hosts. Find the section starting "# Support site-specific machine types.". Add something like this

 i[3-7]86-*-myos*)
     ;;

Note that before the ";;" is a tab. You need this change or else when libgcc is configured you'll get an error about the configuration being unsupported.

Newer GCC versions (4.7.0 and newer?)

For GCC 4.7.0 and newer, the header 'Support site-specific machine types' is removed but you still need to add a definition for your os otherwise the build process will cause an error. The section that needs editing is near the end of the file; around line 1140. There is a default entry which causes an error - you need to add an entry for your OS to this case statement; you can add it just above the default "*)" entry. For these GCC versions, it appears the crtbegin/crtend generation has been moved to libgcc. We therefore need to add lines like these, rather than just the empty statement for previous versions:

 i[3-7]86-*-myos*)
     extra_parts="crtbegin.o crtend.o"
     tmake_file="$tmake_file i386/t-crtstuff"
     ;;

Without extra_parts, crtbegin.o and crtend.o are not built; without the tmake_file addition, GCC can give the error "error in /.../crtend.o(.eh_frame); no .eh_frame_hdr table will be created.", as discussed in this forum thread.

Selecting a C Library

At this point, you have to decide which C Library to use. You can either pick an existing C Library (such as newlib) or create your own libc.

Newlib

config.sub

Same as for binutils

newlib/configure.host

Tell newlib which system-specific directory to use for our particular target. In the section starting 'Get the source directories to use for the host ... case "${host}" in', add a section

 i[3-7]86-*-myos*)
   sys_dir=myos
   ;;

newlib/libc/sys/configure.in

Tell the newlib build system that it also needs to configure our myos-specific host directory. In the 'case ${sys_dir} in' list, simply add

 myos) AC_CONFIG_SUBDIRS(myos) ;;

After this, you need to run 'autoconf' in the newlib/libc/sys directory.

newlib/libc/sys/myos

This is a directory that we need to create where we put our OS-specific extensions to newlib. We need to create a minimum of 4 files. You can easily add more files to this directory to define your own os-specific library functions, if you want them to be included in libc.a (and so linked in to every application by default).

crt0.S

This file creates crt0.o, which is included in every application. It should define the symbol _start, and then call the main() function, possibly after setting up process-space segment selectors and pushing argc and argv onto the stack. A simple implementation is:

 .global _start
 .extern main
 .extern exit
 _start:
 call main
 call exit
 .wait: hlt
 jmp .wait

It is also worth mentioning that crt0 can be written in C instead of Assembly. There are a couple of reasons why you may want to do so, including that you would be able to properly find the entry point to programs written in C++, without having to worry about name mangling or using C linkage. It is also easier (marginally) to handle the argc and argv parameters in C.

syscalls.c

This file should contain implementations for each of the system calls that newlib depends on. There is a list on the newlib website but I believe it to be slightly out of date as my version had some extra ones not documented there. Generally, each of these system calls should trigger an interrupt or use sysenter/syscall to run a kernel-space system call. As such, they are heavily OS-specific. A non-exhaustive list is:

  /* note these headers are all provided by newlib - you don't need to provide them */
  #include <sys/stat.h>
  #include <sys/types.h>
  #include <sys/fcntl.h>
  #include <sys/times.h>
  #include <sys/errno.h>
  #include <sys/time.h>
  #include <stdio.h>

  void _exit();
  int close(int file);
  char **environ; /* pointer to array of char * strings that define the current environment variables */
  int execve(char *name, char **argv, char **env);
  int fork();
  int fstat(int file, struct stat *st);
  int getpid();
  int isatty(int file);
  int kill(int pid, int sig);
  int link(char *old, char *new);
  int lseek(int file, int ptr, int dir);
  int open(const char *name, int flags, ...);
  int read(int file, char *ptr, int len);
  caddr_t sbrk(int incr);
  int stat(const char *file, struct stat *st);
  clock_t times(struct tms *buf);
  int unlink(char *name);
  int wait(int *status);
  int write(int file, char *ptr, int len);
  int gettimeofday(struct timeval *p, struct timezone *z);
configure.in

Configure script for our system directory.

 AC_PREREQ(2.59)
 AC_INIT([newlib], [NEWLIB_VERSION])
 AC_CONFIG_SRCDIR([crt0.S])
 AC_CONFIG_AUX_DIR(../../../..)
 NEWLIB_CONFIGURE(../../..)
 AC_CONFIG_FILES([Makefile])
 AC_OUTPUT
Makefile.am

A Makefile template for this directory

  AUTOMAKE_OPTIONS = cygnus
  INCLUDES = $(NEWLIB_CFLAGS) $(CROSS_CFLAGS) $(TARGET_CFLAGS)
  AM_CCASFLAGS = $(INCLUDES)

  noinst_LIBRARIES = lib.a

  if MAY_SUPPLY_SYSCALLS
  extra_objs = $(lpfx)syscalls.o
  else
  extra_objs =
  endif

  lib_a_SOURCES =
  lib_a_LIBADD = $(extra_objs)
  EXTRA_lib_a_SOURCES = syscalls.c crt0.S
  lib_a_DEPENDENCIES = $(extra_objs)
  lib_a_CCASFLAGS = $(AM_CCASFLAGS)
  lib_a_CFLAGS = $(AM_CFLAGS)

  if MAY_SUPPLY_SYSCALLS
  all: crt0.o
  endif

  ACLOCAL_AMFLAGS = -I ../../..
  CONFIG_STATUS_DEPENDENCIES = $(newlib_basedir)/configure.host

After this, you need to run 'autoconf' in the newlib/libc/sys/ directory, and 'autoreconf' in the newlib/libc/sys/myos directory. Note: 'autoconf' and 'autoreconf' will only run with automake version <= 1.12 and autoconf version 2.64 (exactly) (applies to newlib source pulled from git repository July 31 2013)

Signal handling

Newlib has two different mechanisms for dealing with UNIX signals (see the man pages for signal()/raise()). In the first, it provides its own emulation, where it maintains a table of signal handlers in a per-process manner. If you use this method, then you will only be able to respond to signals sent from within the current process. In order to support it, all you need to do is make sure your crt0 calls '_init_signal' before it calls main, which sets up the signal handler table.

Alternatively, you can provide your own implementation. To do this, you need to define your own version of signal() in syscalls.c. A typical implementation would register the handler somewhere in kernel space, so that issuing a signal from another process causes the corresponding function to be called in the receiving process (this will also require some nifty stack-playing in the receiving process, as you are basically interrupting the program flow in the middle). You then need to provide a kill() function in syscalls.c which actually sends signals to another process. Newlib will still define a raise() function for you, but it is just a stub which calls kill() with the current process id. To switch newlib to this mode, you need to #define the SIGNAL_PROVIDED macro when compiling. A simple way to do this is to add the line:

 newlib_cflags="${newlib_cflags} -DSIGNAL_PROVIDED"

to your host's entry in newlib/configure.host. It would probably also make sense to provide sigaction(), and provide signal() as a wrapper for it. Note that OpenGroup.org's definition of sigaction states that 1) sigaction supersedes signal, and 2) an application designed shouldn't use both to manipulate the same signal.

Building

Main article: Hosted GCC Cross-Compiler

Your OS specific toolchain is built differently from the introductory i686-elf toolchain as it has a user-space and standard library. In particular, you need to ensure your libc meets the minimum requirements for libgcc. You need to install the standard library headers into your System Root before building the cross-compiler. You need to tell the cross-binutils and cross-gcc where the system root is via the configure option --with-sysroot=/path/to/sysroot. You can then build your libc with your cross-compiler and then finally libstdc++ if desired.

Common errors

Whitespaces

Some files need tabs, some files need spaces and some files accept happily any mixture. Use an editor that can display special chars such as tabs and spaces, to be sure you use the right form. Whitespace errors may result in 'make' reporting missing separators. Some editors will replace a tab with four spaces, which will also cause invalid separator issues (Code::Blocks 8.02 does this - you need to switch it off in Settings > Editor)

Autoconf

There are several steps that conclude in running 'autoconf' or 'autoreconf', be sure you did not miss them. The order of autoconf/-reconf calls in a packet is important. These errors may result in missing subdirectories of the build-* directory and/or 'make' reporting missing targets.

Caveats

Using the cross toolchain to compile your kernel

Generally you don't want libc and crt0.o in your kernel. Link your kernel with the -nostdlib option to i586-pc-myos-gcc. Yes, gcc is also a front-end to ld, and is probably the preferred way to interface to it. You should specify a linker script to link your kernel, otherwise the kernel's .text section will be at the same location as a process' .text section, which is a bad thing. Use the -Wl,-Tlinkerscript.ld option.

Linking a kernel with libsupc++

You can use your libsupc++ to get exception handling and RTTI in a C++ kernel (no more passing -fno-exceptions -fno-rtti to g++!) so you can use things like throw and dynamic_cast<>. Libsupc++ depends upon libgcc for stack unwinding support. Passing the -nostdlib option to gcc when linking caused libgcc.a and libsupc++.a to not be included, so you need to specify -lgcc -lsupc++ on the command line (no need to specify the directories; gcc knows where it installed them to). In addition, you need to include a .eh_frame section in your linker script and terminate it with 32 bits of zeros (QUAD(0) is a useful linker script command). The symbol start_eh_frame should point to the start of the eh_frame section, and it should be aligned by 4. In addition you need to include your constructors and destructors in the link (see C++ for details). You also need to provide __register_frame() (or call the function provided by libgcc with the start of your .eh_frame section), void *__dso_handle;, __cxa_atexit() and __cxa_finalize (again see C++). Something along the lines of

 #include <reent.h>
 static struct _reent global_reent;
 struct _reent *_impure_ptr = &global_reent;

somewhere in your kernel will keep libgcc happy, because it expects these bits to be provided by newlib (which you aren't linking into your kernel). Libgcc expects a number of (simple) C library functions to be provided (again normally by newlib) by your kernel, including abort, malloc, free, memcpy, memset and strlen. Libsupc++ also requires write, fputs, fputc, fwrite, strcpy and strcat for debugging output.

Disclaimer

This tutorial is provided for people who are already happy compiling a GCC Cross-Compiler and are generally sure of what they are doing. It is not, in any way, intended to replace the already excellent articles on GCC Cross-Compiler and Porting Newlib but is instead intended as a further resource for those who wish to take another step towards having their OS become self-hosting. The other tutorials mentioned are extensively tested and known to work, whereas the steps described here (given their increased complexity) are expected to be less well tested and could well include a number of bugs.

See Also

External Links

  • Inspired by this article, Boomstick is a script to build a complete GCC toolchain, including newlib, for your OS. Just fill in the stubs! (or don't :)