OS Specific Toolchain: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
m (Use --disable-werror instead of CFLAGS hacking)
(change categorization)
 
(47 intermediate revisions by 21 users not shown)
Line 1: Line 1:
{{Rating|3}}
{{Rating|3}}


This tutorial will guide you through creating a toolchain comprising Binutils and GCC that specifically targets your operating system. The instructions below teach Binutils and GCC how to create programs for a hypothetical OS named 'MyOS'.
This tutorial will walk you through creating a toolchain comprising Binutils, GCC (with cc1 and [[libgcc]]), newlib and optionally cc1plus and libstdc++ (with libsupc++) that specifically targets your OS. In this tutorial we will assume you are using an i586 compatible PC for a hypothetical OS named 'MyOS' and so the GNU canonical name we will be targeting is 'i586-pc-myos'. It should not, however, be difficult to adapt this tutorial for other systems, e.g. x86-64. This tutorial will produce a setup with not much different functionality from those produced by the [[GCC Cross-Compiler]] and [[Porting Newlib]] tutorials (aside from the name). It is merely intended to demonstrate the principles behind adding a new target to these packages and provide you with a framework for incorporating your ownOS-specific modifications. You will, however, get a newlib that does not require libgloss (linking will work 'out of the box').


Until now you have been using a [[GCC Cross-Compiler|cross-compiler]] configured to use an existing generic bare target. This is very convenient when starting out as you get a reliable target and the compiler doesn't make any bad assumptions because it thinks it is targeting an existing operating system. However, when you proceed it becomes useful if the compiler knows it is targeting your operating system and what its customs are. For instance, you can make the compiler define a <tt>__myos__</tt> preprocessor macro, know which directories to search for include files in, what special <tt>crt*.o</tt> files are used when linking against libc, and so on. It also becomes much easier to cross-compile software to your OS when you simply have to invoke <tt>x86_64-myos-gcc hello.c -o hello</tt> to cross-compile a program. Additionally part of the instructions here can be applied to other software packages that also use the GNU build system, which will help you [[Cross-Porting Software|port existing software]].
= Prerequisites =
* The latest binutils, gcc-core and newlib packages, and optionally gcc-g++. This tutorial was developed with binutils-2.18, gcc-4.2.1 and newlib-1.15.0, and has recently been tested on binutils-2.22 and gcc-4.7.0, and again on binutils-2.23 and gcc-4.7.2 using newlib-2.0.0.
* A build environment that can successfully build a [[GCC Cross-Compiler]]. This tutorial was developed with Cygwin and Ubuntu 12.04.
* autoconf, automake, autoreconf, libtool.
* The OS you are targeting should be able to handle syscalls, with a well defined interface.
* Knowledge of the internals of binutils, gcc and newlib, along with a working knowledge of autoconf and automake.
* Depending on your version of gcc, you may also need gmp, mpfr and mpc.


This tutorial teaches you how to set up a cross-compiler that specifically targets your OS. This is actually the first step of ''porting Binutils and GCC to your operating system'': Any information you give GCC about your OS will help it run on your OS. Once your OS Specific Toolchain has been set up and you have built your OS with it, you can continue by using the cross-compiler to cross-compile the compiler itself to your OS, assuming your libc and kernel is powerful enough. For more information see [[Porting GCC to your OS]].
= Preparation =

Extract the source tarballs into your /usr/src directory.
== Introduction ==
Set up the TARGET and PREFIX environment variables:

export TARGET=i586-pc-myos
You need the following before you get started:
export PREFIX=/usr/local/cross

Create the build directories:
* A build environment that can successfully build a [[GCC Cross-Compiler]].
mkdir build-binutils build-gcc build-newlib
* autoconf (exactly version 2.69)
* automake (exactly version 1.15.1)
* libtool
* The latest Binutils source code (2.39 at time of writing).
* The latest GCC source code (12.2.0 at time of writing).
* Knowledge of the internals of Binutils and GCC.
* Knowledge of autoconf and automake.
* The dependencies of Binutils and GCC as detailed in [[GCC Cross-Compiler]].

If you're compiling a later version of GCC/binutils, find versions of autoconf and automake that were released just prior to your version of GCC/binutils. See the [[Cross-Compiler Successful Builds]] page for known compatible versions of GCC and binutils.

Additionally you will need a [[C Library]] as described in a later section. As detailed in [[Hosted GCC Cross-Compiler]], it doesn't need to support much and the functionality can be stubbed, but libgcc will need to believe you have a libc.

You should decide exactly what targets you'll add to Binutils and GCC. If you have been using a generic <tt>i686-elf</tt> or <tt>x86_64-elf</tt> or such target, you'll simply want to swap <tt>-elf</tt> with <tt>-myos</tt> and get <tt>i686-myos</tt> and <tt>x86_64-myos</tt>. Naturally, don't actually write myos, but rather use the name of your OS converted to lower case. See [[Target Triplet]].

This tutorial currently only have instructions for adding a new x86 and x86_64 target for myos, but it serves as a good enough example that it should be trivial to add more processors by basing it on these instructions and what other operating systems have done.


== Making your changes reproducible ==
Since at some point you might have a contributor that wants participate in your project, you likely want to make your toolchain setup reproducible. Instead of downloading the tar-balls, it is therefore a good idea to clone the repositories of binutils (https://sourceware.org/git/binutils-gdb.git) and gcc (https://gcc.gnu.org/git/gcc.git) locally.

All release versions are tagged within these repositories, so you can use <tt>"git checkout binutils-2_39"</tt> and <tt>"git checkout releases/gcc-12.2.0"</tt> to switch to the correct versions respectively. After making the described changes, you can easily create a patch using <tt>"git diff"</tt> which you can reuse later.
Another option would be to fork one of the mirror repositories on GitHub.


== Modifying Binutils ==


= Modify the source directories to add your target =
Now you can start adding your own target to each package. Some of the changes are quite similar between each package.
== Binutils ==
=== config.sub ===
=== config.sub ===

This is a file you will modify in the same way for each package. It is a gnu standard file produced by including the line 'AC_CANONICAL_SYSTEM' in a configure.in that is processed by autoconf, and is designed to convert a canonical name of the form 'i586-pc-myos' into separate variables for the processor, vendor and OS, and also rejects systems it doesn't know about. We simply need to add 'myos' to the list of acceptable operating systems. Find the section that begins with the comment "First accept the basic system types" (it begins '-gnu*') and add '-myos*' to the list. I typically add it after -aos* simply because there is some free room on the line there.
This is a file you will modify in the same way for each package. It is a GNU standard file produced by including the line 'AC_CANONICAL_SYSTEM' in a configure.ac that is processed by autoconf, and is designed to convert a canonical name of the form <tt>i686-pc-myos</tt> into separate variables for the processor, vendor and OS, and also rejects systems it doesn't know about. We simply need to add 'myos' to the list of acceptable operating systems. Find the section that begins with the comment "<tt>Now accept the basic system types</tt>" (it begins '<tt>-gnu*</tt>') and add '<tt>-myos*</tt>' to the list. Find a line with some free room and add your entry there.


=== bfd/config.bfd ===
=== bfd/config.bfd ===

This file is part of the configuration for libbfd, the back-end to binutils which provides a consistent interface for many object file formats. Generally, each platform-specific version of binutils contains a libbfd which only supports the object files normally in use on that system, as otherwise the library would be massive (libbfd can support a _lot_ of object types). We need to associate our os with some particular object types. There is a long list starting 'WHEN ADDING ENTRIES TO THIS MATRIX' with the first line as 'case "${targ}" in'. We need to add our full canonical name to this list, by adding some cases such as:
This file is part of the configuration for libbfd, the back-end to Binutils which provides a consistent interface for many object file formats. Generally, each platform-specific version of Binutils contains a libbfd which only supports the object files normally in use on that system, as otherwise the library would be massive (libbfd can support a _lot_ of object types). We need to associate our os with some particular object types. There is a long list starting 'WHEN ADDING ENTRIES TO THIS MATRIX' with the first line as 'case "${targ}" in'. We need to add our full canonical name to this list, by adding some cases such as:

<syntaxhighlight lang="bash">
i[3-7]86-*-myos*)
i[3-7]86-*-myos*)
targ_defvec=bfd_elf32_i386_vec
targ_defvec=i386_elf32_vec
targ_selvecs=i386coff_vec
targ_selvecs=
targ64_selvecs=x86_64_elf64_vec
;;
;;
#ifdef BFD64
Be sure to follow the instructions in the comment block above the list and add your entry beneath the comment "#START OF targmatch.h". If you like, you could support different object formats (look at other entries in the list, and the contents of 'bfd' for hints) and also provide more than one to the 'targ_selvecs' line. The example here lets you deal with coff object files, but defaults to elf for object files and executables (similar to the i686-elf [[GCC Cross-Compiler]]).
x86_64-*-myos*)
targ_defvec=x86_64_elf64_vec
targ_selvecs=i386_elf32_vec
want64=true
;;
#endif
</syntaxhighlight>

'''Note:''' If using binutils-2.24 or older, change <i>i386_elf32_vec</i> to <i>bfd_elf32_i386_vec</i> and <i>x86_64_elf64_vec</i> to <i>bfd_elf64_x86_64_vec</i>.

Be sure to follow the instructions in the comment block above the list and add your entry beneath the comment "<tt>#START OF targmatch.h</tt>". If you like, you could support different object formats (look at other entries in the list, and the contents of 'bfd' for hints) and also provide more than one to the <tt>targ_selvecs</tt> line. For instance, you can support coff object files if you add <tt>i386coff_vec</tt> to the <tt>targ_selvecs</tt> list. All the <tt>x86_64</tt> entries in the file file are wrapped in <tt>#ifdef BFD64</tt>, as when targmatch.sed processes this file to turn it into to targmatch.h, it will add <tt>#ifdef BFD64</tt> so that the relevant code is only compiled when targeting a 64 bit platform


=== gas/configure.tgt ===
=== gas/configure.tgt ===

This file tells the gnu assembler what type of output to generate for each target. It automatically matches the i586 part of your target and generates the correct output for that. We just need to tell it what type of object file to generate for myos. In the section starting 'Assign object format ... case ${generic_target} in' you need to add a line like
This file tells the gnu assembler what type of output to generate for each target. It automatically matches the i686 part of your target and generates the correct output for that. We just need to tell it what type of object file to generate for myos. In the section starting '<tt>Assign object format ... case ${generic_target} in</tt>' you need to add a line like

<syntaxhighlight lang="bash">
i386-*-myos*) fmt=elf ;;
i386-*-myos*) fmt=elf ;;
</syntaxhighlight>
You should use 'i386' in this line even if you are targeting x86_64. This is the only file where you should do it. It is basically because the variable 'generic_target' is not your canonical target name, but rather a variable generated further up in the configure.tgt file, and it sets the first part to i386 for any i[3-7]86 or x86_64.

You should use '<tt>i386</tt>' in this line even if you are targeting x86_64. This is the only file where you should do it. It is basically because the variable 'generic_target' is not your canonical target name, but rather a variable generated further up in the configure.tgt file, and it sets the first part to <tt>i386</tt> for any <tt>i[3-7]86</tt> or <tt>x86_64</tt>.

Note: this will use the 'generic' emulation. One side-effect is that gas will interpret slash ('/') as a comment, not as a division operator. This will break any code like "<tt>movl $(ADDRESS/PAGE_SIZE), %eax</tt>". Using "<tt>fmt=elf em=gnu ;;</tt>" or "<tt>fmt=elf em=linux ;;</tt>" will disable slash as a comment character.


=== ld/configure.tgt ===
=== ld/configure.tgt ===
This file tells the gnu linker what 'emulation' to use for each target. An emulation is basically a combination of linker script and executable file format. We are going to define our own emulation called myos_i386. We need to add an entry to the case statement here beginning 'Please try to keep this table in alphabetical order ... case "${targ}" in'
i[3-7]86-*-myos*) targ_emul=myos_i386 ;;
You can also add a targ_extra_emuls line to specify other targets ld should support, e.g. elf_i386, although this is untested. See the ld/configure.tgt file for examples.


This file tells the gnu linker what 'emulation' to use for each target. An emulation is basically a combination of linker script and executable file format. We are going to define our own emulation called <tt>elf_i386_myos</tt>. We need to add an entry to the case statement here after '<tt>Please try to keep this table more or less in alphabetic order ... case "${targ}</tt>" in':
=== ld/emulparams/myos_i386.sh ===

Now we need to actually define our emulation. There is a generic file called ld/genscripts.sh which creates the required linker scripts for our target (you need more than one, depending on shared object usage and the like: I have 13 for a single target). It uses a linker script template (from the ld/scripttempl directory) to do this, and it creates the actual emulation C file from an emulation template (from the ld/emultempl directory). These templates are customised by running a script in the ld/emulparams directory which sets various variables. You are welcome to define your own emulation and linker templates, but I find the ELF ones adequate, given that they can be customised by simply adding a file to the emulparams directory. This is what we are going to do now. The content of the file could be something like:
<syntaxhighlight lang="bash">
SCRIPT_NAME=elf
i[3-7]86-*-myos*)
(defines the scripttempl file to use, in this case the ELF one)
targ_emul=elf_i386_myos
OUTPUT_FORMAT=elf32-i386
targ_extra_emuls=elf_i386
(create our executables in elf32 format)
targ64_extra_emuls="elf_x86_64_myos elf_x86_64"
TEXT_START_ADDR=0x40000000
;;
(start of the .text section and therefore the entire executable image. I set to this because my kernel occupies 0x0-0x3fffffff)
x86_64-*-myos*)
MAXPAGESIZE="CONSTANT (MAXPAGESIZE)"
targ_emul=elf_x86_64_myos
COMMONPAGESIZE="CONSTANT (COMMONPAGESIZE)"
targ_extra_emuls="elf_i386_myos elf_x86_64 elf_i386"
(Tell ld the page sizes so it can properly align sections to page boundaries. Just use the defaults here)
;;
ARCH=i386
</syntaxhighlight>
MACHINE=

(self explanatory)
* '''elf_i386_myos''' is a 32-bit target for your OS.
NOP=0x90909090
* '''elf_x86_64_myos''' is a 64-bit target for your OS.
(what ld pads sections with, basically the i386 NOP instruction 4 times to fill a 32-bit value)
* '''elf_i386''' is a bare 32-bit target as you had with <tt>i686-elf</tt>.
TEMPLATE_NAME=elf32
* '''elf_x86_64''' is a bare 64-bit target for your OS as you had with <tt>x86_64-elf</tt>.
(defines the emultempl file to use, again, stick with ELF)

GENERATE_SHLIB_SCRIPT=yes
This setup provides you with a 32-bit toolchain that also can produce 64-bit executables, and a 64-bit toolchain that can also produce 64-bit executables. This comes in handy if you use objcopy, for instance. You can also add <tt>targ_extra_emuls</tt> entries to specify other targets ld should support. See the <tt>ld/configure.tgt</tt> file for examples.
GENERATE_PIE_SCRIPT=yes

(should we generate linker scripts to produced shared libraries and position-independent executables respectively)
=== ld/emulparams/elf_i386_myos.sh ===
NO_SMALL_DATA=yes

SEPARATE_GOTPLT=12
Now we need to actually define our emulation. There is a generic file called <tt>ld/genscripts.sh</tt> which creates the required linker scripts for our target (you need more than one, depending on shared object usage and the like: I have 13 for a single target). It uses a linker script template (from the ld/scripttempl directory) to do this, and it creates the actual emulation C file from an emulation template (from the ld/emultempl directory). These templates are customised by running a script in the ld/emulparams directory which sets various variables. You are welcome to define your own emulation and linker templates, but I find the ELF ones adequate, given that they can be customised by simply adding a file to the emulparams directory. This is what we are going to do now. The content of the file could be something like:
(unsure as to these, just copied from the ELF one, seems to work okay)

<syntaxhighlight lang="bash">
source_sh ${srcdir}/emulparams/elf_i386.sh
TEXT_START_ADDR=0x08000000
</syntaxhighlight>

This script is included by <tt>ld/genscripts.sh</tt> to customize its behavior through shell variables. We include the base <tt>elf_i386.sh</tt> script as it sets reasonable defaults. Finally, we override the variables whose defaults we disagree with.

There are a large number of variables that can be set here to customize your toolchain. Read the documentation and look at existing emulations for further information. These are some of the variables that can be set:

* '''GENERATE_SHLIB_SCRIPT'''=''yes|no'' Whether to generate a linker script for shared libraries. We enable this as you might want it later.
* '''GENERATE_PIE_SCRIPT'''=''yes|no'' Whether to generate a linker script for position independent executables. We enable this as you might want it later.
* '''SCRIPT_NAME'''=''name'' Controls which ld/scripttempl/''name''.sc script generates our linker scripts.
* '''TEMPLATE_NAME'''=''name'' Controls which ld/emultempl/''name''.em script generates our bfd emulation C implementation.
* '''OUTPUT_FORMAT'''=''name'' The name of the BFD output target we use.
* '''TEXT_START_ADDR'''=0x''value'' Controls where the executable begins in memory.

You can read the base <tt>elf_i386.sh</tt> script for the defaults of these variables, you can then decide for yourself if you wish to override them for your operating system.

=== ld/emulparams/elf_x86_64_myos.sh ===

This file is just like the above <tt>ld/emulparams/elf_i386_myos.sh</tt> but for x86_64.

<syntaxhighlight lang="bash">
source_sh ${srcdir}/emulparams/elf_x86_64.sh
</syntaxhighlight>

=== ld/Makefile.am ===

We now just need to tell make how to produce the emulation C file for our specific emulation. Putting the '<tt>targ_emul=elf_i386_myos</tt>' line into <tt>ld/configure.tgt</tt> above implies that your host linker will try to link your target ld executable with an object file called <tt>eelf_i386_myos.o</tt>. There is a default rule to generate this from <tt>eelf_i386_myos.c</tt>, so we just need to tell it how to make this <tt>eelf_i386_myos.c</tt> file. As stated above, we let the genscripts.sh file do the hard work. You need to add <tt>eelf_i386_myos.c</tt> to the <tt>ALL_EMULATION_SOURCES</tt> list; you also need to add <tt>eelf_x86_64_myos.c</tt> to the <tt>ALL_64_EMULATION_SOURCES</tt> list if applicable.


'''Note''': You ''must'' run <tt>automake</tt> in the <tt>ld</tt> directory after you modify <tt>Makefile.am</tt> to regenerate <tt>Makefile.in</tt>.
=== ld/Makefile.in ===
We now just need to tell make how to produce the emulation C file for our specific emulation. Putting the 'targ_emul=myos_i386' line into ld/configure.tgt above implies that your host linker will try to link your target ld executable with an object file called emyos_i386.o. There is a default rule to generate this from emyos_i386.c, so we just need to tell it how to make this emyos_i386.c file. As stated above, we let the genscripts.sh file do the hard work. You need to add a makefile rule (I add it after the one to build eelf_i386.c) similar to:
emyos_i386.c: $(srcdir)/emulparams/myos_i386.sh $(ELF_DEPS) $(srcdir)/scripttempl/elf.sc ${GEN_DEPENDS}
${GENSCRIPTS} myos_i386 "$(tdir_myos_i386)"
Note that some parts of the line use normal brackets () whereas other parts use curly braces {}. Also, the second line is indented with a tab, not spaces.
You can also add 'emyos_i386.o' to the dependencies of 'ALL_EMULATIONS' if you like, but it is not essential.


== Modifying GCC ==
'''Attention:''' the second line must start with a ''tab'', not spaces. Otherwise 'make' might complain about missing seperators.


== GCC ==
=== config.sub ===
=== config.sub ===

Similar modification to config.sub in binutils.
Similar modification to config.sub in Binutils.


=== gcc/config.gcc ===
=== gcc/config.gcc ===

This file defines what needs to be built for each particular target and what to include in the final executable. There are two main sections: one which defines generic options for your operating system, and those which define options specific to your operating system on each individual machine type. For the first part, find the section starting 'Common parts for widely ported systems ... case ${target} in' and add something like:
This file defines what needs to be built for each particular target and what to include in the final executable. There are two main sections: one which defines generic options for your operating system, and those which define options specific to your operating system on each individual machine type.
*-*-myos*)

extra_parts="crtbegin.o crtend.o"
For the first part, find the '<tt>case ${target} in</tt>' line just after '<tt># Common parts for widely ported systems</tt>' (around line 688) and add something like:
gas=yes

gnu_ld=yes
<syntaxhighlight lang="bash">
default_use_cxa_atexit=yes
*-*-myos*)
gas=yes
gnu_ld=yes
default_use_cxa_atexit=yes
use_gcc_stdint=provide
;;
</syntaxhighlight>

* '''gas=yes''' our operating system by default uses the GNU assembler
* '''gnu_ld=yes''' out operating system by default uses the GNU linker
* '''default_use_cxa_atexit=yes''' We will provide ''__cxa_atexit'' (You will need to provide this in your standard library)
* '''use_gcc_stdint=provide''' This instructs gcc to provide you with a ''stdint.h'' appropiate for your target. Change <tt>provide</tt> to <tt>wrap</tt> if you have your own ''stdint.h'', to make GCC wrap yours.

The second section we need to add to is the architecture-specific one. Find the '<tt>case ${target} in</tt>' line just before '<tt>tm_file="${tm_file} elfos.h newlib-stdint.h"</tt>' (around line 1094) and add something like:

<syntaxhighlight lang="bash">
i[34567]86-*-myos*)
tm_file="${tm_file} i386/unix.h i386/att.h elfos.h glibc-stdint.h i386/i386elf.h myos.h"
;;
;;
x86_64-*-myos*)
This basically says: build the static versions of crtbegin and crtend (important for various parts of [[libgcc]]), our operating system by default uses the GNU linker and assembler and that we will provide __cxa_atexit (newlib will actually provide it).
tm_file="${tm_file} i386/unix.h i386/att.h elfos.h glibc-stdint.h i386/i386elf.h i386/x86-64.h myos.h"
The second section we need to add to is the architecture-specific one. Find the section starting 'case ${target} in ... Support site-specific machine types' and add
i[3-7]86-*-myos*)
tm_file="${tm_file} i386/unix.h i386/att.h dbxelf.h elfos.h i386/i386elf.h myos.h"
tmake_file="i386/t-i386elf t-svr4"
use_fixproto=yes
;;
;;
</syntaxhighlight>
Which defines some extra include files and makefile fragments to use. In GCC 4.7.0, the text 'Support site-specific maching types' does not exist in this file, but you still need to edit the appropriate table - it is the one immediately following the table to which you added the first part ('Common parts for widely ported systems'). We will create the myos.h in the next step.

This defines which target configuration header files gets used. You can make <tt>i386/myos32.h</tt> and <tt>i386/myos64.h</tt> files if desired.


=== gcc/config/myos.h ===
=== gcc/config/myos.h ===
Now we create a header file which sets some OS-specific stuff in GCC. This currently just sets some variables which state that the C preprocessor should #define some macros, make some asserts and set the default system name when we are compiling an application for our OS. Something like the following should be fine.
<pre>
#undef TARGET_OS_CPP_BUILTINS
#define TARGET_OS_CPP_BUILTINS() \
do { \
builtin_define_std ("myos"); \
builtin_define_std ("unix"); \
builtin_assert ("system=myos"); \
builtin_assert ("system=unix"); \
} while(0);


This header allows you to customize your toolchain using preprocessor macros. The relevant parts of GCC will include this header (as controlled by <tt>gcc/config.gcc</tt>) and modify the behavior according to your customizations.
#undef TARGET_VERSION // note that adding these two lines cause an error in gcc-4.7.0

#define TARGET_VERSION fprintf(stderr, " (i386 myos)"); // the build process works fine without them until someone can work out an alternative
You can explore <tt>gcc/defaults.h</tt> for a full list of things you can modify, and more importantly, the assumptions GCC will make about various aspects of your target. For instance, if <tt>PID_TYPE</tt> is not defined in <tt>myos.h</tt>, then GCC will default it to <tt>int</tt>, which can be problematic if that's not what your <tt>pid_t</tt> is defined as.
</pre>

<syntaxhighlight lang="C">
/* Useful if you wish to make target-specific GCC changes. */
#undef TARGET_MYOS
#define TARGET_MYOS 1

/* Default arguments you want when running your
i686-myos-gcc/x86_64-myos-gcc toolchain */
#undef LIB_SPEC
#define LIB_SPEC "-lc" /* link against C standard library */

/* Files that are linked before user code.
The %s tells GCC to look for these files in the library directory. */
#undef STARTFILE_SPEC
#define STARTFILE_SPEC "crt0.o%s crti.o%s crtbegin.o%s"

/* Files that are linked after user code. */
#undef ENDFILE_SPEC
#define ENDFILE_SPEC "crtend.o%s crtn.o%s"

/* Additional predefined macros. */
#undef TARGET_OS_CPP_BUILTINS
#define TARGET_OS_CPP_BUILTINS() \
do { \
builtin_define ("__myos__"); \
builtin_define ("__unix__"); \
builtin_assert ("system=myos"); \
builtin_assert ("system=unix"); \
builtin_assert ("system=posix"); \
} while(0);
</syntaxhighlight>


=== libstdc++-v3/crossconfig.m4 ===
=== libstdc++-v3/crossconfig.m4 ===

Only necessary if you are compiling the GNU C++ compiler AND you wish to cross-compile the standard C++ library for your os. You can experiment with adding more options (ala the Linux case) if you wish. Add a case similar to
This file describes how the libstdc++ configure file will examine your operating system and adjust the provided features of libstdc++ accordingly. Add a case similar to

<syntaxhighlight lang="bash">
*-myos*)
*-myos*)
GLIBCXX_CHECK_COMPILER_FEATURES
AC_CHECK_HEADERS([sys/types.h locale.h float.h])
GLIBCXX_CHECK_LINKER_FEATURES
GLIBCXX_CHECK_BUILTIN_MATH_SUPPORT
GLIBCXX_CHECK_MATH_SUPPORT
GLIBCXX_CHECK_COMPLEX_MATH_SUPPORT
GLIBCXX_CHECK_STDLIB_SUPPORT
GLIBCXX_CHECK_STDLIB_SUPPORT
;;
;;
</syntaxhighlight>


'''You need to run 'autoconf' in the libstdc++-v3 directory if you change this file.'''
'''Note''': You need to run <tt>autoconf</tt> in the <tt>libstdc++-v3</tt> directory.


=== libgcc/config.host ===
=== libgcc/config.host ===
==== Older GCC versions ====
This file defines special configuration for [[libgcc]] for different hosts. Find the section starting "# Support site-specific machine types.". Add something like this
i[3-7]86-*-myos*)
;;
Note that before the ";;" is a ''tab''. You need this change or else when [[libgcc]] is configured you'll get an error about the configuration being unsupported.


Find the '<tt>case ${host} in</tt>' just prior to '<tt>extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"</tt>' (around line 368) and add the cases:
==== Newer GCC versions (4.7.0 and newer?) ====
For GCC 4.7.0 and newer, the header 'Support site-specific machine types' is removed but you still need to add a definition for your os otherwise the build process will cause an error.
The section that needs editing is near the end of the file; around line 1140. There is a default entry which causes an error - you need to add an entry for your OS to this case statement; you can add it just above the default "*)" entry.
For these GCC versions, it appears the crtbegin/crtend generation has been moved to [[libgcc]]. We therefore need to add lines like these, rather than just the empty statement for previous versions:
i[3-7]86-*-myos*)
extra_parts="crtbegin.o crtend.o"
tmake_file="$tmake_file i386/t-crtstuff"
;;


<syntaxhighlight lang="bash">
Without extra_parts, crtbegin.o and crtend.o are not built; without the tmake_file addition, GCC can give the error "error in /.../crtend.o(.eh_frame); no .eh_frame_hdr table will be created.", as discussed in [http://forum.osdev.org/viewtopic.php?t=25489 this forum thread].
i[34567]86-*-myos*)
extra_parts="$extra_parts crti.o crtbegin.o crtend.o crtn.o"
tmake_file="$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic"
;;
x86_64-*-myos*)
extra_parts="$extra_parts crti.o crtbegin.o crtend.o crtn.o"
tmake_file="$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic"
;;
</syntaxhighlight>


=== fixincludes/mkfixinc.sh ===
== Selecting a C Library ==
At this point, you have to decide which [[C Library]] to use. You can either pick an existing [[C Library]] (such as newlib) or [[Creating_a_C_Library|create your own libc]].


You should disable fixincludes for your operating system. Find the case statement and add a pattern for your operating system. For instance:
== Newlib ==
=== config.sub ===
Same as for binutils


<syntaxhighlight lang="bash">
=== newlib/configure.host ===
# Check for special fix rules for particular targets
Tell newlib which system-specific directory to use for our particular target. In the section starting 'Get the source directories to use for the host ... case "${host}" in', add a section
case $machine in
i[3-7]86-*-myos*)
sys_dir=myos
*-myos* | \
;;
*-*-myos* | \
i?86-*-cygwin* | \
# (... snip ...)
powerpcle-*-eabi* )
# IF there is no include fixing,
# THEN create a no-op fixer and exit
(echo "#! /bin/sh" ; echo "exit 0" ) > ${target}
;;
</syntaxhighlight>


A number of operating systems (especially older and obscure ones) provide troublesome systems headers that fail to strictly comply with various standards. The GCC developers consider it their job to fix these headers. GCC will look into your system root, apply a bunch of patterns to detect headers it doesn't like, then it copies that header into a private GCC system directory (that overrides your standard system directory) and attempts to fix the header. Sometimes fixincludes even break working headers (some people refer to it as breakincludes).
=== newlib/libc/sys/configure.in ===
Tell the newlib build system that it also needs to configure our myos-specific host directory. In the 'case ${sys_dir} in' list, simply add
myos) AC_CONFIG_SUBDIRS(myos) ;;


This is rather inconvenient as your libc will likely happen to trigger these patterns (and false positives often happens). Any time you change your system headers, you have to rebuild your compiler so the fixed versions get updated. The first time you encounter this, it will show up as a system header that does nothing different even though you edit it.
'''After this, you need to run 'autoconf' in the newlib/libc/sys directory.'''


This addition to the <tt>mkfixinc.sh</tt> file forcefully disables fixincludes for your operating system. It's your job to provide working system headers, not the compiler developers'.
=== newlib/libc/sys/myos ===
This is a directory that we need to create where we put our OS-specific extensions to newlib. We need to create a minimum of 4 files. You can easily add more files to this directory to define your own os-specific library functions, if you want them to be included in libc.a (and so linked in to every application by default).


== Further Customization ==
==== crt0.S ====
This file creates crt0.o, which is included in every application. It should define the symbol _start, and then call the main() function, possibly after setting up process-space segment selectors and pushing argc and argv onto the stack. A simple implementation is:
.global _start
.extern main
.extern exit
_start:
call main
call exit
.wait: hlt
jmp .wait


'''TODO''': Document more various tips and tricks for further customization of OS specific toolchains.
It is also worth mentioning that crt0 can be written in C instead of Assembly. There are a couple of reasons why you may want to do so, including that you would be able to properly find the entry point to programs written in C++, without having to worry about name mangling or using C linkage. It is also easier (marginally) to handle the argc and argv parameters in C.


=== Changing the Default Include Directory ===
==== syscalls.c ====
This file should contain implementations for each of the system calls that newlib depends on. There is a list on the newlib website but I believe it to be slightly out of date as my version had some extra ones not documented there. Generally, each of these system calls should trigger an interrupt or use sysenter/syscall to run a kernel-space system call. As such, they are heavily OS-specific. A non-exhaustive list is:
<pre>
/* note these headers are all provided by newlib - you don't need to provide them */
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/fcntl.h>
#include <sys/times.h>
#include <sys/errno.h>
#include <sys/time.h>
#include <stdio.h>


If you wish to change the default include directory from <tt>/usr/include</tt>, you can override the <tt>native_system_header_dir</tt> variable in <tt>gcc/config.gcc</tt> in the case for your OS.
void _exit();
int close(int file);
char **environ; /* pointer to array of char * strings that define the current environment variables */
int execve(char *name, char **argv, char **env);
int fork();
int fstat(int file, struct stat *st);
int getpid();
int isatty(int file);
int kill(int pid, int sig);
int link(char *old, char *new);
int lseek(int file, int ptr, int dir);
int open(const char *name, int flags, ...);
int read(int file, char *ptr, int len);
caddr_t sbrk(int incr);
int stat(const char *file, struct stat *st);
clock_t times(struct tms *buf);
int unlink(char *name);
int wait(int *status);
int write(int file, char *ptr, int len);
int gettimeofday(struct timeval *p, struct timezone *z);
</pre>


=== Changing the Default Library Directory ===
==== configure.in ====
Configure script for our system directory.
AC_PREREQ(2.59)
AC_INIT([newlib], [NEWLIB_VERSION])
AC_CONFIG_SRCDIR([crt0.S])
AC_CONFIG_AUX_DIR(../../../..)
NEWLIB_CONFIGURE(../../..)
AC_CONFIG_FILES([Makefile])
AC_OUTPUT


If you wish to change the default library directory from <tt>/usr/lib</tt>, you can change it to <tt>/lib</tt> by adding the following block of code to the case just below the declaration of <tt>NATIVE_LIB_DIRS</tt> in <tt>binutils/ld/configure.tgt</tt> (around line 1056).
'''Attention:''' add a new-line to the end of the file, otherwise autoreconf in the next step may fail with a ''"INTERNAL ERROR: recursive push_string!"''.


<syntaxhighlight lang="bash">
==== Makefile.am ====
*-*-myos*)
A Makefile template for this directory
NATIVE_LIB_DIRS='/lib /local/lib'
<pre>
;;
AUTOMAKE_OPTIONS = cygnus
</syntaxhighlight>
INCLUDES = $(NEWLIB_CFLAGS) $(CROSS_CFLAGS) $(TARGET_CFLAGS)
AM_CCASFLAGS = $(INCLUDES)


=== Start Files Directory ===
noinst_LIBRARIES = lib.a


You can modify which directory GCC looks for the crt0.o, crti.o and crtn.o in. The path to that directory is stored in <tt>STANDARD_STARTFILE_PREFIX</tt>. For instance, if you change the library directory to <tt>/lib</tt> in Binutils and want GCC to match, you can add the following to <tt>gcc/config/myos.h</tt>:
if MAY_SUPPLY_SYSCALLS
extra_objs = $(lpfx)syscalls.o
else
extra_objs =
endif


<syntaxhighlight lang="C">
lib_a_SOURCES =
#undef STANDARD_STARTFILE_PREFIX
lib_a_LIBADD = $(extra_objs)
#define STANDARD_STARTFILE_PREFIX "/lib/"
EXTRA_lib_a_SOURCES = syscalls.c crt0.S
</syntaxhighlight>
lib_a_DEPENDENCIES = $(extra_objs)
lib_a_CCASFLAGS = $(AM_CCASFLAGS)
lib_a_CFLAGS = $(AM_CFLAGS)


Note that the trailing slash is important as the raw crt*.o names are appended without first adding a slash.
if MAY_SUPPLY_SYSCALLS
all: crt0.o
endif


ACLOCAL_AMFLAGS = -I ../../..
CONFIG_STATUS_DEPENDENCIES = $(newlib_basedir)/configure.host
</pre>


=== Dynamic linking ===
'''After this, you need to run 'autoconf' in the newlib/libc/sys/ directory, and 'autoreconf' in the newlib/libc/sys/myos directory.'''
If you want to support dynamic linking with your toolchain, you need to...
''Note: 'autoconf' and 'autoreconf' will only run with automake version <= 1.12 and autoconf version 2.64 (exactly) (applies to newlib source pulled from git repository July 31 2013)''
* Pass the <tt>"--enable-shared"</tt> flag to the <tt>./configure</tt> scripts of both binutils and gcc
* Set <tt>LINK_SPEC</tt> in your "gcc/config/myos.h" so the gcc driver will pass the correct flags to the linker:


<syntaxhighlight lang="C">
=== Signal handling ===
#undef LINK_SPEC
Newlib has two different mechanisms for dealing with UNIX signals (see the man pages for signal()/raise()). In the first, it provides its own emulation, where it maintains a table of signal handlers in a per-process manner. If you use this method, then you will only be able to respond to signals sent from within the current process. In order to support it, all you need to do is make sure your crt0 calls '_init_signal' before it calls main, which sets up the signal handler table.
#define LINK_SPEC "%{shared:-shared} %{static:-static} %{!shared: %{!static: %{rdynamic:-export-dynamic}}}"
</syntaxhighlight>


=== Linker Options ===
Alternatively, you can provide your own implementation. To do this, you need to define your own version of signal() in syscalls.c. A typical implementation would register the handler somewhere in kernel space, so that issuing a signal from another process causes the corresponding function to be called in the receiving process (this will also require some nifty stack-playing in the receiving process, as you are basically interrupting the program flow in the middle). You then need to provide a kill() function in syscalls.c which actually sends signals to another process. Newlib will still define a raise() function for you, but it is just a stub which calls kill() with the current process id. To switch newlib to this mode, you need to #define the SIGNAL_PROVIDED macro when compiling. A simple way to do this is to add the line:

newlib_cflags="${newlib_cflags} -DSIGNAL_PROVIDED"
You can modify the arguments passed to the linker using <tt>LINK_SPEC</tt>. This can be used to force 4KB alignment of sections on 64 bit systems, as ld defaults to 2MB alignment.
to your host's entry in newlib/configure.host. It would probably also make sense to provide sigaction(), and provide signal() as a wrapper for it. Note that [http://www.opengroup.org/onlinepubs/007908799/xsh/sigaction.html OpenGroup.org's] definition of sigaction states that 1) sigaction supersedes signal, and 2) an application designed shouldn't use both to manipulate the same signal.

<syntaxhighlight lang="C">
/* Tell ld to force 4KB pages*/
#undef LINK_SPEC
#define LINK_SPEC "-z max-page-size=4096"
</syntaxhighlight>

== Selecting a C Library ==


{{Main|C Library}}
= Building =
Configuring and building should be done in the build-xxx directory specific to the package you are building. Do not attempt to configure or build within the source directory. It is a method not supported by either myself or GNU. The configure option --disable-nls is optional, although I haven't tested without it.


At this point, you have to decide which [[C Library]] to use. You have options:
Note that on multi-processor build machines, you may be able to speed the build process by adding "-j X" (where X is the number of jobs to run in parallel) to the make commands below.


* [[Creating a C_Library|Create your own C library]].
Also note that if your build environment requires you to make the install target as a different user (such as root), you'll need to add $PREFIX/bin to the $PATH environment variable for that user as well.
* Pick an existing [[C Library]] such as [[Porting Newlib|Newlib]].


== Building ==
=== Note for Mac OS X users ===
The [[GCC_Cross-Compiler#MacOS_users.2C_beware|warning]] listed in the GCC Cross-Compiler article applies here as well. Make sure to read it and set up $CC/$CXX/$CPP/$LD to point to a "real" GCC, as opposed to LLVM-GCC, or your build will most likely fail!


{{Main|Hosted GCC Cross-Compiler}}
== Binutils ==
From /usr/src/build-binutils, run
../binutils-2.18/configure --target=$TARGET --prefix=$PREFIX --disable-werror
make
make install
export PATH=$PATH:$PREFIX/bin


Your OS specific toolchain is built differently from the introductory <tt>i686-elf</tt> toolchain as it has a user-space and standard library. In particular, you need to ensure your libc meets the minimum requirements for libgcc. You need to install the standard library headers into your [[Meaty_Skeleton#System_Root|System Root]] before building the cross-compiler. You need to tell the cross-binutils and cross-gcc where the system root is via the configure option <tt>--with-sysroot=/path/to/sysroot</tt>. You can then build your libc with your cross-compiler and then finally libstdc++ if desired.
== GCC ==
From /usr/src/build-gcc, run (only use --enable-languages=c if you haven't downloaded and unpacked gcc-g++!)
../gcc-4.2.1/configure --target=$TARGET --prefix=$PREFIX --disable-nls --enable-languages=c,c++
make all-gcc
make install-gcc


== Conclusion ==
You will also want libgcc. Run
make all-target-libgcc
make install-target-libgcc


You now have a <tt>i686-myos</tt> toolchain that can be used instead of your old <tt>i686-elf</tt> toolchain. Your new toolchain is effectively just a renamed <tt>i686-elf</tt> with customizations. You should switch all your operating system build scripts to use this new compiler, even the kernel and libk, as your new compiler is capable of providing a freestanding environment.


You will certainly wish to package up your custom toolchain (and be able to create a diff between the upstream version and your custom version for others to audit). Contributors should be able to download tarballs of your myos-binutils and myos-gcc packages, so they can build themselves your custom toolchain.
== Newlib ==
From /usr/src/build-newlib, run
../newlib-1.15.0/configure --target=$TARGET --prefix=$PREFIX
make
make install
and optionally
make install-info


== Libstdc++ ==
== Common errors ==
From /usr/src/build-gcc, run
make
make install
Note that libstdc++ can only be built after installing newlib, as it depends on libc.


=== Whitespaces ===
= Common errors =


Some files need tabs, some files need spaces and some files accept happily any mixture. Use an editor that can display special chars such as tabs and spaces, to be sure you use the right form. Whitespace errors may result in 'make' reporting missing separators. Some editors will replace a tab with four spaces, which will also cause invalid separator issues.
== Whitespaces ==
Some files need tabs, some files need spaces and some files accept happily any mixture. Use an editor that can display special chars such as tabs and spaces, to be sure you use the right form.
Whitespace errors may result in 'make' reporting missing separators. Some editors will replace a tab with four spaces, which will also cause invalid separator issues (Code::Blocks 8.02 does this - you need to switch it off in Settings > Editor)


== Autoconf ==
=== Autoconf ===
There are several steps that conclude in running 'autoconf' or 'autoreconf', be sure you did not miss them. The order of autoconf/-reconf calls in a packet is important.
These errors may result in missing subdirectories of the build-* directory and/or 'make' reporting missing targets.


There are several steps that conclude in running '<tt>autoconf</tt>' or '<tt>automake</tt>', '<tt>autoreconf</tt>', be sure you did not miss them. The order of autoconf/-reconf calls in a package is important. These errors may result in missing subdirectories of the build-* directory and/or 'make' reporting missing targets.
= Caveats =
== Using the cross toolchain to compile your kernel ==
Generally you don't want libc and crt0.o in your kernel. Link your kernel with the -nostdlib option to i586-pc-myos-gcc. Yes, gcc is also a front-end to ld, and is probably the preferred way to interface to it. You should specify a linker script to link your kernel, otherwise the kernel's .text section will be at the same location as a process' .text section, which is a bad thing. Use the -Wl,-Tlinkerscript.ld option.


== See Also ==
== Linking a kernel with libsupc++ ==
You can use your libsupc++ to get exception handling and RTTI in a C++ kernel (no more passing -fno-exceptions -fno-rtti to g++!) so you can use things like throw and dynamic_cast<>. Libsupc++ depends upon [[libgcc]] for stack unwinding support. Passing the -nostdlib option to gcc when linking caused libgcc.a and libsupc++.a to not be included, so you need to specify -lgcc -lsupc++ on the command line (no need to specify the directories; gcc knows where it installed them to). In addition, you need to include a .eh_frame section in your linker script and terminate it with 32 bits of zeros (QUAD(0) is a useful linker script command). The symbol start_eh_frame should point to the start of the eh_frame section, and it should be aligned by 4. In addition you need to include your constructors and destructors in the link (see [[C++]] for details). You also need to provide __register_frame() (or call the function provided by libgcc with the start of your .eh_frame section), void *__dso_handle;, __cxa_atexit() and __cxa_finalize (again see [[C++]]). Something along the lines of
#include <reent.h>
static struct _reent global_reent;
struct _reent *_impure_ptr = &global_reent;
somewhere in your kernel will keep libgcc happy, because it expects these bits to be provided by newlib (which you aren't linking into your kernel). Libgcc expects a number of (simple) C library functions to be provided (again normally by newlib) by your kernel, including abort, malloc, free, memcpy, memset and strlen. Libsupc++ also requires write, fputs, fputc, fwrite, strcpy and strcat for debugging output.


= Disclaimer =
=== Articles ===
This tutorial is provided for people who are already happy compiling a [[GCC Cross-Compiler]] and are generally sure of what they are doing. It is not, in any way, intended to replace the already excellent articles on [[GCC Cross-Compiler]] and [[Porting Newlib]] but is instead intended as a further resource for those who wish to take another step towards having their OS become self-hosting. The other tutorials mentioned are extensively tested and known to work, whereas the steps described here (given their increased complexity) are expected to be less well tested and could well include a number of bugs.


* [[GCC]]
=See Also=
===External Links===
* Inspired by this article, [[Boomstick]] is a script to build a complete GCC toolchain, including newlib, for your OS. Just fill in the stubs! (or don't :)


[[Category:Tools]]
[[Category:Toolchains]]
[[Category:Tutorials]]
[[Category:Tutorials]]

Latest revision as of 18:38, 16 June 2024

Difficulty level

Advanced

This tutorial will guide you through creating a toolchain comprising Binutils and GCC that specifically targets your operating system. The instructions below teach Binutils and GCC how to create programs for a hypothetical OS named 'MyOS'.

Until now you have been using a cross-compiler configured to use an existing generic bare target. This is very convenient when starting out as you get a reliable target and the compiler doesn't make any bad assumptions because it thinks it is targeting an existing operating system. However, when you proceed it becomes useful if the compiler knows it is targeting your operating system and what its customs are. For instance, you can make the compiler define a __myos__ preprocessor macro, know which directories to search for include files in, what special crt*.o files are used when linking against libc, and so on. It also becomes much easier to cross-compile software to your OS when you simply have to invoke x86_64-myos-gcc hello.c -o hello to cross-compile a program. Additionally part of the instructions here can be applied to other software packages that also use the GNU build system, which will help you port existing software.

This tutorial teaches you how to set up a cross-compiler that specifically targets your OS. This is actually the first step of porting Binutils and GCC to your operating system: Any information you give GCC about your OS will help it run on your OS. Once your OS Specific Toolchain has been set up and you have built your OS with it, you can continue by using the cross-compiler to cross-compile the compiler itself to your OS, assuming your libc and kernel is powerful enough. For more information see Porting GCC to your OS.

Introduction

You need the following before you get started:

  • A build environment that can successfully build a GCC Cross-Compiler.
  • autoconf (exactly version 2.69)
  • automake (exactly version 1.15.1)
  • libtool
  • The latest Binutils source code (2.39 at time of writing).
  • The latest GCC source code (12.2.0 at time of writing).
  • Knowledge of the internals of Binutils and GCC.
  • Knowledge of autoconf and automake.
  • The dependencies of Binutils and GCC as detailed in GCC Cross-Compiler.

If you're compiling a later version of GCC/binutils, find versions of autoconf and automake that were released just prior to your version of GCC/binutils. See the Cross-Compiler Successful Builds page for known compatible versions of GCC and binutils.

Additionally you will need a C Library as described in a later section. As detailed in Hosted GCC Cross-Compiler, it doesn't need to support much and the functionality can be stubbed, but libgcc will need to believe you have a libc.

You should decide exactly what targets you'll add to Binutils and GCC. If you have been using a generic i686-elf or x86_64-elf or such target, you'll simply want to swap -elf with -myos and get i686-myos and x86_64-myos. Naturally, don't actually write myos, but rather use the name of your OS converted to lower case. See Target Triplet.

This tutorial currently only have instructions for adding a new x86 and x86_64 target for myos, but it serves as a good enough example that it should be trivial to add more processors by basing it on these instructions and what other operating systems have done.


Making your changes reproducible

Since at some point you might have a contributor that wants participate in your project, you likely want to make your toolchain setup reproducible. Instead of downloading the tar-balls, it is therefore a good idea to clone the repositories of binutils (https://sourceware.org/git/binutils-gdb.git) and gcc (https://gcc.gnu.org/git/gcc.git) locally.

All release versions are tagged within these repositories, so you can use "git checkout binutils-2_39" and "git checkout releases/gcc-12.2.0" to switch to the correct versions respectively. After making the described changes, you can easily create a patch using "git diff" which you can reuse later. Another option would be to fork one of the mirror repositories on GitHub.


Modifying Binutils

config.sub

This is a file you will modify in the same way for each package. It is a GNU standard file produced by including the line 'AC_CANONICAL_SYSTEM' in a configure.ac that is processed by autoconf, and is designed to convert a canonical name of the form i686-pc-myos into separate variables for the processor, vendor and OS, and also rejects systems it doesn't know about. We simply need to add 'myos' to the list of acceptable operating systems. Find the section that begins with the comment "Now accept the basic system types" (it begins '-gnu*') and add '-myos*' to the list. Find a line with some free room and add your entry there.

bfd/config.bfd

This file is part of the configuration for libbfd, the back-end to Binutils which provides a consistent interface for many object file formats. Generally, each platform-specific version of Binutils contains a libbfd which only supports the object files normally in use on that system, as otherwise the library would be massive (libbfd can support a _lot_ of object types). We need to associate our os with some particular object types. There is a long list starting 'WHEN ADDING ENTRIES TO THIS MATRIX' with the first line as 'case "${targ}" in'. We need to add our full canonical name to this list, by adding some cases such as:

  i[3-7]86-*-myos*)
    targ_defvec=i386_elf32_vec
    targ_selvecs=
    targ64_selvecs=x86_64_elf64_vec
    ;;
#ifdef BFD64
  x86_64-*-myos*)
    targ_defvec=x86_64_elf64_vec
    targ_selvecs=i386_elf32_vec
    want64=true
    ;;
#endif

Note: If using binutils-2.24 or older, change i386_elf32_vec to bfd_elf32_i386_vec and x86_64_elf64_vec to bfd_elf64_x86_64_vec.

Be sure to follow the instructions in the comment block above the list and add your entry beneath the comment "#START OF targmatch.h". If you like, you could support different object formats (look at other entries in the list, and the contents of 'bfd' for hints) and also provide more than one to the targ_selvecs line. For instance, you can support coff object files if you add i386coff_vec to the targ_selvecs list. All the x86_64 entries in the file file are wrapped in #ifdef BFD64, as when targmatch.sed processes this file to turn it into to targmatch.h, it will add #ifdef BFD64 so that the relevant code is only compiled when targeting a 64 bit platform

gas/configure.tgt

This file tells the gnu assembler what type of output to generate for each target. It automatically matches the i686 part of your target and generates the correct output for that. We just need to tell it what type of object file to generate for myos. In the section starting 'Assign object format ... case ${generic_target} in' you need to add a line like

  i386-*-myos*)    fmt=elf ;;

You should use 'i386' in this line even if you are targeting x86_64. This is the only file where you should do it. It is basically because the variable 'generic_target' is not your canonical target name, but rather a variable generated further up in the configure.tgt file, and it sets the first part to i386 for any i[3-7]86 or x86_64.

Note: this will use the 'generic' emulation. One side-effect is that gas will interpret slash ('/') as a comment, not as a division operator. This will break any code like "movl $(ADDRESS/PAGE_SIZE), %eax". Using "fmt=elf em=gnu ;;" or "fmt=elf em=linux ;;" will disable slash as a comment character.

ld/configure.tgt

This file tells the gnu linker what 'emulation' to use for each target. An emulation is basically a combination of linker script and executable file format. We are going to define our own emulation called elf_i386_myos. We need to add an entry to the case statement here after 'Please try to keep this table more or less in alphabetic order ... case "${targ}" in':

i[3-7]86-*-myos*)
			targ_emul=elf_i386_myos
			targ_extra_emuls=elf_i386
			targ64_extra_emuls="elf_x86_64_myos elf_x86_64"
			;;
x86_64-*-myos*)
			targ_emul=elf_x86_64_myos
			targ_extra_emuls="elf_i386_myos elf_x86_64 elf_i386"
			;;
  • elf_i386_myos is a 32-bit target for your OS.
  • elf_x86_64_myos is a 64-bit target for your OS.
  • elf_i386 is a bare 32-bit target as you had with i686-elf.
  • elf_x86_64 is a bare 64-bit target for your OS as you had with x86_64-elf.

This setup provides you with a 32-bit toolchain that also can produce 64-bit executables, and a 64-bit toolchain that can also produce 64-bit executables. This comes in handy if you use objcopy, for instance. You can also add targ_extra_emuls entries to specify other targets ld should support. See the ld/configure.tgt file for examples.

ld/emulparams/elf_i386_myos.sh

Now we need to actually define our emulation. There is a generic file called ld/genscripts.sh which creates the required linker scripts for our target (you need more than one, depending on shared object usage and the like: I have 13 for a single target). It uses a linker script template (from the ld/scripttempl directory) to do this, and it creates the actual emulation C file from an emulation template (from the ld/emultempl directory). These templates are customised by running a script in the ld/emulparams directory which sets various variables. You are welcome to define your own emulation and linker templates, but I find the ELF ones adequate, given that they can be customised by simply adding a file to the emulparams directory. This is what we are going to do now. The content of the file could be something like:

source_sh ${srcdir}/emulparams/elf_i386.sh
TEXT_START_ADDR=0x08000000

This script is included by ld/genscripts.sh to customize its behavior through shell variables. We include the base elf_i386.sh script as it sets reasonable defaults. Finally, we override the variables whose defaults we disagree with.

There are a large number of variables that can be set here to customize your toolchain. Read the documentation and look at existing emulations for further information. These are some of the variables that can be set:

  • GENERATE_SHLIB_SCRIPT=yes|no Whether to generate a linker script for shared libraries. We enable this as you might want it later.
  • GENERATE_PIE_SCRIPT=yes|no Whether to generate a linker script for position independent executables. We enable this as you might want it later.
  • SCRIPT_NAME=name Controls which ld/scripttempl/name.sc script generates our linker scripts.
  • TEMPLATE_NAME=name Controls which ld/emultempl/name.em script generates our bfd emulation C implementation.
  • OUTPUT_FORMAT=name The name of the BFD output target we use.
  • TEXT_START_ADDR=0xvalue Controls where the executable begins in memory.

You can read the base elf_i386.sh script for the defaults of these variables, you can then decide for yourself if you wish to override them for your operating system.

ld/emulparams/elf_x86_64_myos.sh

This file is just like the above ld/emulparams/elf_i386_myos.sh but for x86_64.

source_sh ${srcdir}/emulparams/elf_x86_64.sh

ld/Makefile.am

We now just need to tell make how to produce the emulation C file for our specific emulation. Putting the 'targ_emul=elf_i386_myos' line into ld/configure.tgt above implies that your host linker will try to link your target ld executable with an object file called eelf_i386_myos.o. There is a default rule to generate this from eelf_i386_myos.c, so we just need to tell it how to make this eelf_i386_myos.c file. As stated above, we let the genscripts.sh file do the hard work. You need to add eelf_i386_myos.c to the ALL_EMULATION_SOURCES list; you also need to add eelf_x86_64_myos.c to the ALL_64_EMULATION_SOURCES list if applicable.

Note: You must run automake in the ld directory after you modify Makefile.am to regenerate Makefile.in.

Modifying GCC

config.sub

Similar modification to config.sub in Binutils.

gcc/config.gcc

This file defines what needs to be built for each particular target and what to include in the final executable. There are two main sections: one which defines generic options for your operating system, and those which define options specific to your operating system on each individual machine type.

For the first part, find the 'case ${target} in' line just after '# Common parts for widely ported systems' (around line 688) and add something like:

*-*-myos*)
  gas=yes
  gnu_ld=yes
  default_use_cxa_atexit=yes
  use_gcc_stdint=provide
  ;;
  • gas=yes our operating system by default uses the GNU assembler
  • gnu_ld=yes out operating system by default uses the GNU linker
  • default_use_cxa_atexit=yes We will provide __cxa_atexit (You will need to provide this in your standard library)
  • use_gcc_stdint=provide This instructs gcc to provide you with a stdint.h appropiate for your target. Change provide to wrap if you have your own stdint.h, to make GCC wrap yours.

The second section we need to add to is the architecture-specific one. Find the 'case ${target} in' line just before 'tm_file="${tm_file} elfos.h newlib-stdint.h"' (around line 1094) and add something like:

i[34567]86-*-myos*)
    tm_file="${tm_file} i386/unix.h i386/att.h elfos.h glibc-stdint.h i386/i386elf.h myos.h"
    ;;
x86_64-*-myos*)
    tm_file="${tm_file} i386/unix.h i386/att.h elfos.h glibc-stdint.h i386/i386elf.h i386/x86-64.h myos.h"
    ;;

This defines which target configuration header files gets used. You can make i386/myos32.h and i386/myos64.h files if desired.

gcc/config/myos.h

This header allows you to customize your toolchain using preprocessor macros. The relevant parts of GCC will include this header (as controlled by gcc/config.gcc) and modify the behavior according to your customizations.

You can explore gcc/defaults.h for a full list of things you can modify, and more importantly, the assumptions GCC will make about various aspects of your target. For instance, if PID_TYPE is not defined in myos.h, then GCC will default it to int, which can be problematic if that's not what your pid_t is defined as.

/* Useful if you wish to make target-specific GCC changes. */
#undef TARGET_MYOS
#define TARGET_MYOS 1

/* Default arguments you want when running your
   i686-myos-gcc/x86_64-myos-gcc toolchain */
#undef LIB_SPEC
#define LIB_SPEC "-lc" /* link against C standard library */

/* Files that are linked before user code.
   The %s tells GCC to look for these files in the library directory. */
#undef STARTFILE_SPEC
#define STARTFILE_SPEC "crt0.o%s crti.o%s crtbegin.o%s"

/* Files that are linked after user code. */
#undef ENDFILE_SPEC
#define ENDFILE_SPEC "crtend.o%s crtn.o%s"

/* Additional predefined macros. */
#undef TARGET_OS_CPP_BUILTINS
#define TARGET_OS_CPP_BUILTINS()      \
  do {                                \
    builtin_define ("__myos__");      \
    builtin_define ("__unix__");      \
    builtin_assert ("system=myos");   \
    builtin_assert ("system=unix");   \
    builtin_assert ("system=posix");   \
  } while(0);

libstdc++-v3/crossconfig.m4

This file describes how the libstdc++ configure file will examine your operating system and adjust the provided features of libstdc++ accordingly. Add a case similar to

  *-myos*)
    GLIBCXX_CHECK_COMPILER_FEATURES
    GLIBCXX_CHECK_LINKER_FEATURES
    GLIBCXX_CHECK_MATH_SUPPORT
    GLIBCXX_CHECK_STDLIB_SUPPORT
    ;;

Note: You need to run autoconf in the libstdc++-v3 directory.

libgcc/config.host

Find the 'case ${host} in' just prior to 'extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"' (around line 368) and add the cases:

i[34567]86-*-myos*)
	extra_parts="$extra_parts crti.o crtbegin.o crtend.o crtn.o"
	tmake_file="$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic"
	;;
x86_64-*-myos*)
	extra_parts="$extra_parts crti.o crtbegin.o crtend.o crtn.o"
	tmake_file="$tmake_file i386/t-crtstuff t-crtstuff-pic t-libgcc-pic"
	;;

fixincludes/mkfixinc.sh

You should disable fixincludes for your operating system. Find the case statement and add a pattern for your operating system. For instance:

# Check for special fix rules for particular targets
case $machine in
    *-myos* | \
    *-*-myos* | \
    i?86-*-cygwin* | \
	# (... snip ...)
    powerpcle-*-eabi* )
	#  IF there is no include fixing,
	#  THEN create a no-op fixer and exit
	(echo "#! /bin/sh" ; echo "exit 0" ) > ${target}
        ;;

A number of operating systems (especially older and obscure ones) provide troublesome systems headers that fail to strictly comply with various standards. The GCC developers consider it their job to fix these headers. GCC will look into your system root, apply a bunch of patterns to detect headers it doesn't like, then it copies that header into a private GCC system directory (that overrides your standard system directory) and attempts to fix the header. Sometimes fixincludes even break working headers (some people refer to it as breakincludes).

This is rather inconvenient as your libc will likely happen to trigger these patterns (and false positives often happens). Any time you change your system headers, you have to rebuild your compiler so the fixed versions get updated. The first time you encounter this, it will show up as a system header that does nothing different even though you edit it.

This addition to the mkfixinc.sh file forcefully disables fixincludes for your operating system. It's your job to provide working system headers, not the compiler developers'.

Further Customization

TODO: Document more various tips and tricks for further customization of OS specific toolchains.

Changing the Default Include Directory

If you wish to change the default include directory from /usr/include, you can override the native_system_header_dir variable in gcc/config.gcc in the case for your OS.

Changing the Default Library Directory

If you wish to change the default library directory from /usr/lib, you can change it to /lib by adding the following block of code to the case just below the declaration of NATIVE_LIB_DIRS in binutils/ld/configure.tgt (around line 1056).

*-*-myos*)
  NATIVE_LIB_DIRS='/lib /local/lib'
  ;;

Start Files Directory

You can modify which directory GCC looks for the crt0.o, crti.o and crtn.o in. The path to that directory is stored in STANDARD_STARTFILE_PREFIX. For instance, if you change the library directory to /lib in Binutils and want GCC to match, you can add the following to gcc/config/myos.h:

#undef STANDARD_STARTFILE_PREFIX
#define STANDARD_STARTFILE_PREFIX "/lib/"

Note that the trailing slash is important as the raw crt*.o names are appended without first adding a slash.


Dynamic linking

If you want to support dynamic linking with your toolchain, you need to...

  • Pass the "--enable-shared" flag to the ./configure scripts of both binutils and gcc
  • Set LINK_SPEC in your "gcc/config/myos.h" so the gcc driver will pass the correct flags to the linker:
#undef LINK_SPEC
#define LINK_SPEC "%{shared:-shared} %{static:-static} %{!shared: %{!static: %{rdynamic:-export-dynamic}}}"

Linker Options

You can modify the arguments passed to the linker using LINK_SPEC. This can be used to force 4KB alignment of sections on 64 bit systems, as ld defaults to 2MB alignment.

/* Tell ld to force 4KB pages*/
#undef LINK_SPEC
#define LINK_SPEC "-z max-page-size=4096"

Selecting a C Library

Main article: C Library

At this point, you have to decide which C Library to use. You have options:

Building

Main article: Hosted GCC Cross-Compiler

Your OS specific toolchain is built differently from the introductory i686-elf toolchain as it has a user-space and standard library. In particular, you need to ensure your libc meets the minimum requirements for libgcc. You need to install the standard library headers into your System Root before building the cross-compiler. You need to tell the cross-binutils and cross-gcc where the system root is via the configure option --with-sysroot=/path/to/sysroot. You can then build your libc with your cross-compiler and then finally libstdc++ if desired.

Conclusion

You now have a i686-myos toolchain that can be used instead of your old i686-elf toolchain. Your new toolchain is effectively just a renamed i686-elf with customizations. You should switch all your operating system build scripts to use this new compiler, even the kernel and libk, as your new compiler is capable of providing a freestanding environment.

You will certainly wish to package up your custom toolchain (and be able to create a diff between the upstream version and your custom version for others to audit). Contributors should be able to download tarballs of your myos-binutils and myos-gcc packages, so they can build themselves your custom toolchain.

Common errors

Whitespaces

Some files need tabs, some files need spaces and some files accept happily any mixture. Use an editor that can display special chars such as tabs and spaces, to be sure you use the right form. Whitespace errors may result in 'make' reporting missing separators. Some editors will replace a tab with four spaces, which will also cause invalid separator issues.

Autoconf

There are several steps that conclude in running 'autoconf' or 'automake', 'autoreconf', be sure you did not miss them. The order of autoconf/-reconf calls in a package is important. These errors may result in missing subdirectories of the build-* directory and/or 'make' reporting missing targets.

See Also

Articles