How kernel, compiler, and C library work together: Difference between revisions

m
Use {{Main}}
[unchecked revision][unchecked revision]
(Nuke useless section that doesn't fit in this article)
m (Use {{Main}})
 
(3 intermediate revisions by 3 users not shown)
Line 1:
==Kernel==
 
The kernel is the core of an operating system. In a traditional design, it is responsible for memory management, I/O, interrupt handling, and various other things. And even while some modern designs like [[Microkernel]]s or [[Exokernel]]s move several of these services into user space, this matters little in the scope of this document.
 
The kernel makes its services available through a set of system calls; how they are called and what they do exactly differs from kernel to kernel.
 
==C Library==
:''{{Main Articles: See [[|C Library]], [[|Creating a C Library]]''}}
 
:''Main Articles: See [[C Library]], [[Creating a C Library]]''
 
One thing up front: When you begin working on your kernel, you do not have a C library available. You have to provide everything yourself, except a few pieces provided by the compiler itself. You will also have to port an existing C library or write one yourself.
 
Line 18 ⟶ 14:
 
A more elaborate example is available in [[Library Calls]] or, you can use an existing [[C Library]] or [[Creating a C Library|create your own C Library]].
 
==Compiler / Assembler==
 
An Assembler takes (plaintext) source code and turns it into (binary) machine code; more precisely, it turns the source into ''object'' code, which contains additional information like symbol names, relocation information etc.
 
Line 27 ⟶ 21:
The resulting object code does ''not'' yet contain any code for standard functions called. If you <tt>include</tt>d e.g. <tt><stdio.h></tt> and used <tt>printf()</tt>, the object code will merely contain a ''reference'' stating that a function named <tt>printf()</tt> (and taking a <tt>const char *</tt> and a number of unnamed arguments as parameters) must be linked to the object code in order to receive a complete executable.
 
Some compilers use standard library functions ''internally'', which might result in object files referencing e.g. <tt>memset()</tt> or <tt>memcpy()</tt> even though you did not include the header or used a function of this name. You will have to provide an implementation of these functions to the linker, or the linking will fail. The gccGCC frestandingfreestanding environment expects only the functions <tt>memset()</tt>, <tt>memcpy()</tt>, <tt>memcmp()</tt>, and <tt>memmove()</tt>, as well as the [[libgcc]] library. Some advanced operations (e.g. 64-bits divisions on a 32-bits system) might involve ''compiler-internal'' functions. For [[GCC]], those functions are residing in libgcc. The content of this library is agnostic of what OS you use, and it won't taint your compiled kernel with licensing issues of whatever sort.
 
==Linker==
 
A linker takes the object code generated by the compiler / assembler, and ''links'' it against the C library (and / or libgcc.a or whatever link library you provide). This can be done in two ways: static, and dynamic.
 
===Static Linking===
 
When linking statically, the linker is invoked during the build process, just after the compiler / assembler run. It takes the object code, checks it for unresolved references, and checks if it can resolve these references from the available libraries. It then adds the binary code from these libraries to the executable; after this process, the executable is ''complete'', i.e. when running it does not require anything but the kernel to be present.
 
On the downside, the executable can become quite large, and code from the libraries is duplicated over and over, both on disk and in memory.
 
===Dynamic Linking===
 
When linking dynamically, the linker is invoked during the ''loading'' of an executable. The unresolved references in the object code are resolved against the libraries currently present in the system. This makes the on-disk executable much smaller, and allows for in-memory space-saving strategies such as ''shared libraries'' (see below).
 
On the downside, the executable becomes dependent on the presence of the libraries it references; if a system does not have those libraries, the executable cannot run.
 
===Shared Libraries===
 
A popular strategy is to ''share'' dynamically linked libraries across multiple executables. This means that, instead of attaching the binary of the library to the executable image, the references in the executable are tweaked, so that all executables refer to the same in-memory representation of the required library.
 
Line 52 ⟶ 38:
 
Second, in a virtual memory environment, it is usually impossible to provide a library to all executables in the system at the same virtual memory address. To access library code at an arbitrary virtual address requires the library code to be ''position independent'' (which can be achieved e.g. by setting the -PIC command line option for the [[GCC]] compiler). This requires support of the feature by the binary format (relocation tables), and can result in slightly less efficient code on some architectures.
 
===ABI - Application Binary Interface===
 
The ABI of a system defines how library function calls and kernel system calls are actually done. This includes whether parameters are passed on the stack or in registers, how function entry points are located in libraries, and other such concerns.
 
When using static linkage, the resulting executable depends on the kernel using the same ABI as the one the executable was built for; when using dynamic linkage, the executable depends on the libraries' ABI staying the same.
 
===Unresolved Symbols===
 
The linker is the stage where you will find out about stuff that has been added without your knowledge, and which is not provided by your environment. This can include references to <tt>alloca()</tt>, <tt>memcpy()</tt>, or several others. This is usually a sign that either your toolchain or your command line options are not correctly set up for compiling your own OS kernel - or that you are using functionality that is not yet implemented in your C library / runtime environment! You will most certainly run into trouble if you are not using a [[GCC Cross-Compiler|cross-compiler]] and the libgcc library and have implementations of memcpy, memmove, memset and memcmp.
 
Other symbols, such as _udiv* or __builtin_saveregs, are available in [[libgcc]]. If you get errors about missing such symbols, remember that you need to link with libgcc.
[[Category:Kernel]]
[[Category:Compilers]]
[[Category:Linkers]]
[[Category:ABI]]