How kernel, compiler, and C library work together: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
m (Use {{Main}})
 
(23 intermediate revisions by 13 users not shown)
Line 1: Line 1:
{{Convert}}

==Kernel==
==Kernel==

The kernel is the core of an operating system. In a traditional design, it is responsible for memory management, I/O, interrupt handling, and various other things. And even while some modern designs like [[Microkernel]]s or [[Exokernel]]s move several of these services into user space, this matters little in the scope of this document.
The kernel is the core of an operating system. In a traditional design, it is responsible for memory management, I/O, interrupt handling, and various other things. And even while some modern designs like [[Microkernel]]s or [[Exokernel]]s move several of these services into user space, this matters little in the scope of this document.


The kernel makes its services available through a set of system calls; how they are called and what they do exactly differs from kernel to kernel.
The kernel makes its services available through a set of system calls; how they are called and what they do exactly differs from kernel to kernel.

==C Library==
==C Library==
{{Main|C Library|Creating a C Library}}

One thing up front: When you begin working on your kernel, you do not have a C library available. You have to provide everything yourself, except a few pieces provided by the compiler itself. You will also have to port an existing C library or write one yourself.
<blockquote>
''One thing up front: When you begin working on your kernel, ''you
do not have a C library available''. You ''must not <tt>#include</tt>
anything'' you did not write yourself. You will also have to port
an existing C library. See [[GCC Cross-Compiler]], Future Steps /
Standard Library.''
</blockquote>


The C library implements the standard C functions (i.e., the things declared in <stdlib.h>, <math.h>, <stdio.h> etc.) and provides them in binary form suitable for linking with user-space applications.
The C library implements the standard C functions (i.e., the things declared in <stdlib.h>, <math.h>, <stdio.h> etc.) and provides them in binary form suitable for linking with user-space applications.


In addition to standard C functions (as defined in the ISO standard), a C library might (and usually does) implement further functionality, which might or might not be defined by some standard. The standard C library says nothing about networking, for example. For Unix-alike systems, the POSIX standard defines what is expected from a C library; other systems might differ fundamentally.
In addition to standard C functions (as defined in the ISO standard), a C library might (and usually does) implement further functionality, which might or might not be defined by some standard. The standard C library says nothing about networking, for example. For Unix-like systems, the POSIX standard defines what is expected from a C library; other systems might differ fundamentally.


It should be noted that, in order to implement its functionality, the C library must call kernel functions. So, for your own OS, you can of course take a ready-made C library and just recompile it for your OS - but that requires that you tell the library how to call your kernel functions, and your kernel to actually provide those functions. The good news is that [http://sourceware.org/newlib/libc.html#SEC195 relatively few of the library's functions do use some system call]
It should be noted that, in order to implement its functionality, the C library must call kernel functions. So, for your own OS, you can of course take a ready-made C library and just recompile it for your OS - but that requires that you tell the library how to call your kernel functions, and your kernel to actually provide those functions.

Available libraries include the [http://www.gnu.org/software/libc/ GNU C library] (with info about [http://www.gnu.org/software/libc/manual/html_node/Porting.html porting the glibc]), [http://sources.redhat.com/newlib/ newlib] (with info on the required OS functions detailed in the manual), and [http://www.uclibc.org uClibC] (although that is highly optimized to be used with an embedded ''Linux'').

You may also wish to check the [http://my.execpc.com/~geezer/osd/code/lib/ 'OS for Dummies' limited library], which should be enough for kernel programming

And finally, you have [http://www.netlib.org/fdlibm/ fdlibm] and [http://www.fefe.de/dietlibc/ dietlib] that can provide you minimun standard support if you're fine with GPL.

<blockquote>
''A more elaborate example of LibraryCalls is available on a different page.''
</blockquote>


A more elaborate example is available in [[Library Calls]] or, you can use an existing [[C Library]] or [[Creating a C Library|create your own C Library]].
==Compiler / Assembler==
==Compiler / Assembler==

An Assembler takes (plaintext) source code and turns it into (binary) machine code; more precisely, it turns the source into ''object'' code, which contains additional information like symbol names, relocation information etc.
An Assembler takes (plaintext) source code and turns it into (binary) machine code; more precisely, it turns the source into ''object'' code, which contains additional information like symbol names, relocation information etc.


Line 41: Line 21:
The resulting object code does ''not'' yet contain any code for standard functions called. If you <tt>include</tt>d e.g. <tt><stdio.h></tt> and used <tt>printf()</tt>, the object code will merely contain a ''reference'' stating that a function named <tt>printf()</tt> (and taking a <tt>const char *</tt> and a number of unnamed arguments as parameters) must be linked to the object code in order to receive a complete executable.
The resulting object code does ''not'' yet contain any code for standard functions called. If you <tt>include</tt>d e.g. <tt><stdio.h></tt> and used <tt>printf()</tt>, the object code will merely contain a ''reference'' stating that a function named <tt>printf()</tt> (and taking a <tt>const char *</tt> and a number of unnamed arguments as parameters) must be linked to the object code in order to receive a complete executable.


Some compilers use standard library functions ''internally'', which might result in object files referencing e.g. <tt>memset()</tt> or <tt>memcpy()</tt> even though you did not include the header or used a function of this name. You will have to provide an implementation of these functions to the linker, or the linking will fail (see below).
Some compilers use standard library functions ''internally'', which might result in object files referencing e.g. <tt>memset()</tt> or <tt>memcpy()</tt> even though you did not include the header or used a function of this name. You will have to provide an implementation of these functions to the linker, or the linking will fail. The GCC freestanding environment expects only the functions <tt>memset()</tt>, <tt>memcpy()</tt>, <tt>memcmp()</tt>, and <tt>memmove()</tt>, as well as the [[libgcc]] library. Some advanced operations (e.g. 64-bits divisions on a 32-bits system) might involve ''compiler-internal'' functions. For [[GCC]], those functions are residing in libgcc. The content of this library is agnostic of what OS you use, and it won't taint your compiled kernel with licensing issues of whatever sort.
==Linker==

Some advanced operations (e.g. 64-bits divisions on a 32-bits system) might involve ''compiler-internal'' functions. For [[GCC]], those functions are residing in libgcc.a. The content of this library is agnostic of what OS you use, and it won't taint your compiled kernel with licensing issues of whatever sort, so you are welcome to locate "libgcc.a" and link it with your kernel.

== Linker ==

A linker takes the object code generated by the compiler / assembler, and ''links'' it against the C library (and / or libgcc.a or whatever link library you provide). This can be done in two ways: static, and dynamic.
A linker takes the object code generated by the compiler / assembler, and ''links'' it against the C library (and / or libgcc.a or whatever link library you provide). This can be done in two ways: static, and dynamic.
===Static Linking===

== Static Linking ==

When linking statically, the linker is invoked during the build process, just after the compiler / assembler run. It takes the object code, checks it for unresolved references, and checks if it can resolve these references from the available libraries. It then adds the binary code from these libraries to the executable; after this process, the executable is ''complete'', i.e. when running it does not require anything but the kernel to be present.
When linking statically, the linker is invoked during the build process, just after the compiler / assembler run. It takes the object code, checks it for unresolved references, and checks if it can resolve these references from the available libraries. It then adds the binary code from these libraries to the executable; after this process, the executable is ''complete'', i.e. when running it does not require anything but the kernel to be present.


On the downside, the executable can become quite large, and code from the libraries is duplicated over and over, both on disk and in memory.
On the downside, the executable can become quite large, and code from the libraries is duplicated over and over, both on disk and in memory.
===Dynamic Linking===

== Dynamic Linking ==

When linking dynamically, the linker is invoked during the ''loading'' of an executable. The unresolved references in the object code are resolved against the libraries currently present in the system. This makes the on-disk executable much smaller, and allows for in-memory space-saving strategies such as ''shared libraries'' (see below).
When linking dynamically, the linker is invoked during the ''loading'' of an executable. The unresolved references in the object code are resolved against the libraries currently present in the system. This makes the on-disk executable much smaller, and allows for in-memory space-saving strategies such as ''shared libraries'' (see below).


On the downside, the executable becomes dependent on the presence of the libraries it references; if a system does not have those libraries, the executable cannot run.
On the downside, the executable becomes dependent on the presence of the libraries it references; if a system does not have those libraries, the executable cannot run.
===Shared Libraries===

== Shared Libraries ==

A popular strategy is to ''share'' dynamically linked libraries across multiple executables. This means that, instead of attaching the binary of the library to the executable image, the references in the executable are tweaked, so that all executables refer to the same in-memory representation of the required library.
A popular strategy is to ''share'' dynamically linked libraries across multiple executables. This means that, instead of attaching the binary of the library to the executable image, the references in the executable are tweaked, so that all executables refer to the same in-memory representation of the required library.


Line 68: Line 38:


Second, in a virtual memory environment, it is usually impossible to provide a library to all executables in the system at the same virtual memory address. To access library code at an arbitrary virtual address requires the library code to be ''position independent'' (which can be achieved e.g. by setting the -PIC command line option for the [[GCC]] compiler). This requires support of the feature by the binary format (relocation tables), and can result in slightly less efficient code on some architectures.
Second, in a virtual memory environment, it is usually impossible to provide a library to all executables in the system at the same virtual memory address. To access library code at an arbitrary virtual address requires the library code to be ''position independent'' (which can be achieved e.g. by setting the -PIC command line option for the [[GCC]] compiler). This requires support of the feature by the binary format (relocation tables), and can result in slightly less efficient code on some architectures.
===ABI - Application Binary Interface===
The ABI of a system defines how library function calls and kernel system calls are actually done. This includes whether parameters are passed on the stack or in registers, how function entry points are located in libraries, and other such concerns.


When using static linkage, the resulting executable depends on the kernel using the same ABI as the one the executable was built for; when using dynamic linkage, the executable depends on the libraries' ABI staying the same.
== ABI - Application Binary Interface ==
===Unresolved Symbols===

The linker is the stage where you will find out about stuff that has been added without your knowledge, and which is not provided by your environment. This can include references to <tt>alloca()</tt>, <tt>memcpy()</tt>, or several others. This is usually a sign that either your toolchain or your command line options are not correctly set up for compiling your own OS kernel - or that you are using functionality that is not yet implemented in your C library / runtime environment! You will most certainly run into trouble if you are not using a [[GCC Cross-Compiler|cross-compiler]] and the libgcc library and have implementations of memcpy, memmove, memset and memcmp.
The ABI of a system defines how library function calls and kernel system calls are actually done. This includes e.g. whether parameters are passed on the stack or in registers, how function entry points are located in libraries etc.

When using static linkage, the resulting executable is depending on the executing kernel using the same ABI as the one it was built for; when using dynamic linkage, the executable is depending on the libraries' ABI staying the same.

==Unresolved Symbols==

The linker is the stage where you will find out about stuff that has been added without your knowledge, and which is not provided by your environment. This can include references to <tt>alloca()</tt>, <tt>memcpy()</tt>, or several others. This is usually a sign that either your toolchain or your command line options are not correctly set up for compiling your own OS kernel - or that you are using functionality that is not yet implemented in your C library / runtime environment!

It is '''strongly''' recommended to build a [[GCC Cross-Compiler]] to avoid this and similar problems right from the start.

Other kind of symbols such as _udiv* or __builtin_saveregs are available in the OS-agnostic "libgcc.a". If get error about missing such symbols, try to link your kernel against that library too. See [[Topic:10783|this thread]] for details.

==__alloca, ___main==

<tt>alloca()</tt> is a "semi-standard" C function (from BSD?, but supported by most C implementations) that is used to allocate memory from the stack. On Windows this function is also used for stack probing. As such, <tt>alloca()</tt> is referenced in PE binaries, build e.g. by the Cygwin GCC. You can set <tt>-mno-stack-arg-probe</tt> to suppress those references.

Another "specialty" of PE binaries is that, if you define <tt>int main()</tt>, a function <tt>void __main()</tt> is called first thing after entering <tt>main()</tt>. You can either define that function, or omit <tt>main()</tt> from your kernel code, using a different function as entry point.

This explanation of <tt>alloca()</tt> comes from Chris Giese, posted to alt.os.dev:

<blockquote>
>> I think _alloca() is for stack-probing, which is required by Windows.<br>

> What is stack probing?<br>

By default, Windows reserves 1 meg of virtual memory for the stack. No page of stack memory is actually allocated (commited) until the page is accessed. This is demand-allocation.

The page beyond the top of the stack is the guard page. If this page is accessed, memory will be allocated for it, and the guard page moved downward by 4K (one page). Thus, the stack can grow beyond the initial 1 meg.

Windows will not, however, let you grow the stack by accessing discontiguous pages of memory. Going beyond the guard page causes an exception. Stack-probing code prevents this.
</blockquote>

Some more information about stack-probing:
* http://groups.google.com/groups?&selm=702bki%24oki%241%40news.Eindhoven.NL.net
* http://groups.google.com/groups?&selm=3381B63D.6E39%40austin.finnigan.com


<tt>alloca()</tt> is not just used for stack-probing, but also as a sort of <tt>malloc()</tt> -- for dynamically allocating memory for variables/buffers -- without the need to manually free the reserved memory (with <tt>free()</tt> as it should be done for <tt>malloc()</tt> allocated memory).

If you search the man pages for alloca on a UNIX OS you'll find something like this:
"The alloca() function allocates space in the stack frame of the caller, and returns a pointer to the allocated block. This temporary space is automatically freed when the function from which alloca() is called returns."

On Windows:

"_alloca allocates size bytes from the program stack. The allocated space is automatically freed when the calling function exits (not when the allocation merely passes out of scope). Therefore, do not pass the pointer value returned by _alloca as an argument to free."

Note that "A stack overflow exception is generated if the space cannot be allocated." so <tt>alloca()</tt> might cause the stack to grow.

Unfortunately, there are all kinds of caveats with using <tt>alloca()</tt>. Notably, Turbo~C/C++ had the limitation that functions calling <tt>alloca()</tt> needed to have at least one local variable, and Linux man-page warns the following:

<blockquote>
''The alloca function is machine and compiler dependent. On many systems its
implementation is buggy. Its use is discouraged.''
</blockquote>

<blockquote>
''On many systems alloca cannot be used inside the list of arguments of a
function call, because the stack space reserved by alloca would appear on
the stack in the middle of the space for the function arguments.''
</blockquote>

==memcpy==


Other symbols, such as _udiv* or __builtin_saveregs, are available in [[libgcc]]. If you get errors about missing such symbols, remember that you need to link with libgcc.
This function is used internally by [[GCC]]. You should set the =--no-builtin= switch, and provide your own implementation of =memcpy()= if you want to be independent of [[GCC]] ([http://my.execpc.com/~~geezer/osd/code/ OSD library] just has it ;).
[[Category:Kernel]]
[[Category:Compilers]]
[[Category:Linkers]]
[[Category:ABI]]

Latest revision as of 01:15, 10 June 2024

Kernel

The kernel is the core of an operating system. In a traditional design, it is responsible for memory management, I/O, interrupt handling, and various other things. And even while some modern designs like Microkernels or Exokernels move several of these services into user space, this matters little in the scope of this document.

The kernel makes its services available through a set of system calls; how they are called and what they do exactly differs from kernel to kernel.

C Library

Main articles: C Library and Creating a C Library

One thing up front: When you begin working on your kernel, you do not have a C library available. You have to provide everything yourself, except a few pieces provided by the compiler itself. You will also have to port an existing C library or write one yourself.

The C library implements the standard C functions (i.e., the things declared in <stdlib.h>, <math.h>, <stdio.h> etc.) and provides them in binary form suitable for linking with user-space applications.

In addition to standard C functions (as defined in the ISO standard), a C library might (and usually does) implement further functionality, which might or might not be defined by some standard. The standard C library says nothing about networking, for example. For Unix-like systems, the POSIX standard defines what is expected from a C library; other systems might differ fundamentally.

It should be noted that, in order to implement its functionality, the C library must call kernel functions. So, for your own OS, you can of course take a ready-made C library and just recompile it for your OS - but that requires that you tell the library how to call your kernel functions, and your kernel to actually provide those functions.

A more elaborate example is available in Library Calls or, you can use an existing C Library or create your own C Library.

Compiler / Assembler

An Assembler takes (plaintext) source code and turns it into (binary) machine code; more precisely, it turns the source into object code, which contains additional information like symbol names, relocation information etc.

A compiler takes higher-level language source code, and either directly turns it into object code, or (as is the case with GCC) turns it into Assembler source code and invokes an Assembler for the final step.

The resulting object code does not yet contain any code for standard functions called. If you included e.g. <stdio.h> and used printf(), the object code will merely contain a reference stating that a function named printf() (and taking a const char * and a number of unnamed arguments as parameters) must be linked to the object code in order to receive a complete executable.

Some compilers use standard library functions internally, which might result in object files referencing e.g. memset() or memcpy() even though you did not include the header or used a function of this name. You will have to provide an implementation of these functions to the linker, or the linking will fail. The GCC freestanding environment expects only the functions memset(), memcpy(), memcmp(), and memmove(), as well as the libgcc library. Some advanced operations (e.g. 64-bits divisions on a 32-bits system) might involve compiler-internal functions. For GCC, those functions are residing in libgcc. The content of this library is agnostic of what OS you use, and it won't taint your compiled kernel with licensing issues of whatever sort.

Linker

A linker takes the object code generated by the compiler / assembler, and links it against the C library (and / or libgcc.a or whatever link library you provide). This can be done in two ways: static, and dynamic.

Static Linking

When linking statically, the linker is invoked during the build process, just after the compiler / assembler run. It takes the object code, checks it for unresolved references, and checks if it can resolve these references from the available libraries. It then adds the binary code from these libraries to the executable; after this process, the executable is complete, i.e. when running it does not require anything but the kernel to be present.

On the downside, the executable can become quite large, and code from the libraries is duplicated over and over, both on disk and in memory.

Dynamic Linking

When linking dynamically, the linker is invoked during the loading of an executable. The unresolved references in the object code are resolved against the libraries currently present in the system. This makes the on-disk executable much smaller, and allows for in-memory space-saving strategies such as shared libraries (see below).

On the downside, the executable becomes dependent on the presence of the libraries it references; if a system does not have those libraries, the executable cannot run.

Shared Libraries

A popular strategy is to share dynamically linked libraries across multiple executables. This means that, instead of attaching the binary of the library to the executable image, the references in the executable are tweaked, so that all executables refer to the same in-memory representation of the required library.

This requires some trickery. For one, the library must either not have any state (static or global data) at all, or it must provide a separate state for each executable. This gets even trickier with multi-threaded systems, where one executable might have more than one simultaneous control flow.

Second, in a virtual memory environment, it is usually impossible to provide a library to all executables in the system at the same virtual memory address. To access library code at an arbitrary virtual address requires the library code to be position independent (which can be achieved e.g. by setting the -PIC command line option for the GCC compiler). This requires support of the feature by the binary format (relocation tables), and can result in slightly less efficient code on some architectures.

ABI - Application Binary Interface

The ABI of a system defines how library function calls and kernel system calls are actually done. This includes whether parameters are passed on the stack or in registers, how function entry points are located in libraries, and other such concerns.

When using static linkage, the resulting executable depends on the kernel using the same ABI as the one the executable was built for; when using dynamic linkage, the executable depends on the libraries' ABI staying the same.

Unresolved Symbols

The linker is the stage where you will find out about stuff that has been added without your knowledge, and which is not provided by your environment. This can include references to alloca(), memcpy(), or several others. This is usually a sign that either your toolchain or your command line options are not correctly set up for compiling your own OS kernel - or that you are using functionality that is not yet implemented in your C library / runtime environment! You will most certainly run into trouble if you are not using a cross-compiler and the libgcc library and have implementations of memcpy, memmove, memset and memcmp.

Other symbols, such as _udiv* or __builtin_saveregs, are available in libgcc. If you get errors about missing such symbols, remember that you need to link with libgcc.