Creating a C Library

From OSDev.wiki
Revision as of 13:07, 19 December 2012 by osdev>Sortie (Created page with "This tutorial will discuss implementing your own Standard C Library (libc). While implementing a minimal subset for kernel use is easy, it is considerably more work to impleme...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This tutorial will discuss implementing your own Standard C Library (libc). While implementing a minimal subset for kernel use is easy, it is considerably more work to implement sufficient functionality to port third party programs. You may wish to save yourself the effort and port an existing C library. Before going ahead, you may wish to create a OS Specific Toolchain, which will allow us more control of the compiler.

Program Initialization

The first and most important thing to implement in a C library is the _start function, to which control is passed from your program loader. It's task is to initialize and run the process. Normally this is done by initializing the C library (if needed), then calling the global constructors, and finally calling exit(main(argc, argv)). You can change the name of the default program entry point by adding ENTRY=_my_start_name in your OS-specific binutils emulparams script (binutils/ld/emulparams). You can change which start files are used by modifying gcc/gcc/config/myos.h in your OS-specific GCC. The macros STARTFILE_SPEC and ENDFILE_SPEC define the object files to use in the GCC spec language. See gcc/gcc/config/gnu-user.h for examples on how to use this. If you decide to use the conventional (GNU-like) names and semantics for these initialization files, the following information applies:

crt0.o

The crt0.o object file will contain the _start function that initializes the process and calls exit(main(argc, argv)). This file is normally written in assembly, though depending on how your program loader invokes _start, you may be able to write it in C. Note that the SysV ABIs have some opinions on how _start is invoked and the stack contents. Adhering to this standard may help porting third party software to your OS.

Below is a simple implementation of crt0.s for x86_64. It assumes that the program loader has put *argv and *envp on the stack, and that %rdi contains argc, %rsi contains argv, %rdx contains envc, and %rcx contains envp

.section .text

.global _start
_start:
	# Set up end of the stack frame linked list.
	movq $0, %rbp
	pushq %rbp
	movq %rsp, %rbp

	# We need those in a moment when we call main.
	pushq %rsi
	pushq %rdi

	# Prepare signals, memory allocation, stdio and such.
	call initialize_standard_library

	# Run the global constructors.
	call _init

	# Restore argc and argv.
	popq %rdi
	popq %rsi

	# Run main
	call main

	# Terminate the process with the exit code.
	movl %eax, %edi
	call exit

This implementation is careful to set up the end of the stack frame linked list. If you compile your files without optimization or you use -fno-omit-frame-pointer, then each function adds itself to this linked list. This is very useful if you wish to add calltracing support. In that case, you'll need to know when you have reached the end, which is why we add an explicit zero in the above code..

Next the _start function calls initialize_standard_library. Note how the program loader register usage nicely fits the x86_64 SysV calling convention, and how the initialize_standard_library(int argc, char* argv[], int envc, char* envp[]) function accepts the same arguments as _start. We save the argc and argv variables, because we'll need them in a moment when we call main. How initialize_standard_library is implemented is up to you, but normally it sets environ to envp, sets up stdin, stdout, and stderr, the heap, and whatever needs to be done.

Programming languages such as C++ offer a feature called global constructors. This allows code to be executed before the main function is called, but after the core standard library has been initialized. Note how GCC allows C programs to use the same feature using the function attribute ((constructor)). Some third party software, such as binutils, silently relies on global constructor functions to be called and otherwise malfunction, even if the program compiled seemingly correctly. To call these global constructors, we call the traditionally-named _init function, which is a piece of magic we'll set up in a moment.

Finally we restore the %rdi and %rsi registers and call main, and use its return value as argument to exit. The _exit function must then call the _fini function, which calls the global destructor functions (as opposed to _init that calls the constructors).

crtbegin.o, crtend.o, crti.o, and crtn.o

These special object files are provided by your OS-specific compiler. The compiler is told to make these files in gcc/gcc/config.gcc by extra_parts="crtbegin.o crtend.o". GCC internally maintains tables of global constructor/destructor functions. However, these tables are set up in a manner such that you cannot directly link to them yourself. Rather, the files crtbegin.o and crtend.o contains the instructions that uses these tables and calls them as requested. The bad news, however, is that crtbegin.o and crtend.o offer no symbols we can call. Instead, GCC expects us provide these symbols ourselves using some linker tricks. The STARTFILE_SPEC and ENDFILE_SPEC macros tell the compiler which files must be linked in before anything else, and which files must be linked in after anything else. The idea is that files are linked in this order: crt0.o, crti.o, crtbegin.o, your-program-foo.o, your-program-bar.o, crtend.o, crtn.o.

The crtbegin.o and crtend.o files contain the necessary instructions that call global constructors in the .init section, and the instructions that call the global destructors in the .fini section. GCC expects us to put the header of the _init function in crti.o's .init section and the footer of the _init function in crtn.o.'s .init section. The same applies to the _fini function and the .fini section. Using that, we can now implement the _init and _fini functions in crti.o and crtn.o.

Hence an crti.s implementation will simply be (x86_64):

.section .init
.global _init
_init:
   push %rbp
   movq %rsp, %rbp
   /* gcc will nicely put the contents of crtbegin.o's .init section here. */

.section .fini
.global _fini
_fini:
   push %rbp
   movq %rsp, %rbp
   /* gcc will nicely put the contents of crtbegin.o's .fini section here. */

and a simple implementation of crtn.s will be (x86_64):

.section .init
   /* gcc will nicely put the contents of crtend.o's .init section here. */
   popq %rbp
   ret

.section .fini
   /* gcc will nicely put the contents of crtend.o's .fini section here. */
   popq %rbp
   ret

Finally, you simply need to assemble your crt0.o, crti.o, and crtn.o files and install them in your system library directory. Your _start function is now able to set up the standard library, call the global constructors, and call exit(main(argc, argv)). Don't forget to call your _fini function in your exit function, or the global destructors won't be run, leading to subtle bugs. You can modify which directory GCC looks in for these files by adding the following to gcc/gcc/config/i386/myos64.h:

#undef STANDARD_STARTFILE_PREFIX
#define STANDARD_STARTFILE_PREFIX "/x86_64-myos/lib/"

C++ Compatibility

Traditionally C++ programs can use the C headers, even though such headers are written in C and have C linkage. Unless you specifiy otherwise, GCC will assume that header files found in your system include directory are written in C and have C linkage. This is done by GCC automatically inserting extern "C" { ... } around all included system headers. However, this may lead to strange linking-failures if you try to use C++ headers from the system include directory. The key solution is to make your C headers explicitly compatible with C++ and tell GCC that you understand C++.

Telling GCC that your headers understand C++ is as simple as adding the following to your OS-specific gcc/gcc/config/myos.h:

/* Don't assume anything about the header files.  */
#undef  NO_IMPLICIT_EXTERN_C
#define NO_IMPLICIT_EXTERN_C	1

Adding support for C++ in your C header files (such as stdio.h) is very simple. Previously GCC automatically added extern "C" around all headers if compiling C++, but now we'll need to do it ourselves. The key feature is that we don't need to do this on C++-only headers, meaning that C++ headers will now actually work, instead of GCC assuming they have C-linkage. For instance, you can change your stdio.h to be of this form:

#ifndef _STDIO_H
#define _STDIO_H 1

#if defined(__cplusplus)
extern "C" {
#endif

int printf(const char* format, ...);

#if defined(__cplusplus)
} /* extern "C" */
#endif

#endif

Alternatively, you can mimic GNU's libc and use a pair of macros declared in your <features.h>, which makes it simpler to write C headers and actually may help port third-party software that erroneously uses these macros. In that case, your features.h will contain:

#ifdef	__cplusplus
	#define __BEGIN_DECLS   extern "C" {
	#define __END_DECLS     }
#else
	#define __BEGIN_DECLS
	#define __END_DECLS
#endif

You can then simply use these macros in your header files. The above stdio.h will then look as such:

#ifndef _STDIO_H
#define _STDIO_H 1

#include <features.h>

__BEGIN_DECLS

int printf(const char* format, ...);

__END_DECLS

#endif