Porting Newlib

From OSDev.wiki
Revision as of 10:38, 8 December 2010 by Creature (talk | contribs) (Made the Jeff Johnston quote a bit easier to read by removing the excessive newlines.)
Jump to navigation Jump to search
Difficulty level

Medium

Everyone wants to do it eventually. Porting newlib is one of the easiest ways to get a simple C library into your operating system without an excessive amount of effort. As an added bonus, once complete you can port the toolchain (GCC/binutils) to your OS - and who wouldn't want to do that?

Introduction

I decided that after an incredibly difficult week of trying to get newlib ported to my own OS that I would write a tutorial that outlines the requirements for porting newlib and how to actually do it. I'm assuming you can already load binaries from somewhere and that these binaries are compiled C code. I also assume you have a syscall interface setup already. Why wait? Let's get cracking!


Getting Ready

First of all you need to support a set of 17 system calls that act as 'glue' between newlib and your OS. These calls are the typical "_exit", "_open", "read/write", "execve" (et al). The text below is taken straight from the Red Hat newlib C library documentation:

_exit
    Exit a program without cleaning up files.
    If your system doesn't provide this, it is best to avoid linking with subroutines that
    require it (exit, system).

close
    Close a file.

    Minimal implementation:

    int close(int file){
        return -1;
    }

environ
    A pointer to a list of environment variables and their values.
    For a minimal environment, this empty list is adequate:

    char *__env[1] = { 0 };
    char **environ = __env;

execve
    Transfer control to a new process.

    Minimal implementation (for a system without processes):

    #include <errno.h>
    #undef errno
    extern int errno;
    int execve(char *name, char **argv, char **env){
      errno=ENOMEM;
      return -1;
    }

fork
    Create a new process.

    Minimal implementation (for a system without processes):

    #include <errno.h>
    #undef errno
    extern int errno;
    int fork() {
      errno=EAGAIN;
      return -1;
    }

fstat
    Status of an open file.

    For consistency with other minimal implementations in these examples, all files are regarded as 
    character special devices.

    The `sys/stat.h' header file is distributed in the `include' subdirectory for this C library.

    #include <sys/stat.h>
    int fstat(int file, struct stat *st) {
      st->st_mode = S_IFCHR;
      return 0;
    }

getpid
    Process-ID;

    This is sometimes used to generate strings unlikely to conflict with other processes.       
    
    Minimal implementation, for a system without processes:

    int getpid() {
      return 1;
    }

isatty
    Query whether output stream is a terminal.

    For consistency with the other minimal implementations, which only support output to stdout,
    this minimal implementation is suggested:

    int isatty(int file){
       return 1;
    }

kill
    Send a signal.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int kill(int pid, int sig){
      errno=EINVAL;
      return(-1);
    }

link
    Establish a new name for an existing file.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int link(char *old, char *new){
      errno=EMLINK;
      return -1;
    }

lseek
    Set position in a file.

    Minimal implementation:

    int lseek(int file, int ptr, int dir){
        return 0;
    }

open
    Open a file. Minimal implementation:

    int open(const char *name, int flags, int mode){
        return -1;
    }

read
    Read from a file. Minimal implementation:

    int read(int file, char *ptr, int len){
        return 0;
    }

sbrk
    Increase program data space.
    As malloc and related functions depend on this, it is useful to have a working implementation.
    
    The following suffices for a standalone system;
    it exploits the symbol end automatically defined by the GNU linker. 	

    caddr_t sbrk(int incr){
      extern char end;		/* Defined by the linker */
      static char *heap_end;
      char *prev_heap_end;
     
      if (heap_end == 0) {
        heap_end = &end;
      }
      prev_heap_end = heap_end;
      if (heap_end + incr > stack_ptr)
        {
          _write (1, "Heap and stack collision\n", 25);
          abort ();
        }

      heap_end += incr;
      return (caddr_t) prev_heap_end;
    }

stat
    Status of a file (by name).
    
    Minimal implementation:

    int stat(const char *file, struct stat *st) {
      st->st_mode = S_IFCHR;
      return 0;
    }

times
    Timing information for current process.

    Minimal implementation:
     	
    clock_t times(struct tms *buf){
      return -1;
    }

unlink
    Remove a file's directory entry.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int unlink(char *name){
      errno=ENOENT;
      return -1; 
    }

wait
    Wait for a child process.

    Minimal implementation:

    #include <errno.h>
    #undef errno
    extern int errno;
    int wait(int *status) {
      errno=ECHILD;
      return -1;
    }

write
    Write a character to a file.

    `libc' subroutines will use this system routine for output to all files, 
    including stdout---so if you need to generate any output, for example to a serial port for
    debugging, you should make your minimal write capable of doing this. 

    The following minimal implementation is an incomplete example; it relies on a writechar
    subroutine (not shown; typically, you must write this in assembler from examples provided
    by your hardware manufacturer) to actually perform the output.

    int write(int file, char *ptr, int len){
        int todo;
      
        for (todo = 0; todo < len; todo++) {
            writechar(*ptr++);
        }
        return len;
    }

According to the documentation, you should also redefine errno as an 'extern char':

#include <errno.h>
#undef errno
extern int errno;

Re-entrant versions of these are a bit harder and are outlined in the documentation.

Using the Kernel's System Calls

My kernel exposes all the system calls on interrupt 0x80 (128d) so I just had to put a bit of inline assembly into each stub to do what I needed it to do. It's up to you how to implement them in relation to your kernel.

Compiling

Guess what... you're almost there!

Download newlib source (I'm using 1.15.0) from this ftp server. I'm using Cygwin with an ELF cross-compiler (--target=i586-elf), so I put the source into the C:\cygwin\usr\src folder.

Once the source was downloaded, I loaded up Cygwin. All that needs to be done here is to build newlib:

cd /usr/src
mkdir build-newlib
cd build-newlib
../newlib-''<version>''/configure --prefix=''<location to put libarary>'' --target=i586-elf
make all install

Simple! However, if you try linking a previously written program with newlib you'll get undefined references everywhere. Why? You haven't yet put your 'glue' into the newlib yet.

According to Jeff Johnston on the newlib mailing list:

So, you get the majority of the C library from newlib and the rest (syscalls) is usually in libgloss. Using an ld script makes life easy for the end-user as all they have to do is specify -Txxxx.ld. Inside the ld script you can specify all the libraries needed, where the entry point is, etc.... The libgloss library is a separate library and you name it whatever you want. The ld script handles all of this internally and the user doesn't need to know just what libraries there are out there.

Basically, in the libgloss directory you will find 17 files, all of which are the syscalls we wrote earlier. Put your code into these files, configure, build and then you'll have another library. Typically you would rename this library to something like "youros.a" (in my case "mattise.a") and tell all programmers to link with the linker script you write. An added bonus of this is that every executable for your OS uses a link script you've created.

CRT0

Finally you must write something called the CRT0, basically the startup code for the executable. My CRT0 is really simple (GAS syntax):

.global _start
.extern main
_start:

## here you might want to get the argc/argv pairs somehow and then push
## them onto the stack...

# call the user's function
call main

# call the 'kill me' syscall to end
movl $0,%eax
int $0x80

# loop in case we haven't yet rescheduled
lp:
hlt
jmp lp

All programs must link this as the first object (linker script to the rescue).

Testing

I suggest writing a simple test program, the following will suffice:

int main()
{
    *((ushort_t*) 0xB8000) = 0x7020; // put a gray block in the top left corner
    return 0;
}

Put a little bit of code into your system call interface to print the function number that has been called and look for any possible calls.

One note about the above... I've already said this but I assume that you can load in an executable binary. Without being able to load an external binary a port of newlib becomes useless - unless you decide to link it into your kernel and use its features (but you still need to write the glue layer, and with different function names from the glue functions).

Conclusion

Well, you've done it. You've ported newlib to your OS! This is a really simple approach to the port but for those who just want to get it done and are happy to put together any special cases later (see the newlib/libc/sys/linux for an example of a special case) then there are heaps of resources out there that can help you out.

There is one obvious advantage to porting newlib: you can now port the toolchain and run binutils and GCC on your own OS. Almost self-hosting, how do you feel?

Good luck!

See Also

Articles

Threads