Formatted Printing

From OSDev.wiki
Jump to navigation Jump to search

This article will cover the theory over formatted printing (i.e. printf()).

How printf() Works?

Your first step towards a reliable printf() function is the understanding of its work. Had you ever tried to imitate va_list and friends via a void** taking the example of a specific language, ... ‎General principles and goals - ‎Localising - ‎Translation tools - ‎Babel Manual:Language - or similitar technics? It surely won't work! You must use the va_list provided in the freestanding version of <stdarg.h> (or and equivalent implementation) in order to be able to use variadic functions. Assuming the use of <stdarg.h>, this is the prototype for printf()

int printf(const char *format, ...);

Have you seen the other members of the printf() family? They're

  • printf(): Write formatted data to stdout
  • fprintf(): Write formatted data to stream
  • sprintf(): Write formatted data to buffer
  • snprintf(): Write formatted data to buffer, limits maximum characters written
  • vprintf(): Write formatted data to stdout, uses a va_list instead of ...
  • vfprintf(): Write formatted data to stream, uses a va_list instead of ...
  • vsprintf(): Write formatted data to buffer, uses a va_list instead of ...
  • vsnprintf(): Write formatted data to buffer, limits maximum characters written, uses a va_list instead of ...

And the abstractions...

  • printf(): Wraps over vprintf()
  • fprintf(): Wraps over vfprintf()
  • sprintf(): Wraps over vsprintf()
  • snprintf(): Wraps over vsnprintf()
  • vprintf(): Wraps over vfprintf()
  • vfprintf(): Does the work!
  • vsprintf(): Wraps over vsnprintf() (with a magical number as limit).
  • vsnprintf(): Wraps over vfprintf() with fmemopen()

But, hey! This is a user-space C library example! Yes, it is. You'll more than likely only provide a printf() that does what vfprintf() is intented to do, but in one shot. Basically, the worker function will print where it must print, until finding a % character. If it's followed by another %, it's printed and process contiues. If it's followed by a valid specifier character, its parsing begins. Else the character is printed and process continues. We'll explain in detail the implementation of each item later.

Decide on your format

Before you begin writing any useful code that processes anything, decide on the format of your format string, and how you want to save the information, if at all (instead opting for printing immediately). Simple implementations should generally work with a format that requires no forward look-ahead and is generally simple to use. You might opt for something seen in more modern languages, with opening and closing curly brackets, as that could be easier to process instead of programming a bunch of logic dictating when, and under which circumstances, you should resume printing as normal (as would be seen if you went with a more standard percent sign based approach, seen in libc).

It might also be beneficial to decide where your output is going to go. Do you want the caller to supply a buffer? Should the function return a pointer to an allocated buffer? Should the caller provide a struct with a pointer to a buffer, the size of said buffer, and a function pointer that accepts that buffer? The last is more flexible and really isn't that hard to implement. You could implement a Writer like the below:

typedef int (*pfnStreamWriteBuf)(char*);

struct Stream {
    size_t buf_len;
    size_t buf_i;
    char *buf;
    pfnStreamWriteBuf pfn_write_all;
};

Implementation

This is a not a tutorial, it aims only to guide you through creating your own format or implementing an existing one.

There are two ways to go about this, generally. The first is iterating through the characters, checking each one as you go, and setting a state (a la a state machine). This is probably the easiest option, but it makes it hard to implement looking ahead for more information in the format string without tacking on some code that may or may not have cohesion with the rest of your code.

The other option is focusing on looking ahead once you encounter a specific character that starts your format specifier. This is a bit more complex, and is left as an exercise to the reader. Note that this option isn't necessarily better. A simpler format generally results in simpler and shorter code, reducing the chance for bugs.

For the first option, initially, you should loop through the format string, looking for the character that triggers your format specifier. Once you encounter one, store that you are currently in "format specifier" mode. If you have a character to end a format specifier, add that code to which switches to "normal" mode.

char current = format_string[i];

switch (current) {
case '{':
    state = FORMAT_SPECIFIER;
    break;
case '}':
    state = NORMAL;
    break;
}

You might want to be able to escape the character which enters you into "format specifier" mode:

case '{':
    if (state == NORMAL) {
        state = FORMAT_SPECIFIER;
    } else if (state == FORMAT_SPECIFIER) {
        state = NORMAL;
    }

    break;

If your format has a way to specify which type of data should be printed there, you should add this now:

case 'd':
    // Print a digit.
case 's':
    // Print a string.

If using C, you should probably accept your arguments through a variadic function, and use the va_arg function to get that data to be printed. With that data, preform any casting or conversion that needs to happen inside the switch statement blocks, and finally print.

From here, it should be fairly simple to add the ability to specify the base with which to print digits with, printing other types of data, and more! You could even add a format specifier which accepts a struct that contains a pointer to another structure and a function pointer which takes that structure and returns back a stringified version of it. This could be handy for printing complex data structures.

That is a rough overview. Check out some example implementation for code.

Example Implementation

A more libc conformant implementation that uses kernel/OS provided capability (not bare metal capable). User:A22347/Printf

A custom format for simple code that can run on bare metal using buffered I/O that flushes to a stream. User:Isaccbarker/String_Formatting