Null Character

From OSDev.wiki
Jump to navigation Jump to search

The null character, also known as the null terminator, is a character with the value of zero. Besides representing a NOP, nowadays it is known as the control character that indicates the end of an string in C-like data formats. In essence, the null terminator is a way to encode the end of a string using its own contents. As Wikipedia says, "This allows the string to be any length with only the overhead of one byte; the alternative of storing a count requires either a string length limit of 255 or an overhead of more than one byte".

How the string "osdev" would get encoded in a C-like data format

In ASCII, Unicode, and basically all character sets and programing languages, it is encoded as a string of n low bits (or in other words, a representation of the value zero with n bits). From this point forward, when we mention the null character/terminator, we will refer to a 8-bit ASCII NULL character, represented as 0x00 in hexadecimal or, in C, as the escape sequence '\0' (not to confuse with "\0", which in C is a null-terminated string containing one null character).

It is important to mention that enconding a NULL value to indicate an end is not limited to strings: arrays of n strings can also have a NULL pointer at it's n-th element to indicate it's end, as argv[] does (the standard way of passing arguments to main() in C).

How to use and detect a null terminator

In C, the null terminator is automatically attached at the end of a string when you create it with double quotation marks ("Hello, OSDev!" is an array of 14 characters, as after the '!', there is a '\0' or null byte). The standard library uses this to detect the length of strings and operate consequentially with them, but behind the header files, the code that detects the null terminator is quite easy but important to understand.

size_t strlen(const char *str)
{
    size_t n = 0;
    
    while (str[n] != '\0') {
        n++;
    }
    
    return n;
}

This C code is a quick-and-easy implementation of the strlen() standard function. All it does is increment "n" while the n-th character of the string "str" is not a null terminator, and then return it. If no null byte is found in the string, the function may overflow. Problably it will find one null byte somewhere, but if not, it will read from an invalid address, generating an exception. Keep in mind that detecting a null terminator is only usefull in the context of strings, as memory in general can have arbitrary zeros that don't mean the end of its contents.

In assembly

We talked about the null terminator in C, but not in assembly. In most of the dialects of assembly, strings variables (or more accurrately, labels) can be defined to be null terminated when created. Some assemblers accept the tag ".asciiz" followed by an string between double quotation marks to create (unsually in the data section) a string, that then you can treat as a normal C-like string. Example under 64 bit Intel assembly, with NASM:

bits 64

global main
extern printf

section .data

str:    db  "Hello, OSDev!"         ; NON-NULL-TERMINATED STRING
strnll: db  ", and goodbye!", 0     ; NULL-TERMINATED STRING

section .text

main:
    lea     rdi, [str]
    xor     rax, rax
    call    printf

As printf will start reading the characters to print from address "str" until a NULL, the output of the program will be:

"Hello, OSDev!, and goodbye!" 

OS desing considerations

String-related functions are crucial, you should implement them in your standard library. So keep in mind that null terminator detection is indispensable. It is usually not a good idea to specify the length of a string via a number variable, but it is not a bad practice to specify a maximum length in some contexts (maybe because you can't dynamically allocate memory for a string, so you just declare something like "char str[MAX_STR]").

See also

External Links


--Virtualexception 14:58, 11 March 2024 (CDT)