C preprocessor: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
mNo edit summary
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(20 intermediate revisions by 11 users not shown)
Line 1: Line 1:
The C preprocessor is the first step in the process of translating C/C++ source code into a binary. Generally, the process walked through is preprocessing, compiling and finally linking. In trivial environments, the preprocessor is used only for '''#include'''ing header files, and providing "header guards" to avoid multiple inclusions. However, the preprocessor can do much more, and can be very useful - not only for C/C++ sources, but for your Assembly sources as well. Use it with care, since it can also obfuscate your source code and introduce bugs that may be very difficult to debug.
== C preprocessor ==
The C preprocessor is a powerful tool and properly used may be very useful.
The following have been checked to work in GCC.


=== Rules ===
== General ==
The preprocessor handles ''preprocessor directives'', which are lines that begin with '' '#' ''. Really old compiler versions demanded that the '' '#' '' be placed in column 1, modern versions of C and C++ allow preprocessor directives to begin in any column, as long as the first non-whitespace character of the line is '' '#' ''.
=== Uses ===
==== Uses for debugging ====
Assertions are used to catch situations which should never happen, even under error circumstances.
#define assert(x) do { if (!(x)) { complain("assertion fail"); panic(); } } while(0)
assert( sizeof(struct free_memory_block) == 8 );
assert( 1 != 2 );
assert( gdt_ptr != null );
Assertions may be turned off in production code
#define assert(x) do {} while(0)
Some rare unrecoverable errors should be tested for also in production code and these test should not be disabled so that we recognise the problem instead of having random crashes
#define testif(x) do { if (!(x)) { complain("testif fail"); panic(); } } while(0)
testif( isChecksumCorrect( kernel_heap.first_free_list ) );
testif( timersAreOn );


Lines with preprocessor directives can be "continued" by placing a backslash ('\') as the last character of the line.
Capturing debugging information like values of variables at different moments of execution
void alert(const char *msg);
void alert(uint32 u);
void alert_dec(uint32 u);


== Includes ==
#define complain(msg) do {\
The most familiar use of the preprocessor is to include header files (containing function declarations, definition of constants etc.):
alert_decimal(__FILE__); \
alert(": "); \
alert_decimal(__LINE__); \
alert(": "); \
alert(msg); \
alert("\n"); \
} while(0)


<syntaxhighlight lang="c">
void * malloc(size_t s) {
#include <stdio.h>
complain((uint32)kernel_heap.first_free->addr);
#include "myheader.h"
do_something();
</syntaxhighlight>
complain((uint32)kernel_heap.first_free->addr);
if (do_something2()) y = malloc(sizeof(struct book_keeping_struct));
complain((uint32)kernel_heap.first_free->addr);
do_something3();
}
with an output of
src/memory/malloc.c: 271: 0xd0010000 //entering malloc
src/memory/malloc.c: 273: 0xd0010000 //having done_something
src/memory/malloc.c: 271: 0xd0010000 //do_something2 makes malloc to recursive call
src/memory/malloc.c: 273: 0xd0010000 //done_something
src/memory/malloc.c: 275: 0x0e1bc30a //aha! an error after do_something2() (which returns 0) in nested malloc call
...


The effect is that the contents of the given header file are pasted into the source file. The ''technical'' difference between <> and "" is that the compiler is allowed to satisfy <> includes internally, i.e. without actually accessing any on-disk files of that name. None of the prominent compilers do this, to the knowledge of the author, but it has become common practice to use <> for system headers and "" for your own header files.
Finding death point or program flow
SYSFAIL: page fault, %eip= 0xc0001d330, %cr2=0x00000000


Header files are searched for in a list of preconfigured directories (the ''include path''). This list of include directories can be prepended to by the user (e.g. by using the "-I <directory>" option of [[GCC]]).
#define lnDbg do { alert("<<"); alert(__func__); alert(" : "); alert_decimal(__LINE__); alert(">>\n"); } while(0)


The ''#include'' statement can be used in other contexts, too: As a replacement for assembler-specific include directives, for example.
13 void a(int i) {
14 lnDbg;
15 if (fun(i)) c(i) else a(i-1);
16 lnDbg;
17 a(i);
18}
19 void c(int i) {
20 lnDbg;
21 if (!is(i)) return ;
22 lnDbg;
23 c(i+1);
24 lnDbg;
25}


Another possible use is "templating" a piece of code that keeps recurring in more than one source file but could not be put into a seperate function. This way, you could still reduce redundancy by keeping the shared code in a single file and merely ''#include''ing it where needed. This, however, is a pretty ugly construct and should be avoided if possible.
<<a: 14>>
<<c: 20>> //line 16 wasnt run, fun(i) has returned true
<<a: 16>> //line 22 wasnt run, !is(i) has returned true
<<a: 14>>
SYSFAIL: .... //after line 14 neither line 20 or 16 has been reached so the call fun(i) caused the page fault.


== Preprocessor Macros, pt. 1 ==
Such macros may be stored in an shared.h file included by other compilation units.
The preprocessor can ''define'' tokens. It is good custom to write these tokens in ALL CAPS. (See pt. 2 as for why.)


<syntaxhighlight lang="c">
#define MYTOKEN
</syntaxhighlight>


Most compilers also allow the definition of preprocessor tokens on the command line, e.g. the "-D MYTOKEN" option for [[GCC]].
==== Deleted Code ====
A code block may be commented out to delete it from the program, however nesting deleted fragments may reduce legibility with C++ style comments, and C comments do not nest at all.
A better solution is to wrap the code in and #if 0-#endif block, where the conditional 0 means false :
#if 0
print("memory state: ");
print(mem->state);
print("\nallocated blocks: ");
print(mem->allocs);
#endif
Many editors, like [[VIm]] have by default syntax highlighting rules that treat such #if 0-#endif blocks as comments.
The #if-#endif directives must be balanced, single-quotes characters must balance etc. so for deleting non-code text use comments instead.


== Conditional Compilation ==
The preprocessor can ''conditionally'' select which parts of source code to compile, depending on whether a given token is defined or not (see above).

<syntaxhighlight lang="c">
#define MYTOKEN

#ifdef MYTOKEN
/* This source will be compiled */
#endif

#ifndef MYTOKEN
/* This source will be removed */
#else
/* This source will be compiled */
#endif
</syntaxhighlight>

Note that such ''#if'' / ''#ifdef'' / ''#ifndef'' - ''#endif'' sections can be nested.

== Header Guards ==
Non-trivial projects face the problem that a header file includes other header files in turn. Let's say both ''abc.h'' and ''def.h'' both include ''xyz.h''. Should you ''#include'' both ''abc.h'' and ''def.h'' in your source, you will likely end up with warnings and errors about redefinitions etc.

The solution are ''header guards'', a combination of conditional compilation and token definition:

<syntaxhighlight lang="c">
/* abc.h */

#ifndef ABC_H_
#define ABC_H_

/* declaractions here */

#endif
</syntaxhighlight>

== Preprocessor Macros, pt. 2 ==
Preprocessor tokens can also be assigned a ''value''.

The preprocessor will replace any occurrence of a defined token in the source code with the value the token has been defined to. ''This is also true for tokens that have been defined to nothing'' (as in pt. 1 above). This is the reason why preprocessor tokens are customarily written in ALL CAPS - to avoid accidential clashes with identifiers used in the source code itself.

The ''#if'' statement can be used to base conditional compilation on token values. Note that the preprocessor can only work with compile-time constants. Compiler-evaluated code like `sizeof()` cannot be used in preprocessor directives. On the upside, the preprocessor can natively handle non-numerical values.

<syntaxhighlight lang="c">
#define MYTOKEN foo
#define OTHERTOKEN 42

#if MYTOKEN == foo
/* This code will be compiled */
#elif MYTOKEN == bar
/* This code won't */
#endif

#if OTHERTOKEN > 40
/* Will be compiled. */
#endif

#if OTHERTOKEN != 42
/* Won't be compiled. */
#endif
</syntaxhighlight>

The ''#if'' directive also allows for a simple construct to disable a region of code without having to worry about nested ''/* ... */'' style comments:

<syntaxhighlight lang="c">
#if 0
/* disabled code */
#endif
</syntaxhighlight>

Such code can easily be re-enabled temporarily with no more effort than replacing the "0" with a "1". Source comments as to why you disabled code this way are in order.

== #undef ==
Using the ''#undef'' directive, a preprocessor token can be undefined. This is useful for trickier setups where you might want to redefine a token to a different value: Redefinitions generate a warning message, undefinitions of undefined tokens don't.

This should not be constructed as an advice to ''always'' use ''#undef'' before a ''#define''. Those warnings might actually be pointing to a real problem in your logic. Use ''#undef'' with care.

== Predefined Tokens ==
The preprocessor provides a couple of tokens which are automatically defined to the appropriate values - something very useful when constructing error messages or tracing messages. Note that some obsolete compilers might balk at ''__func__'' and not all tokens may be supported or implemented by all compilers.

{| {{wikitable}}
! Preprocessor Token
! Explanation
|-
| __FILE__ || Holds the name of the current source file being compiled (as a string).
|-
| __LINE__ || Holds the current line being compiled (as an integer).
|-
| __DATE__ || Holds the date when the compilation process began (a string with the format "Mmm dd yyyy").
|-
| __TIME__ || Same as the previous, but the time (a string with the format "hh:mm:ss").
|-
| __cplusplus || When defined, the value indicates that C++ compilation is active. When the compiler is (fully) compliant to the standards, the value should be >= 199711L.
|-
| __STDC__ || When defined, the value indicates that the compiler is (fully) compliant with the ANSI C standard.
|-
| __func__ || Holds the name of the function it is used within (as a string).
|}

Different compilers may define extra preprocessor tokens. Visual C++ for example may define _MSC_VER __cplusplus_cli. See the link section below for more information.

== assert() ==
Assertions are used to catch situations which should never happen, even under error circumstances. If the condition given in the parantheses does not evaluate to "true", a diagnosis is printed which contains source file name, line number, and (since C99) name of the current function; the program then calls abort().

<syntaxhighlight lang="c">
#include <assert.h>

assert( sizeof(struct free_memory_block) == 8 );
assert( 1 != 2 );
assert( gdt_ptr != null );
</syntaxhighlight>

For production code, assertions may be turned off by defining NDEBUG:

<syntaxhighlight lang="bash">
gcc -DNDEBUG ...
</syntaxhighlight>

Note that <assert.h> does not have (or need) a header guard, i.e. can be included multiple times in a source file, and that whether NDEBUG is defined or not is evaluated anew ''at every inclusion of <assert.h>''. You can thus enable / disable assertions at a very fine-grained level if necessary:

<syntaxhighlight lang="c">
#include <assert.h>

/* assert() at this point only fails-on-false if NDEBUG is not defined */
assert( isChecksumCorrect() );

#ifdef NDEBUG
/* Hard-enabling of assert() even if NDEBUG is defined */
#define NDEBUG_WAS_SET
#undef NDEBUG
#include <assert.h>
#endif

/* assert() in this block of code should fail-on-false even in production */
assert( isChecksumCorrect() );

#ifdef NDEBUG_WAS_SET
/* Restoring NDEBUG if it was enabled originally */
#define NDEBUG
#include <assert.h>
#endif
</syntaxhighlight>


=== Hazards of the C preprocessor ===
There is a number of counter-intuitive consequences of macros and macro expanding design.
[http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls]
== See also ==
== See also ==
=== Articles ===
* [[C]]
* [[C]]
* [[Why function implementations shouldn't be put In header files]]

=== External Links ===
* [http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls] - A number of counter-intuitive consequences of macros and macro expanding design.
* [http://gcc.gnu.org/onlinedocs/cpp/ The GNU C preprocessor manual]
* [http://msdn.microsoft.com/en-us/library/b0084kay(VS.80).aspx VC++ preprocessor information]


[[Category:C]]
== External Links ==
[[Category:Tutorials]]
The GNU C preprocessor manual:
* [http://gcc.gnu.org/onlinedocs/cpp/ Index]
* [http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls]

Latest revision as of 05:13, 9 June 2024

The C preprocessor is the first step in the process of translating C/C++ source code into a binary. Generally, the process walked through is preprocessing, compiling and finally linking. In trivial environments, the preprocessor is used only for #includeing header files, and providing "header guards" to avoid multiple inclusions. However, the preprocessor can do much more, and can be very useful - not only for C/C++ sources, but for your Assembly sources as well. Use it with care, since it can also obfuscate your source code and introduce bugs that may be very difficult to debug.

General

The preprocessor handles preprocessor directives, which are lines that begin with '#' . Really old compiler versions demanded that the '#' be placed in column 1, modern versions of C and C++ allow preprocessor directives to begin in any column, as long as the first non-whitespace character of the line is '#' .

Lines with preprocessor directives can be "continued" by placing a backslash ('\') as the last character of the line.

Includes

The most familiar use of the preprocessor is to include header files (containing function declarations, definition of constants etc.):

#include <stdio.h>
#include "myheader.h"

The effect is that the contents of the given header file are pasted into the source file. The technical difference between <> and "" is that the compiler is allowed to satisfy <> includes internally, i.e. without actually accessing any on-disk files of that name. None of the prominent compilers do this, to the knowledge of the author, but it has become common practice to use <> for system headers and "" for your own header files.

Header files are searched for in a list of preconfigured directories (the include path). This list of include directories can be prepended to by the user (e.g. by using the "-I <directory>" option of GCC).

The #include statement can be used in other contexts, too: As a replacement for assembler-specific include directives, for example.

Another possible use is "templating" a piece of code that keeps recurring in more than one source file but could not be put into a seperate function. This way, you could still reduce redundancy by keeping the shared code in a single file and merely #includeing it where needed. This, however, is a pretty ugly construct and should be avoided if possible.

Preprocessor Macros, pt. 1

The preprocessor can define tokens. It is good custom to write these tokens in ALL CAPS. (See pt. 2 as for why.)

#define MYTOKEN

Most compilers also allow the definition of preprocessor tokens on the command line, e.g. the "-D MYTOKEN" option for GCC.

Conditional Compilation

The preprocessor can conditionally select which parts of source code to compile, depending on whether a given token is defined or not (see above).

#define MYTOKEN

#ifdef MYTOKEN
/* This source will be compiled */
#endif

#ifndef MYTOKEN
/* This source will be removed */
#else
/* This source will be compiled */
#endif

Note that such #if / #ifdef / #ifndef - #endif sections can be nested.

Header Guards

Non-trivial projects face the problem that a header file includes other header files in turn. Let's say both abc.h and def.h both include xyz.h. Should you #include both abc.h and def.h in your source, you will likely end up with warnings and errors about redefinitions etc.

The solution are header guards, a combination of conditional compilation and token definition:

/* abc.h */

#ifndef ABC_H_
#define ABC_H_

/* declaractions here */

#endif

Preprocessor Macros, pt. 2

Preprocessor tokens can also be assigned a value.

The preprocessor will replace any occurrence of a defined token in the source code with the value the token has been defined to. This is also true for tokens that have been defined to nothing (as in pt. 1 above). This is the reason why preprocessor tokens are customarily written in ALL CAPS - to avoid accidential clashes with identifiers used in the source code itself.

The #if statement can be used to base conditional compilation on token values. Note that the preprocessor can only work with compile-time constants. Compiler-evaluated code like `sizeof()` cannot be used in preprocessor directives. On the upside, the preprocessor can natively handle non-numerical values.

#define MYTOKEN foo
#define OTHERTOKEN 42

#if MYTOKEN == foo
/* This code will be compiled */
#elif MYTOKEN == bar
/* This code won't */
#endif

#if OTHERTOKEN > 40
/* Will be compiled. */
#endif

#if OTHERTOKEN != 42
/* Won't be compiled. */
#endif

The #if directive also allows for a simple construct to disable a region of code without having to worry about nested /* ... */ style comments:

#if 0
/* disabled code */
#endif

Such code can easily be re-enabled temporarily with no more effort than replacing the "0" with a "1". Source comments as to why you disabled code this way are in order.

#undef

Using the #undef directive, a preprocessor token can be undefined. This is useful for trickier setups where you might want to redefine a token to a different value: Redefinitions generate a warning message, undefinitions of undefined tokens don't.

This should not be constructed as an advice to always use #undef before a #define. Those warnings might actually be pointing to a real problem in your logic. Use #undef with care.

Predefined Tokens

The preprocessor provides a couple of tokens which are automatically defined to the appropriate values - something very useful when constructing error messages or tracing messages. Note that some obsolete compilers might balk at __func__ and not all tokens may be supported or implemented by all compilers.

Preprocessor Token Explanation
__FILE__ Holds the name of the current source file being compiled (as a string).
__LINE__ Holds the current line being compiled (as an integer).
__DATE__ Holds the date when the compilation process began (a string with the format "Mmm dd yyyy").
__TIME__ Same as the previous, but the time (a string with the format "hh:mm:ss").
__cplusplus When defined, the value indicates that C++ compilation is active. When the compiler is (fully) compliant to the standards, the value should be >= 199711L.
__STDC__ When defined, the value indicates that the compiler is (fully) compliant with the ANSI C standard.
__func__ Holds the name of the function it is used within (as a string).

Different compilers may define extra preprocessor tokens. Visual C++ for example may define _MSC_VER __cplusplus_cli. See the link section below for more information.

assert()

Assertions are used to catch situations which should never happen, even under error circumstances. If the condition given in the parantheses does not evaluate to "true", a diagnosis is printed which contains source file name, line number, and (since C99) name of the current function; the program then calls abort().

#include <assert.h>

assert( sizeof(struct free_memory_block) == 8 );
assert( 1 != 2 );
assert( gdt_ptr != null );

For production code, assertions may be turned off by defining NDEBUG:

gcc -DNDEBUG ...

Note that <assert.h> does not have (or need) a header guard, i.e. can be included multiple times in a source file, and that whether NDEBUG is defined or not is evaluated anew at every inclusion of <assert.h>. You can thus enable / disable assertions at a very fine-grained level if necessary:

#include <assert.h>

    /* assert() at this point only fails-on-false if NDEBUG is not defined */
    assert( isChecksumCorrect() );

#ifdef NDEBUG
/* Hard-enabling of assert() even if NDEBUG is defined */
#define NDEBUG_WAS_SET
#undef NDEBUG
#include <assert.h>
#endif

    /* assert() in this block of code should fail-on-false even in production */
    assert( isChecksumCorrect() );

#ifdef NDEBUG_WAS_SET
/* Restoring NDEBUG if it was enabled originally */
#define NDEBUG
#include <assert.h>
#endif

See also

Articles

External Links