C preprocessor: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(13 intermediate revisions by 10 users not shown)
Line 1:
The C preprocessor is the first step in the process of translating C/C++ source code into a binary. Generally, the process walked through is preprocessing, compiling and finally linking. In trivial environments, the preprocessor is used only for '''#include'''ing header files, and providing "header guards" to avoid multiple inclusions. However, the preprocessor can do much more, and can be very useful - not only for C/C++ sources, but for your Assembly sources as well. Use it with care, since it can also obfuscate your source code and introduce bugs that may be very difficult to debug.
{{stub}}
== C preprocessor ==
The C preprocessor is a powerful tool and properly used may be very useful.
The following have been checked to work in GCC.
 
=== RulesGeneral ===
The preprocessor handles ''preprocessor directives'', which are lines that begin with '' '#' ''. Really old compiler versions demanded that the '' '#' '' be placed in column 1, modern versions of C and C++ allow preprocessor directives to begin in any column, as long as the first non-whitespace character of the line is '' '#' ''.
{{stub}}
=== Uses ===
 
Lines with preprocessor directives can be "continued" by placing a backslash ('\') as the last character of the line.
#define MAX(a,b) ((a) > (b) ? (a) : (b))
This definition has the impact of computing one of a,b twice so the sideeffects would happen twice.
 
== Includes ==
#define MAX(a,b) ({typeof(a) _a = (a); \
The most familiar use of the preprocessor is to include header files (containing function declarations, definition of constants etc.):
typeof(b) _b = (b); \
_a > _b ? _a : _b;})
Using GCC statement expressions and typeof we may avoid this problem.
 
<syntaxhighlight lang="c">
#include <stdio.h>
#include "myheader.h"
</syntaxhighlight>
 
The effect is that the contents of the given header file are pasted into the source file. The ''technical'' difference between <> and "" is that the compiler is allowed to satisfy <> includes internally, i.e. without actually accessing any on-disk files of that name. None of the prominent compilers do this, to the knowledge of the author, but it has become common practice to use <> for system headers and "" for your own header files.
 
Header files are searched for in a list of preconfigured directories (the ''include path''). This list of include directories can be prepended to by the user (e.g. by using the "-I <directory>" option of [[GCC]]).
 
The ''#include'' statement can be used in other contexts, too: As a replacement for assembler-specific include directives, for example.
 
Another possible use is "templating" a piece of code that keeps recurring in more than one source file but could not be put into a seperate function. This way, you could still reduce redundancy by keeping the shared code in a single file and merely ''#include''ing it where needed. This, however, is a pretty ugly construct and should be avoided if possible.
 
== Preprocessor Macros, pt. 1 ==
The preprocessor can ''define'' tokens. It is good custom to write these tokens in ALL CAPS. (See pt. 2 as for why.)
 
<syntaxhighlight lang="c">
#define MYTOKEN
</syntaxhighlight>
 
Most compilers also allow the definition of preprocessor tokens on the command line, e.g. the "-D MYTOKEN" option for [[GCC]].
 
== Conditional Compilation ==
The preprocessor can ''conditionally'' select which parts of source code to compile, depending on whether a given token is defined or not (see above).
 
<syntaxhighlight lang="c">
#define MYTOKEN
 
#ifdef MYTOKEN
/* This source will be compiled */
#endif
 
#ifndef MYTOKEN
/* This source will be removed */
#else
/* This source will be compiled */
#endif
</syntaxhighlight>
 
Note that such ''#if'' / ''#ifdef'' / ''#ifndef'' - ''#endif'' sections can be nested.
 
== Header Guards ==
Non-trivial projects face the problem that a header file includes other header files in turn. Let's say both ''abc.h'' and ''def.h'' both include ''xyz.h''. Should you ''#include'' both ''abc.h'' and ''def.h'' in your source, you will likely end up with warnings and errors about redefinitions etc.
 
The solution are ''header guards'', a combination of conditional compilation and token definition:
 
<syntaxhighlight lang="c">
/* abc.h */
 
#ifndef ABC_H_
#define ABC_H_
 
/* declaractions here */
 
#endif
</syntaxhighlight>
 
== Preprocessor Macros, pt. 2 ==
Preprocessor tokens can also be assigned a ''value''.
 
The preprocessor will replace any occurrence of a defined token in the source code with the value the token has been defined to. ''This is also true for tokens that have been defined to nothing'' (as in pt. 1 above). This is the reason why preprocessor tokens are customarily written in ALL CAPS - to avoid accidential clashes with identifiers used in the source code itself.
 
The ''#if'' statement can be used to base conditional compilation on token values. Note that the preprocessor can only work with compile-time constants. Compiler-evaluated code like `sizeof()` cannot be used in preprocessor directives. On the upside, the preprocessor can natively handle non-numerical values.
 
<syntaxhighlight lang="c">
#define MYTOKEN foo
#define OTHERTOKEN 42
 
#if MYTOKEN == foo
/* This code will be compiled */
#elif MYTOKEN == bar
/* This code won't */
#endif
 
#if OTHERTOKEN > 40
/* Will be compiled. */
#endif
 
#if OTHERTOKEN != 42
/* Won't be compiled. */
#endif
</syntaxhighlight>
 
The ''#if'' directive also allows for a simple construct to disable a region of code without having to worry about nested ''/* ... */'' style comments:
 
<syntaxhighlight lang="c">
#if 0
/* disabled code */
#endif
</syntaxhighlight>
 
Such code can easily be re-enabled temporarily with no more effort than replacing the "0" with a "1". Source comments as to why you disabled code this way are in order.
 
== #undef ==
Using the ''#undef'' directive, a preprocessor token can be undefined. This is useful for trickier setups where you might want to redefine a token to a different value: Redefinitions generate a warning message, undefinitions of undefined tokens don't.
 
This should not be constructed as an advice to ''always'' use ''#undef'' before a ''#define''. Those warnings might actually be pointing to a real problem in your logic. Use ''#undef'' with care.
 
== Predefined Tokens ==
The preprocessor provides a couple of tokens which are automatically defined to the appropriate values - something very useful when constructing error messages or tracing messages. Note that some obsolete compilers might balk at ''__func__'' and not all tokens may be supported or implemented by all compilers.
 
{| {{wikitable}}
! Preprocessor Token
! Explanation
|-
| __FILE__ || Holds the name of the current source file being compiled (as a string).
|-
| __LINE__ || Holds the current line being compiled (as an integer).
|-
| __DATE__ || Holds the date when the compilation process began (a string with the format "Mmm dd yyyy").
|-
| __TIME__ || Same as the previous, but the time (a string with the format "hh:mm:ss").
|-
| __cplusplus || When defined, the value indicates that C++ compilation is active. When the compiler is (fully) compliant to the standards, the value should be >= 199711L.
|-
| __STDC__ || When defined, the value indicates that the compiler is (fully) compliant with the ANSI C standard.
|-
| __func__ || Holds the name of the function it is used within (as a string).
|}
 
Different compilers may define extra preprocessor tokens. Visual C++ for example may define _MSC_VER __cplusplus_cli. See the link section below for more information.
 
== assert() ==
Assertions are used to catch situations which should never happen, even under error circumstances. If the condition given in the parantheses does not evaluate to "true", a diagnosis is printed which contains source file name, line number, and (since C99) name of the current function; the program then calls abort().
 
<syntaxhighlight lang="c">
#include <assert.h>
 
{{stub}}
==== Uses for debugging ====
Assertions are used to catch situations which should never happen, even under error circumstances.
<source lang="c">
#define assert(condition) do { if (!(condition)) { complain("assertion fail: " #condition); panic(); } } while(0)
assert( sizeof(struct free_memory_block) == 8 );
assert( 1 != 2 );
assert( gdt_ptr != null );
</syntaxhighlight>
</source>
Assertions may be turned off in production code
<source lang="c">
#define assert(x) do {} while(0)
</source>
Some rare unrecoverable errors should be tested for also in production code and these test should not be disabled so that we recognise the problem instead of having random crashes
<source lang="c">
#define testif(condition) do { if (!(condition)) { complain("testif fail: " #condition); panic(); } } while(0)
testif( isChecksumCorrect( kernel_heap.first_free_list ) );
testif( timersAreOn );
</source>
Capturing debugging information like values of variables at different moments of execution
<source lang="c">
void alert(const char *msg);
void alert(uint32 u);
void alert_dec(uint32 u);
 
For production code, assertions may be turned off by defining NDEBUG:
#define complain(msg) do {\
alert_decimal(__FILE__); \
alert(": "); \
alert_decimal(__LINE__); \
alert(": "); \
alert(msg); \
alert("\n"); \
} while(0)
 
<syntaxhighlight lang="bash">
void * malloc(size_t s) {
gcc -DNDEBUG ...
complain((uint32)kernel_heap.first_free->addr);
</syntaxhighlight>
do_something();
complain((uint32)kernel_heap.first_free->addr);
if (do_something2()) y = malloc(sizeof(struct book_keeping_struct));
complain((uint32)kernel_heap.first_free->addr);
do_something3();
}
</source>
with an output of
src/memory/malloc.c: 271: 0xd0010000 //entering malloc
src/memory/malloc.c: 273: 0xd0010000 //having done_something
src/memory/malloc.c: 271: 0xd0010000 //do_something2 makes malloc to recursive call
src/memory/malloc.c: 273: 0xd0010000 //done_something
src/memory/malloc.c: 275: 0x0e1bc30a //aha! an error after do_something2() (which returns 0) in nested malloc call
...
 
Note that <assert.h> does not have (or need) a header guard, i.e. can be included multiple times in a source file, and that whether NDEBUG is defined or not is evaluated anew ''at every inclusion of <assert.h>''. You can thus enable / disable assertions at a very fine-grained level if necessary:
Finding death point or program flow
SYSFAIL: page fault, %eip= 0xc0001d330, %cr2=0x00000000
<source lang="c">
#define lnDbg do { alert("<<"); alert(__func__); alert(" : "); alert_decimal(__LINE__); alert(">>\n"); } while(0)
</source>
&nbsp;
<source lang="c" line start="13">
void a(int i) {
lnDbg;
if (fun(i)) c(i) else a(i-1);
lnDbg;
a(i);
}
void c(int i) {
lnDbg;
if (!is(i)) return ;
lnDbg;
c(i+1);
lnDbg;
}
</source>
<<a: 14>>
<<c: 20>> //line 16 wasn't run, fun(i) has returned true
<<a: 16>> //line 22 wasn't run, !is(i) has returned true
<<a: 14>>
SYSFAIL: .... //after line 14 neither line 20 or 16 has been reached so the call fun(i) caused the page fault.
 
<syntaxhighlight lang="c">
Such macros may be stored in an shared.h file included by other compilation units.
#include <assert.h>
{{stub}}
 
/* assert() at this point only fails-on-false if NDEBUG is not defined */
==== Deleted Code ====
assert( isChecksumCorrect() );
A code block may be commented out to delete it from the program, however nesting deleted fragments may reduce legibility with C++ style comments, and C comments do not nest at all.
A better solution is to wrap the code in and #if 0-#endif block, where the conditional 0 means false :
#if 0
print("memory state: ");
print(mem->state);
print("\nallocated blocks: ");
print(mem->allocs);
#endif
Many editors, like [[VIm]] have by default syntax highlighting rules that treat such #if 0-#endif blocks as comments.
The #if-#endif directives must be balanced, single-quotes characters must balance etc. so for deleting non-code text use comments instead.
 
#ifdef NDEBUG
/* Hard-enabling of assert() even if NDEBUG is defined */
#define NDEBUG_WAS_SET
#undef NDEBUG
#include <assert.h>
#endif
 
/* assert() in this block of code should fail-on-false even in production */
assert( isChecksumCorrect() );
 
#ifdef NDEBUG_WAS_SET
/* Restoring NDEBUG if it was enabled originally */
#define NDEBUG
#include <assert.h>
#endif
</syntaxhighlight>
 
=== Hazards of the C preprocessor ===
There is a number of counter-intuitive consequences of macros and macro expanding design.
[http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls]
== See also ==
=== Articles ===
* [[C]]
* [[Why function implementations shouldn't be put In header files]]
 
=== External Links ===
* [http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls] - A number of counter-intuitive consequences of macros and macro expanding design.
* [http://gcc.gnu.org/onlinedocs/cpp/ The GNU C preprocessor manual]
* [http://msdn.microsoft.com/en-us/library/b0084kay(VS.80).aspx VC++ preprocessor information]
 
[[Category:C]]
== External Links ==
[[Category:Tutorials]]
The GNU C preprocessor manual:
* [http://gcc.gnu.org/onlinedocs/cpp/ Index]
* [http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls]