C preprocessor: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
(Added some basic information on the preprocessor)
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(8 intermediate revisions by 6 users not shown)
Line 1:
The C preprocessor is the first step in the process of translating C/C++ source code into a binary. Generally, the process walked through is preprocessing, compiling and finally linking. In trivial environments, the preprocessor is used only for '''#include'''ing header files, and providing "header guards" to avoid multiple inclusions. However, the preprocessor can do much more, and can be very useful - not only for C/C++ sources, but for your Assembly sources as well. Use it with care, since it can also obfuscate your source code and introduce bugs that may be very difficult to debug.
{{stub}}
== C preprocessor ==
The C preprocessor is a powerful tool and properly used may be very useful.
The following have been checked to work in GCC.
 
=== RulesGeneral ===
The preprocessor handles ''preprocessor directives'', which are lines that begin with '' '#' ''. Really old compiler versions demanded that the '' '#' '' be placed in column 1, modern versions of C and C++ allow preprocessor directives to begin in any column, as long as the first non-whitespace character of the line is '' '#' ''.
<source lang="c">
 
#ifdef SOME_MACRO
Lines with preprocessor directives can be "continued" by placing a backslash ('\') as the last character of the line.
#define SOME_OTHER_MACRO SOME_VALUE
 
#undef SOME_MACRO
== Includes ==
The most familiar use of the preprocessor is to include header files (containing function declarations, definition of constants etc.):
 
<syntaxhighlight lang="c">
#include <stdio.h>
#include "myheader.h"
</syntaxhighlight>
 
The effect is that the contents of the given header file are pasted into the source file. The ''technical'' difference between <> and "" is that the compiler is allowed to satisfy <> includes internally, i.e. without actually accessing any on-disk files of that name. None of the prominent compilers do this, to the knowledge of the author, but it has become common practice to use <> for system headers and "" for your own header files.
 
Header files are searched for in a list of preconfigured directories (the ''include path''). This list of include directories can be prepended to by the user (e.g. by using the "-I <directory>" option of [[GCC]]).
 
The ''#include'' statement can be used in other contexts, too: As a replacement for assembler-specific include directives, for example.
 
Another possible use is "templating" a piece of code that keeps recurring in more than one source file but could not be put into a seperate function. This way, you could still reduce redundancy by keeping the shared code in a single file and merely ''#include''ing it where needed. This, however, is a pretty ugly construct and should be avoided if possible.
 
== Preprocessor Macros, pt. 1 ==
The preprocessor can ''define'' tokens. It is good custom to write these tokens in ALL CAPS. (See pt. 2 as for why.)
 
<syntaxhighlight lang="c">
#define MYTOKEN
</syntaxhighlight>
 
Most compilers also allow the definition of preprocessor tokens on the command line, e.g. the "-D MYTOKEN" option for [[GCC]].
 
== Conditional Compilation ==
The preprocessor can ''conditionally'' select which parts of source code to compile, depending on whether a given token is defined or not (see above).
 
<syntaxhighlight lang="c">
#define MYTOKEN
 
#ifdef MYTOKEN
/* This source will be compiled */
#endif
 
#ifndef NDEBUGMYTOKEN
/* OlderThis compilerssource maywill notbe have __func__removed */
#else
fprintf(stderr, __FILE__ ":" __LINE__ " : in function " __func__ "\n");
/* This source will be compiled */
#endif
</syntaxhighlight>
</source>
The preprocessor can be used to define macros with the #define macro. A previously defined macro can be undefined with #undef.
And #ifdef block can be used to conditionally compile a block of code if a certain macro is defined. This can be useful if the code is
platform-dependant, or is used for debugging.
 
Note that such ''#if'' / ''#ifdef'' / ''#ifndef'' - ''#endif'' sections can be nested.
When the preprocessor reads a file, it replaces any occurrence of a defined macro with the value it was defined to. Since the preprocessor is
language agnostic, if an error is introduced through its use, the error will not be reported until the file is compiled. This also means
that the preprocessor can be used with languages other than C. The standard way to invoke the preprocessor with GCC is "gcc -E".
{{stub}}
=== Uses ===
<source lang="c">
#define MAX(a,b) ((a) > (b) ? (a) : (b))
</source>
This definition has the impact of computing one of a,b twice so the sideeffects would happen twice.
<source lang="c">
#define MAX(a,b) ({typeof(a) _a = (a); \
typeof(b) _b = (b); \
_a > _b ? _a : _b;})
</source>
Using GCC statement expressions and typeof we may avoid this problem.
 
== Header Guards ==
Non-trivial projects face the problem that a header file includes other header files in turn. Let's say both ''abc.h'' and ''def.h'' both include ''xyz.h''. Should you ''#include'' both ''abc.h'' and ''def.h'' in your source, you will likely end up with warnings and errors about redefinitions etc.
 
The solution are ''header guards'', a combination of conditional compilation and token definition:
 
<syntaxhighlight lang="c">
/* abc.h */
 
#ifndef ABC_H_
#define ABC_H_
 
/* declaractions here */
 
#endif
</syntaxhighlight>
 
== Preprocessor Macros, pt. 2 ==
Preprocessor tokens can also be assigned a ''value''.
 
The preprocessor will replace any occurrence of a defined token in the source code with the value the token has been defined to. ''This is also true for tokens that have been defined to nothing'' (as in pt. 1 above). This is the reason why preprocessor tokens are customarily written in ALL CAPS - to avoid accidential clashes with identifiers used in the source code itself.
 
The ''#if'' statement can be used to base conditional compilation on token values. Note that the preprocessor can only work with compile-time constants. Compiler-evaluated code like `sizeof()` cannot be used in preprocessor directives. On the upside, the preprocessor can natively handle non-numerical values.
 
<syntaxhighlight lang="c">
#define MYTOKEN foo
#define OTHERTOKEN 42
 
#if MYTOKEN == foo
/* This code will be compiled */
#elif MYTOKEN == bar
/* This code won't */
#endif
 
#if OTHERTOKEN > 40
/* Will be compiled. */
#endif
 
#if OTHERTOKEN != 42
/* Won't be compiled. */
#endif
</syntaxhighlight>
 
The ''#if'' directive also allows for a simple construct to disable a region of code without having to worry about nested ''/* ... */'' style comments:
 
<syntaxhighlight lang="c">
#if 0
/* disabled code */
#endif
</syntaxhighlight>
 
Such code can easily be re-enabled temporarily with no more effort than replacing the "0" with a "1". Source comments as to why you disabled code this way are in order.
 
== #undef ==
Using the ''#undef'' directive, a preprocessor token can be undefined. This is useful for trickier setups where you might want to redefine a token to a different value: Redefinitions generate a warning message, undefinitions of undefined tokens don't.
 
This should not be constructed as an advice to ''always'' use ''#undef'' before a ''#define''. Those warnings might actually be pointing to a real problem in your logic. Use ''#undef'' with care.
 
== Predefined Tokens ==
The preprocessor provides a couple of tokens which are automatically defined to the appropriate values - something very useful when constructing error messages or tracing messages. Note that some obsolete compilers might balk at ''__func__'' and not all tokens may be supported or implemented by all compilers.
 
{| {{wikitable}}
! Preprocessor Token
! Explanation
|-
| __FILE__ || Holds the name of the current source file being compiled (as a string).
|-
| __LINE__ || Holds the current line being compiled (as an integer).
|-
| __DATE__ || Holds the date when the compilation process began (a string with the format "Mmm dd yyyy").
|-
| __TIME__ || Same as the previous, but the time (a string with the format "hh:mm:ss").
|-
| __cplusplus || When defined, the value indicates that C++ compilation is active. When the compiler is (fully) compliant to the standards, the value should be >= 199711L.
|-
| __STDC__ || When defined, the value indicates that the compiler is (fully) compliant with the ANSI C standard.
|-
| __func__ || Holds the name of the function it is used within (as a string).
|}
 
Different compilers may define extra preprocessor tokens. Visual C++ for example may define _MSC_VER __cplusplus_cli. See the link section below for more information.
 
== assert() ==
Assertions are used to catch situations which should never happen, even under error circumstances. If the condition given in the parantheses does not evaluate to "true", a diagnosis is printed which contains source file name, line number, and (since C99) name of the current function; the program then calls abort().
 
<syntaxhighlight lang="c">
#include <assert.h>
 
{{stub}}
==== Uses for debugging ====
Assertions are used to catch situations which should never happen, even under error circumstances.
<source lang="c">
#define assert(condition) do { if (!(condition)) { complain("assertion fail: " #condition); panic(); } } while(0)
assert( sizeof(struct free_memory_block) == 8 );
assert( 1 != 2 );
assert( gdt_ptr != null );
</syntaxhighlight>
</source>
Assertions may be turned off in production code
<source lang="c">
#define assert(x) do {} while(0)
</source>
Some rare unrecoverable errors should be tested for also in production code and these test should not be disabled so that we recognise the problem instead of having random crashes
<source lang="c">
#define testif(condition) do { if (!(condition)) { complain("testif fail: " #condition); panic(); } } while(0)
testif( isChecksumCorrect( kernel_heap.first_free_list ) );
testif( timersAreOn );
</source>
Capturing debugging information like values of variables at different moments of execution
<source lang="c">
void alert(const char *msg);
void alert(uint32 u);
void alert_dec(uint32 u);
 
For production code, assertions may be turned off by defining NDEBUG:
#define complain(msg) do {\
alert_decimal(__FILE__); \
alert(": "); \
alert_decimal(__LINE__); \
alert(": "); \
alert(msg); \
alert("\n"); \
} while(0)
 
<syntaxhighlight lang="bash">
void * malloc(size_t s) {
gcc -DNDEBUG ...
complain((uint32)kernel_heap.first_free->addr);
</syntaxhighlight>
do_something();
complain((uint32)kernel_heap.first_free->addr);
if (do_something2()) y = malloc(sizeof(struct book_keeping_struct));
complain((uint32)kernel_heap.first_free->addr);
do_something3();
}
</source>
with an output of
src/memory/malloc.c: 271: 0xd0010000 //entering malloc
src/memory/malloc.c: 273: 0xd0010000 //having done_something
src/memory/malloc.c: 271: 0xd0010000 //do_something2 makes malloc to recursive call
src/memory/malloc.c: 273: 0xd0010000 //done_something
src/memory/malloc.c: 275: 0x0e1bc30a //aha! an error after do_something2() (which returns 0) in nested malloc call
...
 
Note that <assert.h> does not have (or need) a header guard, i.e. can be included multiple times in a source file, and that whether NDEBUG is defined or not is evaluated anew ''at every inclusion of <assert.h>''. You can thus enable / disable assertions at a very fine-grained level if necessary:
Finding death point or program flow
SYSFAIL: page fault, %eip= 0xc0001d330, %cr2=0x00000000
<source lang="c">
#define lnDbg do { alert("<<"); alert(__func__); alert(" : "); alert_decimal(__LINE__); alert(">>\n"); } while(0)
</source>
&nbsp;
<source lang="c" line start="13">
void a(int i) {
lnDbg;
if (fun(i)) c(i) else a(i-1);
lnDbg;
a(i);
}
void c(int i) {
lnDbg;
if (!is(i)) return ;
lnDbg;
c(i+1);
lnDbg;
}
</source>
<<a: 14>>
<<c: 20>> //line 16 wasn't run, fun(i) has returned true
<<a: 16>> //line 22 wasn't run, !is(i) has returned true
<<a: 14>>
SYSFAIL: .... //after line 14 neither line 20 or 16 has been reached so the call fun(i) caused the page fault.
 
<syntaxhighlight lang="c">
Such macros may be stored in an shared.h file included by other compilation units.
#include <assert.h>
{{stub}}
 
/* assert() at this point only fails-on-false if NDEBUG is not defined */
==== Deleted Code ====
assert( isChecksumCorrect() );
A code block may be commented out to delete it from the program, however nesting deleted fragments may reduce legibility with C++ style comments, and C comments do not nest at all.
 
A better solution is to wrap the code in and #if 0-#endif block, where the conditional 0 means false :
#ifdef NDEBUG
<source lang="c">
/* Hard-enabling of assert() even if NDEBUG is defined */
#if 0
#define NDEBUG_WAS_SET
print("memory state: ");
#undef NDEBUG
print(mem->state);
#include <assert.h>
print("\nallocated blocks: ");
#endif
print(mem->allocs);
 
/* assert() in this block of code should fail-on-false even in production */
assert( isChecksumCorrect() );
 
#ifdef NDEBUG_WAS_SET
/* Restoring NDEBUG if it was enabled originally */
#define NDEBUG
#include <assert.h>
#endif
</syntaxhighlight>
</source>
Many editors, like [http://www.vim.org/ Vim] have by default syntax highlighting rules that treat such #if 0-#endif blocks as comments.
The #if-#endif directives must be balanced, single-quotes characters must balance etc. so for deleting non-code text use comments instead.
 
=== Hazards of the C preprocessor ===
There is a number of counter-intuitive consequences of macros and macro expanding design.
[http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls]
== See also ==
=== Articles ===
* [[C]]
* [[Why function implementations shouldn't be put In header files]]
 
=== External Links ===
* [http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls] - A number of counter-intuitive consequences of macros and macro expanding design.
The GNU C preprocessor manual:
* [http://gcc.gnu.org/onlinedocs/cpp/ IndexThe GNU C preprocessor manual]
* [http://msdn.microsoft.com/en-us/library/b0084kay(VS.80).aspx VC++ preprocessor information]
* [http://gcc.gnu.org/onlinedocs/cpp/Macro-Pitfalls.html#Macro-Pitfalls Macro Pitfalls]
 
[[Category:C]]
[[Category:Tutorials]]