972
edits
[unchecked revision] | [unchecked revision] |
No edit summary |
m (Bot: Replace deprecated source tag with syntaxhighlight) |
||
(18 intermediate revisions by 15 users not shown) | |||
Line 1:
{{Tone}}
== Providing a basic debugging environment ==
=== Exception handlers ===
The first thing ever to do is to implement a reliable 'exception handler' which will tell you what went wrong. Under an emulator like [[Bochs]], the absence of such a handler will lead to a '3rd Exception without resolution' panic message (a.k.a [[Triple Fault]]), if the emulator is configured to do so... On bare hardware, it will simply reset your computer with a laconic 'bip'.
Every time the CPU is unable to call some exception handler, it tries to execute the [[Double Fault]] exception handler. If it fails to call it either, a [[Triple Fault]] occurs. Also keep in mind that exceptions cannot be masked, so either your code is perfect or you need exception handlers. Also keep in mind that when you run applications '''''their''''' code must be perfect without exception handlers, so it's a good idea to get them quite quickly.
<!-- The following code example should be more generalized (e.g. exc_0d_handler, gpfExcHandler renamed to more meaningful names).-->
<syntaxhighlight lang="asm">
exc_0d_handler:
push gs
Line 38 ⟶ 37:
pop gs
iret
</syntaxhighlight>
Once you have implemented such a technique, it may be wise to test it, deliberately issuing 'faulty' instructions to see if the correct code is displayed. Having the 'double fault' exception (08) displayed somewhere else on the screen may also be a smart move.
==== What to do if characters cannot be displayed ====
Such things occurs for instance when your GDT or paging tables has been badly configured (e.g. 0xb8000 no longer refers to the video memory). Fortunately enough, the video memory is not your sole communication technique with your kernel:
* you may use the [[PS2 Keyboard|keyboard]] LEDs to report some events (for instance enabling the 'scroll lock' LED when you're a handler and disabling it when you're out).
* you may use the internal [[PC Speaker]] to make Morse-code-like signals reporting early errors. (May be disgraceful if coding late, though.)
* you may use [[VGA Hardware|VGA registers]] to change the background color or the overscan (screen border) color to report the current 'state' of your kernel. (E.g. black = normal operations, yellow = processing interrupt, red = crash condition occurred (just before cli:hlt), blue = processing an exception, etc...)
Refer to [[:Category:VGA|VGA Resources]] to see how you can modify colors. The [[resources]] page should have all the documentation for LED flashing and speaker beeping.
* you may output bytes via the [[serial ports|serial port]]. Most emulators allow you to redirect these characters into a file and unlike the screen the number of characters is not limited. A driver for the serial port is also very easy to implement.
==== Avoiding exception loops ====
So we know when exceptions occur and which exception occurred. That's better but still not especially useful. Your exception handler is likely to become something complex as your kernel will evolve, and you'll discover that exceptions mainly occur ... in exception handlers.
In order to avoid recursive exceptions to occur endlessly, you can easily maintain a 'nested exceptions counter' that will be incremented every time you enter an exception handler and decremented just before you leave that handler. If the counter is above a certain threshold of a few units (3 should give interesting enough results), the kernel will abort trying to solve the exception and enter a 'panic' mode (red background, flashing
<syntaxhighlight lang="c">
int nestexc = 0;
// called by the stub
void gpfExcHandler(void) {
if (nestexc > MAX_NESTED_EXCEPTIONS) panic();
nestexc++;
Line 71 ⟶ 68:
return;
}
</syntaxhighlight>
You need to know, of course, that some exceptions are not 'resumable'. If your kernel issued a division by zero, trying to return to the 'div' instruction will only trigger the exception one more time (yeah! altogether, now :). Such loops cannot be solved by the 'nestexc' counter
==== Showing the stack content ====
Much of your program's state (function arguments, return address, local variables) is stored on the [[stack]], especially when using C/C++ code. A complete debugger (like GDB) will inspect the debugging info to give names to the stack content, provide a list of calls, etc. This is a bit complex to do ourselves, but if your kernel can simply ''show'' the content of the stack and if you know ''where'' in the code the process halted, you can already fix quite a lot of bugs by doing the job of the debugger yourself, guessing which stack location holds which variable, where the return addresses are, etc.
The stack content is still in memory. The [[EBP]] value of the erroring process is still in memory, and points to the start of the stack frame for the current function. Everything from this address and up was the current stack. Now, you can use the value in ebp as the source. Just use the following call:
<syntaxhighlight lang="asm">
stack_dump:
push ebp
Line 88 ⟶ 84:
pop ebp
ret ; note that this is not going to work, but it should be here for completion.
</syntaxhighlight>
and use <tt>void dump_hex(char *stack)</tt>.
==== Locating the Faulty instruction ====
In most cases, when your exception handler is called, the address of the faulty instruction is somewhere on the stack. The first step here is to print out the address of this instruction.
Once this is done, you (as a human) can inspect the ''linker map'' and find out in which object file the problem was. You can request a map with <tt>ld </tt>''<usual options>''<tt> -Map </tt>''<filename.map>''.
<pre>
Line 114 ⟶ 110:
</pre>
Is an example of what a map can look like. If the error address was 0x554f, we can tell from this that the error is somewhere in init.o, and most likely in <tt>kinit()</tt> function (it may still be in some other function if there are some ''static'' functions in the source file). All we know is that the error
Now, we can use <tt>objdump -drS bin/init.o</tt> to get a look at the disassembled output. Note that this step will work properly only if you had enabled debug information in those separated <tt>.o</tt> files...
<syntaxhighlight lang="c">
#ifdef __DEBUG__
kprint("kernel in debug-mode(%x) press [SHIFT+SPACE] to bypass anykey()\n",
Line 131 ⟶ 127:
DbMsk);
#endif
</syntaxhighlight>
Of course, as I picked up a random address, there's nothing wrong to see at +21f, but I guess you got my point. :)
==== Locating the offending line of source code ====
Once you have found the address of the faulty instruction in the previous step, you can identify the corresponding line of source code by running
<pre>
addr2line -e <your_kernel.elf> <address of faulty instruction>
</pre>
=== Enhanced debugging techniques ===
==== Stack tracing ====
By analyzing the default way to create a stack frame, you can rip off a stack frame at a time, resulting in the call sequence that leads to the fault. For a single bonus point, also extract the arguments and dump them as well. For multiple bonus points, use C++ name mangling, and export the arguments in readable form in the correct type.
Each time a function is called it gets the following head/tail: ([[GCC]] 3.3.2)
<syntaxhighlight lang="asm">
push ebp
mov ebp, esp
Line 147 ⟶ 150:
leave
ret
</syntaxhighlight>
On the place of the ... the rest of the code is filled in. Now, if you analyze the stack output, it looks something like:
<pre>
0000FFC0 0000FFD0 -> this is the result of a push
0000FFC4 001023A5 -> this was the old EIP, which can be looked up in the function table (map file)
0000FFC8 01234567 -> this is an argument
0000FFCC 89ABCDEF -> this is another argument
0000FFD0 0000FFF0 -> this is again another
0000FFD4 00105BC3 -> this is again an
0000FFD8 001023A5 -> this is an argument (could be a function pointer)
0000FFDC 01234567 -> this is another argument
Line 164 ⟶ 167:
0000FFE8 FFC00000 -> this is again an argument to a different function
0000FFEC 00010000 -> this is again another argument, but again not to this function.
0000FFF0 0000FFFC -> this is the previous
0000FFF4 0010002C -> this is an old EIP
0000FFF8 00000001 -> this is an argument
0000FFFC 00000000 -> this is the
</pre>
Now, you can traverse along the path of execution. The content of
If you use C++ name mangling, the arguments are encoded in the function name. If you can read that, you can decode what the value on the stack must be, so you can actually present it to the user in the form of a normal function call with legible arguments and everything. This is the
While I program my kernel in C, I actually thought of writing a script that would parse the header files for function declarations, extract the debugging symbols from the compiled kernel image using <tt>objdump</tt>, and write a system map which would provide the types. Forgot it after falling in love with
=== Debugging techniques ===
If your function <tt>x()</tt> wreaks havoc only after 1000 calls it may not suffice to put a
<syntaxhighlight lang="c">
void scheduler_choose_task() {
static uint32_t Z=0;
Z++;
uint32_t N = 1000;
if (Z > N) panic(); //find the largest integer N for which it crashes not
if (in_critical_section()) return;
...
}
</syntaxhighlight>
...and then check how far does it go:
<syntaxhighlight lang="c">
Z++;
uint32_t N = 1000;
//we get here,
if (in_critical_section()) return;
</syntaxhighlight>
However as complexity rises or multithreading is involved, it is less probable that a crash would be consistently occurring at the same point, after the same amount of calls every time. Then it would not be possible to find the number of the call to <tt>scheduler_choose_talk()</tt> that crashes it (because that number changes). Debugging needs some imagination; what if you knew, by tracing the program flow with <tt>print(__LINE__)</tt> that <tt>scheduler_choose_task()</tt> crashes only when a call to <tt>fun1()</tt> is in progress? You might use a global var <tt>uint32_t dbg</tt> or an array (<tt>uint32_t dbg[20]</tt>) of various <tt>dbg</tt> vars (which are used only in debugging code which is cleaned after the programmer ceases to debug) in a manner such as:
<syntaxhighlight lang="c">
void fun1() {
dbg[3] = 1;
...
if (x()) { dbg[3]=0; return; }
...
dbg[3] = 0;
}
</syntaxhighlight>
...and:
<syntaxhighlight lang="c">
void scheduler_choose_task() {
// if (dbg[3]==1) panic(); //check here.. a panic saves the day from crashing!
if (in_critical_section()) return;
if (dbg[3]==1) panic(); //check here.. it crashes
}
</syntaxhighlight>
(Or mix it with a call count, <tt>Z++; if (Z>5 && dbg[3] == 1) panic()</tt>.)
Using the <tt>__LINE__</tt> aids tracing the program flow:
<syntaxhighlight lang="c">
print(__LINE__);
</syntaxhighlight>
See [[C preprocessor#Uses for debugging|Uses for debugging]] for more info.
== External assistance ==
So far, we assumed that the kernel was containing all the information required for debugging (like symbols names, etc). In production-stage, however, this information, as well as most of the 'debugging prints' have usually been stripped out of the final binary object.
Still, one could imagine a kernel that would be equipped with serial-line communication code and connected to another computer that would have all the 'removed' information (like the symbols map, or the debugging-info featured intermediate binaries).
In case of a panic, the kernel could for instance send the value of <tt>eip</tt> over the serial line and expect the helper PC to reply with the function name and line number corresponding to that address (or dump it on the helper PC's screen, that's a matter of choice :) ).
=== Debugging interface ===
''Now we have plenty of information about what was wrong... can we ask for more? What do Mobius' debugging shell and Clicker's information panels tell us...''
:''Yes. Guess. Correct: AmigaOS. ;-) It offered a mode that allowed debugging over serial line even after the system went into Guru Meditation. That was possible because AmigaOS enjoyed a 256 / 512 kbyte ROM image that could not get corrupted.'' - [[User:Solar|MartinBaute]]
''Unix systems traditionally write their state to /dev/core for offline guru meditation''
== See Also ==
=== Articles ===
* [[How Do I Use A Debugger With My OS]]
[[Category:Troubleshooting
[[Category:Debugging]]
[[de:Debugging]]
|