Kernel Debugging: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
Line 72: Line 72:
objcopy --strip-debug kernel.elf
objcopy --strip-debug kernel.elf


If you are linking the kernel in elf format, you can import the symbols in gdb using
If you are linking the kernel in elf format and haven't stripped the debug info from it, you can import the symbols in gdb using


(gdb) symbol-file kernel.elf ;kernel.elf is the actual kernel image in this case
(gdb) symbol-file kernel.elf ;kernel.elf is the actual kernel image in this case

Revision as of 17:04, 5 July 2010

Humans make mistakes. Some of these mistakes may end up being part of your OS. Since bugs are more difficult to find than to fix, this page provides a list of common techniques that can be used to isolate bugs in your OS.

Debug statements and log files

The first solution is probably the easiest, and depends on what kind of information you want to get back from your debugger.

The problem with using a debugger such as ddd or gdb is that they require an OS to run....kinda useless when it's the OS itself that you want to debug.

Debugging is essentialy being able to probe the contents of a variable at a specific breakpoint. When your program hits the breakpoint, you can probe the variable.

This can also be achieved without using a debugger, by instead inserting a line of code to write to the screen or to a log of some kind. This gives you the contents of the variable that you are interested in - but it means knowing in advance what variable to check, and when, and implies recompiling the kernel every time you want to check a different set of variables... but it is the simplest solution.

Pseudo-Breakpoints

In places where a full print or logging function is not feasible (such as when trying to isolate a single erroneous assembly language instruction), you can create a kind of 'pseudo-breakpoint' by inserting a HLT instruction into the code. These can be used to perform a binary space isolation (often referred to as a 'binary chop') through the code. The idea is to place the halt instruction at a point roughly halfway through the part of the code suspected to be at fault; if the CPU halts before the error occurs, then you know that the error is after the breakpoint, otherwise, it must be in the code before breackpoint. Repeat this procedure until the error is isolated. Unfortunately, this only works if the result of the error can be differentiated from the halt instruction itself, and it does little in the case of a problem occurring more than one repetition into loop, such as an array overrun.

Use a virtual machine

A virtual machine is a program that simulates another computer (Java coders should be familiar with the concept).

There are a number of virtual machines that can simulate x86 machines, my favorite is Bochs (http://bochs.sourceforge.net). Bochs is capable of setting breakpoints in any kind of software (even if it is compiled without debugging info!), and provides an additional "debugging out port" you can easily access from within your kernel code to print debug messages.

The main downside to using a virtual machine like this is that all the code is displayed in assembler (or binary depending on what machine you choose) - instead of the C/C++ source you originally wrote. Also, simulating a virtual machine is slower than an actual machine, and the VM might not even behave exactly like the "real" hardware.

That being said, there are also alot of other advantages to using a VM. For example, you don't have to reboot to test your new OS, you just start the VM.

Another virtual machine called Simics (https://www.simics.net) is capable not only of breakpoints and displaying register information, but it is also capable of opening a port for use with debugging with DDD (the simics command is 'gdb-remote'). Using this combination, it is possible to see your C source code as you step through the OS! However, the Bochs virtual machine is much faster at executing the OS than Simics and thus serves as a better virtual machine to run the OS, while Simics is the better debugger for those hard to find problems.

Using a serial null-modem between GDB and your OS

It is possible to add stubs to your OS kernel so that it can be debugged remotely: Your kernel is running on machine A, and you run GDB on machine B connected to A via a null-modem cable.

This is a bit tricky, since it requires additional hardware, and special support coded into your kernel. You might want to read the kernel hacking how-to and (at minimum) chapter 17 of the GDB manual.

Use gdb with Qemu

You can run Qemu to listen for a "gdb connection" before it starts executing any code to debug it.

qemu -s -S <harddrive.img>

...will setup Qemu to listen on port 1234 and wait for a gdb connection to it. Then, from a remote or local shell:

gdb 
(gdb) target remote localhost:1234 

(Replace localhost with remote IP / URL if necessary.) Then start execution:

But thats not all, you can compile your source code under gcc with debuging symbols using "-g". This will add all the debuging symbols in the kernel image itself (Thus making it bigger ). There is also a way to put all of the debugging information in a separate file using the "objcopy" tool, which is part of the GNU binutils package.

objcopy --only-keep-debug kernel.elf kernel.dbg

This will put the debugging information into a file called kernel.dbg. After that to strip your executable of debugging information you can do

objcopy --strip-debug kernel.elf

If you are linking the kernel in elf format and haven't stripped the debug info from it, you can import the symbols in gdb using

(gdb) symbol-file kernel.elf             ;kernel.elf is the actual kernel image in this case

From there, you can see the actual C source code as it runs line per line! (Use the stepi instruction in gdb to execute the code line per line.)

Example :

$ qemu -s -S c.img
warning: could not open /dev/net/tun: no virtual network emulation
Waiting gdb connection on port 1234 

(gdb) target remote localhost:1234
Remote debugging using localhost:1234
0x0000fff0 in ?? ()
(gdb) symbol-file kernel.b
Reading symbols from kernel.b...done.

(gdb) break kmain                        ; This will add a break point to any function in your kernel code.
Breakpoint 1 at 0x101800: file kernel/kernel.c, line 12.

(gdb) continue

Breakpoint 1, kmain (mdb=0x341e0, magic=0) at kernel/kernel.c:12
12      {

The above started code execution, and will stop at kmain specified in the "break kmain" above. You can view registers at anytime with this command

(gdb) info registers

I won't start explaining all the nice things about gdb, but as you can see, it is a very powerfull tool for debuging OSes.

Develop in hosted environment

Another possibility, which is also a great architectural exercise, is to code every software module in a hosted environment like Linux, and then port it to your OS. You can do this for kernel code too, not just usermode programs.

Suppose you want to develop your VFS interface implementation. Your already created the interface for block devices (doesn't matter if you already implemented it in your kernel). In this case, you can implement your block device interface as a set of wrappers that adapts your interface to POSIX calls. You will then implement your VFS interface (i.e., the code that will manage the filesystem drivers in your kernel) on top of those wrappers. You will then test&debug your implementation all in the hosted environment, and when it is mature, you link it into your real kernel instead of into your hosted implementations. You will finally test your newly introduced code, now in the freestanding environmentm to ensure it works there as well.

Now, the Pros. First of all, you can use your favourite debugger. You can also use unit testing, for example, which is far better than testing software by hand, if you use the right method.

There are some Cons on this approach. For example, you are far from your target environment when you code like this. This is further agravated by the fact that so-called freestanding environments are dramatically more sensible to undefined behaviour, specially unitilialized variables. You can workarround this limitation by asking the compiler to perform aggressive optimization while testing hosted, which make software more sensible to undefined behaviour, too. However, as the best debug environment is the final target environment, you will still want to test your code when you introduce in into your real kernel.

Another Con that will probabilly scare most people is that this approach requires you to consistently plan your interfaces beforehand. Depending on your specific requirements, you may still be able to avoid a too long planning phase. For example, if you want to throw away the hosted implementations once you get the modules working properly, then you don't have to bother maintaining the same interfaces forever.

Related Threads