Kernel Debugging: Difference between revisions

m
Bot: fixing lint errors, replacing obsolete HTML tags:
[unchecked revision][unchecked revision]
(Removed outdated data)
m (Bot: fixing lint errors, replacing obsolete HTML tags:)
 
(6 intermediate revisions by 5 users not shown)
Line 1:
Humans make mistakes. Some of these mistakes may end up being part of your OS. Since bugs are more difficult to find than to fix, this page provides a list of common techniques that can be used to isolate bugs in your OS.
 
== Debug statements and log files ==
 
The first solution is probably the easiest, and depends on what kind of information you
want to get back from your debugger.
Line 17 ⟶ 15:
and implies recompiling the kernel every time you want to check a different set of variables...
but it is the simplest solution.
 
== Pseudo-Breakpoints ==
In places where a full print or logging function is not feasible (such as when trying to isolate a single erroneous assembly language instruction), you can create a kind of 'pseudo-breakpoint' by inserting an "1: jmp 1b" instruction into the code. These can be used to perform a binary space isolation (often referred to as a 'binary chop') through the code. The idea is to place the endless loop at a point roughly halfway through the part of the code suspected to be at fault; if the CPU halts before the error occurs, then you know that the error is after the breakpoint, otherwise, it must be in the code before breakpoint. Repeat this procedure until the error is isolated. Unfortunately, this only works if the result of the error can be differentiated from the halt instruction itself, and it does little in the case of a problem occurring more than one repetition into loop, such as an array overrun. But you could use a virtual machine debugger to do single stepping with pseudo-breakpoints (see bellowbelow "Using Debuggers with VMs").
 
In places where a full print or logging function is not feasible (such as when trying to isolate a single erroneous assembly language instruction), you can create a kind of 'pseudo-breakpoint' by inserting an "1: jmp 1b" instruction into the code. These can be used to perform a binary space isolation (often referred to as a 'binary chop') through the code. The idea is to place the endless loop at a point roughly halfway through the part of the code suspected to be at fault; if the CPU halts before the error occurs, then you know that the error is after the breakpoint, otherwise, it must be in the code before breakpoint. Repeat this procedure until the error is isolated. Unfortunately, this only works if the result of the error can be differentiated from the halt instruction itself, and it does little in the case of a problem occurring more than one repetition into loop, such as an array overrun. But you could use a virtual machine debugger to do single stepping with pseudo-breakpoints (see bellow "Using Debuggers with VMs").
 
IMPORTANT NOTE #1: the HLT instruction is a privileged instruction, and as such it will only work in your kernel. The pseudo-breakpoint "1: jmp 1b" is unprivileged, and works from user mode too.
 
IMPORTANT NOTE #2: gcc thinks it is smarter than the programmer, so if you use "while(1);", then it will falsely assume that everything after that loop is not needed, and it will REMOVE all those code from the binary. You MUST use inline assembly so that gcc will keep your code as-is.
<sourcesyntaxhighlight lang="C">
asm volatile ("1: jmp 1b");
</syntaxhighlight>
</source>
 
== Use a virtual machine ==
 
A virtual machine is a program that simulates another computer (Java coders should be familiar with
the concept).
Line 46 ⟶ 40:
That being said, there are also a lot of other advantages to using a VM.
For example, you don't have to reboot to test your new OS, you just start the VM.
 
== Using the serial port ==
 
=== Writing logfiles with QEMU ===
QEMU allows you to redirect everything that you send to COM1 port to a file on your host computer. To enable this feature, you have to add the following flag when launching QEMU:
-serial file:serial.log
... while "serial.log" is the path to the output file. Once you have this feature enabled, you can write log entries by simply [[Serial Ports|writing characters to the COM1 port]] (reading from the file over the serial port is not supported).
 
=== On real hardware ===
When your real computer resets due to a programming error, anything you might have put on the screen will instantly vanish. If you're tampering with the video card, you will often find yourself with no visual debugging method at all. If you have a pair of computers connected with a null-modem cable, you can instead send all debug statments over the serial port instead and record them on your development machine that is more stable. Using an actual serial terminal works just as well. It requires a bit of additional cabling, but it works fairly simple and can prove to be a very good replacement for a VM log.
 
==== With remote debugger / GDB ====
Since serial works two ways, you can also control your kernel remotely in case of problems. This can be a simple interface, but you can also attach GDB onto the serial port and potentially get a full blown debugger running.
 
This is however rather tricky, since it requires additional hardware, and special support coded into your kernel. You might want to read the [http://web.archive.org/web/20070415113206/http://www.kernelhacking.org/docs/kernelhacking-HOWTO/indexs09.html kernel hacking how-to] and (at minimum) [httphttps://sourceware.org/gdb/current/onlinedocs/gdb.html/Remote-Debugging.html#Remote-Debugging chapter 20 of the GDB manual], and chances are likely that your debugger will introduce even more bugs at first.
 
==== Using mini debugger ====
Because integrating gdb is quite a task, you could use the [https://gitlab.com/bztsrc/minidbg mini debugger] library instead, which is small and simple, written in ANSI C (and a little Assembly). That is a minimal interactive debugger (dumps registers and memory, disassembles instructions) which works on serial terminals (such as VT100, VT220 or emulators like PuTTY and minicom). Available for [[AArch64]] and [[x86-64]] kernels. You can use that library as a skeleton to implement your own, fully featured kernel debugger if you want.
 
== Using Debuggers with VMs ==
 
=== Use GDB with QEMU ===
 
You can run QEMU to listen for a "GDB connection" before it starts executing any code to debug it.
 
qemu -s -S <harddrive.img>
 
...will setup QEMU to listen on port 1234 and wait for a GDB connection to it. Then, from a remote or local shell:
 
gdb
(gdb) target remote localhost:1234
 
(Replace localhost with remote IP / URL if necessary.) Then start execution:
 
But that's not all, you can compile your source code under GCC with debugging symbols using "-g". This will add all the debugging symbols in the kernel image itself (Thus making it bigger). There is also a way to put all of the debugging information in a separate file using the "objcopy" tool, which is part of the GNU Binutils package.
 
objcopy --only-keep-debug kernel.elf kernel.sym
 
This will put the debugging information into a file called "kernel.sym". After that to strip your executable of debugging information you can do
 
objcopy --strip-debug kernel.elf
 
Or alternatively, if you are using a flat binary as your kernel image, you can do
 
objcopy -O binary kernel.elf kernel.bin
 
To produce a flat binary which can be debugged using the previously extracted debug information
 
You can import the symbols in GDB by pointing GDB to the file containing debug information
 
(gdb) symbol-file kernel.elf ;kernel.elf is the actual unstripped kernel image in this case
 
From there, you can see the actual C source code as it runs line per line! (Use the stepi instruction in GDB to execute the code line per line.)
 
Example :
 
$ qemu -s -S c.img
warning: could not open /dev/net/tun: no virtual network emulation
Line 119 ⟶ 92:
Breakpoint 1, kmain (mdb=0x341e0, magic=0) at kernel/kernel.c:12
12 {
 
The above started code execution, and will stop at kmain specified in the "break kmain" above.
You can view registers at anytime with this command
(gdb) info registers
 
I won't start explaining all the nice things about GDB, but as you can see, it is a very powerful tool for debugging OSes.
 
Alternatively you can force a breakpoint in your code without knowing the name of the function or the address. Place an endless loop pseudo-breakpoint somewhere in your code
<sourcesyntaxhighlight lang="C">
asm volatile ("1: jmp 1b");
</syntaxhighlight>
</source>
 
Then on the terminal that's running gdb, when your VM hangs press Ctrl^C to stop execution and drop you at the debugger prompt. There
 
(gdb) set $pc += 2
 
Will step over the endless loop, and you can start single stepping, executing one instruction at a time with
 
(gdb) si
 
=== Use LLDB with QEMU ===
 
LLDB supports GDB server that QEMU uses, so you can do the same thing with the previous section, but with some command modification as LLDB has some commands that are different than GDB
 
You can run QEMU to listen for a "GDB connection" before it starts executing any code to debug it.
 
qemu -s -S <harddrive.img>
 
...will setup QEMU to listen on port 1234 and wait for a GDB connection to it. Then, from a remote or local shell:
 
lldb kernel.elf
(lldb) target create "kernel.elf"
Line 161 ⟶ 122:
0xfff4: addb %al, (%rax)
0xfff6: addb %al, (%rax)
 
(Replace localhost with remote IP / URL if necessary.) Then start execution:
 
(lldb) c
Process 1 resuming
 
To set a breakpoint:
(lldb) breakpoint set --name kmain
Breakpoint 1: where = kernel.elf`kmain, address = 0xffffffff802025d0
 
=== Use bochs debugger ===
 
The easiest way to trigger a breakpoint in bochs is to place "xchg bx, bx" into your code. For example
<sourcesyntaxhighlight lang="C">
asm volatile ("xchg %bx, %bx");
</syntaxhighlight>
</source>
 
Then when you run the virtual machine, it will stop execution and drop you at debugger prompt. To single step from there, use
 
bochs:1> s
 
=== Use VirtualBox debugger ===
 
Unfortunately VirtualBox developers have removed the "--start-dbg" command line option, so there's no way to set up breakpoints before your VM starts execution.
But you can do a similar trick as with GDB, which is to place an endless loop pseudo-breakpoint in your code somewhere:
<sourcesyntaxhighlight lang="C">
asm volatile ("1: jmp 1b");
</syntaxhighlight>
</source>
 
Then when the execution hangs, access "Command line..." under "Debug" menu (if you don't have a Debug menu in the Machine window, you'll have to enable the debugger see below). In the debugger command line, the first thing to do is that you MUST stop the VM from running:
 
VBoxDbg> stop
 
This should dump the registers. But if not, then get the current RIP value with:
 
VBoxDbg> r
 
Once you get the current RIP, add 2 to it, and set a new RIP (I couldn't find any way to reference RIP from command line, you have to use constants), for example:
 
VBoxDbg> r rip = 0xfffffffff1000102
 
Check if the current RIP correctly points to the instruction after the endless loop:
 
VBoxDbg> r
 
And you can start single stepping with
 
VBoxDbg> p
 
== GUI frontends ==
 
While GDB provides a text-based user interface (available via the `-tui` command line option or by entering `wh` at the GDB prompt), you might want to use one of the available GUI frontents to GDB. These include but are not limited to:
 
* [http://www.kdbg.org/ KDbg]
* [http://sources.redhat.com/insight/ Insight]
* [http://www.gnu.org/s/ddd/ DDD]
* [http://visualkernel.com/ VisualKernel]
 
Attaching to a QEMU session works similar to the command line GDB described above.
 
== Develop in hosted environment ==
Another possibility, which is also a great architectural exercise, is to code every software module in a hosted environment like Linux, and then port it to your OS. You can do this for kernel code too, not just usermode programs.
 
Suppose you want to develop your VFS interface implementation. Your already created the interface for block devices (doesn't matter if you already implemented it in your kernel). In this case, you can implement your block device interface as a set of wrappers that [httphttps://en.wikipedia.org/wiki/Adapter_pattern adapts] your interface to POSIX calls. You will then implement your VFS interface (i.e., the code that will manage the filesystem drivers in your kernel) on top of those wrappers. You will then test&debug your implementation all in the hosted environment, and when it is mature, you link it into your real kernel instead of into your hosted implementations. You will finally test your newly introduced code, now in the freestanding environment to ensure it works there as well.
 
Now, the Pros. First of all, you can use your favourite debugger. You can also use unit testing, for example, which is far better than testing software by hand, if you use the right method.
Line 231 ⟶ 168:
 
Another Con that will probably scare most people is that this approach requires you to consistently plan your interfaces beforehand. Depending on your specific requirements, you may still be able to avoid a too long planning phase. For example, if you want to throw away the hosted implementations once you get the modules working properly, then you don't have to bother maintaining the same interfaces forever.
 
== Using an IDE ==
You can debug Linux kernel modules with Visual Studio if you use the VisualKernel plugin. Here's a tutorial showing a normal debugging session: http://visualkernel.com/tutorials/kgdb/
 
If you have an i686-elf toolchain that includes GDB, you can use [https://visualgdb.com/ VisualGDB] to both compile and debug your kernel. For more information on configuring VisualGDB for kernel development in Visual Studio, see [[Visual Studio]].
 
== VirtualBox ==
Start your virtual machine with the command "<tt>VBoxManage startvm --putenv VBOX_GUI_DBG_ENABLED=true <Name>"</tt> and then a "Debug" menu will appear on the window. You can choose "Command Line" to open a debugging prompt.
 
Useful commands:
Line 246 ⟶ 181:
* .pgmphystofile "File Path" - dump physical memory to file
* info help/<Name> - View device information
== See Also ==
 
=== Related ThreadsArticles ===
* [[GDB]]
 
=== Forum Threads ===
*[[Topic:9514|Benchmarking and Debugging]]
*[[Topic:10140|Implementation of kassert()]]
[[Category:Kernel]]
 
[[Category:Debugging]]
[[Category:Troubleshooting]]
40

edits