Dynamic Linker: Difference between revisions

[unchecked revision][unchecked revision]
Content deleted Content added
Created page with "Sooner or later you'll reach the point where you want shared libraries. Here we won't discuss the difference between static and dynamic linking, you should be already familiar..."
 
m Bot: Replace deprecated source tag with syntaxhighlight
 
(9 intermediate revisions by 7 users not shown)
Line 1:
Sooner or later you'll reach the point where you want shared libraries. Here we won't discuss the difference between static and dynamic linking, you should be already familiar with that.
 
This article is about ELF64[[ELF]] on x86_64 architecture, but can be easily adopted to other systems as the concepts are similar.
 
== Home work ==
=== Memory Layout ===
I suppose you have created a new, empty address space, and you've already loaded the executable in it. I also assume that you've loaded the libc shared library after the executable's segment. And now you're stuck, because you don't know how to call printf from the executable. Let's see what the GNU toolchain does for us.
 
As your executable was loaded by you, you should know the virtual address of ELF magic bytes in the memory. Use that for start.
<sourcesyntaxhighlight lang="C">
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)(ptr);
 
Line 18:
// We have a valid image with sections
}
</syntaxhighlight>
</source>
First we check the magic bytes and the format of the ELF64. As an extra, we also check whether it's executable and has a non-empty section table (we'll going to need it).
 
Now let's see what the GNU toolchain does for us to find printf.
 
=== Segment Local Calls ===
In order to figure that out, first we should know how a segment local call works. For that we'll use a very minimal source.
<sourcesyntaxhighlight lang="C">
void localfunction()
{
Line 32 ⟶ 34:
localfunction();
}
</syntaxhighlight>
</source>
That compiles to:
<sourcesyntaxhighlight lang="bash">
$ objdump -d test
000000000020016f <localfunction>:
Line 51 ⟶ 53:
200185: 5d pop %rbp
200186: c3 retq
</syntaxhighlight>
</source>
That's trivial, a rip relative addressing is used at 20017f.
 
=== Inter-segment Calls ===
Now let's modify the source a bit to use a libc call:
<sourcesyntaxhighlight lang="C">
int main(int,char**)
{
printf("Hello World");
}
</syntaxhighlight>
</source>
Compile and see what's generated.
<sourcesyntaxhighlight lang="bash">
000000000020016f <main>:
20016f: 55 push %rbp
Line 83 ⟶ 85:
200216: 68 00 00 00 00 pushq $0x0
20021b: e9 e0 ff ff ff jmpq 200200 <main+0x91>
</syntaxhighlight>
</source>
What? Two more local functions? What happened here? The GNU toolchain has a concept for lazy run-time linking. That means the address is not resolved until it's referenced. To achieve that, it needs helper functions (generated to the .plt section in the text segment).
 
Line 93 ⟶ 95:
All that disassembly teach us two things about the dynamic linker:
1. it has to locate and write the GOT
2. it has two parts: load time linker and a run time resolver.
 
The first part runs before the thread get'sis started and saves the second part's address and argument into GOT. On the other hand, the second part runs when the thread is already running, and saves relocated addresses into GOT. As you can see, both parts require the address of GOT. To save resources, one should not locate the GOT twice: this is where the resolver's argument came in.
 
To proceed we'll have to locate the GOT in memory and figure out what entries it has.
Line 101 ⟶ 103:
=== Locating the GOT ===
Time to peek on what's in the object file.
<sourcesyntaxhighlight lang="bash">
$ readelf -a test
Section Headers:
Line 112 ⟶ 114:
Num: Value Size Type Bind Vis Ndx Name
15: 0000000000201048 0 OBJECT LOCAL DEFAULT 10 _GLOBAL_OFFSET_TABLE_
</syntaxhighlight>
</source>
Symbol table shows that the GOT is at 201048. We can also see the same value in the section headers at '.got.plt'. That means we don't have to resolve symbols in order to get GOT's address which simplifies the first part. We can also learn that the GOT is 32 (0x20) bytes long in our example.
 
Line 121 ⟶ 123:
3. GOT+0x10 is function reference to second part
But what about the rest, starting at 201060 in our example? Here we have only one reference so it's obvious, but what if we have more references? How should we know which symbol is associated to which entry?
<sourcesyntaxhighlight lang="bash">
$ readelf -a test
Section Headers:
Line 132 ⟶ 134:
Offset Info Type Sym. Value Sym. Name + Addend
000000201060 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
</syntaxhighlight>
</source>
How convenient that another table is also recorded in the section headers. It's called '.rela.plt' and describes exactly that.
 
Line 138 ⟶ 140:
So far we assumed that shared libraries are already loaded. It's the case with libc, but how do we know what other shared libraries the executable wants?
 
<sourcesyntaxhighlight lang="bash">
$ readelf -a test
Section Headers:
Line 149 ⟶ 151:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so]
</syntaxhighlight>
</source>
Not surprising that the answer lies in the section header again. There's a table pointer called '.dynamic'. That table has several records, but what we really are interested in is the ones marked by "NEEDED".
 
=== Symbol look up ===
To find out printf's address we should locate it's symbol first in the shared library.
<sourcesyntaxhighlight lang="bash">
$ readelf -a libc.so
Section Headers:
Line 167 ⟶ 169:
Num: Value Size Type Bind Vis Ndx Name
8: 0000000100000175 93 FUNC GLOBAL DEFAULT 1 printf
</syntaxhighlight>
</source>
Bingo! It is 100000175 in our example.
 
== Gimme code! ==
I've put all the above together in a very simple example, see [https://gitlab.com/bztsrc/osz/-/blob/master/tools/elftool.c elftool.c] on gitlab.
 
When I run it on the executable it gives:
<syntaxhighlight lang="bash">
$ gcc elftool.c -o elftool
$ ./elftool -d mytestelf.o
Stringtable 000003f0 (118 bytes), symbols 000002b8 (312 bytes, one entry 24)
 
--- IMPORT ---
Dynamic 00001e20 (464 bytes, one entry 16):
0. /lib/libc.so.6
 
GOT 00002000 (104 bytes), Rela 000004d0 (240 bytes, one entry 24):
0. 00602018 +0 puts
1. 00602020 +0 fread
2. 00602028 +0 fclose
3. 00602030 +0 printf
4. 00602038 +0 strcmp
5. 00602040 +0 ftell
6. 00602048 +0 malloc
7. 00602050 +0 fseek
8. 00602058 +0 fopen
9. 00602060 +0 exit
 
--- EXPORT ---
</syntaxhighlight>
As you can see here we have an import section, but nothing to be exported. Now let's see a shared library!
<syntaxhighlight lang="bash">
$ ./elftool -d /lib/libc.so.6
Stringtable 00011038 (23041 bytes), symbols 00003d90 (53928 bytes, one entry 24)
 
--- IMPORT ---
Dynamic 00197b60 (496 bytes, one entry 16):
0. /lib/ld-linux-x86-64.so.2
 
GOT 00198000 (88 bytes), Rela 0001f760 (192 bytes, one entry 24):
0. 00398050 +844d0
1. 00398048 +a8560
2. 00398040 +7ff80
3. 00398038 +867c0
4. 00398030 +823f0
5. 00398028 +82780
6. 00398020 +a8640
7. 00398018 +83b50
 
--- EXPORT ---
8. 0008e850 __strspn_c1
9. 0006aad0 putwchar
10. 000f8640 __gethostname_chk
11. 0008e870 __strspn_c2
12. 0010f210 setrpcent
13. 0009eda0 __wcstod_l
14. 0008e8a0 __strspn_c3
15. 000e8d10 epoll_create
16. 000d1b50 sched_get_priority_min
17. 000f8660 __getdomainname_chk
18. 000e8f20 klogctl
19. 0002c380 __tolower_l
20. 0004f440 dprintf
21. 000b8e00 setuid
22. 000a3d20 __wcscoll_l
... lot more lines to come ...
</syntaxhighlight>
This time it has hell a lot of functions to export, and also it imports the dynamic linker of Linux with addend offsets in the GOT.
 
== Summary ==
To summarize a dynamic linker should be look like:
 
1. runload-time linker
1.1. locates GOT by '.got.plt' section, and saves that address in GOT+0x8
1.2. stores second part's address at GOT+0x10
1.3. reads '.dynamic' section to load shared libraries
2. run-time reference resolver
2.1. it's called by a helper
2.2. that helper places idxindex-30x18 and the address of GOT as arguments on the stack
2.3. it has to locate '.rela.plt' to get the symbol for the reference
2.4. once the symbol is known, it should be looked up in the shared library's '.dynsym' section to get relocated address
2.5. that relocated address has to be saved in the GOT (index and base on the stack)
2.6. clean up stack, restore registers and jump to the relocated address
 
That's all, hope it helps somebody! Good luck with implementing your own dynamic linker!
 
[[Category:Linkers]]
[[Category:Loaders]]