Dynamic Linker: Difference between revisions

[unchecked revision][unchecked revision]
Content deleted Content added
m Bot: Replace deprecated source tag with syntaxhighlight
 
(6 intermediate revisions by 6 users not shown)
Line 8:
 
As your executable was loaded by you, you should know the virtual address of ELF magic bytes in the memory. Use that for start.
<sourcesyntaxhighlight lang="C">
Elf64_Ehdr *ehdr = (Elf64_Ehdr *)(ptr);
 
Line 18:
// We have a valid image with sections
}
</syntaxhighlight>
</source>
First we check the magic bytes and the format of the ELF64. As an extra, we also check whether it's executable and has a non-empty section table (we'll going to need it).
 
Line 25:
=== Segment Local Calls ===
In order to figure that out, first we should know how a segment local call works. For that we'll use a very minimal source.
<sourcesyntaxhighlight lang="C">
void localfunction()
{
Line 34:
localfunction();
}
</syntaxhighlight>
</source>
That compiles to:
<sourcesyntaxhighlight lang="bash">
$ objdump -d test
000000000020016f <localfunction>:
Line 53:
200185: 5d pop %rbp
200186: c3 retq
</syntaxhighlight>
</source>
That's trivial, a rip relative addressing is used at 20017f.
 
=== Inter-segment Calls ===
Now let's modify the source a bit to use a libc call:
<sourcesyntaxhighlight lang="C">
int main(int,char**)
{
printf("Hello World");
}
</syntaxhighlight>
</source>
Compile and see what's generated.
<sourcesyntaxhighlight lang="bash">
000000000020016f <main>:
20016f: 55 push %rbp
Line 85:
200216: 68 00 00 00 00 pushq $0x0
20021b: e9 e0 ff ff ff jmpq 200200 <main+0x91>
</syntaxhighlight>
</source>
What? Two more local functions? What happened here? The GNU toolchain has a concept for lazy run-time linking. That means the address is not resolved until it's referenced. To achieve that, it needs helper functions (generated to the .plt section in the text segment).
 
Line 97:
2. it has two parts: load time linker and a run time resolver.
 
The first part runs before the thread get'sis started and saves the second part's address and argument into GOT. On the other hand, the second part runs when the thread is already running, and saves relocated addresses into GOT. As you can see, both parts require the address of GOT. To save resources, one should not locate the GOT twice: this is where the resolver's argument came in.
 
To proceed we'll have to locate the GOT in memory and figure out what entries it has.
Line 103:
=== Locating the GOT ===
Time to peek on what's in the object file.
<sourcesyntaxhighlight lang="bash">
$ readelf -a test
Section Headers:
Line 114:
Num: Value Size Type Bind Vis Ndx Name
15: 0000000000201048 0 OBJECT LOCAL DEFAULT 10 _GLOBAL_OFFSET_TABLE_
</syntaxhighlight>
</source>
Symbol table shows that the GOT is at 201048. We can also see the same value in the section headers at '.got.plt'. That means we don't have to resolve symbols in order to get GOT's address which simplifies the first part. We can also learn that the GOT is 32 (0x20) bytes long in our example.
 
Line 123:
3. GOT+0x10 is function reference to second part
But what about the rest, starting at 201060 in our example? Here we have only one reference so it's obvious, but what if we have more references? How should we know which symbol is associated to which entry?
<sourcesyntaxhighlight lang="bash">
$ readelf -a test
Section Headers:
Line 134:
Offset Info Type Sym. Value Sym. Name + Addend
000000201060 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
</syntaxhighlight>
</source>
How convenient that another table is also recorded in the section headers. It's called '.rela.plt' and describes exactly that.
 
Line 140:
So far we assumed that shared libraries are already loaded. It's the case with libc, but how do we know what other shared libraries the executable wants?
 
<sourcesyntaxhighlight lang="bash">
$ readelf -a test
Section Headers:
Line 151:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so]
</syntaxhighlight>
</source>
Not surprising that the answer lies in the section header again. There's a table pointer called '.dynamic'. That table has several records, but what we really are interested in is the ones marked by "NEEDED".
 
=== Symbol look up ===
To find out printf's address we should locate it's symbol first in the shared library.
<sourcesyntaxhighlight lang="bash">
$ readelf -a libc.so
Section Headers:
Line 169:
Num: Value Size Type Bind Vis Ndx Name
8: 0000000100000175 93 FUNC GLOBAL DEFAULT 1 printf
</syntaxhighlight>
</source>
Bingo! It is 100000175 in our example.
 
== Gimme code! ==
I've put all the above together in a very simple example, see [https://githubgitlab.com/bztsrc/osz/-/blob/master/tools/elftool.c elftool.c] on githubgitlab.
 
When I run it on the executable it gives:
<sourcesyntaxhighlight lang="bash">
$ gcc elftool.c -o elftool
$ ./elftool -d mytestelf.o
Line 198:
 
--- EXPORT ---
</syntaxhighlight>
</source>
As you can see here we have an import section, but nothing to be exported. Now let's see a shared library!
<sourcesyntaxhighlight lang="bash">
$ ./elftool -d /lib/libc.so.6
Stringtable 00011038 (23041 bytes), symbols 00003d90 (53928 bytes, one entry 24)
Line 235:
22. 000a3d20 __wcscoll_l
... lot more lines to come ...
</syntaxhighlight>
</source>
This time it has hell a lot of functions to export, and also it imports the dynamic linker of Linux with addend offsets in the GOT.
 
Line 254:
 
That's all, hope it helps somebody! Good luck with implementing your own dynamic linker!
 
[[Category:Linkers]]
[[Category:Loaders]]