ELF: Difference between revisions

475 bytes removed ,  21 days ago
no edit summary
[unchecked revision][unchecked revision]
No edit summary
 
(30 intermediate revisions by 18 users not shown)
Line 1:
{{FirstPerson}}
{{You}}
{{File formats}}
[[ELF]] ([[ELF|Executable and Linkable Format]]) is a file format designed by Unix System Laboratories while working with Sun Microsystems on SVR4 (UNIX System V Release 4.0). Consequently, ELF first appeared in Solaris 2.0 (aka SunOS 5.0), which is based on SVR4. The format is specified in the [[System V ABI]].
 
ELF (Executable and Linkable Format) was designed by Unix System Laboratories while working with Sun Microsystems on SVR4 (UNIX System V Release 4.0). Consequently, ELF first appeared in Solaris 2.0 (aka SunOS 5.0), which is based on SVR4. The format is specified in the [[System V ABI]].
 
A very versatile file format, it was later picked up by many other operating systems for use as both executable files and as shared library files. It does distinguish between TEXT, DATA and BSS.
 
Today, ELF is considered the standard format on Unix-alike systems. While it has some drawbacks (e.g., using up one of the scarce general purpose registers of the IA-32 when using position-independent code), it is well supported and documented.
 
==File Structure==
ELF is a format for storing many program types (see ELF Header table) on the disk, created as a result of compiling and linking. An ELF file might indepedenently contain sections or segments. For an executable program, an ELF header and a segment are the bare minimum, while sections are optional, though it's common for an executable to have a ".text" section for the code and ".data" section for initialized data. Libraries don't have segments, but only sections because they are used for linking purposes. Sections and segments are described by their respective headers that contain information about their sizes, required alignment, etc.
 
ELF is a format for storing programs or fragments of programs on disk, created as a result of compiling and linking. An ELF file is divided into sections. For an executable program, these are the text section for the code, the data section for global variables and the rodata section that usually contains constant strings. The ELF file contains headers that describe how these sections should be stored in memory.
 
Note that depending on whether your file is a linkable or an executable file, the headers in the ELF file won't be the same:
process.o, result of gcc -c process.c $SOME_FLAGS
 
<pre>
C32/kernel/bin/.process.o
Line 39 ⟶ 37:
CONTENTS, READONLY
</pre>
The 'flags' will tell you what's actually available in the ELF file. Here, we have [[Symbol_Table|symbol tables]] and relocation: all that we need to link the file against another, but virtually no information about how to load the file in memory (even if that could be guessed). We don't have the program entry point, for instance, and we have a sections table rather than a program header.
 
The 'flags' will tell you what's actually available in the ELF file. Here, we have symbol tables and relocation: all that we need to link the file against another, but virtually no information about how to load the file in memory (even if that could be guessed). We don't have the program entry point, for instance, and we have a sections table rather than a program header.
 
{| {{wikitable}}
|-
| .text
| where code standslive, as said above. objdump -drS .process.o will show you that
|-
| .data
| where global tables, variables, etc. standlive. objdump -s -j .data .process.o will hexdump it.
|-
| .bss
Line 62 ⟶ 58:
| debugging symbols & similar information.
|}
 
/bin/bash, a real executable file
 
<pre>
/bin/bash: file format elf32-i386
Line 106 ⟶ 100:
filesz 0x0000002c memsz 0x0000002c flags r--
</pre>
thatThat's for Exception Handler information, in case we should link against some C++ binaries at execution (afaikNeeds citing).
<pre>
/bin/bash, loaded (as in /proc/xxxx/maps)
Line 116 ⟶ 110:
40014000-40015000 rw-p 00013000 03:06 27304 /lib/ld-2.3.2.so
</pre>
 
We can recognize our 'code bits' and 'data bits', by stating that the second one should be loaded at 0x080bd*120* and that it starts in file at 0x00074*120*, we actually preserved page-to-disk blocks mapping (e.g. if page 0x80bc000 is missing, just fetch file blocks from 0x75000). That means, however, that a part of the code is mapped twice, but with different permissions. I suggest you do give them different physical pages too if you don't want to end up with modifiable code.
 
==Loading ELF Binaries==
[[File:Elfdiagram.png|center|Executable image and elf binary can being mapped onto each other]]
The ELF header contains all of the relevant information required to load an ELF executable. The format of this header is described in the [http://www.skyfree.org/linux/references/ELF_Format.pdf ELF Specification]. The most relevant sections for this purpose are 1.1 to 1.4 and 2.1 to 2.7. Instructions on loading an executable are contained within section 2.7.
 
The following is a rough outline of the steps that an ELF executable loader must perform:
[[Image:Elfdiagram.png|frame|Executable image and elf binary can being mapped onto each other]]
* Verify that the file starts with the ELF magic number (4 bytes) as described in figure 1-4 (and subsequent table) on page 11 in the ELF specification.
The ELF file format is described in the ELF Specification. The most relevant sections for this project are 1.1 to 1.4 and 2.1 to 2.7.
* Read the ELF Header. The ELF header is always located at the very beginning of an ELF file. The ELF header contains information about how the rest of the file is laid out. An executable loader is only concerned with the program headers.
 
* Read the ELF executable's program headers. These specify where in the file the program segments are located, and where they need to be loaded into memory.
The steps involved in identifying the sections of the ELF file are:
* Parse the program headers to determine the number of program segments that must be loaded. Each program header has an associated type, as described in Figure 2-2 of the ELF specification. Only headers with a type of <tt>PT_LOAD</tt> describe a loadable segment.
 
* Load each of the loadable segments. This is performed as follows:
* Read the ELF Header. The ELF header will always be at the very beginning of an ELF file. The ELF header contains information about how the rest of the file is laid out. You are interested only in the program headers.
** Allocate virtual memory for each segment, at the address specified by the <tt>p_vaddr</tt> member in the program header. The size of the segment in memory is specified by the <tt>p_memsz</tt> member.
* Find the Program Headers, which specify where in the file to find the text and data sections and where they should end up in the executable image.
** Copy the segment data from the file offset specified by the <tt>p_offset</tt> member to the virtual memory address specified by the <tt>p_vaddr</tt> member. The size of the segment in the file is contained in the <tt>p_filesz</tt> member. This can be zero.
 
** The <tt>p_memsz</tt> member specifies the size the segment occupies in memory. This can be zero. If the <tt>p_filesz</tt> and <tt>p_memsz</tt> members differ, this indicates that the segment is padded with zeros. All bytes in memory between the ending offset of the file size, and the segment's virtual memory size are to be cleared with zeros.
There are a few simplifying assumptions you can make about the types and location of program headers. In the files you will be working with, there will always be one text header and one data header. The text header will be the first program header and the data header will be the second program header. This is not generally true of ELF files, but it will be true of the programs you will be responsible for.
* Read the executable's entry point from the ELF header.
 
* Jump to the executable's entry point in the newly loaded memory.
The file geekos/include/geekos/elf.h provides data types for structures which match the format of the ELF and program headers.
 
This is a rough guideline for what Parse_ELF_Executable() has to do:
 
# Check that exeFileData is non-null and exeFileLength is large enough to accommodate the ELF headers and phnum program headers.
# Check that the file starts with the ELF magic number (4 bytes) as described in figure 1-4 (and subsequent table) on page 11 in the ELF specification.
# Check that the ELF file has no more than EXE_MAX_SEGMENTS program headers (phnum field of the elfHeader).
# Fill in numSegments and entryAddr fields of the exeFormat output variable.
# For each program header k in turn, fill in the corresponding segmentList[k] array element of exeFormat with offsetInFile, lengthInFile, startAddress, sizeInMemory, protFlags with information from that program header k. See figure 2-1 on page 33 in the ELF specification.
 
==Relocation==
 
Relocation becomes handy when you need to load, for example, modules or drivers. It's possible to use the "-r" option to ld to permit you to have multiple object files linked into one big one, which means easier coding and faster testing.
 
Line 152 ⟶ 137:
# Go through all sections resolving external references against the kernel symbol table
# If all succeeded, you can use the "e_entry" field of the header as the offset from the load address to call the entry point (if one was specified), or do a symbol lookup, or just return a success error code.
 
Once you can relocate ELF objects you'll be able to have drivers loaded when needed instead of at startup - which is always a Good Thing (tm).
 
== Tables ==
=== ELF Header ===
The ELF header is always found at the start of the ELF file.
{| {{wikitable}}
|-
Line 178 ⟶ 161:
| 6
| 6
| ELF Versionheader version
|-
| 7
| 7
Line 190 ⟶ 173:
| 16-17
| 16-17
| Type (1 = relocatable, 2 = executable, 3 = shared, 4 = core)
|-
| 18-19
Line 198 ⟶ 181:
| 20-23
| 20-23
| ELF Version (currently 1)
|-
| 24-27
| 24-31
| Program entry positionoffset
|-
| 28-31
| 32-39
| Program header table positionoffset
|-
| 32-35
| 40-47
| Section header table positionoffset
|-
| 36-39
Line 218 ⟶ 201:
| 40-41
| 52-53
| ELF Header size
|-
| 42-43
Line 238 ⟶ 221:
| 50-51
| 62-63
| IndexSection inindex to the section header string table with the section names
|}
 
The flags entry can probably be ignored for x86 ELFs, as no flags are actually defined.
 
Line 250 ⟶ 232:
|-
| No Specific
| 00x00
|-
| [[:Category:Sparc | Sparc]]
| 20x02
|-
| '''[[x86]]'''
| 30x03
|-
| [[:Category:MIPS | MIPS]]
| 80x08
|-
| [[PowerPC]]
Line 278 ⟶ 260:
| '''[[:Category:ARM | AArch64]]'''
| 0xB7
|-
| [[RISC-V]]
| 0xF3
|}
The most common architectures are in bold.
 
An example C struct declaration for the file header (from Levine 1999) would look like this:
struct ELF_fh {
char magic[4] = "\177ELF"; // magic number
char class; // address size, 1 = 32 bit, 2 = 64 bit
char byteorder; // 1 = little-endian, 2 = big-endian
char hversion; // header version, always 1
char pad[9]; short filetype; // file type: 1 = relocatable, 2 = executable,
// 3 = shared object, 4 = core image
short archtype; // 2 = SPARC, 3 = x86, 4 = 68K, etc.
int fversion; // file version, always 1
int entry; // entry point if executable
int phdrpos; // file position of program header or 0
int shdrpos; // file position of section header or 0
int flags; // architecture specific flags, usually 0
short hdrsize; // size of this ELF header
short phdrent; // size of an entry in program header
short phdrcnt; // number of entries in program header or 0
short shdrent; // size of an entry in section header
short phdrcnt; // number of entries in section header or 0
short strsec; // section number that contains section name strings
};
 
=== Program header ===
This is an array of N (given in the main header) entries in the following format. Make sure to use the correct version depending on whether the file is 32 bit or 64 bit as the tables are quite different.
Line 323 ⟶ 284:
|-
| 12-15
| Reserved for segment's physical address (p_paddr)
| Undefined for the System V ABI
|-
| 16-19
Line 329 ⟶ 290:
|-
| 20-23
| Size of the segment in memory (p_memsz, at least as big as p_filesz)
|-
| 24-27
Line 335 ⟶ 296:
|-
| 28-31
| The required alignment for this section (must beusually a power of 2)
|}
 
64 bit version:
{| {{wikitable}}
Line 357 ⟶ 317:
|-
| 24-31
| Reserved for segment's physical address (p_paddr)
| Undefined for the System V ABI
|-
| 32-39
Line 363 ⟶ 323:
|-
| 40-47
| Size of the segment in memory (p_memsz, at least as big as p_filesz)
|-
| 48-55
| The required alignment for this section (must beusually a power of 2)
|}
Segment types: 0 = null - ignore the entry; 1 = load - clear p_memsz bytes at p_vaddr to 0, then copy p_filesz bytes from p_offset to p_vaddr; 2 = dynamic - requires dynamic linking; 3 = interp - contains a file path to an executable to use as an interpreter for the following segment; 4 = note section. There are more values, but mostly contain architecture/environment specific information, which is probably not required for the majority of ELF files.
 
Segment types: 0 = null - ignore the entry; 1 = load - clear p_memsz bytes at p_vaddr to 0, then copy p_filesz bytes from p_offset to p_vaddr; 2 = dynamic - requires dynamic linking; 3 = interp - contains a file path to an executable to use as an interpreter for the following segment; 4 = note section. There are more values, but mostly contain architecture/environment specific information, which is probably not required for the majority of ELF files.
 
Flags: 1 = executable, 2 = writable, 4 = readable.
 
=== Using the PIC with programs ===
 
The logic that will allow an ELF program to run (which is quite simple once you have a scheduler) is this:
'''''*IRQ fires*->Scheduler->ELF Program on Queue->Run ELF Program until an exit() is called (usually in crt0)->Take process off the Queue'''''
 
=== Dynamic Linking ===
{{Main|Dynamic Linker}}
 
Dynamic Linking is when the OS gives a program shared libraries if it needs them. Meaning, the libraries are found in the system and then "bind" to the program that needs them while the program is running, versus static linking, which links the libraries '''before''' the program is run. The main advantages are that programs take up less memory, and are smaller in file size. The main disadvantage, however, is that the program becomes less portable because the program depends on many different shared libraries.
 
In order to implement this, you need to have proper scheduling in place, a library, and a program to use that library.
You can create a library with GCC:
 
<pre>
myos-gcc -c -fPIC -o oneobject.o oneobject.c
Line 390 ⟶ 342:
myos-gcc -shared -fPIC -Wl,-soname,nameofmylib oneobject.o anotherobject.o -o mylib.so
</pre>
 
This library should be treated as a file, which is loaded when the OS detects its attempted usage. You will need to implement this "[[Dynamic Linker]]" into a certain classification of code such as in your memory management or your task management section. When the ELF program is run, the system should attach the shared object data to a malloc() region of memory, where the function calls to the libraries redirect to that malloc() region of memory. Once the program is finished, the region can be given up back to the OS with a call to free().
 
That should be a good starting point to writing a dynamic linker.
 
== See Also ==
=== Articles ===
* [[System V ABI]]
* [[Symbol Table]]
* [[Modular Kernel]]
* [[DWARF]]
 
=== External Links ===
* [http://www.skyfree.org/linux/references/ELF_Format.pdf The ELF file format] in detail
Line 406 ⟶ 356:
* [http://www.sco.com/developers/gabi/latest/contents.html System V ABI] about ELF
* [http://www.linuxfoundation.org/en/Specifications LSB specifications]<br />See (generic or platform-specific) 'Core' specifications for additional ELF information.
* [https://en.wikipedia.org/wiki/Executable_and_Linkable_Format Executable and Linkable Format on Wikipedia],which contains a detail of elf references
<!--
* [https://uclibc.org/docs/elf-64-gen.pdf The ELF file format(64-bit)] ELF 64-Bit, General extension to ELF32.
Link is dead.
*ftp://tsx.mit.edu/pub/linux/packages/GCC/ELF.doc.tar.gz
-->
* [http://en.wikipedia.org/wiki/Executable_and_Linkable_Format Executable and Linkable Format on Wikipedia],which contains a detail of elf references
* [http://downloads.openwatcom.org/ftp/devel/docs/elf-64-gen.pdf The ELF file format(64-bit)] ELF 64-Bit, General extension to ELF32.
* [http://www.x86-64.org/documentation/abi.pdf x86-64 ABI] Documented x86-64 specific extensions with ELF64.
* [http://www.robinhoksbergen.com/papers/howto_elf.html Manually Creating an ELF Executable] (dead, [https://web.archive.org/web/20140130143820/http://www.robinhoksbergen.com/papers/howto_elf.html link from archive.org]) Detailed guide on how to create ELF binaries from scratch.
* [https://www.youtube.com/playlist?list=PLZCIHSjpQ12woLj0sjsnqDH8yVuXwTy3p Handmade Linux x86 executables] Youtube playlist about Linux x86 executables, explains ELF binary structure
[[Category:ABI]]
[[Category:Executable Formats]]
[[Category:Object Files]]
[[Category:Standards]]
[[de:Executable and Linking Format]]