ELF Tutorial: Difference between revisions

Jump to navigation Jump to search
m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
m (Added article to 'executable formats' category.)
m (Bot: Replace deprecated source tag with syntaxhighlight)
Line 21:
The ELF file format has only one header with fixed placement: the ELF header, present at the beginning of every file. The format itself is extremely flexible as the positioning, size, and purpose of every header (save the ELF header) is described by another header in the file.
 
<sourcesyntaxhighlight lang="cpp">
# define ELF_NIDENT 16
 
Line 40:
Elf32_Half e_shstrndx;
} Elf32_Ehdr;
</syntaxhighlight>
</source>
 
The ELF header is the first header in an ELF file and it provides important information about the file (such as the machine type, architecture and byte order, etc.) as well as a means of identifying and checking whether the file is valid. The ELF header also provides information about other sections in the file, since they can appear in any order, vary in size, or may be absent from the file altogether. Universal to all ELF files are the first 4 bytes (the magic number) which are used identify the file. When working with the file through the '''Elf32_Ehdr''' type defined above, these 4 bytes are accessible from indexes 0 - 3 of the field '''e_ident'''.
 
<sourcesyntaxhighlight lang="cpp">
enum Elf_Ident {
EI_MAG0 = 0, // 0x7F
Line 65:
# define ELFDATA2LSB (1) // Little Endian
# define ELFCLASS32 (1) // 32-bit Architecture
</syntaxhighlight>
</source>
 
The first field in the header consists of 16 bytes, many of which provide important information about the ELF file such as the intended architecture, byte order, and ABI information. Since this tutorial focuses on implementing an x86 compatible loader, only relevant value definitions have been included.
 
<sourcesyntaxhighlight lang="cpp">
enum Elf_Type {
ET_NONE = 0, // Unkown Type
Line 78:
# define EM_386 (3) // x86 Machine Type
# define EV_CURRENT (1) // ELF Current Version
</syntaxhighlight>
</source>
 
The file header also provides information about the machine type and file type. Once again, only the relevant definitions have been included above.
Line 86:
Before an ELF file can be loaded, linked, relocated or otherwise processed, it's important to ensure that the machine trying to perform the aforementioned is able to do. This entails that the file is a valid ELF file targeting the local machine's architecture, byte order and CPU type, and that any operating system specific semantics are satisfied.
 
<sourcesyntaxhighlight lang="cpp">
bool elf_check_file(Elf32_Ehdr *hdr) {
if(!hdr) return false;
Line 107:
return true;
}
</syntaxhighlight>
</source>
 
Assuming that an ELF file has already been loaded into memory (either by the bootloader or otherwise), the first step to loading an ELF file is checking the ELF header for the magic number that should be present at the begining of the file. A minimal implementation of this could simply treat the image of file in memory as a string and do a comparison against a predefined string. In the example above, the comparision is done byte by byte through the ELF header type, and provides detailed feedback when the method encounters an error.
 
<sourcesyntaxhighlight lang="cpp">
bool elf_check_supported(Elf32_Ehdr *hdr) {
if(!elf_check_file(hdr)) {
Line 139:
return true;
}
</syntaxhighlight>
</source>
 
The next step to loading an ELF object is to check that the file in question is intended to run on the machine that has loaded it. Again, the ELF header provides the necessary information about the file's intended target. The code above assumes that you have implemented a function called '''elf_check_file'''() (or used the one provided above), and that the local machine is i386, little-endian and 32-bit. It also only allows for executable and relocatable files to be loaded, although this can be changed as necessary.
Line 145:
====Loading the ELF File====
 
<sourcesyntaxhighlight lang="cpp">
static inline void *elf_load_rel(Elf32_Ehdr *hdr) {
int result;
Line 177:
return NULL;
}
</syntaxhighlight>
</source>
 
==The ELF Section Header==
Line 183:
The ELF format defines a lot of different types of section and their relevant headers, not all of which are present in every file, and there's no guarantee on which order they appear in. Thus, in order to parse and process these sections the format also defines section headers, which contain information such as section names, sizes, locations and other relevant information. The list of all the section headers in an ELF image is referred to as the section header table.
 
<sourcesyntaxhighlight lang="cpp">
typedef struct {
Elf32_Word sh_name;
Line 196:
Elf32_Word sh_entsize;
} Elf32_Shdr;
</syntaxhighlight>
</source>
 
The section header table contains a number of important fields, some of which have different meanings for different sections. Another point of interest is that the '''sh_name''' field does not point directly to a string, instead it gives the offset of a string in the section name string table (the index of the table itself is defined in the ELF header by the field '''e_shstrndx'''). Each header also defines the position of the actual section in the file image in the field '''sh_offset''', as an offset from the beginning of the file.
 
<sourcesyntaxhighlight lang="cpp">
# define SHN_UNDEF (0x00) // Undefined/Not present
 
Line 217:
SHF_ALLOC = 0x02 // Exists in memory
};
</syntaxhighlight>
</source>
 
Above are a number of constants that are relevant to the tutorial (a good deal more exist). The enumeration '''ShT_Types''' defines a number of different types of sections, which correspond to values stored in the field '''sh_type''' in the section header. Similarly, '''ShT_Attributes''' corresponds to the field '''sh_flags''', but are bit flags rather than stand-alone values.
Line 225:
Getting access to the section header itself isn't very difficult: It's position in the file image is defined by '''e_shoff''' in the ELF header and the number of section headers is in turn defined by '''e_shnum'''. Notably, the first entry in the section header is a NULL entry; that is to say, fields in the header are 0. The section headers are continuous, so given a pointer to the first entry, subsequent entries can be accessed with simple pointer arithmetic or array operations.
 
<sourcesyntaxhighlight lang="cpp">
static inline Elf32_Shdr *elf_sheader(Elf32_Ehdr *hdr) {
return (Elf32_Shdr *)((int)hdr + hdr->e_shoff);
Line 233:
return &elf_sheader(hdr)[idx];
}
</syntaxhighlight>
</source>
 
The two methods above provide convenient access to section headers on a by-index basis using the principles noted above, and they will be used frequently in the example code that follows.
Line 248:
An example of the process is shown in the two convenience methods below.
 
<sourcesyntaxhighlight lang="cpp">
static inline char *elf_str_table(Elf32_Ehdr *hdr) {
if(hdr->e_shstrndx == SHN_UNDEF) return NULL;
Line 259:
return strtab + offset;
}
</syntaxhighlight>
</source>
 
Note that before you attempt to access the name of a section, you should first check that the section has a name (the offset given by sh_name is not equal to '''SHN_UNDEF''').
Line 271:
The symbol table is a section (or a number of sections) that exist within the ELF file and define the location, type, visibility and other traits of various symbols declared in the original source, created during compilation or linking, or otherwise present in the file. Since an ELF object can have multiple symbol tables, it is necessary to either iterate over the file's section headers, or to follow a reference from another section in order to access one.
 
<sourcesyntaxhighlight lang="cpp">
typedef struct {
Elf32_Word st_name;
Line 280:
Elf32_Half st_shndx;
} Elf32_Sym;
</syntaxhighlight>
</source>
 
Each symbol table entry contains a number of notable bits of information such as the symbol name ('''st_name''', may be '''STN_UNDEF'''), the symbol's value ('''st_value''', may be absolute or relative address of value), and the field '''st_info''' which contains both the symbol type and binding. As an aside, the first entry in each symbol table is a NULL entry, so all of it's fields are 0.
 
<sourcesyntaxhighlight lang="cpp">
# define ELF32_ST_BIND(INFO) ((INFO) >> 4)
# define ELF32_ST_TYPE(INFO) ((INFO) & 0x0F)
Line 299:
STT_FUNC = 2 // Methods or functions
};
</syntaxhighlight>
</source>
 
As mentioned above, '''st_info''' contains both the symbol type and biding, so the 2 macros above provide access to the individual values. The enumeration '''StT_Types''' provides a number of possible symbol types, and '''StB_Bindings''' provides possible symbol bindings.
Line 307:
Some operation such as linking and relocation require the value of a symbol (or rather, the address thereof). Although the symbol table entries do define a field '''st_value''', it may only contain a relative address. Below is an example of how to compute the absolute address of the value of the symbol. The code has been broken up into multple smaller section so that it is easier to understand.
 
<sourcesyntaxhighlight lang="cpp">
static int elf_get_symval(Elf32_Ehdr *hdr, int table, uint idx) {
if(table == SHN_UNDEF || idx == SHN_UNDEF) return 0;
Line 320:
int symaddr = (int)hdr + symtab->sh_offset;
Elf32_Sym *symbol = &((Elf32_Sym *)symaddr)[idx];
</syntaxhighlight>
</source>
 
The above performs a check against both the symbol table index and the symbol index; if either is undefined, 0 is returned. Otherwise the section header entry for the symbol table at the given index is accessed. It then checks that the symbol table index is not outside the bounds of the symbol table. If the check fails an error message is displayed and an error code is returned, otherwise the symbol table entry at the given index is is retrieved.
 
<sourcesyntaxhighlight lang="cpp">
if(symbol->st_shndx == SHN_UNDEF) {
// External symbol, lookup value
Line 345:
return (int)target;
}
</syntaxhighlight>
</source>
 
If the section to which the symbol is relative (given by '''st_shndx''') is equal to '''SHN_UNDEF''', the symbol is external and must be linked to its definition. The string table is retrieved for the current symbol table (the string table for a given symbol table is available in the table's section header in '''sh_link'''), and the symbol's name is found in the string table. Next the function '''elf_lookup_symbol'''() is used to find a symbol definition by name (this function is not provided, a minimal implementation always return NULL). If the symbol definition is found, it is returned. If the symbol has the '''STB_WEAK''' flag (is a weak symbol) 0 is returned, otherwise an error message is displayed and an error code returned.
 
<sourcesyntaxhighlight lang="cpp">
} else if(symbol->st_shndx == SHN_ABS) {
// Absolute symbol
Line 359:
}
}
</syntaxhighlight>
</source>
 
 
Line 413:
Relocation starts with a table of relocation entries, which can be located using the relevant section header. There are actually two different kinds of relocation structures; one with an explicit added (section type '''SHT_RELA'''), one without (section type '''SHT_REL'''). Relocation entires in the table are continuous and the number of entries in a given table can be found by dividing the size of the table (given by '''sh_size''' in the section header) by the size of each entry (given by '''sh_entsize'''). Each relocation table is specific to a single section, so a single file may have multiple relocation tables (but all entries within a given table will be the same relocation structure type).
 
<sourcesyntaxhighlight lang="cpp">
typedef struct {
Elf32_Addr r_offset;
Line 424:
Elf32_Sword r_addend;
} Elf32_Rela;
</syntaxhighlight>
</source>
 
The above are the definitions for the different structure types for relocations. Of note if the value stored in '''r_info''', as the upper byte designates the entry in the symbol table to which the relocation applies, whereas the lower byte stores the type of relocation that should be applied. Note that an ELF file may have multiple symbol tables, thus the index of the section header table that refers to the symbol table to which these relocation apply can be found in the '''sh_link''' field on this relocation table's section header. The value in '''r_offset''' gives the relative position of the symbol that is being relocated, within its section.
 
<sourcesyntaxhighlight lang="cpp">
# define ELF32_R_SYM(INFO) ((INFO) >> 8)
# define ELF32_R_TYPE(INFO) ((uint8_t)(INFO))
Line 437:
R_386_PC32 = 2 // Symbol + Offset - Section Offset
};
</syntaxhighlight>
</source>
 
As previously mentioned, the '''r_info''' field in '''Elf32_Rel'''('''a''') refers to 2 separate values, thus the set of macro functions above can be used to attain the individual values; '''ELF32_R_SYM'''() provides access to the symbol index and '''ELF32_R_TYPE'''() provides access to the relocation type. The enumeration '''RtT_Types''' defines the relocation typs this tutorial will encompass.
Line 480:
As the following function is fairly complex, it's been broken up into smaller manageable chunks and explained in detail. Note that the code shown below assumes that the file being relocated is a relocatable ELF file (ELF executables and shared objects may also contain relocation entries, but are processed somewhat differently). Also note that '''sh_info''' for section headers of type '''SHT_REL''' and '''SHT_RELA''' stores the section header to which the relocation applies.
 
<sourcesyntaxhighlight lang="cpp">
# define DO_386_32(S, A) ((S) + (A))
# define DO_386_PC32(S, A, P) ((S) + (A) - (P))
Line 489:
int addr = (int)hdr + target->sh_offset;
int *ref = (int *)(addr + rel->r_offset);
</syntaxhighlight>
</source>
 
The above code defines the macro functions that are used to complete relocation calculations. It also retrieves the section header for the section wherein the symbol exists and computes a reference to the symbol. The variable '''addr''' denotes the start of the symbol's section, and '''ref''' is created by adding the offset to the symbol from the relocation entry.
 
<sourcesyntaxhighlight lang="cpp">
// Symbol value
int symval = 0;
Line 500:
if(symval == ELF_RELOC_ERR) return ELF_RELOC_ERR;
}
</syntaxhighlight>
</source>
 
Next the value of the symbol being relocated is accessed. If the symbol table index stored in '''r_info''' is undefined, then the value defaults to 0. The code also references a function called '''elf_get_symval'''(), which was implemented previously. If the value returned by the function is equal to '''ELF_RELOC_ERR''', relocation is stopped and said error code is returned.
 
<sourcesyntaxhighlight lang="cpp">
// Relocate based on type
switch(ELF32_R_TYPE(rel->r_info)) {
Line 525:
return symval;
}
</syntaxhighlight>
</source>
 
Finally, this segment of code details the actual relocation process, performing the necessary calculating the relocated symbol and returning it's value on success. If the relocation type is unsupported an error message is displayed, relocation is stopped and the function returns an error code. Assuming no errors have occurred, relocation is now complete.
Line 533:
The program header is a structure that defines information about how the ELF program behaves once it's been loaded, as well as runtime linking information. ELF program headers (much like section headers) are all grouped together to make up the program header table.
 
<sourcesyntaxhighlight lang="cpp">
typedef struct {
Elf32_Word p_type;
Line 544:
Elf32_Word p_align;
} Elf32_Phdr;
</syntaxhighlight>
</source>
 
The program header table contains a continuous set of program headers (thus they can be accessed as if they were an array). The table itself can be accessed using the '''e_phoff''' field defined in the ELF header, assuming that it is present. The header itself defines a number of useful fields like '''p_type''' which distinguishes between headers, '''p_offset''' which stores the offset to the segment the header refers to, and '''p_vaddr''' which defines the address at which position-dependent code should exist.
Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu