Object Files: Difference between revisions

use appropriate template
[unchecked revision][unchecked revision]
(Categorisation)
(use appropriate template)
 
(25 intermediate revisions by 8 users not shown)
Line 1:
{{You}}
Object files basically consists of compiled and assembled code, data, and all the additional information necessary to make their content usable. In the process of building an operating system, you will encounter lots of object files. While for common development tasks you do not need to know their exact details, but when you want to create or use one with various specifics, the details can be very important.
 
Object files basically consistsconsist of compiled and assembled code, data, and all the additional information necessary to make their content usable. In the process of building an operating system, you will encounteruse lotsa lot of object files. While for common development tasks you do not need to know their exact details, but when you want to create or use one with various specifics, the details can be very important.
== Objects and executables ==
 
Whereas wikipedia considers executables to be a subset of object files, there are significant differences. In some systems, they are a completely different format (COFF vs PE), or they different fields (ELF program/section headers). The key difference is that in executables the addresses have been resolved, while in object files they are not. This makes that non-executable files do not contain working code.
'''Note:''' The term 'object file' has no relation to the high-level concept of 'object-oriented programming'. Object files predate the earliest forms of OOP (the 'Actors model', circa 1966) by over ten years, and the term was in use at IBM by 1958 or earlier.
 
== Core Concepts ==
An object file is one of three types of files which contain ''object code'', a modified form of machine code in which additional information that allows for linking and relocation of the final loaded executable is included.
 
For most purposes, a compiler or assembler will produce object code as its final result, rather than a true executable binary. While most assemblers, and some compilers, have an option to produce a raw binary image, this is usually only applied to boot loaders, Read-Only Memory chips, and other special-purpose executables. As a practical matter, virtually all systems today use both object files and relocatable executables. Even the simplest file format in regular use today, the [https://en.wikipedia.org/wiki/COM_file MS-DOS .COM format], is not a pure binary executable image; the MS-DOS loader uses the first 0x100 bytes of the segment for the [https://en.wikipedia.org/wiki/Program_Segment_Prefix Program Segment Prefix], so that portion of the segment image is excluded in the file.
 
The majority of systems today have significantly more complex object formats, in which the address information is replaced with a stub or symbol of some kind, and which contain information regarding the relative location of externally visible functions, variables, and so forth. This facilitates the processes of
* linking, in which two or more object and/or library files are combined to form an executable file, and
* loading, where the address stubs are replaced by the loader with the actual memory locations that the code will reside within the process' memory.
 
== Object files, executable files and library files ==
Whereas wikipediaWikipedia considers executables to be a subset of object files, on the basis that both contain object code rather than a binary image, there are significant differences. In some systems, they are a completely different format (COFF vs PE), or they have different fields (ELF program/section headers). The key difference is that in executables, the addressesfull haveobject beencode resolved,of whilethe program is present (save for whatever may be in shared libraries, as explained below), while object files are only the object code for the specific module that they arewere notgenerated from. This makesmeans that non-executable files do not contain workingloadable code.
 
As stated earlier, this does not necessarily mean that the 'executable' file is the actual binary image that is executed; in most modern systems, that is produced in the loading step. In many cases, executable files still contain object code, not pure machine code, and the address locations may not be resolved until loading, but they ''do'' include all of the statically-linked code for the working program. Some linkers (such as [[LD|ld]], the Unix/Linux linker, which is invoked implicitly by GCC when generating an executable) have the option of - or even default to, as [[LD|ld]] does - resolving the addresses at link time, but even in this case, the executable files generated usually contain additional information to facilitate the loading process - e.g., a separate read-only data section, a definition of the writable data area (sometimes called the ''[https://en.wikipedia.org/wiki/.bss .bss section]''), a section defining the stack area, etc. - and may have linkage information for using shared libraries.
 
A third type of object code file is a ''library file'', a file that contains elements used by several programs, and made available for general use. Most functions, variables, and other elements used by the majority of programs are held in libraries. Libraries differ from regular object files mainly in that (on most systems) they are arranged so that independent elements of the library can be extracted from the file by the linker, so that only the elements used by the program are included in the executable generated from them.
 
On most systems today, libraries come in two types, ''static libraries'', which are directly linked into into the executable file at linkage time, and ''shared libraries'' (also called ''dynamic-link libraries'' or DLLs in the Windows world), which are loaded and linked at run time to the programs which use them. The main difference is that shared libraries, as the name implies, can be shared by several programs at once, lowering memory usage. However, this comes at the cost of having to load the shared library in addition to the executable file when the first use of an element in them is made, and then linking it to the program(s) using them at run-time. Shared libraries are generally cached, to reduce the loading overhead, and usually are not loaded until the elements in them are actually used, meaning that if the part of the program using the shared library is not invoked, the library need not be loaded at all. Still, the trade-offs are such that code which isn't likely to be shared by several programs at once generally is linked as a static library instead, with only very common elements (e.g., the standard C and C++ libraries) being dynamically linked.
 
== Relocations ==
Line 22 ⟶ 41:
 
=== Relocating code ===
When an executable is created, it will be set to use a specific address by default. This can be a problem when you need several object files in the same address space and they may overlap, or you want to perform address space randomisationrandomization, you might find relocating an executable an option.
 
Since relocations are only needed to build an executable, but not when you run it, they normally aren't present in a linked file. Instead you need to specifically tell the linker to emit relocations when necessary. For the [[GCC Cross-Compiler]], this can be done with the <tt>-q</tt> switch. Note that the <tt>-i</tt> and <tt>-r</tt> switches have a similar description, but cause the linker to yield an object file rather than an executable.
Line 38 ⟶ 57:
 
== Common errors ==
* '''Passing -i or -r to ld.''' There's actually a tutorial that does this. It does not work except for some limited cases, as it generates a file where relocations have not been applied at all.
* '''Assuming code and data are continuous.''' A pitfall when trying to make a PE file multiboot-compatible. A section is generally page-aligned (4k), but a PE file is sector-aligned (512b). So if a section is not multiple of 4k in size, relative addresses to the data section will be off by a multiple of 512 bytes as the gap has been removed from the binary. Worse, it is perfectly valid to have metadata sections between the various loadable sections, which can put addresses off.
* '''Loading as a flat binary.''' All executables that aren't flat binaries have a header up front. Blatantly loading a file and starting at the start will execute the header instead of your code. Again, there is a tutorial that tries to get away with this.
* '''Assuming the entry point is at the start'''. The linker has a certain amount of freedom in what order it loads the object files, and so does the compiler. That makesmeans that main doesn't need to be at the start of the code section.
 
[[Category:ExecutableObject FormatsFiles]] [[Category:Theory]]