Linker Scripts
GNU ld supports configuration of the linking process via the use of linker scripts. Linker scripts are written in a specialized scripting language specific to the GNU ld application. While the use of a linker script file is not strictly required by ld
, it provides a more more maintainable alternative to specifying complex configuration options as arguments on the command line. A Linker script's principal use is specifying the format and layout of the final executable binary. This is of particular relevance to operating-system development, where executable binaries will often require specific file layouts in order to be recognized by certain bootloaders. GNU GRUB is one such bootloader.
Linker scripts are typically invoked via the use of the -T script.ld
command line argument when calling the ld
application.
Keywords
Listed below are a selection of significant keywords used within linker scripts.
ENTRY
ENTRY(main) ENTRY(MultibootEntry)
The ENTRY
keyword is used for defining the entry point of an application, that being specifically the first executable instruction in the output file. This keyword accepts the symbol name for the entry point of the linked program/kernel as a single argument. The code pointed to by the symbol name provided will be the first byte of the .text
section in ELF and PE binaries.
OUTPUT_FORMAT
OUTPUT_FORMAT(elf64-x86-64) OUTPUT_FORMAT("pe-i386")
The OUTPUT_FORMAT
directive takes a single argument. It specifies the output format of the executable. To find out what output formats are supported by your system binutils and GCC, the objdump -i
command may be used.
Some commonly used formats are detailed below:
Format | Description |
---|---|
binary
|
A flat binary with no formatting at all. |
elf32-i386
|
32-bit ELF format for the i386 architecture. |
elf64-x86-64
|
64-bit ELF format for the x86-64 architecture. |
pe-i386
|
32-bit PE format for the i386 architecture. |
STARTUP
STARTUP(Boot.o) STARTUP(crt0.o)
STARTUP takes one argument. It is which file you want to be linked to the very beginning of the executable. For userland programs, this is usually crt0.o or crtbegin.o. For kernels, it is usually the file that contains your assembly boilerplate that initiates the stack and in some cases GDT and such and then calls your kmain().
SEARCH_DIR
SEARCH_DIR(Directory)
This will add a path to your library search directory. The -nostdlib flag will cause any library found in this path to be effectively ignored. I'm not sure why, it just seems to be how ld works. It treats linker script specified search directories as standard directories, and therefore ignores them with -no-default-libs and such flags
INPUT
INPUT(File1.o File2.o File3.o ...) INPUT ( File1.o File2.o File3.o ... )
INPUT is a 'in-linker script' replacement for adding object files to the command line. Where you would usually specify something like ld File1.o File2.o, the INPUT section can be used to do this inside the linker script instead.
OUTPUT
OUTPUT(Kernel.bin)
The OUTPUT
command specifies the file to be generated as the output of the linking process. This is the name of the final binary created. The effect of this command is identical to the effect of the -o filename
command line flag, which overrides it.
MEMORY
MEMORY { ROM (rx) : ORIGIN = 0, LENGTH = 256k RAM (wx) : org = 0x00100000, len = 1M }
MEMORY declares one or more memory regions with attributes specifying whether the region can be written to, read from or executed. This is mostly used in embedded systems where different regions of address space may contain different access permissions.
The example script above tells the linker that there are two memory regions:
a) "ROM" starts at address 0x00000000, is 256kB in length, can be read and executed.
b) "RAM" starts at address 0x00100000, is 1MB in length, can be written, read and executed.
SECTIONS
SECTIONS { .text.start (_KERNEL_BASE_) : { startup.o( .text ) } .text : ALIGN(CONSTANT(MAXPAGESIZE)) { _TEXT_START_ = .; *(.text) _TEXT_END_ = .; } .data : ALIGN(CONSTANT(MAXPAGESIZE)) { _DATA_START_ = .; *(.data) _DATA_END_ = .; } .bss : ALIGN(CONSTANT(MAXPAGESIZE)) { _BSS_START_ = .; *(.bss) _BSS_END_ = .; } }
This script tells the linker to place code from the ".text" section in startup.o at the beginning, starting at logical address _KERNEL_BASE_
.
This is then followed by page-aligned sections for all ".text", ".data" and ".bss" sections of all the other input files. If your sections are not properly page aligned and separated, GNU ld will generate a "LOAD segment with RWX permissions" warning.
Note that different architectures have different page sizes. You can modify the size of "MAXPAGESIZE
" by adding "-z max-page-size=0x1000
" switch to the linker. This behavior only exists on GNU ld.
Linker symbols are defined holding the start and end address of each section. These symbols have external linkage within the application and are accessible as pointers within the code. Additional information on linker file symbols can be found below.
KEEP
The KEEP
statement within a linker script will instruct the linker to keep the specified section, even if no symbols inside it are referenced. This statement is used within the SECTIONS
section of the linker script. This becomes relevant when garbage collection is performed at link time, enabled by passing the --gc-sections
switch to the linker. The KEEP
statement instructs the linker to use the specified section as a root node when creating a dependency graph, looking for unused sections. Essentially forcing the section to be marked as used.
This statement is commonly seen in linker scripts targeting the ARM architecture for placing the interrupt vector table at offset 0x00000000
. Without this directive the table, which might not be referenced explicitly in code, would be pruned out.
SECTIONS { .text : { KEEP(*(.text.ivt)) *(.text.boot) *(.text*) } > ROM /** ... **/ }
Symbols
It is possible to define arbitrary symbols within a linker script. These symbols are added into the program's symbol table. Each symbol in the table has a name and an associated address. Symbols within a linker script that have been assigned a value will be given external linkage, and are accessible within the program's code as pointers.
An example script showing three different places that symbol assignment may be used can be seen below:
floating_point = 0; SECTIONS { .text : { *(.text) _etext = .; } _bdata = (. + 3) & ~ 3; .data : { *(.data) } }
In the above example, the symbol floating_point
has been defined as zero. The symbol _etext
has been defined as the address following the last .text input section. The symbol _bdata
has been defined as the address following the .text output section aligned upward to a 4 byte boundary.
An example usage of these symbols in C code can be seen below:
/** Externally linked symbol */
extern uintptr_t _etext;
// ...
/** Pointer to the binary data at the address stored in the symbol expression. */
uint32_t* item = &_etext;