Linker Scripts: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content deleted Content added
No edit summary
m Undo revision 27012 by Xtex (talk)
Tag: Undo
 
(21 intermediate revisions by 8 users not shown)
Line 1: Line 1:
[[LD|GNU ld]] supports configuration of the linking process via the use of linker scripts. Linker scripts are written in a specialized scripting language specific to the GNU ld application. While the use of a linker script file is not strictly required by <tt>ld</tt>, it provides a more more maintainable alternative to specifying complex configuration options as arguments on the command line. A Linker script's principal use is specifying the format and layout of the final executable binary. This is of particular relevance to operating-system development, where executable binaries will often require specific file layouts in order to be recognized by certain [[Bootloader|bootloaders]]. [[GRUB|GNU GRUB]] is one such bootloader.<br/>
{{In Progress}}
Linker scripts are typically invoked via the use of the <tt>-T script.ld</tt> command line argument when calling the <tt>ld</tt> application.


==Introduction==
== Keywords ==
Listed below are a selection of significant keywords used within linker scripts.
So what exactly is a linker script, and do I really need one? In all basic cases, you can technically do without, as you can specify a lot of options on the [[LD|GNU ld]] command line, but how fun is that to type 100 characters over and over? A linker script is basically a nice little file you can create that ld will parse and link according to it.


==Keywords==
==== ENTRY ====
I won't cover all of the keywords here, but I will cover the ones that will most likely be used.

====ENTRY====
<pre>
<pre>
ENTRY(main)
ENTRY(main)
Line 13: Line 11:
</pre>
</pre>


ENTRY takes one argument. That is the symbol name for the entry point of the linked program/kernel. This can be "start" or "__main" or whatever, it's really up to you, but this will be the very first byte of your loaded program(or the first byte of the .text section in [[ELF]] and [[PE]] binaries).
The <tt>ENTRY</tt> keyword is used for defining the entry point of an application, that being specifically the first executable instruction in the output file. This keyword accepts the symbol name for the entry point of the linked program/kernel as a single argument. The code pointed to by the symbol name provided will be the first byte of the <tt>.text</tt> section in [[ELF]] and [[PE]] binaries.


====OUTPUT_FORMAT====
==== OUTPUT_FORMAT ====
<pre>
<pre>
OUTPUT_FORMAT(elf64-x86-64)
OUTPUT_FORMAT(elf64-x86-64)
Line 21: Line 19:
</pre>
</pre>


OUTPUT_FORMAT also takes one argument. It specifies what the output format(duh) of your executable will be. To find out what formats your binutils and GCC supports, just use 'objdump -i'. Some of the more common formats are
The <tt>OUTPUT_FORMAT</tt> directive takes a single argument. It specifies the output format of the executable. To find out what output formats are supported by your system binutils and GCC, the <code lang="bash">objdump -i</code> command may be used.
* binary --This is just a flat binary with no formatting at all
* elf32-i386 --This is just the ELF format, usually little endian too.
* elf64-x86-64 --This is the ELF format for 64bit, usually little endian.
* pe-i386 --The PE format


Some commonly used formats are detailed below:
====STARTUP====

{| {{wikitable}}
|-
! Format
! Description
|-
| <tt>binary</tt>
| A flat binary with no formatting at all.
|-
| <tt>elf32-i386</tt>
| 32-bit ELF format for the i386 architecture.
|-
| <tt>elf64-x86-64</tt>
| 64-bit ELF format for the x86-64 architecture.
|-
| <tt>pe-i386</tt>
| 32-bit PE format for the i386 architecture.
|}

==== STARTUP ====
<pre>
<pre>
STARTUP(Boot.o)
STARTUP(Boot.o)
Line 33: Line 47:
</pre>
</pre>


STARTUP takes on argument. It is which file you want to be linked to the very beginning of the executable. For userland programs, this is usually crt0.o or crtbegin.o. For kernels, it is usually the file that contains your assembly boilerplate that initiates the stack and in some cases GDT and such and then calls your kmain().
STARTUP takes one argument. It is which file you want to be linked to the very beginning of the executable. For userland programs, this is usually crt0.o or crtbegin.o. For kernels, it is usually the file that contains your assembly boilerplate that initiates the stack and in some cases GDT and such and then calls your kmain().


====SEARCH_DIR====
==== SEARCH_DIR ====
<pre>
<pre>
SEARCH_DIR(Directory)
SEARCH_DIR(Directory)
Line 42: Line 56:
This will add a path to your library search directory. The -nostdlib flag will cause any library found in this path to be effectively ignored. I'm not sure why, it just seems to be how ld works. It treats linker script specified search directories as standard directories, and therefore ignores them with -no-default-libs and such flags
This will add a path to your library search directory. The -nostdlib flag will cause any library found in this path to be effectively ignored. I'm not sure why, it just seems to be how ld works. It treats linker script specified search directories as standard directories, and therefore ignores them with -no-default-libs and such flags


====INPUT====
==== INPUT ====
<pre>
<pre>
INPUT(File1.o File2.o File3.o ...)
INPUT(File1.o File2.o File3.o ...)
Line 56: Line 70:
INPUT is a 'in-linker script' replacement for adding object files to the command line. Where you would usually specify something like ''ld File1.o File2.o'', the INPUT section can be used to do this inside the linker script instead.
INPUT is a 'in-linker script' replacement for adding object files to the command line. Where you would usually specify something like ''ld File1.o File2.o'', the INPUT section can be used to do this inside the linker script instead.


====OUTPUT====
==== OUTPUT ====
<pre>
<pre>
OUTPUT(Kernel.bin)
OUTPUT(Kernel.bin)
</pre>
</pre>


OUTPUT specifies the file that has to be output by the linker. This is the name of the executable. For applications on Windows, for example, this is usually ''.exe''.
The <tt>OUTPUT</tt> command specifies the file to be generated as the output of the linking process. This is the name of the final binary created. The effect of this command is identical to the effect of the <code lang="bash">-o filename</code> command line flag, which overrides it.

====MEMORY====
<pre>
MEMORY
{
...
}
</pre>

MEMORY declares one or more memory regions with attributes specifying whether the region can be written to, read from or executed. This is mostly used in embedded systems where different regions of address space may contain different access permissions.

For example:


==== MEMORY ====
<pre>
<pre>
MEMORY
MEMORY
Line 83: Line 86:
</pre>
</pre>


MEMORY declares one or more memory regions with attributes specifying whether the region can be written to, read from or executed. This is mostly used in embedded systems where different regions of address space may contain different access permissions.
This script tells the linker that there are two memory regions.

a) "ROM" starts at address 0x00000000, is 256kB in length, can be read and executed.
The example script above tells the linker that there are two memory regions:<br/>
a) "ROM" starts at address 0x00000000, is 256kB in length, can be read and executed.<br/>
b) "RAM" starts at address 0x00100000, is 1MB in length, can be written, read and executed.
b) "RAM" starts at address 0x00100000, is 1MB in length, can be written, read and executed.


====SECTIONS====
==== SECTIONS ====
<pre>
<pre>
SECTIONS
SECTIONS
Line 95: Line 100:
}
}


.text : ALIGN(0x1000) {
.text : ALIGN(CONSTANT(MAXPAGESIZE)) {
_TEXT_START_ = .;
_TEXT_START_ = .;
*(.text)
*(.text)
Line 101: Line 106:
}
}


.data : ALIGN(0x1000) {
.data : ALIGN(CONSTANT(MAXPAGESIZE)) {
_DATA_START_ = .;
_DATA_START_ = .;
*(.data)
*(.data)
Line 107: Line 112:
}
}


.bss : ALIGN(0x1000) {
.bss : ALIGN(CONSTANT(MAXPAGESIZE)) {
_BSS_START_ = .;
_BSS_START_ = .;
*(.bss)
*(.bss)
_BSS_END_ = .;
_BSS_END_ = .;
}
}
}
</pre>


This script tells the linker to place code from the ".text" section in startup.o at the beginning, starting at logical address <tt>_KERNEL_BASE_</tt>.
This is then followed by page-aligned sections for all ".text", ".data" and ".bss" sections of all the other input files. If your sections are not properly page aligned and separated, GNU ld will generate a "LOAD segment with RWX permissions" warning.


Note that different architectures have different page sizes. You can modify the size of "<tt>MAXPAGESIZE</tt>" by adding "<tt>-z max-page-size=0x1000</tt>" switch to the linker. This behavior only exists on GNU ld.

Linker symbols are defined holding the start and end address of each section. These symbols have external linkage within the application and are accessible as pointers within the code. Additional information on linker file symbols can be found below.

==== KEEP ====

The <tt>KEEP</tt> statement within a linker script will instruct the linker to keep the specified section, even if no symbols inside it are referenced. This statement is used within the <tt>SECTIONS</tt> section of the linker script. This becomes relevant when garbage collection is performed at link time, enabled by passing the <tt>--gc-sections</tt> switch to the linker. The <tt>KEEP</tt> statement instructs the linker to use the specified section as a root node when creating a dependency graph, looking for unused sections. Essentially forcing the section to be marked as used.

This statement is commonly seen in linker scripts targeting the [[:Category:ARM|ARM]] architecture for placing the interrupt vector table at offset <tt>0x00000000</tt>. Without this directive the table, which might not be referenced explicitly in code, would be pruned out.

<pre>
SECTIONS
{
.text :
{
KEEP(*(.text.ivt))
*(.text.boot)
*(.text*)
} > ROM

/** ... **/
}
}
</pre>
</pre>


== Symbols ==
This script tells the linker to place code from the ".text" section in startup.o at the beginning, starting at logical address _KERNEL_BASE_.
It is possible to define arbitrary symbols within a linker script. These symbols are added into the program's symbol table. Each symbol in the table has a name and an associated address. Symbols within a linker script that have been assigned a value will be given external linkage, and are accessible within the program's code as pointers.
This is then followed by page-aligned sections for all ".text", ".data" and ".bss" sections of all the other input files. Linker variables are

defined for the start and end address of each section; C(++) code can refer to these using 'extern "C" void <varname>', and taking the address
An example script showing three different places that symbol assignment may be used can be seen below:
(&<varname>) to use the value.

<pre>
floating_point = 0;
SECTIONS
{
.text :
{
*(.text)
_etext = .;
}
_bdata = (. + 3) & ~ 3;
.data : { *(.data) }
}
</pre>

In the above example, the symbol <tt>floating_point</tt> has been defined as zero. The symbol <tt>_etext</tt> has been defined as the address following the last .text input section. The symbol <tt>_bdata</tt> has been defined as the address following the .text output section aligned upward to a 4 byte boundary.


An example usage of these symbols in C code can be seen below:

<syntaxhighlight lang="c">
/** Externally linked symbol */
extern uintptr_t _etext;
// ...
/** Pointer to the binary data at the address stored in the symbol expression. */
uint32_t* item = &_etext;
</syntaxhighlight>



==See Also==
== See Also ==


=== External Links ===
=== External Links ===
* [http://sourceware.org/binutils/docs-2.21/ld/index.html gnu ld online manual]
* [https://sourceware.org/binutils/docs/ld/Scripts.html#Scripts GNU ld manual on linker scripts]
* [https://sourceware.org/binutils/docs/ld/ GNU ld manual (full)]
* [http://www.bravegnu.org/gnu-eprog/lds.html GNU ld script example]
* [http://www.bravegnu.org/gnu-eprog/lds.html GNU ld script example]



Latest revision as of 01:13, 10 June 2024

GNU ld supports configuration of the linking process via the use of linker scripts. Linker scripts are written in a specialized scripting language specific to the GNU ld application. While the use of a linker script file is not strictly required by ld, it provides a more more maintainable alternative to specifying complex configuration options as arguments on the command line. A Linker script's principal use is specifying the format and layout of the final executable binary. This is of particular relevance to operating-system development, where executable binaries will often require specific file layouts in order to be recognized by certain bootloaders. GNU GRUB is one such bootloader.
Linker scripts are typically invoked via the use of the -T script.ld command line argument when calling the ld application.

Keywords

Listed below are a selection of significant keywords used within linker scripts.

ENTRY

ENTRY(main)
ENTRY(MultibootEntry)

The ENTRY keyword is used for defining the entry point of an application, that being specifically the first executable instruction in the output file. This keyword accepts the symbol name for the entry point of the linked program/kernel as a single argument. The code pointed to by the symbol name provided will be the first byte of the .text section in ELF and PE binaries.

OUTPUT_FORMAT

OUTPUT_FORMAT(elf64-x86-64)
OUTPUT_FORMAT("pe-i386")

The OUTPUT_FORMAT directive takes a single argument. It specifies the output format of the executable. To find out what output formats are supported by your system binutils and GCC, the objdump -i command may be used.

Some commonly used formats are detailed below:

Format Description
binary A flat binary with no formatting at all.
elf32-i386 32-bit ELF format for the i386 architecture.
elf64-x86-64 64-bit ELF format for the x86-64 architecture.
pe-i386 32-bit PE format for the i386 architecture.

STARTUP

STARTUP(Boot.o)
STARTUP(crt0.o)

STARTUP takes one argument. It is which file you want to be linked to the very beginning of the executable. For userland programs, this is usually crt0.o or crtbegin.o. For kernels, it is usually the file that contains your assembly boilerplate that initiates the stack and in some cases GDT and such and then calls your kmain().

SEARCH_DIR

SEARCH_DIR(Directory)

This will add a path to your library search directory. The -nostdlib flag will cause any library found in this path to be effectively ignored. I'm not sure why, it just seems to be how ld works. It treats linker script specified search directories as standard directories, and therefore ignores them with -no-default-libs and such flags

INPUT

INPUT(File1.o File2.o File3.o ...)
INPUT
(
	File1.o
	File2.o
	File3.o
	...
)

INPUT is a 'in-linker script' replacement for adding object files to the command line. Where you would usually specify something like ld File1.o File2.o, the INPUT section can be used to do this inside the linker script instead.

OUTPUT

OUTPUT(Kernel.bin)

The OUTPUT command specifies the file to be generated as the output of the linking process. This is the name of the final binary created. The effect of this command is identical to the effect of the -o filename command line flag, which overrides it.

MEMORY

MEMORY
{
    ROM (rx) : ORIGIN = 0, LENGTH = 256k
    RAM (wx) : org = 0x00100000, len = 1M
}

MEMORY declares one or more memory regions with attributes specifying whether the region can be written to, read from or executed. This is mostly used in embedded systems where different regions of address space may contain different access permissions.

The example script above tells the linker that there are two memory regions:
a) "ROM" starts at address 0x00000000, is 256kB in length, can be read and executed.
b) "RAM" starts at address 0x00100000, is 1MB in length, can be written, read and executed.

SECTIONS

SECTIONS
{
  .text.start (_KERNEL_BASE_) : {
    startup.o( .text )
  }

  .text : ALIGN(CONSTANT(MAXPAGESIZE)) {
_TEXT_START_ = .;
    *(.text)
_TEXT_END_ = .;
  }

  .data : ALIGN(CONSTANT(MAXPAGESIZE)) {
_DATA_START_ = .;
    *(.data)
_DATA_END_ = .;
  }

  .bss : ALIGN(CONSTANT(MAXPAGESIZE)) {
_BSS_START_ = .;
    *(.bss)
_BSS_END_ = .;
  }
}

This script tells the linker to place code from the ".text" section in startup.o at the beginning, starting at logical address _KERNEL_BASE_. This is then followed by page-aligned sections for all ".text", ".data" and ".bss" sections of all the other input files. If your sections are not properly page aligned and separated, GNU ld will generate a "LOAD segment with RWX permissions" warning.

Note that different architectures have different page sizes. You can modify the size of "MAXPAGESIZE" by adding "-z max-page-size=0x1000" switch to the linker. This behavior only exists on GNU ld.

Linker symbols are defined holding the start and end address of each section. These symbols have external linkage within the application and are accessible as pointers within the code. Additional information on linker file symbols can be found below.

KEEP

The KEEP statement within a linker script will instruct the linker to keep the specified section, even if no symbols inside it are referenced. This statement is used within the SECTIONS section of the linker script. This becomes relevant when garbage collection is performed at link time, enabled by passing the --gc-sections switch to the linker. The KEEP statement instructs the linker to use the specified section as a root node when creating a dependency graph, looking for unused sections. Essentially forcing the section to be marked as used.

This statement is commonly seen in linker scripts targeting the ARM architecture for placing the interrupt vector table at offset 0x00000000. Without this directive the table, which might not be referenced explicitly in code, would be pruned out.

SECTIONS
{
	.text :
	{
		KEEP(*(.text.ivt))
		*(.text.boot)
		*(.text*)
	} > ROM

	/** ... **/
}

Symbols

It is possible to define arbitrary symbols within a linker script. These symbols are added into the program's symbol table. Each symbol in the table has a name and an associated address. Symbols within a linker script that have been assigned a value will be given external linkage, and are accessible within the program's code as pointers.

An example script showing three different places that symbol assignment may be used can be seen below:

floating_point = 0;
SECTIONS
{
  .text :
    {
      *(.text)
      _etext = .;
    }
  _bdata = (. + 3) & ~ 3;
  .data : { *(.data) }
}

In the above example, the symbol floating_point has been defined as zero. The symbol _etext has been defined as the address following the last .text input section. The symbol _bdata has been defined as the address following the .text output section aligned upward to a 4 byte boundary.


An example usage of these symbols in C code can be seen below:

/** Externally linked symbol */
extern uintptr_t _etext;
// ...
/** Pointer to the binary data at the address stored in the symbol expression. */
uint32_t* item = &_etext;


See Also

External Links