Calling conventions

A calling convention is the set of contracts that compiler-generated machine code respects and expects external functions to respect. Among other things, the calling convention specifies:

  • how parameters are passed to functions;

  • how the stack is handled and cleaned (for example if needs to be aligned at function entry);

  • how structures are going to be laid out in memory;

  • which registers need to be restored by the caller and which by the callee.

1. x86-64 calling conventions

The Windows (including UEFI) world and the UNIX (Linux, macOS, BSDs) world have adopted two different conventions. Note that MSVC can only generate code using the Windows calling convention.

1.1. Microsoft x64

The x64 Application Binary Interface[1] (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers.

There is a strict one-to-one correspondence between a function call’s arguments and the registers used for those arguments. Any argument that does not fit in 8 bytes, or is not 1, 2, 4, or 8 bytes, must be passed by reference. A single argument is never spread across multiple registers.

The x87 register stack is unused. It may be used by the callee, but is considered volatile across function calls. All floating point operations are done using the 16 XMM registers.

Integer arguments are passed in registers RCX, RDX, R8, and R9; floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16-byte arguments are passed by reference. Parameter passing is described in detail in the parameter section. These registers, and RAX, R10, R11, XMM4, and XMM5, are considered volatile.

1.1.1. Register allocation

Caller saved

RAX, RCX, RDX, R8, R9, R10, R11

Callee saved

RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15

1.1.2. Stack

The stack is always 16-byte aligned at function entry: this requires the stack to be aligned to 8 bytes, but not 16 bytes, before the CALL, since the instruction pushes 8 bytes on the stack.

All functions need to have 32 bytes of home space (also known as spill space or shadow space), which is allocated before pushing the other parameters on the stack. This allows for an easier implementation of variadic functions: va_start can just put the four register parameters in the home space, so that va_arg simply becomes a lookup starting at RSP.

For example, if the caller calls a function with 4 or 5 arguments, it needs to allocate 40 bytes of stack in order for the stack at the callee entry to be 48 bytes below the caller’s stack.

1.1.3. Parameters

RCX/XMM0, RDX/XMM1, R8/XMM2, R9/XMM3, stack ([RSP+0x20], [RSP+0x28] and so on, 0x20 is due to the home space).

For example, a function with the prototype

1
 void do_something(int a, float b, int c, int d, int e, float f);

would pass parameters in RCX, XMM1, R8, R9, [RSP+0x20] and [RSP+0x28] respectively. A function calling this needs to have at least 56 bytes of stack to store the parameters and home space and ensure that the stack is aligned after the CALL.

1.1.4. Return value

If the return value is an integer, struct, or union whose size is less than or equal to 64 bits, it is returned in RAX; otherwise, the struct is allocated by the caller, and a pointer to it is passed as the first parameter.

1
2
3
4
5
6
---
 // large_struct_t is larger than 64 bits, this:
 large_struct_t do_something(int a)
 // effectively becomes this:
 void do_something(large_struct_t *ret, int a)
---

1.2. System V

The standard calling sequence requirements[2] apply only to global functions. Local functions that are not reachable from other compilation units may use different conventions. Nevertheless, it is recommended that all functions use the standard calling sequence when possible.

The CPU shall be in x87 mode upon entry to a function. Therefore, every function that uses the MMX registers is required to issue an EMMS or FEMMS instruction after using MMX registers, before returning or calling another function.

The direction flag DF in the RFLAGS register must be clear (set to "forward" direction) on function entry and return. Other user flags have no specified role in the standard calling sequence and are not preserved across calls.

Registers RBP, RBX and R12 through R15 "belong" to the calling function and the called function is required to preserve their values. In other words, a called function must preserve these registers’ values for its caller. Remaining registers "belong" to the called function.

1.2.1. Register allocation

Caller saved

RAX, RCX, RDX, RDI, RSI, R8, R9, R10, R11

Callee saved

RBX, RBP, RSP, R12, R13, R14, R15

1.2.2. Stack

The stack is always 16-byte aligned at function call: this requires the stack to be aligned to 16 bytes before the CALL. The push of the return address onto the stack by CALL is immediately succeeded by the push of RBP onto the stack by the callee function, thereby re-aligning the stack shortly after function entry.

1.2.3. Parameters

RDI, RSI, RDX, RCX, R8, R9, stack ([RSP+0x00], [RSP+0x08] and so on).

If the parameter is of type float or double, it will be stored in the next available SSE register (XMM0-XMM7).

For example, a function with the prototype void do_something(int a, float b, int c, int d, int e, float f) would pass parameters in RDI, XMM0, R8, R9, [RSP+0x00] and XMM1 respectively. A function calling this needs to have at least 32 bytes of stack to store the parameters and align the stack upon call.

1.2.4. Return value

If the return value is an integer, struct, or union whose size is less than or equal than 64 bits, it is returned in RAX; otherwise, the struct is allocated by the caller and a pointer to it is passed as the first parameter, like the Microsoft x64 ABI. Unlike the Microsoft ABI, the pointer is actually returned in RAX upon return.

1
2
3
4
5
6
---
 // large_struct_t is larger than 64 bits, this:
 large_struct_t do_something(int a)
 // effectively becomes this:
 void do_something(large_struct_t *ret, int a)
---