Calling conventions
A calling convention is the set of contracts that compiler-generated machine code respects and expects external functions to respect. Among other things, the calling convention specifies:
-
how parameters are passed to functions;
-
how the stack is handled and cleaned (for example if needs to be aligned at function entry);
-
how structures are going to be laid out in memory;
-
which registers need to be restored by the caller and which by the callee.
1. x86-64 calling conventions
The Windows (including UEFI) world and the UNIX (Linux, macOS, BSDs) world have adopted two different conventions. Note that MSVC can only generate code using the Windows calling convention.
1.1. Microsoft x64
The x64 Application Binary Interface[1] (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers.
There is a strict one-to-one correspondence between a function call’s arguments and the registers used for those arguments. Any argument that does not fit in 8 bytes, or is not 1, 2, 4, or 8 bytes, must be passed by reference. A single argument is never spread across multiple registers.
The x87 register stack is unused. It may be used by the callee, but is considered volatile across function calls. All floating point operations are done using the 16 XMM registers.
Integer arguments are passed in registers RCX, RDX, R8, and R9; floating point arguments are passed in XMM0L, XMM1L, XMM2L, and XMM3L. 16-byte arguments are passed by reference. Parameter passing is described in detail in the parameter section. These registers, and RAX, R10, R11, XMM4, and XMM5, are considered volatile.
1.1.1. Register allocation
Caller saved |
RAX, RCX, RDX, R8, R9, R10, R11 |
Callee saved |
RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 |
1.1.2. Stack
The stack is always 16-byte aligned at function entry: this requires the
stack to be aligned to 8 bytes, but not 16 bytes, before the CALL
, since
the instruction pushes 8 bytes on the stack.
All functions need to have 32 bytes of home space (also known as spill space or
shadow space), which is allocated before pushing the other parameters on the
stack.
This allows for an easier implementation of variadic functions: va_start
can
just put the four register parameters in the home space, so that va_arg
simply
becomes a lookup starting at RSP.
For example, if the caller calls a function with 4 or 5 arguments, it needs to allocate 40 bytes of stack in order for the stack at the callee entry to be 48 bytes below the caller’s stack.
1.1.3. Parameters
RCX
/XMM0
, RDX
/XMM1
, R8
/XMM2
, R9
/XMM3
, stack ([RSP+0x20]
,
[RSP+0x28]
and so on, 0x20 is due to the home space).
For example, a function with the prototype
1
void do_something(int a, float b, int c, int d, int e, float f);
would pass parameters in RCX, XMM1, R8, R9, [RSP+0x20] and [RSP+0x28]
respectively.
A function calling this needs to have at least 56 bytes of stack to store the
parameters and home space and ensure that the stack is aligned after the CALL
.
1.1.4. Return value
If the return value is an integer, struct, or union whose size is less than or equal to 64 bits, it is returned in RAX; otherwise, the struct is allocated by the caller, and a pointer to it is passed as the first parameter.
1
2
3
4
5
6
---
// large_struct_t is larger than 64 bits, this:
large_struct_t do_something(int a)
// effectively becomes this:
void do_something(large_struct_t *ret, int a)
---
1.2. System V
The standard calling sequence requirements[2] apply only to global functions. Local functions that are not reachable from other compilation units may use different conventions. Nevertheless, it is recommended that all functions use the standard calling sequence when possible.
The CPU shall be in x87 mode upon entry to a function. Therefore, every function that uses the MMX registers is required to issue an EMMS or FEMMS instruction after using MMX registers, before returning or calling another function.
The direction flag DF in the RFLAGS register must be clear (set to "forward" direction) on function entry and return. Other user flags have no specified role in the standard calling sequence and are not preserved across calls.
Registers RBP, RBX and R12 through R15 "belong" to the calling function and the called function is required to preserve their values. In other words, a called function must preserve these registers’ values for its caller. Remaining registers "belong" to the called function.
1.2.1. Register allocation
Caller saved |
RAX, RCX, RDX, RDI, RSI, R8, R9, R10, R11 |
Callee saved |
RBX, RBP, RSP, R12, R13, R14, R15 |
1.2.2. Stack
The stack is always 16-byte aligned at function call: this requires the
stack to be aligned to 16 bytes before the CALL
.
The push of the return address onto the stack by CALL
is immediately succeeded
by the push of RBP onto the stack by the callee function, thereby re-aligning
the stack shortly after function entry.
1.2.3. Parameters
RDI
, RSI
, RDX
, RCX
, R8
, R9
, stack ([RSP+0x00]
, [RSP+0x08]
and so
on).
If the parameter is of type float
or double
, it will be stored in the next
available SSE register (XMM0
-XMM7
).
For example, a function with the prototype
void do_something(int a, float b, int c, int d, int e, float f)
would pass
parameters in RDI
, XMM0
, R8
, R9
, [RSP+0x00]
and XMM1
respectively.
A function calling this needs to have at least 32 bytes of stack to store the
parameters and align the stack upon call.
1.2.4. Return value
If the return value is an integer, struct, or union whose size is less than or
equal than 64 bits, it is returned in RAX
; otherwise, the struct is allocated
by the caller and a pointer to it is passed as the first parameter, like the
Microsoft x64 ABI.
Unlike the Microsoft ABI, the pointer is actually returned in RAX
upon return.
1
2
3
4
5
6
---
// large_struct_t is larger than 64 bits, this:
large_struct_t do_something(int a)
// effectively becomes this:
void do_something(large_struct_t *ret, int a)
---