Stack Smashing Protector: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content deleted Content added
m moved GCC Stack Smashing Protector to Stack Smashing Protector: This isn't specific to GCC
Rewrite article
Line 1: Line 1:
The Stack Smashing Protector (SSP) compiler feature helps detect stack buffer overrun by aborting if a secret value on the stack is changed. This serves a dual purpose in making the occurrence of such bugs visible and as exploit mitigation against return-oriented programming. SSP merely detects stack buffer overruns, they are not prevented. The detection can be beaten by preparing the input such that the stack canary is overwritten with the correct value and thus does not offer perfect protection. The stack canary is native word sized and if chosen randomly, an attacker will have to guess the right value among 2^32 or 2^64 combinations (and revealing the bug if the guess is wrong), or resort to clever means of determining it.
== GCC Stack-Smashing Protector (ProPolice) ==
=== What is it? ===
The GCC SSP protects the stack from buffer overflows. If a buffer overflow occurs, you're informed instantly. The way this works is by inserting a "canary" value into the stack frame that, if changed, indicates a buffer overflow or stack corruption. This feature can not only detect buffer overflows, malicious or accidental, but also may help in detecting other stack-related bugs that are often found in kernel code.


== Description ==


Compilers implement this feature by selecting appropriate functions, storing the stack canary during the function prologue and checking the value at the epilogue, invoking a failure handler if it was changed. For instance, consider the code:
=== How does it work? ===

Suppose you are writing a program to which data is passed by an external program. You may write a function like the following which accepts external data. Obviously this is a contrived example which no one would write in practice, but it demonstrates the idea.

<source lang="c">
<source lang="c">
int check_input(const char *input)
void foo(const char* str)
{
{
char buf[16];
char buffer[16];
strcpy(buffer, str);

strcpy(buf, input);

// do some processing and return a value based on it
}
}
</source>
</source>
SSP automatically illustratively transforms that code into this:
<source lang="c">
/* Note how buffer overruns are undefined behavior and the compilers tend to
optimize these checks away if you wrote them yourself, this only works
robustly because the compiler did it itself. */
extern uintptr_t __stack_chk_guard;
noreturn void __stack_chk_fail(void);
void foo(const char* str)
{
uintptr_t canary = __stack_chk_guard;
char buffer[16];
strcpy(buffer, str);
if ( (canary = canary ^ __stack_chk_guard) != 0 )
__stack_chk_fail();
}
</source>
Note how the secret value is stored in a global variable (initialized at program
load time) and is copied into the stack frame, and how the it is safely erased
from the stack as part of check. Since stacks grow downwards on many
architectures, the canary gets overwritten whenever input to strcpy is at least
16 characters. The caller return pointer exploited in return-oriented
programming attacks is not accessed until after the value was validated, thus
defusing such attacks.


The detection is perfect is a impossible to fake the correct value, i.e. the
Now the stack frame for the above is:
attacker doesn't have full control over what bytes can be written. The attacker
cannot change further stack contents undetected if faking the correct value
stops the output. For instance, if the canary in the strcpy example above
contains a zero byte, it is impossible to fake that byte in the canary without
stopping the output. This forces the attacker to either not attack, be detected,
or not change any further stack contents. This doesn't mean the buffer overrun
is always unexploitable: The string is now 16 characters instead of the intended
limit of 15 characters, this can cause other unintended behavior during the
continued program execution.


Note how there is only a single protective value, not every variable is
{| {{wikitable}}
protected in this manner. The a heuristic is often used that first (downwards)
|-
stores the canary, then buffers (that might overflow into each other) and
| Return EIP
finally all the small variables unaffected from overruns. This is based on the
|-
idea that it is generally less dangerous if arrays are modified, compared to
| Previous EBP
variables that hold flags, pointers and function pointers, which may more
|-
seriously alter execution.
| buf[12-15]
|-
| buf[8-11]
|-
| buf[4-7]
|-
| buf[0-3]
|}


Some compilers randomize the order of stack variables and randomize the stack
If someone passed a 24 byte value to this function as the input parameter, they could easily overwrite our return EIP and thus redirect our execution to their own malicious code. SSP is designed to protect against this. What it aims to do is insert a special value called a canary into the stack immediately after the return EIP address. On function exit, this is compared against its original value, and if it has been overwritten then execution halts with an error. It is thus able to protect against accidental and malicious buffer overflows that would affect the return address. Of note it doesn't stop one buffer overflowing into another.
frame layout, which further complicates determining the right input with the
intended malicious effect.


== Usage ==
The problem is that the canary value cannot be known to the person passing data to our check_input function, as otherwise they could just inject it into their data at the appropriate offset, along with an altered return EIP address, and the SSP mechanism would be none the wiser. To get around this, the gcc implementation uses a random value which is chosen fresh each time a process starts. The actual gcc implementation uses a canary value the size of a pointer and then reads the required number of bytes from /dev/urandom in the start-up code for a process. If it cannot do this, it chooses the value 0x00000aff which is at least effective protection against some attacks using the standard string copying functions (they will stop copying once they reach the null value) and most accidental overflows.


Compilers such as [[GCC]] enables this feature if requested through compiler
=== How to implement it ===
options, or if the compiler supplier enabled it by default. It is worth
When you started OS developing, you might have seen that following error:
considering enabling it by default if your operating system is security
conscious and you provide support. It is possible to use it in your entire
operating system (even kernel and standard library); perhaps excusing ports with
really poor code quality. The feature enabled with the right <tt>-ffoo option</tt> and
can be disabled with the <tt>-fno-foo</tt> counterpart. Several options exist that
provide different variants of SSP:


'''-fstack-protector''': Check for stack smashing in functions with vulnerable
... undefined reference to '''__stack_chk_fail'''
objects. This includes functions with buffers larger than 8 bytes or calls to
alloca.


'''-fstack-protector-strong''': Like <tt>-fstack-protector</tt>, but also
... undefined reference to '''__stack_chk_guard'''
includes functions with local arrays or references to local frame addresses.


'''-fstack-protector-all''': Check for stack smashing in every function.
That's actually the SSP! You probably just didn't care about it and disabled it.


Some operating systems have extended their compiler with more relevant options:
Now, implementing this feature is dead easy and it is a really handy thing.

'''-fstack-shuffle''': (Found in OpenBSD) Randomize the order of stack variables
at compile time. This helps find bugs.

When you activate the feature, the compiler will attempt to link in libssp and
libssp_nonshared (if statically linked) for run-time support. This is disabled
if you pass -nostdlib as you do when linking a kernel and you'll need to supply
your own implementation. For user-space, you have two options:

* Supply your own implementation in libc (so libc can take advantage of the feature) and install empty libssp and libssp_nonshared libraries (or change your toolchain to not use them).
* Use the libssp implementation that comes with GCC.

== Implementation ==

Run-time support needs only two components: A global variable and a check
failure handler. For instance, a minimal implementation could be:


<source lang="c">
<source lang="c">
#include <stdint.h>
void * __stack_chk_guard = NULL;
#include <stdlib.h>


#if UINT32_MAX == UINTPTR_MAX
void __stack_chk_guard_setup()
#define STACK_CHK_GUARD 0xe2dee396
{
#else
unsigned char * p;
#define STACK_CHK_GUARD 0x595e9fbd94fda766
p = (unsigned char *) &__stack_chk_guard;
#endif


uintptr_t __stack_chk_guard = STACK_CHK_GUARD;
/* If you have the ability to generate random numbers in your kernel then use them,
otherwise for 32-bit code: */
p[0] = 0xcd; p[1] = 0x00; p[2] = 0x0a; p[3] = 0xff;
}


void __attribute__((noreturn)) __stack_chk_fail()
__attribute__((noreturn))
void __stack_chk_fail(void)
{
{
/* put your panic function or similar in here */
#if __STDC_HOSTED__
unsigned char * vid = (unsigned char *)0xB8000;
abort();
vid[1] = 7;
#elif __is_myos_kernel
for(;;)
panic("Stack smashing detected");
vid[0]++;
#endif
}
}

</source>
</source>


Note how the secret guard value is hard-coded rather than being decided during
You should call <tt>__stack_chk_guard_setup</tt> at early boot stage, in particular before calling any C code with the stack protector enabled. If you try it later, the value that's already on the stack will be compared to the updated one, which inadvertently triggers the panic. Don't forget to add <tt>-fstack-protector-all</tt> to the gcc flags, except for the file actually containing <tt>__stack_chk_guard_setup</tt>.
program load. You should have the program loader (the bootloader in the case of
the kernel) randomize the values. You can do this by putting the guard value in
a special segment that the loader knows to randomize. The numbers shown here are
not special, they are just example randomly generated numbers. You can still
take advantage of the bug-discovering properties of SSP even if the guard value
is not cryptographically secure (unless you anticipate sufficiently-obscure bugs
that intelligently circumvent SSP).


Alternatively, you could have an early phase in your code that initializes the
After that, you'll find yourself protected from the majority of buffer overflows. There are other settings available than <tt>-fstack-protector-all</tt>, which you can use to only check the stack whenever there's a likely possibility of a buffer being overused instead of everywhere.
guard value, perhaps written in assembly or in C but built without stack smash
protection. This approach adds code complexity and early phases where language
features are not online. You may take such approaches with thread-local storage,
errno, paging, gdt, scheduling, and so on, and suddenly a bootstrap is very
complex with many dependencies between language features. Once a function built
with stack-smashing protection is run, the guard value cannot be changed or a
spurious failure will occur.


== See Also ==
== Secure Handling ==


Beware how you implement the stack smash detection handler: This code is only
=== Articles ===
run in cases where the bug was triggered innocently, or where the bug is being
exploited maliciously. By now the attacker is assumed to have, at least,
corrupted an unknown amount of this thread's stack. This means the environment
is hostile. The stack is currently under your control and none of the new local
variables are affected. Note however that the stack smash protection may have
occurred from a signal handler or another inopportune time where another thread
holds locks to critical standard library state or such. Beware how if pointers
to caller stack variables are currently inside the standard library, and using
standard library functions accesses that memory, the attacker may control the
stack smash detection handler even.

Assuming a handler invocation implies an intelligent exploit is happening, the
best course of action is is:

* Eliminate attacker influence.
* Alert user or system administrator of a potential breach.
* Diagnose the details of the buffer overrun so the defect can be fixed.

You should assume the worst if you wish to eliminate the attacker influence. The
used exploit may well be combined with other exploited vulnerabilities, and a
sufficiently skilled attacker may even influence and exploit the actions of the
handler. There are many creative ways an attacker could influence the handler or
even take advantage of it:

* Pointers to earlier stack variables (now to be considered potentially corrupted) could be stored somewhere and accessed by the functions you use.
* The handler could be run at a very inopportune time where the process is fragile, perhaps from a signal handler, perhaps the current thread owns non-recursive locks you could deadlock.
* Printing a stack trace (if at all possible) and other diagnostic information to the stderr file descriptor (which might not even exist in this process, but instead fd 2 is used for another purpose) might result in the output being sent to the attacker. This is imaginable for a webserver, which perhaps includes the stderr contents in an error response. The attacker could learn things this way he isn't supposed to.
* The thread might be multi-threaded and who knows how that might interact with a thread that is malfunctioning and compromised. It could have pointers to variables on the stack of the compromised thread, and SSP won't help if it accesses those.

Your approach should be to discard the process as soon as possible. Use only
async-signal-safe functions, preferably without state that could influence them.
Don't write to any standard streams but open the terminal anew or write to the
system log. Ensure none of these operations fail (for instance, if the process
is in a chroot or out of file descriptors).

The ideal approach is perhaps to have a special system call that does these
tasks and invoke it unconditionally and immediately. Kernel code must not trust
user-space code or be unsafely influenced it by it, so it can be considered
safe. It can then stop all threads in the process, investigate where the issue
seemed to occur in the process, and alert the user or system administrator
appropriately.

== libssp ==

Alternatively, to your own implementation, you can use the implementation that comes with GCC. This means you have to build libssp as part of your toolchain.

'''TODO''': I have never built it for osdev purposes before, but I guess that you do <tt>make all-target-libssp</tt> and <tt>make install-target-libssp</tt> like with libstdc++. It's probable that depends on libc for no good reason at all (as the gcc developers put fortify source functions in it and it wants to check whether they work).

The libssp approach is to have an initialization function marked as attribute constructor, which is run among the global constructors during process startup. This means SSP isn't properly online during the early parts of process initialization (but perhaps that's not a problem if all those C stack frames are gone before that point and the default null guard value was used until now). The startup code then proceeds to attempt opening <tt>/dev/urandom</tt> which might fail if you are in a chroot, are out of file descriptors, or your system doesn't have such a file (perhaps by design). If it fails, it falls back on a reasonable but known value. You can read the [https://gcc.gnu.org/viewcvs/gcc/trunk/libssp/ssp.c?view=markup#l67 libssp initialization code here].

The libssp <tt>__stack_chk_fail</tt> implementation tries to open the terminal, construct an error message with alloca, then use write to output it, if the terminal isn't accessible, it tries to the system log. It then attempts to destroy the process by invoking <tt>__builtin_trap()</tt>, writing a 0 int to the int at -1 (which is also undefined behavior and an unaligned pointer, in addition to probably crashing), and finally attempting to <tt>_exit().</tt> This exiting strategy doesn't feel super robust. You can read the [https://gcc.gnu.org/viewcvs/gcc/trunk/libssp/ssp.c?view=markup#l96 libssp handler code here].

Read the secure handling section above and read the code, then decide whether you want this linked into your programs, or whether it is cleaner to make your own implementation. You can also modify this code as part of your [[OS Specific Toolchain]].

== See Also ==


=== Threads ===
=== Threads ===
Line 87: Line 201:


=== External Links ===
=== External Links ===
* [https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html GCC optimization options documentation] in which <tt>-fstack-protector</tt> is detailed
* [http://www.trl.ibm.com/projects/security/ssp/ GCC extension for protecting applications from stack-smashing attacks]
* [http://www.trl.ibm.com/projects/security/ssp/ GCC extension for protecting applications from stack-smashing attacks]
* [[wikipedia:Buffer overflow protection|Buffer overflow protection]] on Wikipedia
* [[wikipedia:Buffer overflow protection|Buffer overflow protection]] on Wikipedia

Revision as of 23:29, 22 October 2014

The Stack Smashing Protector (SSP) compiler feature helps detect stack buffer overrun by aborting if a secret value on the stack is changed. This serves a dual purpose in making the occurrence of such bugs visible and as exploit mitigation against return-oriented programming. SSP merely detects stack buffer overruns, they are not prevented. The detection can be beaten by preparing the input such that the stack canary is overwritten with the correct value and thus does not offer perfect protection. The stack canary is native word sized and if chosen randomly, an attacker will have to guess the right value among 2^32 or 2^64 combinations (and revealing the bug if the guess is wrong), or resort to clever means of determining it.

Description

Compilers implement this feature by selecting appropriate functions, storing the stack canary during the function prologue and checking the value at the epilogue, invoking a failure handler if it was changed. For instance, consider the code:

void foo(const char* str)
{
	char buffer[16];
	strcpy(buffer, str);
}

SSP automatically illustratively transforms that code into this:

/* Note how buffer overruns are undefined behavior and the compilers tend to
   optimize these checks away if you wrote them yourself, this only works
   robustly because the compiler did it itself. */
extern uintptr_t __stack_chk_guard;
noreturn void __stack_chk_fail(void);
void foo(const char* str)
{
	uintptr_t canary = __stack_chk_guard;
	char buffer[16];
	strcpy(buffer, str);
	if ( (canary = canary ^ __stack_chk_guard) != 0 )
		__stack_chk_fail();
}

Note how the secret value is stored in a global variable (initialized at program load time) and is copied into the stack frame, and how the it is safely erased from the stack as part of check. Since stacks grow downwards on many architectures, the canary gets overwritten whenever input to strcpy is at least 16 characters. The caller return pointer exploited in return-oriented programming attacks is not accessed until after the value was validated, thus defusing such attacks.

The detection is perfect is a impossible to fake the correct value, i.e. the attacker doesn't have full control over what bytes can be written. The attacker cannot change further stack contents undetected if faking the correct value stops the output. For instance, if the canary in the strcpy example above contains a zero byte, it is impossible to fake that byte in the canary without stopping the output. This forces the attacker to either not attack, be detected, or not change any further stack contents. This doesn't mean the buffer overrun is always unexploitable: The string is now 16 characters instead of the intended limit of 15 characters, this can cause other unintended behavior during the continued program execution.

Note how there is only a single protective value, not every variable is protected in this manner. The a heuristic is often used that first (downwards) stores the canary, then buffers (that might overflow into each other) and finally all the small variables unaffected from overruns. This is based on the idea that it is generally less dangerous if arrays are modified, compared to variables that hold flags, pointers and function pointers, which may more seriously alter execution.

Some compilers randomize the order of stack variables and randomize the stack frame layout, which further complicates determining the right input with the intended malicious effect.

Usage

Compilers such as GCC enables this feature if requested through compiler options, or if the compiler supplier enabled it by default. It is worth considering enabling it by default if your operating system is security conscious and you provide support. It is possible to use it in your entire operating system (even kernel and standard library); perhaps excusing ports with really poor code quality. The feature enabled with the right -ffoo option and can be disabled with the -fno-foo counterpart. Several options exist that provide different variants of SSP:

-fstack-protector: Check for stack smashing in functions with vulnerable objects. This includes functions with buffers larger than 8 bytes or calls to alloca.

-fstack-protector-strong: Like -fstack-protector, but also includes functions with local arrays or references to local frame addresses.

-fstack-protector-all: Check for stack smashing in every function.

Some operating systems have extended their compiler with more relevant options:

-fstack-shuffle: (Found in OpenBSD) Randomize the order of stack variables at compile time. This helps find bugs.

When you activate the feature, the compiler will attempt to link in libssp and libssp_nonshared (if statically linked) for run-time support. This is disabled if you pass -nostdlib as you do when linking a kernel and you'll need to supply your own implementation. For user-space, you have two options:

  • Supply your own implementation in libc (so libc can take advantage of the feature) and install empty libssp and libssp_nonshared libraries (or change your toolchain to not use them).
  • Use the libssp implementation that comes with GCC.

Implementation

Run-time support needs only two components: A global variable and a check failure handler. For instance, a minimal implementation could be:

#include <stdint.h>
#include <stdlib.h>

#if UINT32_MAX == UINTPTR_MAX
#define STACK_CHK_GUARD 0xe2dee396
#else
#define STACK_CHK_GUARD 0x595e9fbd94fda766
#endif

uintptr_t __stack_chk_guard = STACK_CHK_GUARD;

__attribute__((noreturn))
void __stack_chk_fail(void)
{
#if __STDC_HOSTED__
	abort();
#elif __is_myos_kernel
	panic("Stack smashing detected");
#endif
}

Note how the secret guard value is hard-coded rather than being decided during program load. You should have the program loader (the bootloader in the case of the kernel) randomize the values. You can do this by putting the guard value in a special segment that the loader knows to randomize. The numbers shown here are not special, they are just example randomly generated numbers. You can still take advantage of the bug-discovering properties of SSP even if the guard value is not cryptographically secure (unless you anticipate sufficiently-obscure bugs that intelligently circumvent SSP).

Alternatively, you could have an early phase in your code that initializes the guard value, perhaps written in assembly or in C but built without stack smash protection. This approach adds code complexity and early phases where language features are not online. You may take such approaches with thread-local storage, errno, paging, gdt, scheduling, and so on, and suddenly a bootstrap is very complex with many dependencies between language features. Once a function built with stack-smashing protection is run, the guard value cannot be changed or a spurious failure will occur.

Secure Handling

Beware how you implement the stack smash detection handler: This code is only run in cases where the bug was triggered innocently, or where the bug is being exploited maliciously. By now the attacker is assumed to have, at least, corrupted an unknown amount of this thread's stack. This means the environment is hostile. The stack is currently under your control and none of the new local variables are affected. Note however that the stack smash protection may have occurred from a signal handler or another inopportune time where another thread holds locks to critical standard library state or such. Beware how if pointers to caller stack variables are currently inside the standard library, and using standard library functions accesses that memory, the attacker may control the stack smash detection handler even.

Assuming a handler invocation implies an intelligent exploit is happening, the best course of action is is:

  • Eliminate attacker influence.
  • Alert user or system administrator of a potential breach.
  • Diagnose the details of the buffer overrun so the defect can be fixed.

You should assume the worst if you wish to eliminate the attacker influence. The used exploit may well be combined with other exploited vulnerabilities, and a sufficiently skilled attacker may even influence and exploit the actions of the handler. There are many creative ways an attacker could influence the handler or even take advantage of it:

  • Pointers to earlier stack variables (now to be considered potentially corrupted) could be stored somewhere and accessed by the functions you use.
  • The handler could be run at a very inopportune time where the process is fragile, perhaps from a signal handler, perhaps the current thread owns non-recursive locks you could deadlock.
  • Printing a stack trace (if at all possible) and other diagnostic information to the stderr file descriptor (which might not even exist in this process, but instead fd 2 is used for another purpose) might result in the output being sent to the attacker. This is imaginable for a webserver, which perhaps includes the stderr contents in an error response. The attacker could learn things this way he isn't supposed to.
  • The thread might be multi-threaded and who knows how that might interact with a thread that is malfunctioning and compromised. It could have pointers to variables on the stack of the compromised thread, and SSP won't help if it accesses those.

Your approach should be to discard the process as soon as possible. Use only async-signal-safe functions, preferably without state that could influence them. Don't write to any standard streams but open the terminal anew or write to the system log. Ensure none of these operations fail (for instance, if the process is in a chroot or out of file descriptors).

The ideal approach is perhaps to have a special system call that does these tasks and invoke it unconditionally and immediately. Kernel code must not trust user-space code or be unsafely influenced it by it, so it can be considered safe. It can then stop all threads in the process, investigate where the issue seemed to occur in the process, and alert the user or system administrator appropriately.

libssp

Alternatively, to your own implementation, you can use the implementation that comes with GCC. This means you have to build libssp as part of your toolchain.

TODO: I have never built it for osdev purposes before, but I guess that you do make all-target-libssp and make install-target-libssp like with libstdc++. It's probable that depends on libc for no good reason at all (as the gcc developers put fortify source functions in it and it wants to check whether they work).

The libssp approach is to have an initialization function marked as attribute constructor, which is run among the global constructors during process startup. This means SSP isn't properly online during the early parts of process initialization (but perhaps that's not a problem if all those C stack frames are gone before that point and the default null guard value was used until now). The startup code then proceeds to attempt opening /dev/urandom which might fail if you are in a chroot, are out of file descriptors, or your system doesn't have such a file (perhaps by design). If it fails, it falls back on a reasonable but known value. You can read the libssp initialization code here.

The libssp __stack_chk_fail implementation tries to open the terminal, construct an error message with alloca, then use write to output it, if the terminal isn't accessible, it tries to the system log. It then attempts to destroy the process by invoking __builtin_trap(), writing a 0 int to the int at -1 (which is also undefined behavior and an unaligned pointer, in addition to probably crashing), and finally attempting to _exit(). This exiting strategy doesn't feel super robust. You can read the libssp handler code here.

Read the secure handling section above and read the code, then decide whether you want this linked into your programs, or whether it is cleaner to make your own implementation. You can also modify this code as part of your OS Specific Toolchain.

See Also

Threads

External Links