Tail Recursion and Tail Call Optimization
Recursion
If you’ve done much programming, then you know what recursion is: a function that invokes itself to further its implementation. The canonical example is the Factorial function. (Note that the other canonical example, the Fibonacci sequence, is not conducive to tail recursion.)
Factorial
In mathematics, the factorial function is written using the !
symbol. When placed after a (whole) number, it means “the result after multiplying this number with every number less than it, down to 1”. Thus 4!
is the same as 4x3x2x1
, or 24
.
Non-recursion implementation
In code, this function could be written many different ways. For example, in the C language you could write it as follows
long long Factorial(unsigned start) { long long result = 1; for (unsigned i = 2; i <= start; ++i) { result *= i; } // for return result; } // Factorial(start)
Note that this function uses long long
because the successive multiplications get very big very quickly.
Recursion implementation
Another implementation takes advantage of the fact that factorial is based on a series of numbers, so you can implement it in terms of itself. That is, 4!
is the same as 4x3!
, which is the same as 4x3x2!
etc.
long long Factorial(unsigned start) { if (start > 1) { return Factorial(start - 1) * start; } // if return 1; } // Factorial(start)
A lot simpler code!
Problem with Recursion
To highlight the biggest problem with recursion, though, you need to know what the code is actually making the processor do. Every time a function is called, the processor needs to make room on the stack to hold the intermediate calculations. If you were to look at the processor stack at the moment that it hit the return 1;
in the above code (assuming a call to Factorial(4)
), you could see something like this:
[Parameter: 4] [Return address: Call to Factorial(4)] [Return value: ?] [Parameter: 3] [Return address: Call to Factorial(3)] [Return value: ?] [Parameter: 2] [Return address: Call to Factorial(2)] [Return value: ?]
And, of course, with a larger start
parameter, the deeper the stack would be.
Tail Recursion
To help fix this, the programmer can write the code differently - to hint to the compiler that although the code is written recursively, it doesn’t have to be implemented that way.
Rather than calling Factorial(start - 1)
in the middle of the function like above, the code could instead be written like this:
long long Factorial(unsigned start) { if (start <= 1) { return 1; } // if return start * Factorial(start - 1); } // Factorial(start)
This code is essentially identical to the previous code, except the if
test is reversed, and the recursive call to Factorial(start - 1)
is the last thing in the function - at the tail.
Tail-Call Optimisation
So: how does that help? Frankly, it doesn’t - unless the compiler implements “tail-call optimisation”. If the compiler doesn’t, then the above code would produce exactly the same stack as before. But if it does implement it, then the stack depth would be only 1
, no matter what the start
value was.
Tail-call optimisation is not only useful with recursion. Its hallmark is when there is a call to a new function right at the end of the current function (hence “tail-call”). The compiler can easily realise that there is no point in keeping the current function’s state - it’s no longer going to be needed any more. It can directly modify the current function’s intermediate stack area in preparation for the new function call - and instead of CALL
ing that code, it instead JMP
s to it.
The JMP
will be into somewhere in the middle of the function, bypassing the function’s normal initialisation. That doesn’t need to be performed since it’s already been set up by the tail-call optimisation. And the JMP
instead of CALL
is important since there will not be a corresponding RET
urn from the function - other than the RET
at the end of the final calculation back to the original CALL
ing function.
Summary
The tail-call optimisation is independent from tail recursion - but it is tricky to implement generically. It is most easily implemented when tail recursion is used, since the compiler is busy compiling the very function that can best take advantage of it - it can modify the way the function is implemented to take full advantage of the situation.
But to get the compiler to even consider using the tail-call optimisation, it is incumbent on the programmer to ensure that the recursive call to itself is the last thing that the function does - the “tail recursion”.
The benefit is faster, more efficient code - in both processor time and memory (stack) usage.