Tail Recursion and Tail Call Optimization: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(Original page)
 
No edit summary
 
(9 intermediate revisions by 4 users not shown)
Line 1: Line 1:
== Recursion ==
== What is a tail call? ==
Informally, a tail call is when a function calls another function, and returns immediately. In assembly, it's when a CALL and a RET instructions are found right next to each other. For example, in the following code:


<code>
If you’ve done much programming, then you know what recursion is: a function that invokes itself to further its implementation. The canonical example is the Factorial function. (Note that the other canonical example, the Fibonacci sequence, is not conducive to tail recursion.)
myFunction:
CALL foo
CALL bar
RET
</code>


<code>CALL bar</code> is a tail call, but <code>CALL foo</code> is not.
=== Factorial ===


Note that, it's important that there is no operation between <code>CALL bar</code> and <code>RET</code>. Consider the following example:
In mathematics, the factorial function is written using the <code>!</code> symbol. When placed after a (whole) number, it means “the result after multiplying this number with every number less than it, down to 1”. Thus <code>4!</code> is the same as <code>4x3x2x1</code>, or <code>24</code>.


<code>
==== Non-recursion implementation ====
int foo(x){
return bar(x) * 2;
}
</code>


Even though this looks like a tail call, the caller has to modify the return value after the <code>CALL</code> operation. As such, it's not actually a tail call. In assembly:
In code, this function could be written many different ways. For example, in the C language you could write it as follows


<code>
long long Factorial(unsigned start) {
CALL bar
long long result = 1;
SHL EAX, 1
for (unsigned i = 2; i <= start; ++i) {
RET
result *= i;
</code>
} // for
return result;
} // Factorial(start)


However the following is a tail call:
Note that this function uses <code>long long</code> because the successive multiplications get very big very quickly.


<code>
==== Recursion implementation ====
int foo(x){
return bar(x * 2);
}
</code>


In assembly:
Another implementation takes advantage of the fact that factorial is based on a series of numbers, so you can implement it ''in terms of itself''. That is, <code>4!</code> is the same as <code>4x3!</code>, which is the same as <code>4x3x2!</code> etc.


<code>
long long Factorial(unsigned start) {
if (start > 1) {
SHL EAX, 1
CALL bar
return Factorial(start - 1) * start;
RET
} // if
</code>
return 1;
} // Factorial(start)


As you can see, there is no operation between <code>CALL</code> and <code>RET</code> this time.
A lot simpler code!


== Problem with Recursion ==
== Optimizing a tail call ==


A tail call can be optimized by simply replacing <code>CALL</code> with <code>JMP</code> and removing <code>RET</code>. So looking at this example, one more time:
To highlight the biggest problem with recursion, though, you need to know what the code is actually making the processor do. Every time a function is called, the processor needs to make room on the stack to hold the intermediate calculations. If you were to look at the processor stack at the moment that it hit the <code>return 1;</code> in the above code (assuming a call to <code>Factorial(4)</code>), you could see something like this:


<code>
[Parameter: 4]
int foo(x){
[Return address: Call to Factorial(4)]
return bar(x * 2);
[Return value: ?]
}
[Parameter: 3]
</code>
[Return address: Call to Factorial(3)]
[Return value: ?]
[Parameter: 2]
[Return address: Call to Factorial(2)]
[Return value: ?]


In assembly, no tail call optimization:
And, of course, with a larger <code>start</code> parameter, the deeper the stack would be.


<code>
=== Tail Recursion ===
SHL EAX, 1
CALL bar
RET
</code>


In assembly, with tail call optimization:
To help fix this, the programmer can write the code differently - to hint to the compiler that although the code is written recursively, it doesn’t have to be implemented that way.


<code>
Rather than calling <code>Factorial(start - 1)</code> in the middle of the function like above, the code could instead be written like this:
SHL EAX, 1
JMP bar
</code>


This has a minor speed up as you no longer have to save and restore the instruction pointer. The callee (in this case, <code>bar</code>) will return all the way back to the caller of this function.
long long Factorial(unsigned start) {
if (start <= 1) {
return 1;
} // if
return start * Factorial(start - 1);
} // Factorial(start)


It doesn't have much of an impact on a function like this, but it can have immense impact on recursive calls. A recursive call that takes 1000 recursions will have to save and restore the instruction pointer 1001 times without tail call optimization, but only once with it.
This code is essentially identical to the previous code, except the <code>if</code> test is reversed, and the recursive call to <code>Factorial(start - 1)</code> is the last thing in the function - at the '''tail'''.


=== Tail-Call Optimisation ===
== Epilogues and Prologues ==
In the above examples, it is assumed that functions don't have a prologue or epilogue which is highly impractical. As such a proper implementation of tail call optimization is a lot trickier than shown here. This is only a rudimentary example.


[[Category:Compiler Development]]
So: how does that help? Frankly, it doesn’t - unless the compiler implements “tail-call optimisation”. If the compiler doesn’t, then the above code would produce exactly the same stack as before. But if it ''does'' implement it, then the stack depth would be only <code>1</code>, no matter what the <code>start</code> value was.

Tail-call optimisation is not only useful with recursion. Its hallmark is when there is a call to a new function right at the end of the current function (hence “tail-call”). The compiler can easily realise that there is no point in keeping the current function’s state - it’s no longer going to be needed any more. It can directly modify the current function’s intermediate stack area in preparation for the new function call - and instead of <code>CALL</code>ing that code, it instead <code>JMP</code>s to it.

The <code>JMP</code> will be into somewhere in the middle of the function, bypassing the function’s normal initialisation. That doesn’t need to be performed since it’s already been set up by the tail-call optimisation. And the <code>JMP</code> instead of <code>CALL</code> is important since there will not be a corresponding <code>RET</code>urn from the function - other than the <code>RET</code> at the end of the final calculation back to the original <code>CALL</code>ing function.

== Summary ==

The tail-call optimisation is independent from tail recursion - but it is tricky to implement generically. It is most easily implemented when tail recursion is used, since the compiler is busy compiling the very function that can best take advantage of it - it can modify the way the function is implemented to take full advantage of the situation.

But to get the compiler to even consider using the tail-call optimisation, it is incumbent on the programmer to ensure that the recursive call to itself is the last thing that the function does - the “tail recursion”.

The benefit is faster, more efficient code - in both processor time and memory (stack) usage.

Latest revision as of 18:55, 16 June 2024

What is a tail call?

Informally, a tail call is when a function calls another function, and returns immediately. In assembly, it's when a CALL and a RET instructions are found right next to each other. For example, in the following code:

 myFunction:
   CALL foo
   CALL bar
   RET

CALL bar is a tail call, but CALL foo is not.

Note that, it's important that there is no operation between CALL bar and RET. Consider the following example:

 int foo(x){
   return bar(x) * 2;
 }

Even though this looks like a tail call, the caller has to modify the return value after the CALL operation. As such, it's not actually a tail call. In assembly:

 CALL bar
 SHL EAX, 1
 RET

However the following is a tail call:

 int foo(x){
   return bar(x * 2);
 }

In assembly:

 SHL EAX, 1
 CALL bar
 RET

As you can see, there is no operation between CALL and RET this time.

Optimizing a tail call

A tail call can be optimized by simply replacing CALL with JMP and removing RET. So looking at this example, one more time:

 int foo(x){
   return bar(x * 2);
 }

In assembly, no tail call optimization:

 SHL EAX, 1
 CALL bar
 RET

In assembly, with tail call optimization:

 SHL EAX, 1
 JMP bar

This has a minor speed up as you no longer have to save and restore the instruction pointer. The callee (in this case, bar) will return all the way back to the caller of this function.

It doesn't have much of an impact on a function like this, but it can have immense impact on recursive calls. A recursive call that takes 1000 recursions will have to save and restore the instruction pointer 1001 times without tail call optimization, but only once with it.

Epilogues and Prologues

In the above examples, it is assumed that functions don't have a prologue or epilogue which is highly impractical. As such a proper implementation of tail call optimization is a lot trickier than shown here. This is only a rudimentary example.