CompilerDev/Implementing Conditional Statements And Loops

From OSDev.wiki
Revision as of 00:51, 27 October 2016 by osdev>Schol-r-lea (Created page with "= General Concepts = For a conventional instruction set architecture such as x86, ARM, 8051, MIPS, or most other CPU types in widespread use at this time (2016), conditional ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

General Concepts

For a conventional instruction set architecture such as x86, ARM, 8051, MIPS, or most other CPU types in widespread use at this time (2016), conditional statements such as if/elsif/else, and loop constructs such as for or while, are generally implemented through a combination of tests, conditional jumps/branches (jz, beq, etc.) and unconditional branches (jmp, bra, j, etc.). While some common ISAs have special-purpose instructions for repetition or looping (e.g., the REP and LOOP instructions on the x86), few compilers use them due to the added work of determining where they can be applied.

For the examples below, MIPS R2000 assembly language (including pseudo-instructions) is used. It is assumed that the compiler produces assembly language; for compilers that generate an Executable File directly, the compiler must handle the housekeeping (e.g., tracking branch targets) that would otherwise be done b the assembler.

General Conditional Statements

If

A naive implementation of a basic if statement will generally consist of a conditional branch to the code for the when the condition is true (the consequent), an unconditional branch past the end of the consequent, and the consequent itself:

    if (a == b)
    {
        /* do something */
    }

The generated code (assuming that $t0 == a and $t1 == b already) might look like:

        beq $t0, $t1, main.if0.true
        j main.if0.end
main.if0.true:
        ;;; consequent
main.if0.end:

For compound conditionals, such as

    if (a == b && b <= c)
        /* do something */
    }

it could generate

        bge $t0, $t1, main.if1.and0
        j main.if1.end
main.if.and0:
        blt $t0, $t2, main.if1.true
        j main.if1.end
main.if1.true:
        ;;; consequent
main.if1.end:


Note that the compiler has to generate a unique label or target table listing for each of the branch targets; the simplest implementation of this is to keep a count of the labels, and assign them a separate label name with the count appended to it. For the sake of readability, the example code uses a local label with the function name (main, in this case), the type of expression, and the count of the expressions of this type.

Thus, for nested ifs, such as

    if (a == b)
    {
        if (b <= c)
        {
            /* do the inner clause */
        }
        /* do the rest of the outer clause 8/
    }


it might produce:

        bge $t0, $t1, main.if2.true
        j main.if2.end
main.if2.true:
        blt $t0, $t2, main.if3.true
        j main.if3.end
main.if3.true:
        ;;; consequent
main.if3.end:
        ;;; remaining code
main.if2.end:


Basic Optimizations

An obvious, and reasonably simple, optimization of this is to negate or reverse the condition, thus eliminating the need for the unconditional branch:

        bne $t0, $t1, main.if4.true
main.if4.true:
        ;;; consequent
main.if4.end:

Similarly, for nested ifs where the inner if is the first or last clause of the outer loop, then it can remove an extraneous labels:

        blt $t0, $t1, main.if5.end
        bge $t0, $t2, main.if6.end
        ;;; inner consequent
main.if6.end:
        ;;; remaining code o outer consequent
main.if2.end:


Definite Loops

The naive implementation of a definite loop is a conditional branch at the start of the loop and an unconditional branch back to that conditional branch at the end of the loop, which is also the naive implementation of an explicit for loop. An example in MIPS assembly code (using pseudo-instructions) might be:

        clear $t0                 
        move $t1, for_loop_count    
for1.start:
        bge $t0, $t1, for1.end        ; if t1 is greater than or equal to t0, jump past the loop     
        ;;  body of the loop here
        addi $t0, $t0, 1
        bra for1.start
for1.end


However, even a naive compiler will usually do an unconditional branch, followed by the loop label, followed by the body, and then put the loop condition as the target of the original unconditional branch with the condition negated or reversed:

   clear $t0
        move $t1, for_loop_count    
        BRA for1.test
for1.start:
        ;;  body of the loop here
        addi $t0, $t0, 1
for1.test:
        BLT $t0, $t1, for1.start        ; if t1 is less than to t0, jump past the loop 
for1.end


This means that after the loop entry, the loop has just the unconditional branch, making it faster. For a hand-coded recursive descent compiler, the parsing process actually makes this almost as easy to do as the naive form described earlier - the only complications are that you need to pass the endpoint label to the function parsing the body in case there is a break or continue, and it makes debugging marginally harder because the test isn't where you expect. It is even simpler the tail call optimization, and unlike TCO, it doesn't smash the stack trace.