Optimizing

From OSDev.wiki
Revision as of 20:54, 16 February 2009 by osdev>Brynet-inc
Jump to navigation Jump to search

Optimization is the practise of taking functioning code and modifying it in order to make it execute faster. Optimization is already discussed very thoroughly in other places for various userland problem domains, and below are a few areas that are important for operating system developers to consider.

Note that to fully understand the ideas on this page, it is important to have fairly good grasp of assembly (x86 examples will be used), and a thorough knowledge of the x86 architecture is a must. But you've already read the Intel CPU manuals, haven't you?

When and What to Optimize

Please take the following list with some common sense: your kernel might be different.

Things to Optimize

  • Deeply-nested loops. If possible, try to eliminate some of the nesting of the loop. Any instruction taken out of a loop will have a much more significant speed impact than an instruction taken out of a general routine. If you run though a loop one million times, and there is just one "unnecessary line of code", then there is a significant portion of a second wasted, especially on slower computers.
  • Code than is run often.
  • Graphics code.
  • API code.

Things not to bother Optimizing

  • Boot code. Code that only runs once (and doesn't take very long, such as loading an IDT), is not worth optimizing.

Things not to Optimize

  • Untested or nonworking code. If you try to prematurely optimize, you will almost certainly make any problems you might have worse. 'First make it run, then make it run fast.'
  • Code that needs to retain clarity. It is generally fairly hard to keep optimized code looking as neat and understandable as code that is designed to be easy to understand. Comments can help (and should be used anyway, especially to mark why optimizations were made).
  • Crash recovery code. This will depend on your personal stance. It may be a good idea to err on the side of caution and concentrate on stability of the recovery code, not speed.

Compiler Optimization

If you're using one of the more complex compilers (such as GCC), you have available to you a lot of optimizations that can be done without having to write (or remove) a single line of code. One thing to be aware of when using compiler optimizations is that when overdone (or even done at all) they can cause previously functioning code to become buggy, or not work at all. It is important to remember that this is not so much a problem with the optimizations the compiler does, but the fact that the code had underlying errors in the first place. That is to say, not using the volatile keyword correctly.

To use compiler optimizations in GCC, simply add the -O(x) switch to the command line when compiling, where (x) is one of the following:

Switch Use
s Optimize for size.
1 Do mild speed optimizations.
2 Do more speed optimizations.
3 Do lots of speed optimizations (not recommended).

Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual.

The main reason for code to become buggy or not work after adding compiler optimizations is that it removes variables, or assumes that it doesn't have to reload variables because their value isn't changed in any other code (which isn't true for most components of kernels, which are hugely interdependent). To prevent this from happening, use the volatile keyword.

Inline Functions

What?

The inline keyword is a way of requesting that the compiler insert the code instead of calling it. This means that if you create an inline function and then call it from another function, the compiler will try to replace your call with the function body (similar to a preprocessor define, but with none of its disadvantages).

Advantages

Inline functions save the clock cycles involved in actually calling a function (argument and return code pushes, call and return jump). Compared to preprocessor defines, they add type safety, and avoid side effects that make macros a hazard. Inlining can also be triggered on and off using compiler options so you can measure the size / speed tradeoff.

Disadvantages

A normal function called from five different locations will only reside in memory once. If you inline that function, it will reside in memory five times. Your code becomes faster, but it also becomes bigger.

Caveats

The inline keyword is merely a request to the compiler, which the compiler can choose to ignore if it deems the size / speed tradeoff to be detrimental.

When To Use It?

It is best to only use the inline keyword on small functions, where it is known to work best and to provide the largest speed boost. An example of small function is an accessor or a mutator inside a C++ class, or a function that simply sets the value.

How To Use It?

The inline keyword is simply specified before the function type, just like the static keyword:

inline int max_of_three(int x, int y, int z)
{
    if ( x > y && x > z )
        return x;
    return ( y > z ) ? y : z;
}

The Cache

The CPU's cache contains a high-bandwidth store of currently used data. On a modern CPU, it usually takes 2 cycles to do a read or write from level 1 (L1) cache (compared with 1 cycle for a register), 5 cycles for level 2 (L2) cache, and perhaps around 30 cycle for main memory. As you can see, reading from memory is very expensive compared to reading from a register or cache. Unfortunately, this trade-off for speed comes at a size cost: L1 cache is only around 8KiB per processor core, with L2 cache sitting around 4MiB to 12MiB per Core 2 chip. For more information, see this Wikipedia entry.

Most processors try to keep the most recently used data in the two caches (though the method used can differ, check the manuals). For this reason, it pays to keep in mind what data is likely to be in the caches at any given time. If you for instance (and this is a very simplistic example) set your multitasking code to switch processes too fast, you'll waste a higher percentage of time going through cache 'flushes', refilling the cache with new data.

ToDo: More info on how we can keep the cache in mind while programming.

Piping

in the older days, where dual cores did not exist, and the CPUs were slow, they need to speed things up. of cause there are many ways to achieve this. Intel decided to use the Pipe idea, used in RICS CPUs and other high speed CPUs. The idea behind piping is that you can execute more instructions in only 1 cycle. This means that if you have 2 pipes ( 1 pipe = 1 instruction / cycle), you can execute 2 instructions in 1 time. Now there are only 1 big problem. How can that be possible if you change a register in the fist instruction, and then use it in the next?, the solution is bad because there is no other solution: wait until the first instruction has completed. So the upside of this is simply: you can increase the speed many times, but you can also make the code extremely slow. today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we don't think about piping. here is a example for 2 pipes, the first is not optimized, the other is:

mov eax,0x0000aa55 ; don't think about the values, only the instruction order
mov esi,eax 
mov ebx,0x0000beef 
mov ecx,ebx
mov es,cx
mov edi,ebx
; and so could we continue all the day long...

now, can you see the problem ?, it should be very simple to image. however, here is a example witch is optimized for 2 pipes:

mov eax,0x0000aa55, the code does the same, but in a different order.   
mov ebx,0x0000beef 
mov esi,eax 
mov ecx,ebx
mov edi,ebx
mov es,cx

this example will run 3 times as fast as the unoptimized. The 2 pipes names is "V" and the "U" pipe, one thing to notice, is that the "U" pipe can NOT execute all types of instructions, only the most common instructions.

This page is a stub.
You can help the wiki by accurately adding more contents to it.

Links

Forum Topics

External