Optimizing: Difference between revisions
[unchecked revision] | [unchecked revision] |
(add some to "Cache" , and startede on "piping",) |
(Correct the most obvious typos. No grammar corrections) |
||
Line 11: | Line 11: | ||
and so on. |
and so on. |
||
There is also some places where you |
There is also some places where you don't need to optimize, because it would not help much. Generally, things that are not worth optimizing are things that only run once: loading an [[IDT]], enabling [[Paging|paging]], and crash recovery (emphasis in that case should be on stability, not speed). |
||
'''This is the first step, to fast code.''' What does this mean? [[User:Jackscott|Jackscott]] 10:38, 18 November 2008 (UTC) |
'''This is the first step, to fast code.''' What does this mean? [[User:Jackscott|Jackscott]] 10:38, 18 November 2008 (UTC) |
||
Line 33: | Line 33: | ||
Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual. |
Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual. |
||
The main reason for code to become buggy or not work after adding |
The main reason for code to become buggy or not work after adding compiler optimizations is that it removes variables, or assumes that it doesn't have to reload variables because their value isn't changed in any other code (which isn't true for most components of kernels, which are hugely interdependent). To prevent this from happening, use the [[Volatile (keyword)|Volatile]] keyword. |
||
==The Cache== |
==The Cache== |
||
The CPU's cache contains a very fast store of currently-used data. If an instruction references data that is already in the cache, it is going to be much faster than if it has to be read from general-purpose RAM. The change in speed, can vary due to memory |
The CPU's cache contains a very fast store of currently-used data. If an instruction references data that is already in the cache, it is going to be much faster than if it has to be read from general-purpose RAM. The change in speed, can vary due to memory access and other bus/ memory related instructions. To give a percentage, it varies from 200% ( in case it were from ram) to 400 % faster. however, the cache is not very big. On older processors it is in KB, but today it is mostly MB, varying from 1 - 8MB . |
||
The reason why a memory |
The reason why a memory access can be slower, is that if you try to access a memory part, witch is not cached. That means that the CPU has to get that "area", even into the cache, or directly. |
||
== |
== Piping == |
||
in the older days, where dual cores did not exist, and the CPUs |
in the older days, where dual cores did not exist, and the CPUs were slow, they need to speed things up. of cause there are many ways to |
||
achieve this. Intel |
achieve this. Intel decided to use the Pipe idea, used in RICS CPUs and other high speed CPUs. The idea behind piping is that you can execute more instructions in only 1 cycle. This means that if you have 2 pipes ( 1 pipe = 1 instruction / cycle), you can execute 2 instructions in 1 time. Now there are only 1 big problem. How can that be possible if you change a register in the fist instruction, and then use it in the next?, the solution is bad because there is no other solution: wait until the first instruction has completed. So the upside of this is simply: |
||
you can increase the speed many times, but you can also make the code |
you can increase the speed many times, but you can also make the code extremely slow. |
||
today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we |
today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we don't think about piping. |
||
here is a exampel for 2 pipes, the first is not |
here is a exampel for 2 pipes, the first is not optimized, the other is: |
||
mov eax,0x0000aa55 ; |
mov eax,0x0000aa55 ; don't think about the values, only the instruction order |
||
mov esi,eax |
mov esi,eax |
||
mov ebx,0x0000beef |
mov ebx,0x0000beef |
||
Line 53: | Line 53: | ||
mov es,cx |
mov es,cx |
||
mov edi,ebx |
mov edi,ebx |
||
; and so could we |
; and so could we continue all the day long... |
||
now, can you see the problem ?, it |
now, can you see the problem ?, it should be very simple to image. |
||
however, here is a example witch is |
however, here is a example witch is optimized for 2 pipes: |
||
mov eax,0x0000aa55, the code does the same, but in a different order. |
mov eax,0x0000aa55, the code does the same, but in a different order. |
||
Line 65: | Line 65: | ||
mov es,cx |
mov es,cx |
||
this example will run 3 times as fast as the |
this example will run 3 times as fast as the unoptimized. The 2 pipes names is "V" and the "U" pipe, one thing to notice, is that the "U" pipe can NOT execute all types of instructions, only the most common instructions. |
||
{{Stub}} |
{{Stub}} |
Revision as of 20:59, 18 November 2008
Optimizing is a common thing in programming, but is also very important when you are programming an Operating System. One of the important things about optimizing, is that you have to know where to optimize, and where not.
When and What to Optimize
As said above, there is a time you have to optimize your code, and sometime it does not matter. There is one main place where optimizing is a very good idea: loops. Why you might ask yourself, because if you run though a loop 1 milion times, and there is just one "unnecessary line of code", then it matters a lot.
Loops are not the only places to optimize, places like:
- Often run code
- Graphics code
- API code
and so on.
There is also some places where you don't need to optimize, because it would not help much. Generally, things that are not worth optimizing are things that only run once: loading an IDT, enabling paging, and crash recovery (emphasis in that case should be on stability, not speed).
This is the first step, to fast code. What does this mean? Jackscott 10:38, 18 November 2008 (UTC)
Compiler Optimization
If you're using one of the more complex compilers (such as GCC), you have available to you a lot of optimizations that can be done without having to write (or remove) a single line of code. One thing to be aware of when using compiler optimizations is that when overdone (or even done at all) they can cause previously functioning code to become buggy, or not work at all.
To use compiler optimizations in GCC, simply add the -O(x) switch to the command line when compiling, where (x) is one of the following:
Switch | Use |
---|---|
s | Optimize for size. |
1 | Do mild speed optimizations. |
2 | Do more speed optimizations. |
3 | Do lots of speed optimizations (not recommended). |
Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual.
The main reason for code to become buggy or not work after adding compiler optimizations is that it removes variables, or assumes that it doesn't have to reload variables because their value isn't changed in any other code (which isn't true for most components of kernels, which are hugely interdependent). To prevent this from happening, use the Volatile keyword.
The Cache
The CPU's cache contains a very fast store of currently-used data. If an instruction references data that is already in the cache, it is going to be much faster than if it has to be read from general-purpose RAM. The change in speed, can vary due to memory access and other bus/ memory related instructions. To give a percentage, it varies from 200% ( in case it were from ram) to 400 % faster. however, the cache is not very big. On older processors it is in KB, but today it is mostly MB, varying from 1 - 8MB . The reason why a memory access can be slower, is that if you try to access a memory part, witch is not cached. That means that the CPU has to get that "area", even into the cache, or directly.
Piping
in the older days, where dual cores did not exist, and the CPUs were slow, they need to speed things up. of cause there are many ways to achieve this. Intel decided to use the Pipe idea, used in RICS CPUs and other high speed CPUs. The idea behind piping is that you can execute more instructions in only 1 cycle. This means that if you have 2 pipes ( 1 pipe = 1 instruction / cycle), you can execute 2 instructions in 1 time. Now there are only 1 big problem. How can that be possible if you change a register in the fist instruction, and then use it in the next?, the solution is bad because there is no other solution: wait until the first instruction has completed. So the upside of this is simply: you can increase the speed many times, but you can also make the code extremely slow. today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we don't think about piping. here is a exampel for 2 pipes, the first is not optimized, the other is:
mov eax,0x0000aa55 ; don't think about the values, only the instruction order mov esi,eax mov ebx,0x0000beef mov ecx,ebx mov es,cx mov edi,ebx ; and so could we continue all the day long...
now, can you see the problem ?, it should be very simple to image. however, here is a example witch is optimized for 2 pipes:
mov eax,0x0000aa55, the code does the same, but in a different order. mov ebx,0x0000beef mov esi,eax mov ecx,ebx mov edi,ebx mov es,cx
this example will run 3 times as fast as the unoptimized. The 2 pipes names is "V" and the "U" pipe, one thing to notice, is that the "U" pipe can NOT execute all types of instructions, only the most common instructions.
Links
Forum Topics
- 01000101 asks about optimizing memcpy() and memset()
- A discussion on the best way to do multi-processor scheduling