Optimizing: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(add some to "Cache" , and startede on "piping",)
(Correct the most obvious typos. No grammar corrections)
Line 11: Line 11:
and so on.
and so on.


There is also some places where you dont need to optimize, because it would not help much. Generally, things that are not worth optimizing are things that only run once: loading an [[IDT]], enabling [[Paging|paging]], and crash recovery (emphasis in that case should be on stability, not speed).
There is also some places where you don't need to optimize, because it would not help much. Generally, things that are not worth optimizing are things that only run once: loading an [[IDT]], enabling [[Paging|paging]], and crash recovery (emphasis in that case should be on stability, not speed).


'''This is the first step, to fast code.''' What does this mean? [[User:Jackscott|Jackscott]] 10:38, 18 November 2008 (UTC)
'''This is the first step, to fast code.''' What does this mean? [[User:Jackscott|Jackscott]] 10:38, 18 November 2008 (UTC)
Line 33: Line 33:
Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual.
Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual.


The main reason for code to become buggy or not work after adding compilier optimizations is that it removes variables, or assumes that it doesn't have to reload variables because their value isn't changed in any other code (which isn't true for most components of kernels, which are hugely interdependent). To prevent this from happening, use the [[Volatile (keyword)|Volatile]] keyword.
The main reason for code to become buggy or not work after adding compiler optimizations is that it removes variables, or assumes that it doesn't have to reload variables because their value isn't changed in any other code (which isn't true for most components of kernels, which are hugely interdependent). To prevent this from happening, use the [[Volatile (keyword)|Volatile]] keyword.


==The Cache==
==The Cache==


The CPU's cache contains a very fast store of currently-used data. If an instruction references data that is already in the cache, it is going to be much faster than if it has to be read from general-purpose RAM. The change in speed, can vary due to memory acess and other bus/ memory relatede instructions. To give a procentage, it varys from 200% ( in case it were from ram) to 400 % faster. however, the cache is not very big. On older processors it is in KB, but today it is mostly MB, varying from 1 - 8MB .
The CPU's cache contains a very fast store of currently-used data. If an instruction references data that is already in the cache, it is going to be much faster than if it has to be read from general-purpose RAM. The change in speed, can vary due to memory access and other bus/ memory related instructions. To give a percentage, it varies from 200% ( in case it were from ram) to 400 % faster. however, the cache is not very big. On older processors it is in KB, but today it is mostly MB, varying from 1 - 8MB .
The reason why a memory acess can be slower, is that if you try to acess a memory part, whitch is not cached. That means that the cpu has to get that "area", even into the cache, or directly.
The reason why a memory access can be slower, is that if you try to access a memory part, witch is not cached. That means that the CPU has to get that "area", even into the cache, or directly.


== Pipeping ==
== Piping ==
in the older days, where dual cores did not exist, and the CPUs' were slow, they need to speed things up. of cause there are many ways to
in the older days, where dual cores did not exist, and the CPUs were slow, they need to speed things up. of cause there are many ways to
achieve this. Intel desided to use the Pipe ideer, used in RICS CPUs and other high speed CPUs. The ideer about pipeping is that you can execute more instrcutions in only 1 cycle. This means that if you have 2 pipes ( 1 pipe = 1 instrcution / cycle), you can execute 2 instructions in 1 time. Now there are only 1 big problem. How can that be possible if you change a register in the fist instruction, and then use it in the next?, the solution is bad because there is no other solution: wait until the first instruction has completed. So the upside of this is simply:
achieve this. Intel decided to use the Pipe idea, used in RICS CPUs and other high speed CPUs. The idea behind piping is that you can execute more instructions in only 1 cycle. This means that if you have 2 pipes ( 1 pipe = 1 instruction / cycle), you can execute 2 instructions in 1 time. Now there are only 1 big problem. How can that be possible if you change a register in the fist instruction, and then use it in the next?, the solution is bad because there is no other solution: wait until the first instruction has completed. So the upside of this is simply:
you can increase the speed many times, but you can also make the code extremly slow.
you can increase the speed many times, but you can also make the code extremely slow.
today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we dont think about pipeing.
today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we don't think about piping.
here is a exampel for 2 pipes, the first is not optimised, the other is:
here is a exampel for 2 pipes, the first is not optimized, the other is:
mov eax,0x0000aa55 ; dont think about the values, only the instruction order
mov eax,0x0000aa55 ; don't think about the values, only the instruction order
mov esi,eax
mov esi,eax
mov ebx,0x0000beef
mov ebx,0x0000beef
Line 53: Line 53:
mov es,cx
mov es,cx
mov edi,ebx
mov edi,ebx
; and so could we countine all the day long...
; and so could we continue all the day long...
now, can you see the problem ?, it shuld be very simple to image.
now, can you see the problem ?, it should be very simple to image.
however, here is a example witch is optimised for 2 pipes:
however, here is a example witch is optimized for 2 pipes:


mov eax,0x0000aa55, the code does the same, but in a different order.
mov eax,0x0000aa55, the code does the same, but in a different order.
Line 65: Line 65:
mov es,cx
mov es,cx


this example will run 3 times as fast as the uoptimized. The 2 pipes names is "V" and the "U" pipe, one thing to notice, is that the "U" pipe can NOT execute all types of instrucitons, only the most common instructions.
this example will run 3 times as fast as the unoptimized. The 2 pipes names is "V" and the "U" pipe, one thing to notice, is that the "U" pipe can NOT execute all types of instructions, only the most common instructions.


{{Stub}}
{{Stub}}

Revision as of 20:59, 18 November 2008

Optimizing is a common thing in programming, but is also very important when you are programming an Operating System. One of the important things about optimizing, is that you have to know where to optimize, and where not.

When and What to Optimize

As said above, there is a time you have to optimize your code, and sometime it does not matter. There is one main place where optimizing is a very good idea: loops. Why you might ask yourself, because if you run though a loop 1 milion times, and there is just one "unnecessary line of code", then it matters a lot.

Loops are not the only places to optimize, places like:

  • Often run code
  • Graphics code
  • API code

and so on.

There is also some places where you don't need to optimize, because it would not help much. Generally, things that are not worth optimizing are things that only run once: loading an IDT, enabling paging, and crash recovery (emphasis in that case should be on stability, not speed).

This is the first step, to fast code. What does this mean? Jackscott 10:38, 18 November 2008 (UTC)

Compiler Optimization

If you're using one of the more complex compilers (such as GCC), you have available to you a lot of optimizations that can be done without having to write (or remove) a single line of code. One thing to be aware of when using compiler optimizations is that when overdone (or even done at all) they can cause previously functioning code to become buggy, or not work at all.

To use compiler optimizations in GCC, simply add the -O(x) switch to the command line when compiling, where (x) is one of the following:

Switch Use
s Optimize for size.
1 Do mild speed optimizations.
2 Do more speed optimizations.
3 Do lots of speed optimizations (not recommended).

Note that only one command line switch can be used on a single compilation, and that each level of speed optimizations does everything the level below it does, and more. For more information, read the GCC manual.

The main reason for code to become buggy or not work after adding compiler optimizations is that it removes variables, or assumes that it doesn't have to reload variables because their value isn't changed in any other code (which isn't true for most components of kernels, which are hugely interdependent). To prevent this from happening, use the Volatile keyword.

The Cache

The CPU's cache contains a very fast store of currently-used data. If an instruction references data that is already in the cache, it is going to be much faster than if it has to be read from general-purpose RAM. The change in speed, can vary due to memory access and other bus/ memory related instructions. To give a percentage, it varies from 200% ( in case it were from ram) to 400 % faster. however, the cache is not very big. On older processors it is in KB, but today it is mostly MB, varying from 1 - 8MB . The reason why a memory access can be slower, is that if you try to access a memory part, witch is not cached. That means that the CPU has to get that "area", even into the cache, or directly.

Piping

in the older days, where dual cores did not exist, and the CPUs were slow, they need to speed things up. of cause there are many ways to achieve this. Intel decided to use the Pipe idea, used in RICS CPUs and other high speed CPUs. The idea behind piping is that you can execute more instructions in only 1 cycle. This means that if you have 2 pipes ( 1 pipe = 1 instruction / cycle), you can execute 2 instructions in 1 time. Now there are only 1 big problem. How can that be possible if you change a register in the fist instruction, and then use it in the next?, the solution is bad because there is no other solution: wait until the first instruction has completed. So the upside of this is simply: you can increase the speed many times, but you can also make the code extremely slow. today, Intel says that their CPUs can have up to 6 pipes, so at a assembly level we can make the code up to 600 % faster, in case that we don't think about piping. here is a exampel for 2 pipes, the first is not optimized, the other is:

mov eax,0x0000aa55 ; don't think about the values, only the instruction order
mov esi,eax 
mov ebx,0x0000beef 
mov ecx,ebx
mov es,cx
mov edi,ebx
; and so could we continue all the day long...

now, can you see the problem ?, it should be very simple to image. however, here is a example witch is optimized for 2 pipes:

mov eax,0x0000aa55, the code does the same, but in a different order.   
mov ebx,0x0000beef 
mov esi,eax 
mov ecx,ebx
mov edi,ebx
mov es,cx

this example will run 3 times as fast as the unoptimized. The 2 pipes names is "V" and the "U" pipe, one thing to notice, is that the "U" pipe can NOT execute all types of instructions, only the most common instructions.

This page is a stub.
You can help the wiki by accurately adding more contents to it.

Links

Forum Topics

External