Detecting CPU Speed: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
m (→‎Working Example Code: fixed a bug where the code did not run when TSC was detected ('jnz detect_end' should have been 'jz detect_end'))
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(11 intermediate revisions by 8 users not shown)
Line 1:
==What is CPU Speed==
 
There"CPU arespeed" has several different things that could be called "CPU speed"definitions:
# howHow quickly itthe processor can execute code (e.g. instructions per second)
# howHow fast itsthe processor's clock is running (e.g. cycles per second)
 
How quickly a CPU can execute code is important for determining the CPU's performance. How fast a CPU's clock is running is only useful for specific cases (e.g. calibrating the CPU's TSC to use for measuring time).
 
There are also several different measurements for these different "CPU speeds":
# bestBest case
# nominalNominal case
# averageAverage case
# currentCurrent case
# worstWorst case
 
For example, look at a modern Intel Core i7 CPU (with turbo-boost, power management and hyper-threading). The best case instructions per second would occur when:
For example, if you look at a modern Core i7 CPU (with turbo-boost, power management and hyper-threading), the best case instructions per second would occur when there's no throttling/power saving at all, only one logical CPU is running (turbo-boost activated and hyper-threading not being used), you're executing simple instructions with no dependencies in a loop that fits in the CPU's "loop buffer", there are no branch mispredictions, and there are no accesses to memory (no data being transferred to/from caches or RAM). The worst case instructions per second would be the exact opposite; and may be several orders of magnitude worse (e.g. a best case of 4 billion instructions per second and a worst case of 100 million instructions per second). The nominal instructions per second is an estimation of "normal" - e.g. the normal average instructions per second you'd expect (note: "nominal cycles per second" is used more often). All of these things are fixed values - a specific CPU always has the same best case, worst case and nominal case, and these values don't change depending on CPU load, which instructions are/were executed, etc.
# There is no throttling/power saving at all.
# Only one logical CPU is running (turbo-boost activated and hyper-threading not being used)
# Simple instructions with no dependencies in a loop that fits in the CPU's "loop buffer" are being executed
# There are no branch mispredictions
# There are no accesses to memory (no data being transferred to/from caches or RAM)
 
The currentworst instructions per second is thecase instructions per second atwould abe specificthe instantexact in timeopposite and mustmay be somewhereseveral betweenorders theof bestmagnitude andworse worst cases(e.g. Ita can'tbest actuallycase beof measured,4 but can be estimated by finding the averagebillion instructions per second forand a veryworst shortcase periodof of100 million instructions timeper second). The averagenominal caseinstructions isper somethingsecond that(commonly hasreferred to beas measured."nominal Bothcycles theper currentsecond") instructionsis peran secondestimation andof the normal average instructions per second dependa heavilydeveloper onwould theexpect. codeAll thatof wasthese running.things Forare example,fixed thevalues average- instructionsa perspecific secondCPU foralways ahas seriesthe ofsame NOPbest instructionscase, mayworst becase muchand highernominal thatcase, theand averagethese instructionsvalues perdon't secondchange fordepending aon seriesCPU ofload, DIVwhich instructions are/were executed, etc.
 
The current instructions per second is the instructions per second at a specific instant in time and must be somewhere between the best and worst cases. It can't be measured exactly, but can be estimated by finding the average instructions per second for a very short period of time. The average case is something that has to be measured. Both the current instructions per second and the average instructions per second depend heavily on the code that was running. For example, the average instructions per second for a series of NOP instructions may be much higher that the average instructions per second for a series of DIV instructions.
 
 
==General Method==
 
In order to tell what's the CPU speed is, we need two things:
# beingBeing able to tell that a given (precise) amount of time has elapsed.
# beingBeing able to know how muchmany 'clock cycles' a portion of code took.
 
Once these two sub-problems are solved, one can easily tell the CPU speed
Line 37 ⟶ 44:
 
Note that except for very special cases, using a busy-loop (even calibrated) to introduce delays is a bad idea and that it should be kept for very small delays
(nano or micro seconds) that you must complybe complied with when programming hardware only.
 
Also note that PC emulators (like BOCHS, for instance) are rarely realtime, andresulting in the clock appearing to run faster
than expected.
that you shouldn't be surprised if your clock appears to run faster than expected
on those emulators.
 
=== Waiting for a given amount of time ===
 
There are two circuits in a PC that allows you to deal with time: the [[PIT]] (Programmable Interval Timer, - Intel 8253 iircand 8254 are common PITs) and the RTC (Real Time Clock). The PIT is probably the better of the two for this task.
 
The PIT has two operating modemodes that can be useful for telling the cpuCPU speed:
# the ''periodic interrupt'' mode (0x36), in which a signal is emitted to the interrupt controller at a fixed frequency. This is especially interesting on PIT channel 0 which is bound to IRQ0 on a PC.
# the ''one shot'' mode (0x34), in which the PIT will decrease a counter at its top speed (1.19318 MHz) until the counter reaches zero.
 
Whether or not an IRQ is fired by channel0 in 0x34 mode should be checked.
 
Note that theoretically, _one''one shot_shot'' mode could be used with a _polling_''polling'' approach, reading the current count on the channel's data port, but I/O bus cycles have unpredictable latency and oneit is up to the programmer shouldto make sure the timestamp counter is not affected by this approach.
 
=== Knowing how many cycles your loop takes ===
 
This step depends on yourthe CPU. On 286, 386 and 486, each instruction tooktakes
a well-known and deterministic amount of clock cycles to execute. This allowedallows
the programmer to tell exactly how many cycles a loop iteration took by looking
up the timing of each instruction and then sumsumming them up.
 
Since the multi-pipelined architecture of the Pentium,; however, such numbers are
no longer communicated (for a major part because the same instruction could have
variable timings depending on its surrounding, which makes the timing almost useless).
Line 68 ⟶ 74:
It is possible to create code which is exceptionally pipeline hostile such as:
 
<syntaxhighlight lang="asm">
<pre>
xor eax,edx
xor edx,eax
Line 74 ⟶ 80:
xor edx,eax
...
</syntaxhighlight>
</pre>
 
A simple xor instruction takes one cycle, and it'swhich guaranteedguarantees that the processor cannot pipeline this code as the current instructions operands depend on the results from the last calculation. One can check that, for a small count (tested from 16 to 64), RDTSC will show the instruction count is almost exactly (sometimes off by one) the cycles count. Unfortunately, when making the chain longer you'll start experiencing, code cache misses, whichwill willoccur, ruinruining the whole process.
 
E.g. [http://www.sylvain-ulg.be.tf/resources/speed.c looping on a chain of 1550 XORs] may require a hundred of iterations before it stabilizes around 1575 clock cycles on a AMDx86-64, and I'mwill stilltake waitingan itexceedingly long time to stabilize on mya Pentium3.
 
Despite this inaccuracy, it gives relatively good results across the whole processor generation given a reasonably accurate timer. but ifIf very accurate measurements are needed the next method should prove more useful.
 
A Pentium developer has a much better tool to tell timings: the _Time''Time Stamp Counter_Counter'': an internal counter that can be read using RDTSC special instruction
 
[http://www.math.uwaterloo.ca/~~jamuir/rdtscpm1.pdf rdtscpm1.pdf] explains how that feature can be used for performance monitoring and should provide the necessary information on how to access the TSC on a Pentium.
 
===RDTSC Instruction Access===
 
The presence of the Time Stamp Counter (and thus the availability of RDTSC instruction) can be detected through the [CPUID] instruction. When callingCalling CPUID with eax=1, you'llwill receiveput the featuresfeature flags in edx. TSC is the bit #4four of that field.
 
Note that prior to useusing the CPUID instruction, you should also make sure the processor supportsupports it by testing the 'ID' bit in eflags. (thisThis is 0x200000 and is modifiable only when CPUID instruction is supported. For systems that doesn't support CPUID, writing a '1' at that place will have no effect.)
 
In the case of a processor that does not support CPUID, you'll have to use more eflags-based tests to tell if you're running on a 486, 386, etc. is running and then pick up one of the 'calibrated loops' for that architecture (8086 through 80486 may have variable instruction timings).
 
 
=== Working Example Code===
 
There is aA Real Mode Intel-copyrighted example is present in the above-mentioned application note ...
Here comes anotherAnother code snippet submitted by user DennisCGC that will give the total measured frequency of a Pentium processor follows.
 
Some notes:
* ''irq0_count'' is a variable, which increases each time when the timer interrupt is called.
* inIn this code it's assumed that the [PIT] is programmed to 100 hzHz (ofThe course,formula Ito givecalculate thea formuladesired aboutHz howis toprovided calculatein itthe snippet.)
* it'sThe assumedcode assumes that the command CPUID is supported.
 
<syntaxhighlight lang = "asm">
<pre>
;__get_speed__:
;first do a cpuid command, with eax=1
Line 140 ⟶ 146:
; ax contains measured speed in MHz
mov ~[mhz], ax
</syntaxhighlight>
</pre>
 
See the intelIntel manual (see links) for more information.
:- bugs report are welcome. IM to [http://www.mega-tokyo.com/forum/index.php?action=viewprofile;user=DennisCGc DennisCGC]
 
=== Without Interrupts ===
 
{{Tone}}
 
I'd be tempted to say 'yes', though I haven't gave it a test nor heard of it elsewhere so far. Here is the trick:
 
<syntaxhighlight lang="C">
<pre>
disable() // disable interrupts (if still not done)
outb(0x43,0x34); // set PIT channel 0 to single-shot mode
Line 160 ⟶ 168:
byte lo=inb(0x40);
byte hi=inb(0x40);
</syntaxhighlight>
</pre>
 
Now, we know that
Line 175 ⟶ 183:
 
== Asking the SMBios for CPU speed ==
{{Main|SMBIOS}}
 
The [[SMBiosSMBIOS]] (System Management BIOS) Specification addresses how motherboard and system vendors present management information about their products in a standard format by extending the BIOS interface on Intel architecture systems. The information is intended to allow generic instrumentation to deliver this information to management applications that use DMI, CIM or direct access, eliminating the need for error prone operations like probing system hardware for presence detection.
 
Do note that SMBIOS was never intended for initialization purposes. It was intended to provide information for asset management systems to quickly determine what computers contained what hardware. Unfortunately this means that it might be quite unreliable, especially on cheap/home systems. So ultimately it may not be the best way to determine CPU speed.
 
===SMBios Processor Information===
Line 192 ⟶ 203:
 
All this is depicted in 'Accessing SMBIOS Information' structure of the standard (p 11).
 
''The SMBIOS Entry Point structure, described below, can be located by application software by searching for the anchor-string on paragraph (16-byte) boundaries within the physical memory address range 000F0000h to 000FFFFFh. This entry point encapsulates an intermediate anchor string that is used by some existing DMI browsers.''
 
{| {{wikitable}}
|-
|00-03
|Anchor String (_SM_ or 5f 53 4d 5f)
|-
|04
|Checksum
|-
|05
|Length
|-
|06
|major version
|-
|07
|minor version
|-
|08-09
|max structure size
|-
|0A
|entry point revision
|-
|0B-0F
|formatted area
|-
|10-14
|_DMI_ signature
|-
|15
|intermediate checksum
|-
|16-17
|structure table length
|-
|18-1B
|structure table (physical) address
|-
|1C-1D
|number of SMBIOS structures
|-
|1E
|SMBIOS revision (BCD)
|-
|}
 
I don't feel like re-explaining the PnP calling convention etc. as chances are it will be useless in Protected Mode ...
 
==Links==
Line 250 ⟶ 211:
*Forum:922
*Forum:8949 featuring info on bogomips, how linux does it and durand's code.
* https://forum.osdev.org/viewtopic.php?f=1&t=32808
 
 
===Other resources===
* http://www.cs.usfca.edu/~cruse/cs630cs630s04/lesson23.pptppt‎, a crash course on PIT, and how to use it to compute CPU speed.
* http://www.sandpile.org/post/msgs/20004561.htm
* http://www.midnightbeach.com/jon/pubs/rdtsc.htm
Line 267 ⟶ 229:
 
[[Category:X86 CPU]]
[[Category:Hardware Detection]]