Detecting CPU Topology (80x86): Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
m (Reformat)
(Corrected section on CPUID 0x4 - based on experimentation and specs. Tested on dual Core i5.)
Line 79: Line 79:
=== INTEL: CPUID eax=0x00000004 ===
=== INTEL: CPUID eax=0x00000004 ===


For Intel CPUs (and <u>not</u> AMD), if "CPUID eax=0x0000000B" wasn't supported then this is the next CPUID function to try. You're only interested in EAX bits 26 to 31, which tells you the number of APIC IDs reserved for this chip. Use the "count" field you got earlier (from CPUID eax=0x00000001) rounded up to the next power of 2 to determine "logical_CPU_bits", then do "temp = EAX_bits_26_to_31 >> logical_CPU_bits" and use the result rounded up to the next power of 2 to determine "core_bits".
For Intel CPUs (and <u>not</u> AMD), if "CPUID eax=0x0000000B" wasn't supported then this is the next CPUID function to try. You're only interested in EAX bits 26 to 31, which tells you the number of APIC IDs reserved for this chip. The count field from earlier tells you the max number of logical CPUs in a package, whereas (EAX >> 26) & 0x3F from this function tells you the maximum number of cores in a package, -1. Hence, to get "logical_CPU_bits" and "core_bits", write a function that counts the number of left-shifts to get 0, which is the number of bits to represent a number. core_bits is func((EAX >> 26) & 0x3F), logical_CPU_bits is func(count - 1) - core_bits. Note that the initial ecx value does not affect EAX bits 26 to 31.


If "CPUID eax=0x00000004" isn't supported, then it's a single-core CPU with hyper-threading. Use the "count" field you got earlier (from CPUID eax=0x00000001) rounded up to the next power of 2 to determine "logical_CPU_bits", then set "core_bits = 0".
If "CPUID eax=0x00000004" isn't supported, then it's a single-core CPU with hyper-threading. Use the "count" field you got earlier (from CPUID eax=0x00000001) rounded up to the next power of 2 to determine "logical_CPU_bits", then set "core_bits = 0".

Revision as of 04:21, 16 August 2016

For certain kinds of optimizations, sometimes it's useful for applications and OSs to detect CPU topology and adjust (optimize) its behaviour accordingly.

Detecting CPU Topology

For complete CPU topology, there's 4 levels to consider:

  • Which logical CPU within a core (implemented, for example, by Intel's hyperthreading)
  • Which core within a chip
  • Which chip within a NUMA domain
  • Which NUMA domain

For convenience, this can be expressed as a colon separated ID. For example, "CPU 3:2:1:0" would be logical CPU #0 in core #1 of chip/package number 2, in NUMA domain #3.

General Steps

To begin with, the ACPI MADT table or the MP Specification table should be used to determine the APIC IDs for all CPUs. To better organise the information, a table of structures can be created that contains the information that is different for each CPU (e.g. APIC ID, topology information, etc.). This "CPU info" structure could include a pointer to a separate "CPU description" structure that stores information that can be the same for different CPUs (e.g. vendor, family/model/stepping, brand string, feature flags, cache sizes, etc).

To detect NUMA domain information the ACPI 'SRAT' (System Resource Affinity Table) has to be used. First, preset the "NUMA domain" field for each CPU to "unknown" (in case the CPU isn't mentioned in the SRAT table). Then, for each entry in the SRAT table you determine if it's a CPU or memory, and if the entry is a CPU the 'APIC ID' field is used to determine which CPU and the 'proximity domain' field to determine which NUMA domain that CPU is in; and set the NUMA domain for that CPU. Note that there's 2 different structures in the SRAT table - one for 8-bit local APIC IDs and one for 32-bit APIC IDs, and if you support x2APIC you need to use both.

To get nice clean results in the later steps, the list of CPUs should be sorted now - in order of NUMA domain, each NUMA domain in order of APIC ID.

For cores and logical CPUs; for each CPU, CPUID should to determine:

  • Number of bits in the APIC ID that are used to identify which logical CPU
  • Number of bits in the APIC ID that are used to identify which core

The values are used for a set of calculations:

  • logical_CPU_number_within_core = APIC_ID & ( (1 << logical_CPU_bits) -1)
  • core_number_within_chip = (APIC_ID >> logical_CPU_bits) & ( (1 << core_bits) -1)
  • chip_ID = APIC_ID & ~( (1 << (logical_CPU_bits+core_bits) ) -1)

How to use CPUID to determine these values is described below in the "Using CPUID" section. It should be noted that you only need to obtain 2 values: "logical_CPU_bits" and "core_bits".

The "chip ID" is a temporary value that you could discard later.

These calculations should be done separately for each logical CPU, because it's theoretically possible for different physical chips in the same computer to use APIC ID bits differently.

Once you've done these calculations the "chip number within NUMA domain" can be determined. All CPUs that have the same "chip ID" and the same NUMA domain are in the same chip.

For example:

    for(domain = 0; domain <= highest_domain; domain++) {
        chip_number = 0;
        for(firstCPU = 0; firstCPU < total_CPUs; firstCPU++) {
            if( CPU_info_table[firstCPU].domain == domain) {
                if( CPU_info_table[firstCPU].chipNumber == -1) {
                    CPU_info_table[firstCPU].chipNumber = chip_number;
                    chipID = CPU_info_table[firstCPU].chipID;
                    for(i = firstCPU; i < total_CPUs; i++) {
                        if( CPU_info_table[i].domain == domain) {
                            if( CPU_info_table[i].chipID == chipID) {
                                CPU_info_table[i].chipNumber = chip_number;
                            }
                        }
                    }
                }
                chipNumber++;
            }
        }
    }

There's a bit of strangeness which should be explained here. In general it doesn't make too much sense for the same chip to be in different NUMA domains. However, when Intel designed x2APIC they decided that Logical APIC ID (which shouldn't be confused with the "APIC ID") and the "Destination Format Register" would be hardwired; and that "Logical Destination Mode" would only support "cluster model". They decided that the highest bits of the Logical APIC ID would determine which "group of CPUs" and the lowest 4 bits of the Logical APIC ID would determine which CPU within the group. The end result is that, if a chip has more than 16 logical CPUs it will be split into different NUMA domains. For the algorithm above, this case is treated as separate chips in different NUMA domains (even though it is technically one physical chip spread across multiple NUMA domains). This is deliberate, as it avoids complications everywhere else (in the APIC/interrupt handling code, in the scheduler, etc).

Using CPUID

From the "General Steps" section above, we need to obtain 2 values: "logical_CPU_bits" and "core_bits".

There are several different CPUID functions that are useful for detecting CPU topology. They are shown in order below.

CPUID eax=0x00000001

This is the first step. Check if the "HTT" flag is set. If the HTT flag is clear (or if the CPU doesn't support CPUID) then it's an old single-core CPU without hyper-threading. In that case "logical_CPU_bits = 0" and "core_bits = 0" and you're done.

Otherwise (if the "HTT" flag is set), you want to get the "count" field from EBX bits 16 to 23. You (might) need it later.

INTEL: CPUID eax=0x0000000B

For Intel CPUs (and not AMD), this CPUID function tells you "Number of bits to shift right APIC ID to get next level APIC ID", and needs to be used twice. The first time (with ECX=0) it tells you how many bits of the APIC ID are used to identify the logical CPU within each core (logical_CPU_bits). The second time (with ECX=1) it tells you how many bits of the APIC ID are used to identify the core and logical CPU within the chip; and to get "core_bits" from this value you subtract "logical_CPU_bits" from it.

INTEL: CPUID eax=0x00000004

For Intel CPUs (and not AMD), if "CPUID eax=0x0000000B" wasn't supported then this is the next CPUID function to try. You're only interested in EAX bits 26 to 31, which tells you the number of APIC IDs reserved for this chip. The count field from earlier tells you the max number of logical CPUs in a package, whereas (EAX >> 26) & 0x3F from this function tells you the maximum number of cores in a package, -1. Hence, to get "logical_CPU_bits" and "core_bits", write a function that counts the number of left-shifts to get 0, which is the number of bits to represent a number. core_bits is func((EAX >> 26) & 0x3F), logical_CPU_bits is func(count - 1) - core_bits. Note that the initial ecx value does not affect EAX bits 26 to 31.

If "CPUID eax=0x00000004" isn't supported, then it's a single-core CPU with hyper-threading. Use the "count" field you got earlier (from CPUID eax=0x00000001) rounded up to the next power of 2 to determine "logical_CPU_bits", then set "core_bits = 0".

AMD: CPUID eax=0x80000008

For AMD CPUs (and not Intel), use this function to determine "core_bits". First check ECX bits 12 to 15, if it is not zero, then it contains "core_bits". Otherwise, use ECX bits 0 to 7 to determine the number of cores, round it up to the next power of 2 and use it to determine "core_bits".

Now that you've got "core_bits"; use the "count" field you got earlier (from CPUID eax=0x00000001) and shift it right by "core_bits" bits. Then round the result up to the next power of 2 and use it to determine "logical_CPU_bits".

AMD: CPUID eax=0x00000001

For AMD CPUs (and not Intel), if CPUID eax=0x80000008 isn't supported just assume that the "count" field you got earlier (from CPUID eax=0x00000001) is the number of cores; and round it up to the next power of 2 and use it to determine "core_bits". In this case set "logical_CPU_bits" to zero.

See Also

Articles