Multiprocessing: Difference between revisions

← Older edit

Multiprocessing (view source)

Revision as of 14:13, 6 April 2017

122 bytes added , 7 years ago

Add information about Ryzen SMT

[unchecked revision]

VisualWikitext

Anonymous user

osdev>Zebmccorkle

Revision as of 14:53, 13 May 2010 (view source) osdev>Alfaomega08 No edit summary ← Older edit		Latest revision as of 14:13, 6 April 2017 (view source) osdev>Zebmccorkle (Add information about Ryzen SMT)
(6 intermediate revisions by 4 users not shown)
Line 1: Multiprocessing involves more than one CPU in a computer. There are a variety of ways that multiprocessing may be implemented in hardware. == SMP (Symmetric Multiprocessing) == In theory [[SMP]] means that all CPUs are identical. In practice there may be minor variations between CPUs (e.g. different revisions of the same family of CPUs), but these differences can usually be ignored by operating system software. SMP is the easiest form of multiprocessing for software to support. SMP includes systems with CPUs implemented in separate chips, systems with CPUs implemented in the same chip (multi-core) and combinations (e.g. a system with 2 separate quad core chips, with a total of 8 physical CPUs). == SMT (Simultaneous Multithreading) == Normally a CPU executes one stream of instructions, where at any time some parts of the CPU may be busy and other parts of the CPU may be idle (e.g. while doing integer instructions parts of the CPU that do floating point calculations may be unused), and where the entire CPU may be idle in some cases (e.g. when the CPU needs to wait until data arrives from RAM before it can continue). The idea behind [[SMT]] is to execute more than one stream of instructions, which increases overall performance by reducing the chance that ~~each~~ parts of the CPU isbecome idle. From the operating system's perspective, SMT looks a lot like SMP, except for performance characteristics. For SMT a single physical CPU executes multiple "threads" (or logical CPUs), and the performance of one thread/logical CPU is effected by work done by other threads/logical CPUs in the same physical CPU/core (as the physical CPU's resources are shared). For the 80x86 architecture, Intel iswas the ~~only~~first manufacturer ~~that~~to ~~has implemented~~implement SMT (called "hyperthreading" by Intel) ~~so far~~. When Intel first introduced hyperthreading it got some negative publicity due to performance problems; partly caused by the way it was implemented and partly because a lot of software wasn't designed for it (single-threaded) and operating system schedulers weren't optimized for it. Intel stopped using SMT/hyper-threading for a while; but since then software has caught up and later CPUs (Atom, Core i7) show significant performance improvements from SMT (up to about 30% performance improvement on multi-threaded loads). AMD's Ryzen platform, released in 2017, introduced SMT to a non-Intel CPU for the first time. == AMP (Asymmetric Multiprocessing) == [[AMP]] means that at least one of the CPUs are different. For example, an embedded system might have an ARM CPU and a special purpose CPU for digital signal processing. AMP is rare for desktop/server computers, however this may change in future as Intel has plans to put stream processors ("Larrabee", mainly intended as a GPGPU to be used for graphics, physics and HPC) in the same system as general purpose 80x86 CPUs (possibly even in the same multi-core chip). Possibly, at some time in the future, it may be necessary for a general purpose 80x86 operating system to have 2 schedulers - one for traditional 80x86 CPUs and another for stream processors. == NUMA (Non-Uniform Memory Access) == For SMP the time it takes to access a resource (e.g. RAM) is the same for all CPUs. For [[NUMA]] this isn't the case, and some CPUs may be able to access a resource faster than other CPUs. The name "NUMA" is misleading as RAM is not the only resource effected - it's entirely possible for some CPUs to have faster access to other devices (e.g. hard disk, video, ethernet, etc) than other CPUs. Examples of NUMA are the AMD opteron and Intel Core i7 processors; where you might have 2 quad core CPUs and 2 sets of RAM chips, where each quad core CPU is directly connected to one set RAM chips. In this case for a CPU to access RAM that is not directly connected to it, it needs to ask the other quad core CPU to access the RAM on its behalf (which causes extra latency). Line 47 ⟶ 44: For most NUMA 80x86 systems the NUMA ratio is quite low (around 1.2), and an operating system could treat these systems as UMA without severe performance problems. == Topology == Line 54 ⟶ 50: This map may include the contents of each NUMA domain (number of CPUs and their IDs, number of RAM areas and their sizes and locations in the physical address space, number of I/O hubs and their identification, etc), plus performance data (e.g. tables that can be used to determine the relative cost of accessing a specific location in RAM, a specific I/O hub and/or another CPU from any CPU; and/or information about cache sharing between CPUs). ==See Also== ===Threads=== [[Category:OS theory]] [[Category:Multiprocessing]]