Multitasking Systems: Difference between revisions
[unchecked revision] | [unchecked revision] |
(Added link to scheduling algorithms, because I keep looking for it here.) |
(General tidy up, removal of "task switch code uses IRET" in the example code, addition of new example code and explanation of differences dependent on how kernel stacks are done) |
||
Line 1: | Line 1: | ||
Multitasking Systems are operating systems (or even system extensions) which |
Multitasking Systems are operating systems (or even system extensions) which share available processor time between multiple tasks automatically. |
||
==Types of Multitasking Systems== |
==Types of Multitasking Systems== |
||
Line 6: | Line 6: | ||
===Cooperative Multitasking=== |
===Cooperative Multitasking=== |
||
For cooperative multitasking, a task uses the CPU until it voluntarily gives up the CPU (e.g. yields or exits). Examples of cooperative multitasking systems are pre-X MacOS, or Windows 3.x. |
|||
In some single language cooperative multitasking systems, such as Oberon and ruby, the compiler/interpreter automatically ensures that the code will periodically yield control; it allows such program to run |
In some single language cooperative multitasking systems, such as Oberon and ruby, the compiler/interpreter automatically ensures that the code will periodically yield control; it allows such program to run multiple threads on operating systems such as DOS. |
||
===Preemptive Multitasking=== |
===Preemptive Multitasking=== |
||
In a preemptive multitasking system, |
In a preemptive multitasking system, some task switches are not caused by the currently running task voluntarily giving up the CPU, and are done for one or more reasons (including when the task consumed the time it was given and/or when a higher priority task needed the CPU). |
||
Examples include almost all modern operating systems - e.g. Linux, *BSD, post-3.x Windows, BeOS, AmigaOS. |
|||
⚫ | You can further subdivide these systems into those who can preempt |
||
⚫ | You can further subdivide these systems into those who can preempt tasks, and those who can preempt ''the kernel itself''. Linux (pre-2.6 kernel) is an example of the former, while e.g. AmigaOS is an example for the latter. This is a major concern for multimedia applications or any "soft" [Real-Time Systems] because a non-preemptive kernel introduces latencies that can ruin such "near real-time" performance. |
||
==How does it work== |
==How does it work== |
||
When a task (process, thread, application, program, ...) is using the CPU, some of the task's state is stored in the CPU - things like the contents of registers, the address of the task's current stack, the current instruction pointer, which virtual address space the task is using, etc. |
|||
You have programs running. Each program has some binary code to be executed by the processor and an execution context made of e.g. registers state, [[stack]] content, etc. |
|||
A task switch involves saving any state that the previous task had in the CPU (that the task may have modified) and then loading any state that the next task expects to be in the CPU. Typically a kernel stores at some of this state on the task's (kernel) stack and some of the state in data structure/s in kernel-space (e.g. in a "task control block" structure). |
|||
Since you have a single CPU, you have a single program executed at a given moment and its execution context is the state of the cpu's registers while you may have plenty of programs sleeping, waiting for their turn with their context saved in OS's datastructures, right ? |
|||
There are two very different ways of implementing multi-tasking, depending on how the kernel stacks are done. |
|||
Now, the OS has set up a timer interrupt, which causes the OS call a specific interrupt service routine at regular interval. When the timeslot for the current program runs out, the routine will save the current CPU context into a datastructure, select a new program to be run for the next timeslot, and load the CPU registers with the values that were saved in that process's datastructure. |
|||
===Kernel Stack Per Task=== |
|||
If you still want to figure out, imagine a machine with an accumulator and a stack pointer. You can save the machine state, switch and restore another machine state with |
|||
For most kernels, each task has its own kernel stack where a task moves between user-space (and user stack) and kernel (and kernel stack). For this case, task switches only ever happen when something (IRQ, exception, system call) has already caused the task to switch to kernel code and kernel stack; which means that task switches can only ever occur between tasks that are running kernel code. |
|||
For example (NASM syntax, 32-bit 80x86, "cdecl" calling convention, single-CPU only/globals used for "per CPU" variables, no support for FPU/MMX/SSE/AVX, untested): |
|||
<pre> |
<pre> |
||
;C declaration: |
|||
⚫ | |||
; void switch_tasks(thread_control_block *next_thread); |
|||
LOAD [current_pid] ;; load the current PID in the accumulator |
|||
; |
|||
STORE_SP [context_table+ACC] ;; save the stack of the suspended task |
|||
;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns |
|||
switch_tasks: |
|||
;; get somehow the PID for the next task into ACC, without using the stack |
|||
;Save previous task's state |
|||
LOAD_SP [context_table+ACC] ;; load the stack of the chosen task |
|||
STORE [current_pid] ;; store it's PID |
|||
POP ;; get back the accumulator's content |
|||
IRET ;; end of interrupt |
|||
</pre> |
|||
;Notes: |
|||
Now imagine two programs on that machine, e.g. |
|||
; For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again |
|||
; EIP is already saved on the stack y the caller's "CALL" instruction |
|||
; The task isn't able to change CR3 so it doesn't need to be saved |
|||
; Segment registers are constants (while running kernel code) so they don't need to be saved |
|||
push ebx |
|||
<pre> |
|||
push esi |
|||
IMM 0 ;; into accumulator |
|||
push edi |
|||
here: |
|||
push ebp |
|||
INC ;; accumulator++ |
|||
JMP here |
|||
</pre> |
|||
mov edi,[current_task_TCB] ;edi = address of the previous task's "thread control block" |
|||
and |
|||
mov [edi+TCB.ESP],esp ;Save ESP for previous task's kernel stack in the thread's TCB |
|||
;Load next task's state |
|||
<pre> |
|||
IMM 0 |
|||
mov esi,[esp+(4+1)*4] ;esi = address of the next task's "thread control block" (parameter passed on stack) |
|||
there: |
|||
mov [current_task_TCB],esi ;Current task's TCB is the next task TCB |
|||
DEC ;; accumulator -- |
|||
JMP there |
|||
mov esp,[esi+TCB.ESP] ;Load ESP for next task's kernel stack from the thread's TCB |
|||
mov eax,[esi+TCB.CR3] ;eax = address of page directory for next task |
|||
mov ebx,[esi+TCB.ESP0] ;ebx = address for the top of the next task's kernel stack |
|||
mov [TSS.ESP0],ebx ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes) |
|||
mov ecx,cr3 ;ecx = previous task's virtual address space |
|||
cmp eax,ecx ;Does the virtual address space need to being changed? |
|||
je .doneVAS ; no, virtual address space is the same, so don't reload it and cause TLB flushes |
|||
mov cr3,eax ; yes, load the next task's virtual address space |
|||
.doneVAS: |
|||
pop ebp |
|||
pop edi |
|||
pop esi |
|||
pop ebx |
|||
⚫ | |||
</pre> |
</pre> |
||
Note that the low-level task switch code should be written in pure assembly. If it's written in inline assembly in the middle of a C function then the compiler may add its own "function prologue" and "function epilogue" code that changes the layout of the stack. This is important because when a new task is being created the kernel needs to put values on the new task's kernel stack to match the values that the "switch_tasks" expects to pop off of the new task's stack. |
|||
Sketch on a papersheet the memory of the machine, with the pid_table, the two stacks , the current_pid variable, and then go. You should be able to emulate the behavior of that machine if --say-- you have a interrupt every 10+some_number_you_get_by_rolling_a_dice machine instructions. |
|||
Also note that (for kernels primarily written in higher level languages) the low-level task switch code may be the only part of the scheduler that needs to be rewritten when porting the kernel to a different architecture. |
|||
This low-level task switch code may be called directly from other places in the kernel (e.g. when a task is being pre-empted and the caller knows exactly which task is pre-empting the current task); but may also be called by higher-level code that selects a task to switch to (e.g. when the currently running task is being terminated and the kernel doesn't know which task to switch to). |
|||
===Kernel Stack Per CPU=== |
|||
For some rare kernels (mostly only a few micro-kernels), there is only one kernel stack per CPU (and each task doesn't have its own kernel stack). In this case task switching becomes radically different because the kernel's stack can't be used to store any task's state. Instead, the user-space state has to be saved in some kind of data structure ("thread control block") immediately after any switch from user-space to kernel-space; and the user-space state has to be loaded from that structure immediately before any switch from kernel-space to user-space. |
|||
In this case; typically the kernel has a "thread to return to" variable (for each CPU); and any kernel code can modify this variable when it decides a task switch should happen. |
|||
Note that "kernel stack per CPU" is not recommended for beginners; partly because it's only beneficial in rare cases (e.g. it'd be silly for a monolithic kernel), and partly because it becomes a lot more complex as soon as you try to overcome the latency problems caused by having a non-preemptable kernel (e.g. expensive operations, like creating a new process, may need to be broken up into multiple smaller/less expensive pieces that are separated by explicit "preemption points"). |
|||
And you'll see it can continue working on program 1 after it has been interrupted for 10+ instructions by program 2. |
|||
Of course "10+dice" is not enough to get decent performances. On modern systems, your timeslice is usually of a few milliseconds, which makes millions of instructions! Moreover, you don't always switch when the timer arise (because you might want the system clock more accurate than the switching rate) and you can switch on other events (e.g. because a program needs to wait for something or because something important like the end of a disk request happened). |
|||
==See Also== |
==See Also== |
Revision as of 05:49, 7 August 2018
Multitasking Systems are operating systems (or even system extensions) which share available processor time between multiple tasks automatically.
Types of Multitasking Systems
There are many ways multitasking can be achieved.
Cooperative Multitasking
For cooperative multitasking, a task uses the CPU until it voluntarily gives up the CPU (e.g. yields or exits). Examples of cooperative multitasking systems are pre-X MacOS, or Windows 3.x.
In some single language cooperative multitasking systems, such as Oberon and ruby, the compiler/interpreter automatically ensures that the code will periodically yield control; it allows such program to run multiple threads on operating systems such as DOS.
Preemptive Multitasking
In a preemptive multitasking system, some task switches are not caused by the currently running task voluntarily giving up the CPU, and are done for one or more reasons (including when the task consumed the time it was given and/or when a higher priority task needed the CPU).
Examples include almost all modern operating systems - e.g. Linux, *BSD, post-3.x Windows, BeOS, AmigaOS.
You can further subdivide these systems into those who can preempt tasks, and those who can preempt the kernel itself. Linux (pre-2.6 kernel) is an example of the former, while e.g. AmigaOS is an example for the latter. This is a major concern for multimedia applications or any "soft" [Real-Time Systems] because a non-preemptive kernel introduces latencies that can ruin such "near real-time" performance.
How does it work
When a task (process, thread, application, program, ...) is using the CPU, some of the task's state is stored in the CPU - things like the contents of registers, the address of the task's current stack, the current instruction pointer, which virtual address space the task is using, etc.
A task switch involves saving any state that the previous task had in the CPU (that the task may have modified) and then loading any state that the next task expects to be in the CPU. Typically a kernel stores at some of this state on the task's (kernel) stack and some of the state in data structure/s in kernel-space (e.g. in a "task control block" structure).
There are two very different ways of implementing multi-tasking, depending on how the kernel stacks are done.
Kernel Stack Per Task
For most kernels, each task has its own kernel stack where a task moves between user-space (and user stack) and kernel (and kernel stack). For this case, task switches only ever happen when something (IRQ, exception, system call) has already caused the task to switch to kernel code and kernel stack; which means that task switches can only ever occur between tasks that are running kernel code.
For example (NASM syntax, 32-bit 80x86, "cdecl" calling convention, single-CPU only/globals used for "per CPU" variables, no support for FPU/MMX/SSE/AVX, untested):
;C declaration: ; void switch_tasks(thread_control_block *next_thread); ; ;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns switch_tasks: ;Save previous task's state ;Notes: ; For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again ; EIP is already saved on the stack y the caller's "CALL" instruction ; The task isn't able to change CR3 so it doesn't need to be saved ; Segment registers are constants (while running kernel code) so they don't need to be saved push ebx push esi push edi push ebp mov edi,[current_task_TCB] ;edi = address of the previous task's "thread control block" mov [edi+TCB.ESP],esp ;Save ESP for previous task's kernel stack in the thread's TCB ;Load next task's state mov esi,[esp+(4+1)*4] ;esi = address of the next task's "thread control block" (parameter passed on stack) mov [current_task_TCB],esi ;Current task's TCB is the next task TCB mov esp,[esi+TCB.ESP] ;Load ESP for next task's kernel stack from the thread's TCB mov eax,[esi+TCB.CR3] ;eax = address of page directory for next task mov ebx,[esi+TCB.ESP0] ;ebx = address for the top of the next task's kernel stack mov [TSS.ESP0],ebx ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes) mov ecx,cr3 ;ecx = previous task's virtual address space cmp eax,ecx ;Does the virtual address space need to being changed? je .doneVAS ; no, virtual address space is the same, so don't reload it and cause TLB flushes mov cr3,eax ; yes, load the next task's virtual address space .doneVAS: pop ebp pop edi pop esi pop ebx ret ;Load next task's EIP from its kernel stack
Note that the low-level task switch code should be written in pure assembly. If it's written in inline assembly in the middle of a C function then the compiler may add its own "function prologue" and "function epilogue" code that changes the layout of the stack. This is important because when a new task is being created the kernel needs to put values on the new task's kernel stack to match the values that the "switch_tasks" expects to pop off of the new task's stack.
Also note that (for kernels primarily written in higher level languages) the low-level task switch code may be the only part of the scheduler that needs to be rewritten when porting the kernel to a different architecture.
This low-level task switch code may be called directly from other places in the kernel (e.g. when a task is being pre-empted and the caller knows exactly which task is pre-empting the current task); but may also be called by higher-level code that selects a task to switch to (e.g. when the currently running task is being terminated and the kernel doesn't know which task to switch to).
Kernel Stack Per CPU
For some rare kernels (mostly only a few micro-kernels), there is only one kernel stack per CPU (and each task doesn't have its own kernel stack). In this case task switching becomes radically different because the kernel's stack can't be used to store any task's state. Instead, the user-space state has to be saved in some kind of data structure ("thread control block") immediately after any switch from user-space to kernel-space; and the user-space state has to be loaded from that structure immediately before any switch from kernel-space to user-space.
In this case; typically the kernel has a "thread to return to" variable (for each CPU); and any kernel code can modify this variable when it decides a task switch should happen.
Note that "kernel stack per CPU" is not recommended for beginners; partly because it's only beneficial in rare cases (e.g. it'd be silly for a monolithic kernel), and partly because it becomes a lot more complex as soon as you try to overcome the latency problems caused by having a non-preemptable kernel (e.g. expensive operations, like creating a new process, may need to be broken up into multiple smaller/less expensive pieces that are separated by explicit "preemption points").
See Also
Articles
Threads
External Links
- Computer Multitasking on Wikipedia.