Multitasking Systems: Difference between revisions

From OSDev.wiki
Jump to navigation Jump to search
[unchecked revision][unchecked revision]
Content added Content deleted
(Added link to scheduling algorithms, because I keep looking for it here.)
(General tidy up, removal of "task switch code uses IRET" in the example code, addition of new example code and explanation of differences dependent on how kernel stacks are done)
Line 1: Line 1:
Multitasking Systems are operating systems (or even system extensions) which divide available processor time between several tasks automatically, creating the illusion that the tasks are running simultaneously.
Multitasking Systems are operating systems (or even system extensions) which share available processor time between multiple tasks automatically.


==Types of Multitasking Systems==
==Types of Multitasking Systems==
Line 6: Line 6:
===Cooperative Multitasking===
===Cooperative Multitasking===


This concept runs an application until it exits or yields control back to the OS. Examples for cooperative multitasking systems are pre-X MacOS, or Windows 3.x.
For cooperative multitasking, a task uses the CPU until it voluntarily gives up the CPU (e.g. yields or exits). Examples of cooperative multitasking systems are pre-X MacOS, or Windows 3.x.


In some single language cooperative multitasking systems, such as Oberon and ruby, the compiler/interpreter automatically ensures that the code will periodically yield control; it allows such program to run in multi-threading on non-preemptive OS such as DOS. As yielding isn't necessary anymore, I'm not sure if we can still say it's cooperative multitasking.
In some single language cooperative multitasking systems, such as Oberon and ruby, the compiler/interpreter automatically ensures that the code will periodically yield control; it allows such program to run multiple threads on operating systems such as DOS.


===Preemptive Multitasking===
===Preemptive Multitasking===


In a preemptive multitasking system, the OS can take away control from (preempt) an application after a time slice is used up or a signal occurred. An example would be e.g. Linux, *BSD, post-3.x Windows, BeOS, or AmigaOS.
In a preemptive multitasking system, some task switches are not caused by the currently running task voluntarily giving up the CPU, and are done for one or more reasons (including when the task consumed the time it was given and/or when a higher priority task needed the CPU).


Examples include almost all modern operating systems - e.g. Linux, *BSD, post-3.x Windows, BeOS, AmigaOS.
You can further subdivide these systems into those who can preempt applications, and those who can preempt ''the kernel itself''. Linux (pre-2.6 kernel) is an example of the former, while e.g. AmigaOS is an example for the latter. This is a major concern for multimedia applications or any "soft" [Real-Time Systems] because a non-preemptive kernel introduces latencies that can ruin such "near real-time" performance.

You can further subdivide these systems into those who can preempt tasks, and those who can preempt ''the kernel itself''. Linux (pre-2.6 kernel) is an example of the former, while e.g. AmigaOS is an example for the latter. This is a major concern for multimedia applications or any "soft" [Real-Time Systems] because a non-preemptive kernel introduces latencies that can ruin such "near real-time" performance.


==How does it work==
==How does it work==


When a task (process, thread, application, program, ...) is using the CPU, some of the task's state is stored in the CPU - things like the contents of registers, the address of the task's current stack, the current instruction pointer, which virtual address space the task is using, etc.
You have programs running. Each program has some binary code to be executed by the processor and an execution context made of e.g. registers state, [[stack]] content, etc.


A task switch involves saving any state that the previous task had in the CPU (that the task may have modified) and then loading any state that the next task expects to be in the CPU. Typically a kernel stores at some of this state on the task's (kernel) stack and some of the state in data structure/s in kernel-space (e.g. in a "task control block" structure).
Since you have a single CPU, you have a single program executed at a given moment and its execution context is the state of the cpu's registers while you may have plenty of programs sleeping, waiting for their turn with their context saved in OS's datastructures, right ?


There are two very different ways of implementing multi-tasking, depending on how the kernel stacks are done.
Now, the OS has set up a timer interrupt, which causes the OS call a specific interrupt service routine at regular interval. When the timeslot for the current program runs out, the routine will save the current CPU context into a datastructure, select a new program to be run for the next timeslot, and load the CPU registers with the values that were saved in that process's datastructure.


===Kernel Stack Per Task===
If you still want to figure out, imagine a machine with an accumulator and a stack pointer. You can save the machine state, switch and restore another machine state with

For most kernels, each task has its own kernel stack where a task moves between user-space (and user stack) and kernel (and kernel stack). For this case, task switches only ever happen when something (IRQ, exception, system call) has already caused the task to switch to kernel code and kernel stack; which means that task switches can only ever occur between tasks that are running kernel code.

For example (NASM syntax, 32-bit 80x86, "cdecl" calling convention, single-CPU only/globals used for "per CPU" variables, no support for FPU/MMX/SSE/AVX, untested):


<pre>
<pre>
;C declaration:
PUSH ;; put the accumulator's content on stack
; void switch_tasks(thread_control_block *next_thread);
LOAD [current_pid] ;; load the current PID in the accumulator
;
STORE_SP [context_table+ACC] ;; save the stack of the suspended task
;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns


switch_tasks:
;; get somehow the PID for the next task into ACC, without using the stack


;Save previous task's state
LOAD_SP [context_table+ACC] ;; load the stack of the chosen task
STORE [current_pid] ;; store it's PID
POP ;; get back the accumulator's content
IRET ;; end of interrupt
</pre>


;Notes:
Now imagine two programs on that machine, e.g.
; For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again
; EIP is already saved on the stack y the caller's "CALL" instruction
; The task isn't able to change CR3 so it doesn't need to be saved
; Segment registers are constants (while running kernel code) so they don't need to be saved


push ebx
<pre>
push esi
IMM 0 ;; into accumulator
push edi
here:
push ebp
INC ;; accumulator++
JMP here
</pre>


mov edi,[current_task_TCB] ;edi = address of the previous task's "thread control block"
and
mov [edi+TCB.ESP],esp ;Save ESP for previous task's kernel stack in the thread's TCB


;Load next task's state
<pre>

IMM 0
mov esi,[esp+(4+1)*4] ;esi = address of the next task's "thread control block" (parameter passed on stack)
there:
mov [current_task_TCB],esi ;Current task's TCB is the next task TCB
DEC ;; accumulator --

JMP there
mov esp,[esi+TCB.ESP] ;Load ESP for next task's kernel stack from the thread's TCB
mov eax,[esi+TCB.CR3] ;eax = address of page directory for next task
mov ebx,[esi+TCB.ESP0] ;ebx = address for the top of the next task's kernel stack
mov [TSS.ESP0],ebx ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes)
mov ecx,cr3 ;ecx = previous task's virtual address space

cmp eax,ecx ;Does the virtual address space need to being changed?
je .doneVAS ; no, virtual address space is the same, so don't reload it and cause TLB flushes
mov cr3,eax ; yes, load the next task's virtual address space
.doneVAS:

pop ebp
pop edi
pop esi
pop ebx

ret ;Load next task's EIP from its kernel stack
</pre>
</pre>


Note that the low-level task switch code should be written in pure assembly. If it's written in inline assembly in the middle of a C function then the compiler may add its own "function prologue" and "function epilogue" code that changes the layout of the stack. This is important because when a new task is being created the kernel needs to put values on the new task's kernel stack to match the values that the "switch_tasks" expects to pop off of the new task's stack.
Sketch on a papersheet the memory of the machine, with the pid_table, the two stacks , the current_pid variable, and then go. You should be able to emulate the behavior of that machine if --say-- you have a interrupt every 10+some_number_you_get_by_rolling_a_dice machine instructions.

Also note that (for kernels primarily written in higher level languages) the low-level task switch code may be the only part of the scheduler that needs to be rewritten when porting the kernel to a different architecture.

This low-level task switch code may be called directly from other places in the kernel (e.g. when a task is being pre-empted and the caller knows exactly which task is pre-empting the current task); but may also be called by higher-level code that selects a task to switch to (e.g. when the currently running task is being terminated and the kernel doesn't know which task to switch to).


===Kernel Stack Per CPU===

For some rare kernels (mostly only a few micro-kernels), there is only one kernel stack per CPU (and each task doesn't have its own kernel stack). In this case task switching becomes radically different because the kernel's stack can't be used to store any task's state. Instead, the user-space state has to be saved in some kind of data structure ("thread control block") immediately after any switch from user-space to kernel-space; and the user-space state has to be loaded from that structure immediately before any switch from kernel-space to user-space.

In this case; typically the kernel has a "thread to return to" variable (for each CPU); and any kernel code can modify this variable when it decides a task switch should happen.


Note that "kernel stack per CPU" is not recommended for beginners; partly because it's only beneficial in rare cases (e.g. it'd be silly for a monolithic kernel), and partly because it becomes a lot more complex as soon as you try to overcome the latency problems caused by having a non-preemptable kernel (e.g. expensive operations, like creating a new process, may need to be broken up into multiple smaller/less expensive pieces that are separated by explicit "preemption points").
And you'll see it can continue working on program 1 after it has been interrupted for 10+ instructions by program 2.


Of course "10+dice" is not enough to get decent performances. On modern systems, your timeslice is usually of a few milliseconds, which makes millions of instructions! Moreover, you don't always switch when the timer arise (because you might want the system clock more accurate than the switching rate) and you can switch on other events (e.g. because a program needs to wait for something or because something important like the end of a disk request happened).


==See Also==
==See Also==

Revision as of 05:49, 7 August 2018

Multitasking Systems are operating systems (or even system extensions) which share available processor time between multiple tasks automatically.

Types of Multitasking Systems

There are many ways multitasking can be achieved.

Cooperative Multitasking

For cooperative multitasking, a task uses the CPU until it voluntarily gives up the CPU (e.g. yields or exits). Examples of cooperative multitasking systems are pre-X MacOS, or Windows 3.x.

In some single language cooperative multitasking systems, such as Oberon and ruby, the compiler/interpreter automatically ensures that the code will periodically yield control; it allows such program to run multiple threads on operating systems such as DOS.

Preemptive Multitasking

In a preemptive multitasking system, some task switches are not caused by the currently running task voluntarily giving up the CPU, and are done for one or more reasons (including when the task consumed the time it was given and/or when a higher priority task needed the CPU).

Examples include almost all modern operating systems - e.g. Linux, *BSD, post-3.x Windows, BeOS, AmigaOS.

You can further subdivide these systems into those who can preempt tasks, and those who can preempt the kernel itself. Linux (pre-2.6 kernel) is an example of the former, while e.g. AmigaOS is an example for the latter. This is a major concern for multimedia applications or any "soft" [Real-Time Systems] because a non-preemptive kernel introduces latencies that can ruin such "near real-time" performance.

How does it work

When a task (process, thread, application, program, ...) is using the CPU, some of the task's state is stored in the CPU - things like the contents of registers, the address of the task's current stack, the current instruction pointer, which virtual address space the task is using, etc.

A task switch involves saving any state that the previous task had in the CPU (that the task may have modified) and then loading any state that the next task expects to be in the CPU. Typically a kernel stores at some of this state on the task's (kernel) stack and some of the state in data structure/s in kernel-space (e.g. in a "task control block" structure).

There are two very different ways of implementing multi-tasking, depending on how the kernel stacks are done.

Kernel Stack Per Task

For most kernels, each task has its own kernel stack where a task moves between user-space (and user stack) and kernel (and kernel stack). For this case, task switches only ever happen when something (IRQ, exception, system call) has already caused the task to switch to kernel code and kernel stack; which means that task switches can only ever occur between tasks that are running kernel code.

For example (NASM syntax, 32-bit 80x86, "cdecl" calling convention, single-CPU only/globals used for "per CPU" variables, no support for FPU/MMX/SSE/AVX, untested):

;C declaration:
;   void switch_tasks(thread_control_block *next_thread);
;
;WARNING: Caller is expected to disable IRQs before calling, and enable IRQs again after function returns

switch_tasks:

    ;Save previous task's state

    ;Notes:
    ;  For cdecl; EAX, ECX, and EDX are already saved by the caller and don't need to be saved again
    ;  EIP is already saved on the stack y the caller's "CALL" instruction
    ;  The task isn't able to change CR3 so it doesn't need to be saved
    ;  Segment registers are constants (while running kernel code) so they don't need to be saved

    push ebx
    push esi
    push edi
    push ebp

    mov edi,[current_task_TCB]    ;edi = address of the previous task's "thread control block"
    mov [edi+TCB.ESP],esp         ;Save ESP for previous task's kernel stack in the thread's TCB

    ;Load next task's state

    mov esi,[esp+(4+1)*4]         ;esi = address of the next task's "thread control block" (parameter passed on stack)
    mov [current_task_TCB],esi    ;Current task's TCB is the next task TCB

    mov esp,[esi+TCB.ESP]         ;Load ESP for next task's kernel stack from the thread's TCB
    mov eax,[esi+TCB.CR3]         ;eax = address of page directory for next task
    mov ebx,[esi+TCB.ESP0]        ;ebx = address for the top of the next task's kernel stack
    mov [TSS.ESP0],ebx            ;Adjust the ESP0 field in the TSS (used by CPU for for CPL=3 -> CPL=0 privilege level changes)
    mov ecx,cr3                   ;ecx = previous task's virtual address space

    cmp eax,ecx                   ;Does the virtual address space need to being changed?
    je .doneVAS                   ; no, virtual address space is the same, so don't reload it and cause TLB flushes
    mov cr3,eax                   ; yes, load the next task's virtual address space
.doneVAS:

    pop ebp
    pop edi
    pop esi
    pop ebx

    ret                           ;Load next task's EIP from its kernel stack

Note that the low-level task switch code should be written in pure assembly. If it's written in inline assembly in the middle of a C function then the compiler may add its own "function prologue" and "function epilogue" code that changes the layout of the stack. This is important because when a new task is being created the kernel needs to put values on the new task's kernel stack to match the values that the "switch_tasks" expects to pop off of the new task's stack.

Also note that (for kernels primarily written in higher level languages) the low-level task switch code may be the only part of the scheduler that needs to be rewritten when porting the kernel to a different architecture.

This low-level task switch code may be called directly from other places in the kernel (e.g. when a task is being pre-empted and the caller knows exactly which task is pre-empting the current task); but may also be called by higher-level code that selects a task to switch to (e.g. when the currently running task is being terminated and the kernel doesn't know which task to switch to).


Kernel Stack Per CPU

For some rare kernels (mostly only a few micro-kernels), there is only one kernel stack per CPU (and each task doesn't have its own kernel stack). In this case task switching becomes radically different because the kernel's stack can't be used to store any task's state. Instead, the user-space state has to be saved in some kind of data structure ("thread control block") immediately after any switch from user-space to kernel-space; and the user-space state has to be loaded from that structure immediately before any switch from kernel-space to user-space.

In this case; typically the kernel has a "thread to return to" variable (for each CPU); and any kernel code can modify this variable when it decides a task switch should happen.

Note that "kernel stack per CPU" is not recommended for beginners; partly because it's only beneficial in rare cases (e.g. it'd be silly for a monolithic kernel), and partly because it becomes a lot more complex as soon as you try to overcome the latency problems caused by having a non-preemptable kernel (e.g. expensive operations, like creating a new process, may need to be broken up into multiple smaller/less expensive pieces that are separated by explicit "preemption points").


See Also

Articles

Threads

External Links