Brendan's Multi-tasking Tutorial: Difference between revisions

m
Bot: Replace deprecated source tag with syntaxhighlight
[unchecked revision][unchecked revision]
(Added some stuff for after multi-tasking is working)
m (Bot: Replace deprecated source tag with syntaxhighlight)
 
(9 intermediate revisions by 6 users not shown)
Line 5:
Note: I use the word "task" as generic term to mean "the thing that a scheduler schedules". For operating systems that don't support multi-threading, a task might be a process. For operating systems that do support multi-threading a task would be a thread.
 
==Step 1: InitialisationInitialization and "Switch_to_task()"==
 
Start by creating some kind of data structure that will keep track of a task's information. Typically this is called a "process control block" or "thread control block". At a minimum, this structure needs to contain:
Line 30:
For example (NASM syntax, 32-bit 80x86, "cdecl" calling convention, single-CPU only/globals used for "per CPU" variables, no support for FPU/MMX/SSE/AVX, untested):
 
<syntaxhighlight lang="asm">
<pre>
;C declaration:
; void switch_to_task(thread_control_block *next_thread);
Line 76:
 
ret ;Load next task's EIP from its kernel stack
</syntaxhighlight>
</pre>
 
Next; write a function to create a new kernel task. This is a little bit like the code you already wrote for "initialise_multitasking()" (because it would allocate some memory for the structure used to keep track of the task's information and fill it in); except that you will need to allocate a new kernel stack for the new task and put values on the new kernel stack to match the values that your "switch_to_task(task)" function expects to pop off the stack after switching to the task. Don't forget that the values on the new kernel stack will include a "return EIP" (the address that the "switch_to_task(task)" function will return to), which should come from an input parameter of the "create_kernel_task()" function (e.g. like "create_kernel_task(void * startingEIP)" except probably with a function pointer instead of a void pointer if you can remember the correct syntax for that).
Line 89:
Having tasks that explicitly switch to each other is a little inconvenient, and it'd be nice if the kernel could automatically select a task to switch to next, so the next step is to write a function that does this.
 
``''Note: If you're confident in your abilities you might be tempted to skip this step and go directly to "Step 4: "Schedule()" version 2" below; but this step is so simple that skipping it won't save much time and it's probably better to get a little experience with "extremely simple" before moving on to "very simple".``''
 
The code to select which task to give CPU time to next has a huge effect on the nature and "feel" of the OS; and it's possible for this code to become very complicated for this reason. However, it's better to start simple ("learn to walk before you run") so we're going to start with the simplest (and most awful) code possible. More specifically, we're going to start with a cooperative round robin scheduler; and then (later on) we're going to replace it with something a little more complex (that is a lot less awful).
Line 99:
Essentially; you just need to set the "next_task" field in the data structure for each task's information in both the code to initialise multi-tasking and the code to create a new task.
 
Once that's done, you can write a "schedule()" function that selects a task to switch to and then switches to the selected task. You only need single line of code to select the next task, like "selected_task = current_task_TCB->next_task;" and you already have a function to switch to a specific task; and both of these things can be combined. In other words, the "Schedule()" function can contain a single likeline of code like "switch_to_task(current_task_TCB->next_task);".
 
As soon as you've finished implementing "schedule()" you can test it using the kernel tasks you were using for testing earlier, just by replacing the calls to "switch_to_task()" with calls to "schedule()". If you had multiple additional kernel tasks with different code for each kernel task (so it could switch to the right next task) you can also delete the duplicates so that you end up with multiple additional kernel tasks executing the same code.
 
 
==Step 3: Time Accounting (Optional)==
Line 114 ⟶ 113:
For time accounting, when certain things happen you want to subtract the counter's current value from the value it had last time (to determine how much time elapsed since last time); then add the time elapsed to the amount of time that the current task has consumed and update the time the counter was read last, a little bit like this:
 
<syntaxhighlight lang="c">
<pre>
void update_time_used(void) {
current_count = read_counter();
Line 121 ⟶ 120:
current_task_TCB->time_used += elapsed;
}
</syntaxhighlight>
</pre>
 
The first place this needs to be called is during task switches, to update the amount of time that the previously running task consumed before switching to the next task. For extra goodness "update_time_used()" can also be called when someone asks the kernel how much time a task has consumed, so that (e.g.) if a task has been running for 10 ms already then that 10 ms is included in the time a task has consumed.
Line 134 ⟶ 133:
==Step 4: Task States and "Schedule()" Version 2==
 
For a real OS (regardless of which scheduling algorithm/s the scheduler uses) most tasks switches are caused because the currently running task has to wait for something to happen (data received from a different task or storage devices/file system or network, or time timeto pass, or for user to press a key or move the mouse, or for a mutex to become available, etc), or happen because something that a task was waiting for happened causing a task to unblock and preempt the currently running task.
 
To prepare for this, we want to add some kind of "state" to each task so that it's easy to see what a task is doing (e.g. if it's waiting and what it's waiting for).
Line 148 ⟶ 147:
Finally; the "schedule()" function needs to be changed so that it removes the task from the "ready to run" list before doing a task switch, making it something like this:
 
<syntaxhighlight lang="c">
<pre>
void schedule(void) {
if( first_ready_to_run_task != NULL) {
Line 156 ⟶ 155:
}
}
</syntaxhighlight>
</pre>
 
Note that if there's only one task you don't really want to do any task switches (because there's no other tasks to switch to), and in this case the task will be "running" and the list of "ready to run" tasks will be empty. The previous version of "schedule()" was a little less efficient (it would've done a task switch from one task to the same task).
 
 
==Step 5: Race Conditions and Locking Version 1==
Line 167 ⟶ 165:
When there's a single CPU, in theory, this can be done just by disabling IRQs in places that matter. For multiple CPUs this doesn't work (e.g. disabling IRQs on one CPU won't stop a different CPU from modifying the scheduler's list of "ready to run" threads at an unfortunate time) and you need something more robust (e.g. spinlock).
 
We won't be implementing support for multiple CPUs; but that doesn't mean we can't (and shouldn't) make it a little easier to add support for multiple CPUs much much later. For this reason I'm going to pretend that there's a "big scheduler lock" and implement functions to acquire and release this lock, but then I'm not going to actually have a lock and I'm just going to disable and enable IRQs instead.
 
Unfortunately there's a minor problem - what if IRQs are already disabled for some other reason? For assembly language this isn't a problem - you can PUSHFD to save the original flags and disable IRQs when you enter a critical section, and then POPFD the original flags when you leave the critical section. For higher level languages you can't just leave stuff on the stack like this because the compiler will get horribly confused and think that local variables are in the wrong places, etc.
Line 175 ⟶ 173:
The first option is probably more efficient; but we don't really care much about getting the highest performance possible at this stage, and because I want to make it look a little bit like a lock (to make it easier to support multiple CPUs later) I'm going to use the second option. It might look like this:
 
<syntaxhighlight lang="c">
<pre>
int IRQ_disable_counter = 0;
 
Line 193 ⟶ 191:
#endif
}
</syntaxhighlight>
</pre>
 
Once that's done, you need to add a comment to your "schedule()" function and to your "switch_to_task()" function saying that the caller is responsible for making sure the scheduler is locked before calling.
Line 199 ⟶ 197:
Then you need to comply with the new comment by locking and unlocking the scheduler at the right places. Fortunately you only have a few kernel tasks that call "schedule()" directly; so you only need to add locking/unlocking to them. The end result might look a little like this:
 
<syntaxhighlight lang="c">
<pre>
void test_kernel_task(void) {
// unlock_scheduler();
Line 209 ⟶ 207:
}
}
</syntaxhighlight>
</pre>
 
This can be a little confusing depending on your perspective. From the perspective of one task, it locks the scheduler, does a task switch, and then when it gets CPU time again it unlocks the scheduler; and this is fine (the scheduler is always unlocked after its locked). From the perspective of the CPU, one task locks the scheduler then the scheduler does a task switch and a completely different task unlocks the scheduler; and this is also fine (the scheduler is still unlocked after its locked).
Line 216 ⟶ 214:
 
Finally, you should test to make sure your multi-tasking still works the same.
 
 
==Step 6: Blocking and Unblocking Tasks==
Line 226 ⟶ 223:
Blocking a task is incredibly easy (mostly because we prepared for it in the last step by making sure a task in the "running" state is not on the scheduler's "ready to run" list). You only need to set the task's state and then find another task to run, which might look a little like this:
 
<syntaxhighlight lang="c">
<pre>
void block_task(int reason) {
lock_scheduler();
Line 233 ⟶ 230:
unlock_scheduler();
}
</syntaxhighlight>
</pre>
 
Unblocking a task is a little more involved. The first step is to decide if the task being unblocked should (or shouldn't) preempt the currently running task. For now (because we don't have any fancy task priorities or anything) we could never preempt, or we could only preempt if there was only one running task (because it's likely that if there was only one running task that task has been able to use lots of CPU time anyway). For fun (and because it helps to show how preemption would work) I'm going to go with the latter option.
Line 243 ⟶ 240:
The code might look something like this:
 
<syntaxhighlight lang="c">
<pre>
void unblock_task(thread_control_block * task) {
lock_scheduler();
Line 259 ⟶ 256:
unlock_scheduler();
}
</syntaxhighlight>
</pre>
 
The other thing we'll need is a reason to block a task. For this, we can add the ability for a task to voluntarily pause itself until some other task un-pauses it, which is something that might be useful later. To implement this we only need to define a new task state (PAUSED).
Line 276 ⟶ 273:
To fix the problem you need a function that uses absolute time instead of relative time (e.g. "sleep until time since boot is >= 123456"). This might look a little like this:
 
<syntaxhighlight lang="c">
<pre>
delay = 250000000; // 250000000 nanoseconds is 250 milliseconds
next_update = get_time_since_boot();
Line 284 ⟶ 281:
nano_sleep_until(next_update);
}
</syntaxhighlight>
</pre>
 
For code like this, the time between calls to "do_something()" still varies, but if its possible to call "do_something" often enough you can guarantee that these variations will average out and that "do_something()" will be called every 250 ms on average. Of course if there's a design flaw and "do_something()" takes too long (e.g. longer than 250 ms), or if there's so many other tasks consuming CPU time that this task can't keep up (which is a problem that can be fixed with task priorities that we don't have yet), then it isn't possible to call "do_something" often enough - if that happens, the "nano_sleep_until(next_update);" will do nothing (no delay at all) and "do_something()" will be called as often as possible, which is still the best behaviour anyone can expect in those conditions.
Line 290 ⟶ 287:
Fortunately, if you have "nano_sleep_until()" then you can use it to implement "nano_sleep()", like this:
 
<syntaxhighlight lang="c">
<pre>
void nano_sleep(uint64_t nanoseconds) {
nano_sleep_until(get_time_since_boot() + nanoseconds);
}
</syntaxhighlight>
</pre>
 
In a similar way, if you have either "nano_sleep_until()" or "nano_sleep()" you can use them to implement "sleep()" - you only need to multiply seconds by 1000000000 to get nanoseconds.
Line 323 ⟶ 320:
It might look a little like this:
 
<syntaxhighlight lang="c">
<pre>
int postpone_task_switches_counter = 0;
int task_switches_postponed_flag = 0;
Line 350 ⟶ 347:
#endif
}
</syntaxhighlight>
</pre>
 
The "schedule()" function only needs a few lines to check "postpone_task_switches_counter" and set "task_switches_postponed_flag", and that might end up a looking like this:
 
<syntaxhighlight lang="c">
<pre>
void schedule(void) {
if(postpone_task_switches_counter != 0) {
Line 366 ⟶ 363:
}
}
</syntaxhighlight>
</pre>
 
You'd need to make the same little change at the start of the "switch_to_task()" function, but that's "architecture specific assembly". For 32-bit 80x86 it might just a few instructions inserted at the start of the function (before all other instructions), like:
 
<syntaxhighlight lang="asm">
<pre>
cmp dword [postpone_task_switches_counter],0
je .continue
Line 377 ⟶ 374:
 
.continue:
</syntaxhighlight>
</pre>
 
Of course if you wanted to you could have a high level language wrapper that does this before calling the low level assembly function - either way is fine (I don't like wrappers much, but if you care about portability a wrapper would mean there's a little less code to change when porting).
Line 397 ⟶ 394:
For "nano_sleep_until()" you'd store the time the task is supposed to wake up somewhere, put the task on the unsorted list, then block the task. You'll need to add a "wake up time" field to the data structure you're using to keep track of tasks for this, and you'll need to define a new task state ("SLEEPING"). The code might look like this:
 
<syntaxhighlight lang="c">
<pre>
thread_control_block * sleeping_task_list = NULL;
 
Line 425 ⟶ 422:
block_task(SLEEPING);
}
</syntaxhighlight>
</pre>
 
Note that we're using the same "next" field in the data structure that the kernel uses to keep track of a task's information that the scheduler uses for its list of "ready to run" tasks. A task can't be ready to run and sleeping at the same time, so this is fine.
Line 431 ⟶ 428:
The code for the timer IRQ handler depends a little on which timer you're using. If you're using the PIT chip on 80x86 PCs, and the PIT chip has been set to a frequency of about 1000 Hz, then the timer IRQ handler might look like this:
 
<syntaxhighlight lang="c">
<pre>
uint64_t time_since_boot = 0;
uint64_t time_between_ticks = 1000000; // 1000 Hz = 1 ms between ticks = 1000000 nanoseconds between ticks
Line 467 ⟶ 464:
unlock_stuff();
}
</syntaxhighlight>
</pre>
 
Finally, test the code to make sure tasks block and unblock correctly when they call "sleep()" or "nano_sleep()" or "nano_sleep_until()". Please note that currently there's a minor problem with the scheduler - if all tasks are blocked waiting for something (either paused or sleeping) there won't be any tasks that the scheduler can give CPU time to so the scheduler will give CPU time to a task that is supposed to be blocked (which is bad!). For now, just make sure that at least one task is running.
Line 486 ⟶ 483:
The code might look like this:
 
<syntaxhighlight lang="c">
<pre>
void kernel_idle_task(void) {
for(;;) {
Line 492 ⟶ 489:
}
}
</syntaxhighlight>
</pre>
 
The next problem is that currently we're still using an awful cooperative round robin scheduler, which means that if the scheduler gives the idle task any CPU time it might hog all of the CPU time even after other tasks wake up. What we really need is a scheduler that isn't awful - e.g. a scheduler that understands task priorities, so that we can give the idle task the lowest possible priority and let all the other tasks (which will have higher priorities) preempt the idle task.
Line 500 ⟶ 497:
For this, you'll need a variable to keep track of which task is the idle task. Then you'd modify the "unblock_task()" code so that the idle task is always pre-empted, like this:
 
<syntaxhighlight lang="c">
<pre>
void unblock_task(thread_control_block * task) {
lock_scheduler();
Line 517 ⟶ 514:
unlock_scheduler();
}
</syntaxhighlight>
</pre>
 
The other thing you'd want to do is modify "schedule()" so that it tries to avoid giving the idle task CPU time when other task's are "ready to run":
 
<syntaxhighlight lang="c">
<pre>
void schedule(void) {
if(postpone_task_switches_counter != 0) {
Line 547 ⟶ 544:
}
}
</syntaxhighlight>
</pre>
 
Note that this is a relatively crude hack; but that's fine for now - later we're going to replace the scheduler with something less awful anyway.
Line 560 ⟶ 557:
The (optional) code for time accounting does need to be fixed though, and it'd be nice to keep track of how much time the CPU spend idle too, so that can be changed to something like this:
 
<syntaxhighlight lang="c">
<pre>
uint64_t CPU_idle_time = 0;
 
Line 574 ⟶ 571:
}
}
</syntaxhighlight>
</pre>
 
That only leaves the "schedule()" function itself, which could end up like this:
 
<syntaxhighlight lang="c">
<pre>
void schedule(void) {
thread_control_block * task;
Line 601 ⟶ 598:
uint64_t idle_start_time = get_time_since_boot(); // For future power management support
 
// Do nothing while waiting for a task to unblock and become "ready to run". The only thing that is going to update
// first_ready_to_run_task is going to be from a timer IRQ (with a single processor anyway), but interrupts are disabled.
// The timer must be allowed to fire, but do not allow any task changes to take place. The task_switches_postponed_flag
// must remain set to force the timer to return to this loop.
 
 
do {
STI(); // enable interrupts to allow the timer to fire
HLT();
HLT(); // halt and wait for the timer to fire
CLI(); // disable interrupts again to see if there is something to do
} while( (first_ready_to_run_task == NULL);
 
Line 620 ⟶ 623:
}
}
</syntaxhighlight>
</pre>
 
 
==Step 11: Time Slices==
Line 635 ⟶ 637:
For the earlier example (using the PIT on 80x86 at a frequency of 1000 Hz) it might look like this:
 
<syntaxhighlight lang="c">
<pre>
uint64_t time_slice_remaining = 0;
 
Line 684 ⟶ 686:
unlock_stuff();
}
</syntaxhighlight>
</pre>
 
The other change that's needed is that the "time_slice_remaining" variable needs to be set when a task switch happens. This can be done in the low level task switch code (which is "architecture specific assembly").
Line 690 ⟶ 692:
If you added a wrapper earlier (when updating locking in Step 8), and if you implemented idle time using an idle task, then it might look like this:
 
<syntaxhighlight lang="c">
<pre>
#define TIME_SLICE_LENGTH 50000000 // 50000000 nanoseconds is 50 ms
 
Line 706 ⟶ 708:
switch_to_task(task);
}
</syntaxhighlight>
</pre>
 
If you added a wrapper earlier and implemented idle time without an idle task, then it might look like this:
 
<syntaxhighlight lang="c">
<pre>
#define TIME_SLICE_LENGTH 50000000 // 50000000 nanoseconds is 50 ms
 
Line 731 ⟶ 733:
switch_to_task(task);
}
</syntaxhighlight>
</pre>
 
If you're not using a wrapper, it shouldn't be too hard to convert the changes above into assembly and add them to your low-level task switch code (for both cases; it's just some comparisons, branches and moves).
Line 746 ⟶ 748:
The task termination might look like this:
 
<syntaxhighlight lang="c">
<pre>
thread_control_block *terminated_task_list = NULL;
 
Line 774 ⟶ 776:
unlock_stuff();
}
</syntaxhighlight>
</pre>
 
Note that we're using the same "next" field in the data structure that the kernel uses to keep track of a task's information that the scheduler uses for its list of "ready to run" tasks (which is used for sleeping tasks too!). A task can't be ready to run and terminated at the same time, and a task can't be sleeping and terminated at the same time, so this is fine.
Line 780 ⟶ 782:
Of course this won't work without a "cleaner" task. The cleaner task might look a little bit like this:
 
<syntaxhighlight lang="c">
<pre>
void cleaner_task(void) {
thread_control_block *task;
Line 800 ⟶ 802:
kfree(task);
}
</syntaxhighlight>
</pre>
 
==Step 13: Mutexes and Semaphores==
Line 812 ⟶ 814:
The first thing we'll want is the semaphore's structure itself, which mostly just needs a few fields for counts plus a linked list for waiting tasks. For the linked list of waiting tasks I want to make sure tasks are taken from the front and added to the back to ensure that one task can be very unlucky and end up waiting forever while other tasks are lucky and keep acquiring the semaphore, which is slightly more tricky with singly linked lists (you need a "last task" field and a "first task" field). The semaphore's structure can be something like this:
 
<syntaxhighlight lang="c">
<pre>
typedef struct {
int max_count;
Line 819 ⟶ 821:
thread_control_block *last_waiting_task;
} SEMAPHORE;
</syntaxhighlight>
</pre>
 
 
Next we'll want some functions to create a semaphore or a mutex, like this:
 
<syntaxhighlight lang="c">
<pre>
SEMAPHORE *create_semaphore(int max_count) {
SEMAPHORE * semaphore;
Line 840 ⟶ 842:
return create_semaphore(1);
}
</syntaxhighlight>
</pre>
 
With that done, we want some functions to acquire a semaphore or a mutex:
 
<syntaxhighlight lang="c">
<pre>
void acquire_semaphore(SEMAPHORE * semaphore) {
lock_stuff();
Line 867 ⟶ 869:
acquire_semaphore();
}
</syntaxhighlight>
</pre>
 
Lastly, we need to be able to release the semaphore or mutex:
 
<syntaxhighlight lang="c">
<pre>
void release_semaphore(SEMAPHORE * semaphore) {
lock_stuff();
Line 892 ⟶ 894:
release_semaphore();
}
</syntaxhighlight>
</pre>
 
 
Line 952 ⟶ 954:
==Adding FPU/MMX/SSE/AVX Support (80x86 only)==
 
Originally (for80386, single-CPU) Intel designed theprotected FPUmode so that an OS can avoid saving/loading FPU state during task switches. The general idea was to keep track of an "FPU owner" and use a flag ("TS" in EFLAGS) to indicate when the currently running task isn't the FPU owner. If the CPU executes an instruction that uses FPU but the current task isn't the FPU owner, then the CPU raises an exception (because "TS" is set), and the exception handler saves the FPU state (belonging to a different task) and loads the FPU state for the currently running task. This can (in some cases) improve performance a lot - for example, if you have 100 tasks running where only one uses FPU, you'd never need to save or load FPU state. Intel continued this original idea when newer extensions (MMX, SSE, AVX) where added (although for AVX the implementation is significantly different).
 
However; when almost all tasks are using FPU/MMX/SSE/AVX state, it makes performance worse (due to the extra cost of an inevitable exception); and it doesn't work well for multi-CPU (where the currently running task's FPU state may still be in a completely different CPU).
 
The simplest solution is to always save and load FPU/MMX/SSE/AVX state during all task switches (and that would be a recommended starting point). There are also multiple more complex (and more efficient) ways of doing it; including always saving the previous task's FPU/MMX/SSE/AVX state during task switches if you know it was used but still postponing the work of load the next task's state until the task actually does need it; and including keeping track of "never used, sometimes used, always used" and pre-loading the FPU/MMX/SSE/AVX state when you know a task always uses it (or uses it often enough).
 
[[Category:Tutorials]]
[[Category:Multitasking]]