Preface

This is the first article of the series of articles that introduce you to a brief history of the Linux Kernel's process scheduler from the very first v0.01 to the newest v5.x.

Since it's impossible to deal with all features having been implemented in the Linux kernel, I just share with you the important algorithm changes.

Terminology

task: a scheduling target of process scheduler, for example, processes and threads in a process.
LCPU: The things that the Linux kernel considers it as CPU, for example, CPU, CPU cores in a CPU, CPU threads in a CPU core.
Current: Currently running task in an LCPU
round-robin process scheduling: The way of the process scheduling. If there are multiple runnable processes, the scheduler gives a short period of time called time slice for each process one by one.
run queue: The list of runnable processes.

Overview

In the beginning, I'll explain what round-robin scheduling is. In addition, I'll explain a brief history of the Linux Kernel's process scheduler.

v0.01~v2.4.x: the very first scheduler
v2.6.0~v2.6.22: O(1) scheduler
v2.6.23~: Completely Fair Scheduler (CFS)

The very first scheduler

v0.01 is the very first Linux kernel. Its process scheduler is just 20 lines of code and is very simple. For reference, the newest Linux kernel consists of ten thousand of lines.

In v0.01, all tasks are expressed by the one array. This array is not only the list of all tasks but also the run queue. The length of this array is 64. It means that the number of tasks in this version is at most 64. In this array, empty entry is expressed as NULL.

The time slice of the scheduler is 150 milliseconds. Whether Current exhausts its time slice is detected by a hardware called interval timer. The interval timer interrupts CPU once per 10 milliseconds and then a handler registered by the scheduler is called. This function decreases Current's time slice and if it becomes zero, the scheduler schedules the next runnable task in run queue.

The value of both time slice and the interval of the timer interrupt have been changed after this version. However, this article doesn't explain it one by one, for simplicity.

The scheduling algorithm

Here is the scheduling algorithm of the Linux v0.01's process scheduler.

Traverse run queue in reverse order and schedule the first runnable process which has the time slice more than zero.
If there is no such process, the scheduler reset the all task's time slices. Here the scheduler gives 150 milliseconds to the runnable processes and adds half of the current time slice to the sleeping tasks. The reason for the latter would be to schedule the woken-up tasks as soon as possible for interactivity.

I'll show you the flow of the above-mentioned algorithm with diagrams.

The initial state is as follows. The unit of the time slices is 10 millisecond.

At first, the scheduler traverse run queue in reverse order. Here t4 is skipped since it's sleeping. In addition, t2 is also skipped since it's this entry is empty.

After traversing the whole task array, it finds that the time slice of t1 is the largest among all runnable tasks.

The scheduler schedules t1 and this task run until exhaust its time slice.

The scheduler traverses run queue again and find that the time slice of t0 is the largest. If there are multiple tasks which have the largest time slice, the scheduler selects the first one found by the traversing.

Finally, all runnable tasks exhaust their time slice.

Then the scheduler resets the all task's time slice. It gives the 150 milliseconds, in other words filling up its time slice, and adds the current time slice divided by 2 to the sleeping t4.

Let's consider the case if t4 is woken up from the sleep state.

Then t4 has the largest time slice among all runnable tasks. However, the scheduler doesn't schedule it here. This task is scheduler only when Current exhausts its time slice, sleeps, or exits.

Nice Value

This version's process scheduler supports nice value. If nice value is smaller than zero, its time is increased and vice versa. We can modify the nice value of tasks with nice() system call. This system call can increase or decrease the caller's time slice.

Surprisingly, there is no validation at the beginning of the system call. So we can set any nice value to a task. In addition, non-root users can set negative nice value. It means non-root users can give arbitrary time slice to their tasks.

Conclusion

This article describes the core logic of the Linux v0.01. I guess you thought "wow, it's a toy OS," and I thought it too :-)

The next article of this series deals with the Linux v1.0's process scheduler.

Blog

A brief history of the Linux Kernel's process scheduler: The very first scheduler, v0.01

Satoru Takeuchi