Languages

From OSDev.wiki
Jump to navigation Jump to search

There are many programming languages, some more suited for OS development and kernel writing than others. This page will discuss it in depth.

History

Early operating systems were written entirely in the Assembly language dialect of their respective CPU, and it remains an option for current developers willing to put in the time and effort to use it. Even when an OS is written primarily in a high-level language, there are still parts that can only be done in assembly. A significant sub-set of OS devs choose to work exclusively in assembly, and at least some work in machine language directly (though this is rare today).

Many high-level languages have been used for OS development in the past, including C, Forth, Lisp, C#, C++, Modula-2, Ada, Bliss, Smalltalk, and PL/1. However, not all languages as are suited for OS development, and in many languages other than C, a fair amount of Assembly development is required in order to provide the appropriate runtime environment supporting the language's abstractions. Languages such as C, Modula-2, Ada, Bliss, PL/M, and XPL have all been designed specifically for the purpose of low-level systems programming, either in OS dev or embedded systems, while languages such as Forth incorporate the necessary low-level features even when they weren't intended specifically for this purpose.

Warning

Not all languages are suited for low-level system programming. Some have no suitable low-level development tools available, or require specific runtime support -- problems C does not suffer from. Also, the vast majority of OS related resources (like tutorials, and how-to examples, including this wiki) assume C as the primary development language, so an OS developer should at least be able to read C code.

Using a language other than C entails a good deal of extra effort. But there are some developers who are willing to put in that effort in order to suit the development of their OSes to their ways of thinking (for example: programming paradigms).

On the other hand, attempting to write an OS in interpreted or bytecode languages like Perl or Java is prohibitively challenging, and therefore less likely to lead to a success story. There are research projects about it, of course, but nothing that has completely changed the way we write kernels so far. And trying to write it in HTML would just be proof that you have much to learn.

So, you're warned. And now you should make a choice, hopefully, the informed choice.

Can I use language XYZ?

If you'd like to know if your favourite language is suited for OSDeving, just consider the most important principle - there should be a way of doing low level things in your language.

Also before you start it worth to answer the following questions:

  • Can you cope with datastructures having a specific bits & bytes arrangement (mandatory for e.g. MMU structures and alike things)? Or do you have corresponding tools?
  • Can you take control of memory allocation/freeing? Or can you at least subdivide a large chunk of memory in smaller chunks that other functions can use transparently (necessary for any sort of memory management)? And the important consequence here is - do you have some kind of support of your personal memory manager in your language?
  • Are you able to build a self-sufficient run-time library to support language features you'll need?
  • Can you easily interface XYZ with some assembly code (yes, you'll have some, at least in the run-time library you'll have to write)?
  • If XYZ fits the other points and is an interpreted language, can you invoke code coming from raw data bytes with XYZ, i.e. jump at a specific address and continue execution there (this will be mandatory for loading and running programs)?

If any of those question turns out to "whoops, no, I cannot do that with language XYZ", then chances are that XYZ will be of no help for OS development. This particular problem has often been circumvented by modifying the language and writing new compilers.

Can't I write a compiler for XYZ?

A compiler is one of the few things in software that are of a similar complexity to operating systems. As you are already planning to write an OS, deciding to do a compiler also is like finding new ways to forge metals in order to build a better car. Still, there are some OSDevers who think themselves up to the challenge. The canonical starting point is the "Dragon Book" (Compilers - Principles, Techniques, and Tools located on the Books page).

This does not mean you can afford to ignore C and Assembly, however; indeed, Assembly can be much more important to the code generation step of compilation than it is to OS development (unless you are eschewing high-level languages entirely), and for compiler designers a solid grasp of the target language is a must. Also, as has already been stated, most of the information about OS dev is oriented towards C developers, and a solid ability to at least read C is necessary unless you intend to work wholly alone.

Also, consider this: what language will you write your compiler in? Assembly is clunky, so your choices are C, or your favorite language, XYZ. If you use the latter, how will bootstrap yourself? If you use the former, you'll need to port libc and GCC -- you do want to be self-hosting... right?

But I heard of an OS written in language XYZ, isn't it interpreted?

This is a red herring. There is no such thing as an "interpreted language". Any language can be implemented using either an interpreter or a compiler; and even in an OS project, there are forms of 'interpretation' which can nevertheless by applied to system operations.

You may from time to time hear of operating systems written in languages which are usually interpreted, or which used an interpreter of some sort: JavaOS, Genera (the Symbolics Lisp Machine OS), Pilot-OS (the system for the Xerox Star workstation, written mostly in the Mesa language), UCSD Pascal, the various Forth systems, etc. Most of these fall into one of three categories :

  • The operating system runs in a low-level interpreter, written in Assembly or some systems language like C, which is what actually interacts with the hardware. In effect, the 'operating system' is just an application running on top of another, lower-level OS. Pilot-OS, UCSD Pascal, and some Java OSes work like this, though they also have some modules which are compiled to native code as well (see below).
  • All or part of the code has been compiled to native code. This may involve using a sub-set of the language with reduced runtime requirements (e.g., Pre-Scheme, or Slang - while they have not been used for OS development to date, they do demonstrate this sort of low-level implementation language which can be used this way).
    • Forth-based operating systems are a special case of this. While Forth is usually described as an interpreted language, the threaded-code interpreters many FORTH systems use work differently from most other interpreters; in effect, the interpreter walks through the various Forth 'words' that the code is comprised of until it reaches the low-level words that are implemented in assembly or compiled code, which is what actually gets executed. Furthermore, Forth systems incorporate a special sort of assembler, which produces code specifically meant to be used by the interpreter; also, commonly used 'words' can be compiled into native code as needed. Finally, many embedded Forth systems use special-purpose hardware (see below) to support the language.
    • Most Lisp systems freely mix interpreted and compiled code, and in nearly all 'serious' Lisp systems since the late 1970s, the Listener REPL (the 'command line' for the Lisp system) is not an interpreter, but rather a compile-and-go system - each piece of code is at least partially compiled on the fly. While no significant Lisp OS has been made without also using language-specific hardware support (see below), if one were, it surely would not be a purely interpreted system.
  • The system ran on specialized hardware and microcode, which acted as hardwired 'interpreter' for it's primary language, or for the portable bytecode which it normally used. This type of system includes the SOAR (Smalltalk On A RISC), the Recursiv System, The Lillith Modula-2 System, and the Burroughs 6500 (a mainframe designed for running Algol-60 in the 1960s). The system programming techniques for these cannot work on stock hardware. For example:
    • The MIT CADR Lisp machine architecture had an extensive instruction set with hardware support for certain high-level operations such as type-tag checking and GC. It had a tagged architecture meaning that a portion of the 36-bit addressing word was designated for type information. Typically these machines had a variety of compilers including one for the system language Lisp which was capable of taking advantage of the additional instruction set.
    • The Rekursiv Single-Board Computer had hardware support for a writable instruction set (that is, you could dynamically add microcode instructions) and associative memory dispatch tables for supporting object-oriented programming.