Languages

From OSDev.wiki
Revision as of 08:45, 10 July 2009 by Cic (talk | contribs) (Killing "PlusPlus")
Jump to navigation Jump to search
This page is a work in progress.
This page may thus be incomplete. Its content may be changed in the near future.

There are many programming languages, some more suited for OS development and kernel writing than others. This page will discuss it in depth.


History

Early operating systems were written entirely in the Assembly language dialect of their respective CPU. In modern OSes there are still parts that can only be done in Assembly. Many high-level languages have been used for OS development in the past, including C, Lisp, FORTH, C#, C++, Modula-2, Ada, Bliss, and PL/1. In many languages other than C a fair amount of Assembly and C development is required in order to provide the appropriate runtime environment supporting the language's abstractions.

Warning

Not all languages are suitable for low-level system programming, or have suitable low-level development tools available for them. Even those which are suitable often require specific runtime support, where C does not. Also, the vast majority of OS related resources (like tutorials, and how-to examples, including this Wiki except where noted otherwise) assume C as the primary development language, so an OS developer should at least be able to read C code.

Using a language other than C entails a good deal of extra effort. But there are some developers who are willing to put in that effort in order to use their preferred notation.

On the other hand, trying to write an OS in interpreted languages like Perl or Java is unlikely to succeed. There are research projects about it, of course, but nothing that has completely changed the way we write kernels so far. And trying to write it in HTML, PHP or Javascript would just be proof that you have much to learn...

Can I use language XYZ?

If you'd like to know if your favourite language is suited for OSDeving, just consider the following questions
  • Can you cope with datastructures having a specific bits & bytes arrangement (mandatory for e.g. MMU structures and alike things)?
  • Can you take control of memory allocation/freeing? Or can you at least subdivide a large chunk of memory in smaller chunks that other functions can use transparently (necessary for any sort of memory management)?
  • Are you able to build a self-sufficient run-time library to support language features you'll need ?
  • Can you easily interface XYZ with some assembly code (yes, you'll have some, at least in the run-time library you'll have to write)?
  • If XYZ fits the other points and is an interpreted language, can you invoke code coming from raw data bytes with XYZ, i.e. jump at a specific address and continue execution there (this will be mandatory for loading and running programs)?

If any of those question turns out to "whoops, no, I cannot do that with language XYZ", then chances are that XYZ will be of no help for OS development - or will have to be mixed with other languages and stubs to function.

Can't I write a compiler for XYZ?

The only thing more complex than writing a compiler is writing an operating system. As you are already planning to do the latter, deciding to do the former also is like finding new ways to forge metals in order to build a better car. The canonical starting point is the "dragon book" ("Compilers - Principles, Techniques, and Tools" located on the Books page).

But I heard of an OS written in language XYZ, isn't it interpreted?

You may from time to time hear of operating systems written in languages which are usually interpreted, or which used an interpreter of some sort: JavaOS, Generra (the Symbolics Lisp Machine OS), Smalltalk-80, USCD Pascal, the various FORTH systems, etc. Most of these fall into one of three categories :

  • The operating system runs in a low-level interpreter, written in Assembly or some systems language like C, which is what actually interacts with the hardware. In effect, the 'operating system' is just an application running on top of another, lower-level OS. Smalltalk-80, UCSD Pascal, and JavaOS work like this, though they also have some modules which are compiled to native code as well (see below).
  • All or part of the code has been compiled to native code. This may involve using a sub-set of the language with reduced runtime requirements (e.g., pre Scheme, or Slang - while they have not been used for OS development to date, they do demonstrate this sort of low-level implementation language which can be used this way).
  • The system ran on specialized hardware and microcode, which acted as hardwired 'interpreter' for it's primary language, or for the portable bytecode which it normally used. This type of system includes the Lisp Machines, SOAR (Smalltalk On A RISC), the Recursiv System, The Lillith Modula-2 System, and the Burroughs 6500 (a mainframe designed for running Algol-60 in the 1960s). The system programming techniques for these cannot work on stock hardware.

The FORTH systems are a special case in and of themselves. While FORTH is technically an interpreted language, the 'stack-threaded' interpreter it uses works differently from most other interpreters; in effect, it walks through the various FORTH 'words' that the code is comprised of until it reaches the low-level words that are implemented in assembly or compiled code, which is what actually gets executed. Furthermore, FORTH systems incorporate a special sort of assembler, which produces code specifically meant to be used by the interpreter; also, commonly used 'words' can be compiled into native code as needed. Finally, many embedded FORTH systems use special-purpose hardware to support the language.