How to write systems software?

1.2 How do we write systems software?

To write software, we use programming languages. From the multitude of programming languages in use today, you will find that not every programming language is equally well suited for writing the same types of software. A language such as JavaScript might be more suited for writing client-facing software and can thus be considered an application programming language. In contrast, a language like C, which provides access to the underlying hardware, will be more suited for writing systems software and thus can be considered a systems programming language. In practice, most modern languages can be used for a multitude of tasks, which is why you will often find the term general-purpose programming language being used.

An important aspect that makes some languages ill-suited for writing systems software under our definition is the ability to access the computers hardware resources directly. Examples of hardware resources are:

Memory (working memory and disk memory)
CPU cycles (both on a single logical core and on multiple logical cores)
Network throughput
GPU (Graphics Processing Unit) cycles

Based on these hardware resources, we can classify programming languages by their ability to directly manage access to these resources. This leads us to the often-used terms of low-level and high-level programming languages. Again, there is no clear definition of what constitutes a low-level or high-level programming language, and indeed the usage of these terms has changed over the last decades. Here are two ways of defining these terms:

Definition 1) The level of a programming language describes the level of abstraction over a machines underlying hardware architecture
Definition 2) A low-level programming language gives the programmer direct access to hardware resources, a high-level programming language hides these details from the programmer

Both definitions are strongly related and deal with hardware and abstractions. In the context of computer science, abstraction refers to the process of hiding information in order to simplify interaction with a system (Colburn, T., Shute, G. Abstraction in Computer Science. Minds & Machines 17, 169–184 (2007). https://doi.org/10.1007/s11023-007-9061-7). Modern computers are extremely sophisticated, complex machines. Working with the actual hardware in full detail would include a massive amount of information that the programmer needs to know about the underlying system, making even simple tasks very time-consuming. All modern languages, even the ones that can be considered fairly low-level, thus use some form of abstraction over the underlying hardware. As abstraction is information hiding, there is the possibility of a loss of control when using abstractions. This can happen if the abstraction hides information necessary to achieve a specific task. Let's look at an example:

The Java programming language can be considered fairly high-level. It provides a unified abstraction of the systems hardware architecture called the Java Virtual Machine (JVM). One part of the JVM is concerned with providing the programmer access to working memory. It uses a high degree of abstraction: Memory can be allocated in a general manner by the programmer, unused memory is automatically detected and cleaned up through a garbage collector. This makes the process of allocating and using working memory quite simple in Java, but takes the possibility of specifying exactly where, when and how memory is allocated and released away from the user. The C programming language does not employ a garbage collector and instead requires the programmer to manually release all allocated memory once it is no longer used. Under this set of features and our two definitions of a programming language's level, we can consider Java a more high-level programming language than C.

Here is one more example to illustrate that this concept applies to other hardware resources as well:

The JavaScript programming language is an event-driven programming language. One of its main points of application is the development of client-side behaviour of web pages. Here, the JavaScript code is executed by another program (your browser), which controls the execution flow of the JavaScript program through something called an event-loop. Without going into detail on how this event-loop works, it enforces a sequential execution model for JavaScript code. This means that in pure JavaScript, no two pieces of code can be executed at the same time. Now take a look at the Java programming language. It provides a simple abstraction for running multiple pieces of code in parallel called a Thread. Through threads, the programmer effectively gains access to multiple CPU cores as a resource. In practice, many of the details are still managed by the JVM and the operating system, but given our initial definitions, we can consider JavaScript a more high-level programming language than Java.

Similar to our classification of software on a scale from systems to applications, we can put programming languages on a scale from low-level to high-level:

Programming languages from low-level to high-level

Exercise 1.2 Given the programming languages Python, C++, Haskell, Kotlin and Assembly language, sort them onto the scale from low-level to high-level programming languages. What can you say about the relationship between Kotlin and Java? How about the relationship between C and C++? Where would you put Haskell on this scale, and why?

Now that we have learned about low-level and high-level programming languages, it becomes clear that more low-level programming languages will provide us with better means for writing good systems software that makes efficient use of hardware resources. At the same time, the most low-level programming languages will be missing some abstractions that we would like to have in order to make the process of writing systems software efficient. For the longest time, C and C++ thus were the two main programming languages used for writing systems software, which also shows in their popularity. These are powerful, very well established languages which, despite their considerable age (C being developed in the early 1970s, C++ in the mid 1980s), still continue to be relevant today. At the same time, over the last decade new systems programming languages have emerged, such as Rust and Go. These languages aim to improve some shortcomings of their predecessors, such as memory safety or simple concurrent programming, while at the same time maintaining a level of control over the hardware that makes them well-suited for systems programming.

This course takes the deliberate decision to focus on Rust as a modern systems programming language in contrast to a well-established language such as C++. While no one of the two languages is clearly superior to the other, Rust does adress some shortcomings in C++ in terms of memory safety and safe concurrent programming that can make writing good systems software easier. Rust also has gained a lot of popularity over the last couple of years, continuingly scoring as the most loved programming language in the StackOverflow programmers survey. In addition, Rust's excellent tooling makes it very well suited for a lecture series, as getting some Rust code up and running is very simple.

At the same time, this course assumes that the students are familiar with C++, as it will make continuous references to C++ features important in systems programming and compare them to Rust's approach on systems programming.