Borrows in Rust

Up until now, we haven't really talked a lot about some of the aspects in which Rust is fundamentally different to C++. If you heard a little bit about Rust, one of the first things that comes up is that Rust is a memory-safe language. What does that mean?

Do not overstay your welcome - The problem with borrowing values

To understand memory safety, we have to revisit the concept of references/borrows. All that follows is strictly valid for pointers as well, however pointers in C/C++ play a weird double-role, so it is easier to understand what is going on if we stick to references.

A reference is something like a visitor to a value. It does not own the value, but it can access the value and do things with it. Now what happens if this value gets destroyed while there are references to the value. What happens with the references? What happens if you try to access the value through the reference?

A simple but contrived example where this situation can happen is shown here:

int& evil() {
    int local = 42;
    return local;
}

int main() {
    int& ref = evil();
    return ref;
}

Run this example

Here we have a function evil(), which returns a reference. In this case, it returns a reference to a local variable from within this function's scope! Once we exit evil(), the local variable gets cleaned up and we now have a reference to some memory on the stack that is already cleaned up. In C++, this is undefined behaviour, which means that absolutely anything can happen. Our program can run happily, crash now, crash later, give wrong results, who knows? This situation is what we mean when we call a language 'memory unsafe': The ability to manipulate memory that is not owned by your program anymore.

Of course in this situation, the error is trivial to spot and the compiler actually issues a warning for us:

<source>: In function 'int& evil()':
<source>:5:12: warning: reference to local variable 'local' returned [-Wreturn-local-addr]
    5 |     return local;
      |            ^~~~~
<source>:4:9: note: declared here
    4 |     int local = 42;
      |         ^~~~~

In other situations, similar errors might be harder to spot, yielding subtle bugs that can be difficult to track down! Memory unsafety actually includes a whole bunch of program errors related to memory:

  • Reading from or writing to memory that has been freed
  • Reading from or writing to memory that has never been allocated, for example out of bounds accesses on allocated memory blocks
  • Writing to memory while concurrently reading from the memory

The third category of problems is related to the concept of parallel programming, which we will discuss in a later chapter and instead focus on the first two categories: Read/write after free, and read/write out of bounds. Here is a small exercise to familiarize yourself with these problems:

Exercise 3.3: Write two small C++ programs illustrating read/write after free and read/write out of bounds. Which real-world scenarios do you know where these memory problems can occur?

The reason why these memory problems can manifest in our programs lies in the way that programming languages such as C/C++ abstract memory. Both references and pointers are just numbers refering to a memory address, the validity and semantics of this memory address comes from the context of the program. Take the following example:

#include <iostream>

struct Composite {
    long a, b, c, d;
};

int main() {
    Composite a;
    a.a = 42;

    Composite* ptr_to_a = &a;

    std::cout << ptr_to_a << std::endl;
    std::cout << ptr_to_a->a << std::endl;

    return 0;
}

Run this example

Here we create a pointer to an instance of the Composite class. While we are used to thinking of the construct Composite* as 'A pointer to a Composite object', pointers really are just memory addresses. The type of the pointer just tells the compiler how to interpret this memory address, in this case as containing the memory for a Composite object. However that really is all there is to it, the pointer itself stores no information about its type! The type is just a guide for the compiler to generate the correct assembly code. Here is all the information that is not stored within a pointer:

  • How large is the memory region that the pointer points to?
  • Who owns this memory region?

On the surface, references seem to solve all these problems: A reference always points to a single value, or an array of constant size, and a reference never owns the memory it points to. Unfortunately, this is not enough. A reference can't be used to refer to a dynamically allocated memory block, because the reference can't store the size of the memory block and its address at the same time. Additionally, even if a reference never owns a memory region, it still can't tell us who does own the memory region. This was the main reason for our invalid program that returned a reference to a local variable: The reference pointed to memory which had no owner anymore!

This really is the main issue: The built-in C++ mechanisms are not strong enough to express all the information that we need to guarantee memory safety. This is exactly where Rust comes in! To guarantee memory safety, the Rust abstractions for pointers and references contain all this information. Even better, Rust does this in the majority of cases without any runtime overhead!

Enter the Rust borrow-checker

The central part that makes memory safety possible in Rust is a tool built into the compiler called the borrow checker. While the Rust book has many great examples that show you how the borrow checker works, we will try to understand it for ourselves based on the C++ examples that we saw in this chapter. So let's try to translate the evil C++ example to Rust:

fn evil() -> &i32 {
    let val: i32 = 42;
    &val
}

pub fn main() {
    let evil_borrow = evil();
    println!("{}", *evil_borrow);
}

Run this example

Right off the bat, this does not compile. Again, we get a nice error message:

error[E0106]: missing lifetime specifier
 --> <source>:1:14
  |
1 | fn evil() -> &i32 {
  |              ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but there is no value for it to be borrowed from
help: consider using the `'static` lifetime
  |
1 | fn evil() -> &'static i32 {
  | 

Interesting, now there is something about lifetimes here. As if this whole talk over the last chapter(s) about ownership and lifetime somehow made its way into the Rust programming language ;) So what is a lifetime in Rust? It is a little piece of information that is required for every borrow, telling the compiler how long the value that is borrowed lives! This almost tells us who owns the memory, this one crucial piece of information that C++ references were missing. It turns out that we don't actually have to know who exactly the owner of a piece of memory is (object a or b or what have you). Remember back to the RAII concept: We figure out that through RAII, we can tie the lifetime of resources to the scope of functions, and the the scope becomes the de-facto owner of the resource. This is what Rust lifetimes are: The name of the scope that the memory lives in!

To illustrate this, here is a quick example with some annotations:

pub fn main() 
{ // <-- First scope, let's call it 'a
    let a : i32 = 42;
    let illegal_borrow;
    { // <-- Another scope, let's call it 'b
        // We borrow a, which lives in the scope 'a. This information gets encoded 
        // within the TYPE of borrow_a. The full type is:
        // &'a i32
        let borrow_a = &a; 

        // Let's do the same thing again, but this time for a variable that lives
        // in the scope 'b!
        let b = 43;
        // The type of illegal_borrow is:
        // &'b i32
        illegal_borrow = &b;
    }
    // Rust knows AT COMPILE TIME that using illegal_borrow from the scope 'a
    // is not allowed, because the lifetime 'b has expired!
    println!("{}", *illegal_borrow);
}

Run this example

Encoding lifetime information within the type of borrows is a powerful way of solving the lifetime problem. Since types are purely compile-time constructs, there is zero runtime overhead, and we still get all the benefits of checking for memory safety. Note that, while the previous example assigns some names to the scopes, these names are usually determined by the compiler automatically. In cases where we have to specify a lifetime name ourselves, we do so by prefixing the name with a single quote: 'name. Before we do this, however, one word on a special lifetime that you might have spotted in the error message of the first example in this section: 'static.

Up until now, we only talked about function scopes, with the 'largest' scope being the one of the main method. However there is a scope that is even larger: The global scope. This is where global variables reside in, which get initialized before we enter main, and get destroyed after we leave main. This scope has the fixed name 'static in Rust, with a lifetime equal to the lifetime of the program. Here are the two main things that have 'static lifetime:

  • Global variables
  • String literals ("hello")

In a special sense, dynamically allocated memory also has 'static lifetime, because the heap lives as long as the program runs. However due to the way dynamically allocated memory is treated in Rust, you will rarely see this lifetime in this context.

Now, when do we have to specify a lifetime manually? This question is easy to answer: Whenever the compiler cannot figure out an appropriate lifetime for us. In this case it will complain, as it did for the initial example of this section, where we tried to return a borrow from a function. This is what the missing lifetime specifier error is trying to tell us!

Can we fix this error? The syntax for a lifetime specifier for a borrow goes like this: & 'lifetime TYPE, where TYPE is the type that we borrow, and 'lifetime is the name of our lifetime. So in our example of the evil() function, the lifetime of the borrow should be the name of the scope of the evil() function. However we don't know this name, as local scopes do not have a name that we can reference. Maybe we can try with this 'static lifetime first?

#![allow(unused)]
fn main() {
fn evil() -> & 'static i32 {
    let val: i32 = 42;
    &val
}
}

This gives us a new error: error[E0515]: cannot return reference to local variable 'val'. So the Rust compiler prevents returning a reference to a local variable, similar to the warning that we got in the C++ example. It is interesting that this is the error that we get, and not something akin to 'val' does not have a 'static lifetime, but for us this is a good thing, the error message that we get is easily understandable.

Let us try something else that is amazing when learning Rust. If you look at the full error message, you will see this line at the end: For more information about this error, try 'rustc --explain E0515'. If we run the suggested line, we actually get a thorough explanation of what we are doing wrong, complete with code examples. The very first code example is actually exactly what we were trying to do :) Running rustc --explain ... is something you can try when you get stuck on an error that you don't understand in Rust.

Borrows inside types

Suppose we want to write a type that stores a reference to some other value. How would we express this in Rust using borrows? Here is a first try:

struct Ref {
    the_ref: &i32,
}

pub fn main() {
    let val : i32 = 42;

    let as_ref = Ref {
        the_ref: &val
    };

    println!("{}", *as_ref.the_ref);
}

Run this example

With what we know already, this struct definition should make us suspicious: There is again a borrow without a lifetime specifier! Which is exactly the error that we get when trying to compile this code: error[E0106]: missing lifetime specifier. What would be a good lifetime specifier here? For writing types that use borrows, this is an interesting problem: A lifetime specifier refers to a specific lifetime, but this lifetime might not be known in the definition of our type. In our example, we create a Ref instance inside main, borrowing a value that lives in main, so the lifetime specifier would be equal to the scope of main. Disregarding the fact that we can't even name the scope of main, what if we specified our Ref type in another file, or even another library? There is no way to know about this specific main function and its scope in such a scenario.

It turns out that we don't have to know! Who said that our Ref type should only be valid for values borrowed within main? The specific as_ref instance is, but that doesn't mean that we could not create another instance at some other place in the code. What we need instead is a way for our Ref type to communicate on a per-instance base for which lifetime it is valid. This should ring a bell! We already know a concept that we can use to write a type that works with arbitrary types: Generics. Just as a type like Vec<T> can be used with arbitrary values for T (Vec<i32>, Vec<String> etc.), we can write a type that can work with arbitrary lifetimes:

#![allow(unused)]
fn main() {
struct Ref<'a> {
    the_ref: &'a i32,
}
}

We write this like any other generic type, but using the special Rust syntax for lifetime parameters with the quotation mark: <'a>. Now the lifetime of the borrowed value becomes part of the Ref type, which allows the Rust borrow checker to apply its lifetime checking rules to instances of Ref.

Compare this to C++, where we can easily store references inside types, but have no clue what these references point to:

#include <iostream>

struct Ref {
    explicit Ref(int& the_ref) : the_ref(the_ref) {}

    int& the_ref;
};

Ref evil() {
    int val = 42;
    Ref as_ref(val);
    return as_ref;
}

int main() {
    Ref as_ref = evil();
    std::cout << as_ref.the_ref << std::endl;
    return 0;
}

Run this example

And just like that, we tricked the compiler, no warning message for returning the address of a temporary variable. Catching bugs like this during compile-time was one of the major design goals of Rust.

Almost done! One last thing remaining so that we can conclude this very long chapter!