Rust and move semantics

Let us try and write a similar example to our C++ SillyVector, but with Rust. Don't worry if you don't understand all of the code at this point, we will get there eventually, for now we focus on what is important for this chapter. Rust keeps data and functionality separated, so let's start with the data:

#![allow(unused)]
fn main() {
struct SillyVec<T: Copy> {
    data: Box<[T]>,
    size: usize,
    capacity: usize,
}
}

A simple structure, storing values of some type T. We have a generic constraint on the type T, to make sure that we can copy the values. In C++, this constraint was implicit in the way we wrote our code (by using the assignment operator operator=), in Rust we have to explicitly state what our generic type has to be capable of. Just as in C++, we store the size and capacity using integer types. We also store the pointer to the dynamic memory region, using some special Rust type Box<[T]>. For now, let's say that this is the Rust equivalent for an owning pointer to a dynamically allocated array.

On to the implementation:

#![allow(unused)]
fn main() {
impl<T: Copy> SillyVec<T> {
    pub fn new() -> Self {
        Self {
            data: Box::new([]),
            size: 0,
            capacity: 0,
        }
    }

    pub fn push(&mut self, element: T) {
        if self.size == self.capacity {
            self.grow();
        }

        self.data[self.size] = element;
        self.size += 1;
    }

    pub fn get_at(&self, index: usize) -> T {
        if index >= self.size {
            panic!("Index out of bounds!");
        }

        self.data[index]
    }

    pub fn size(&self) -> usize {
        self.size
    }

    fn grow(&mut self) {
        // nasty details...
    }
}
}

We define a new function, which is like the default constructor in C++We don't have to call this function new, there is no relation to the new operator in C++. It is just an ordinary function like any other Rust function, calling it new just makes sense and is an established convention that you will find on many types in Rust! Since this new function does not take any parameters, we could also implement the Default trait, which is the even more established way of default-constructing types in Rust. For the sake of simplicity, we didn't include an implementation of Default here.. We then define our push, get_at and size functions with very similar implementations to the C++ example. Lastly, there is a private function grow, whose inner details should not concern us at the moment. Doing what we were doing in the C++ grow implementation in Rust is quite complicated, and for a good reason. Just assume that we can get it to work somehow. We can then use our SillyVec:

fn main() {
    let mut vec: SillyVec<i32> = SillyVec::new();
    vec.push(42);
    vec.push(43);

    for idx in 0..vec.size() {
        println!("{}", vec.get_at(idx));
    }
}

Run this example

Now let's try to pass our SillyVec to a function, like we did in C++:

fn dummification(dummy: SillyVec<i32>) {
    println!("Oh yes, a dummy with {} elements!", dummy.size());
}

fn main() {
    let mut vec: SillyVec<i32> = SillyVec::new();
    vec.push(42);
    vec.push(43);

    dummification(vec);

    for idx in 0..vec.size() {
        println!("{}", vec.get_at(idx));
    }
}

Run this example

This example does not compile! Interesting! Did Rust just outsmart us and prevented us from doing something stupid during compile-time? It did, but not in the way we might think. If we look at the compiler error, we get some information as to what went wrong:

error[E0382]: borrow of moved value: `vec`
  --> <source>:75:19
   |
69 |     let mut vec: SillyVec<i32> = SillyVec::new();
   |         ------- move occurs because `vec` has type `SillyVec<i32>`, which does not implement the `Copy` trait
...
73 |     dummification(vec);
   |                   --- value moved here
74 | 
75 |     for idx in 0..vec.size() {
   |                   ^^^ value borrowed here after move

First of all: Take a minute to appreciate how nice this error message looks like! Even if you don't understand half of the words in it, it contains a lot of useful information and shows us exactly where in the code the problem lies. Time to analyze this message!

borrow of moved value: 'vec' So this tells us that our variable vec has been moved, for some reason. We then try to borrow this moved value, and somehow this is not allowed. Makes sense, if we move something and then try to use it, it won't be there anymore. So what is this borrow thing? Borrows are the Rust equivalent to references in C++. They even share almost the same syntax: int& in C++ is a (non-const) reference to an int, which in Rust would be written as &i32, a borrow of an i32. References in C++ are often created automatically by declaring a type to be a reference: int val = 42; int& ref_to_val = val; In Rust, we have to explicitly create a reference using the ampersand-operator &: let val = 42; let borrow_of_val = &val; On the surface, borrows and references are very similar. If we write a const member method in C++, this is equivalent to a Rust method taking a &self parameter, a borrow to self, where self refers to the value that the function is called for. This is what happen in line 75, when we call the size function of our SillyVec type. Look at how it is defined:

#![allow(unused)]
fn main() {
pub fn size(&self) -> usize {
    self.size
}
}

Calling vec.size() is fully equivalent to the following code:

#![allow(unused)]
fn main() {
SillyVec::<i32>::size(&vec)
}

The method calling syntax using the dot-operator is just some syntactic sugar that Rust provides for us! Which explains why the Rust compiler says that we are borrowing our vec variable here, even if we never explicitly wrote &vec.

Now comes the most important part: Rust prevents us from obtaining a borrow to any variable whose lifetime has expired. In C++, scope rules do the same thing, however Rust is more powerful, because the lifetime of a variable can end even if the variable itself is still technically in scope. How is this possible? Due to move semantics.

Move semanticsSemantics is a fancy term for the meaning of language constructs. In this case, move semantics is equivalent to "The meaning of passing values around (into functions or to other variables) is equivalent to moving those values" are an inherent property of Rust (though C++ also supports something similar). To move a value means to take it from one place to another, leaving nothing behind at the old place. Move semantics are closely related to copy semantics, as illustrated by the following picture:

Picture showing the difference between copy semantics and move semantics using two boxes

C++ employs copy semantics by default, which is to say that whenever we pass a value of something somewhere, a copy of the value is created, leaving the original value untouched. Move semantics on the other hand means that whenever we pass a value, the value is moved to the new location, and the old location is empty afterwards. This is why our dummification method that takes a SillyVector by value works silently in C++:

void dummification(SillyVector<int> dummy) {
    std::cout << "Our vector has " << dummy.get_size() << " elements\n";
}

int main() {
    SillyVector<int> vec;
    //...
    dummification(vec /*<-- a COPY of 'vec' is created here! 'vec' is untouched!*/);
    std::cout << vec.get_size() << std::endl;

    return 0;
}

Whereas in Rust, our SillyVec is moved into the dummification function:

fn dummification(dummy: SillyVec<i32>) {
    println!("Oh yes, a dummy with {} elements!", dummy.size());
}

fn main() {
    let mut vec: SillyVec<i32> = SillyVec::new();
    //...
    dummification(vec /*<-- 'vec' is MOVED into the function. We cannot use something that has been moved, because it is gone!*/);

    for idx in 0..vec.size() {
        println!("{}", vec.get_at(idx));
    }
}

This point is quite important: C++ is copy-by-default, Rust is move-by-default!

Both the C++ and the Rust example are of course easily fixed, by realizing that we want to pass a reference/borrow to the dummification function. Which takes us back to the concept of ownership: All variables (local, member, global) are owners of their respective values in C++ and Rust. If we do not want to own something, we make the variable hold a reference (C++) or borrow (Rust). They are like visitors to something that is owned by someone else.

Digression: On the difference between pointers and references

One question that often arises when learning C++ is: What is the difference between a pointer and a reference? Which is a good question to ask, because they both seem to do a similar thing: Point to something else so that we can use this something else without owning it. Under the hood, a C++ reference is equivalent to a pointer, pointing to the memory address of whatever value is referenced. References are more restrictive than pointers however:

  • A reference must never be null, that is to say a reference always points to a valid memory address
  • References are read-only. Once a reference has been created, its value (the address that it is pointing to) can never be changed. Of course, the value at that address can be changed through the reference, just not the address that the reference points to
  • For that reason, references must be initialized when they are created. The following piece of code is thus invalid: int val = 42; int& val_ref; val_ref = val;

Note that these rules are enforced purely by the compiler. There is zero runtime overhead when using references compared to using pointers. These rules often are very convenient and actually allow us to write faster code: A function taking a pointer has to check for potential null values, whereas a function taking a reference can omit this check.