Unsafe Rust

All those loose ends

  • [T]::split_at_mut
  • Manually allocating memory
  • Handling uninitialized memory
  • Vec<T> growing

Memory Safety

  • Rust is a memory safe language (unlike C and C++)
  • Memory safety means no Undefined Behavior (UB)
    • “Rust is safe because all code does well-defined things”
  • Why would something be undefined behavior?

Undefined Behavior

void foo(int* arr) {
    arr[7] = 42;
}

int main() {
    int numbers[] = {1,2,3,4};
    float PI = 3.14159;
    foo(numbers);
    std::cout << PI << std::endl;
}

What does this program print?

5.88545e-44

Accesses past the end of an array are undefined behavior

Why not simply define what happens on an out-of-bounds memory access?

Reasons for Undefined Behavior

  • Performance & compiler optimizations
  • Defined behavior has to be enforced and therefore be detected

arr[7] = 42 with UB:

ldr     r3, [fp, #-8] @ arr addr.
add     r3, #7        @ operator[]
mov     r2, #42
str     r2, [r3]

arr[7] = 42 without UB:

ldr     r4, [fp, #-12] @ size of arr
cmp     r4, #7         @ size <= 7?
ble     .RaiseSIGSEGV 
ldr     r3, [fp, #-8]  @ arr addr.
add     r3, #7         @ operator[]
mov     r2, #42
str     r2, [r3]

Interaction with the language

  • Not every behavior can be determined at compile-time
    • Performance vs. safety tradeoff!
  • operator[] in Rust requires bounds checking
    • C++ does not, so is Rust slower than C++?
  • What if we really don’t want (or need) the bounds check?
    • Enter unsafe Rust

unsafe Rust

// On the slice type [T]:
pub unsafe fn get_unchecked(&self, index: usize) -> &T { ... }
  • We can turn off the Rust safety guarantees using the unsafe keyword
let arr = [1,2,3,4];
// SAFETY: Index is within bounds
let val = unsafe {
    arr.get_unchecked(3)
};
println!("{val}");
  • To call an unsafe function, we need an unsafe {} block
  • Always accompany unsafe {} blocks with a //SAFETY comment!

unsafe is dangerous

  • Each unsafe fn will have safety guarantees: Rules that you have to follow in order to make it safe to use this function
  • These rules are not enforced by the compiler! This is the whole point of unsafe
  • Safety guarantees are part of the method documentation:
/// # Safety
///
/// Calling this method with an out-of-bounds index is *[undefined behavior]*
/// even if the resulting reference is not used.
/// [...]
pub unsafe fn get_unchecked(&self, index: usize) -> &T { ... }

When is something unsafe? (1/4)

Calling a C function:

extern "C" {
    fn strcpy(dest: *mut u8, src: *const u8);
}

When is something unsafe? (2/4)

Accessing a non-existing element in a HashMap:

let map = HashMap::<&str, i32>::default();
map.get("key");

When is something unsafe? (3/4)

Allocating a block of memory on the heap:

let layout = Layout::new::<u32>();
let ptr = alloc(layout);

When is something unsafe? (4/4)

Accessing a mutable global variable:

static mut COUNTER: i32 = 0;

*COUNTER += 1;

What we can do in unsafe code

  • Dereference raw pointers
  • Call unsafe functions (including C functions, compiler intrinsics, and the raw allocator)
  • Implement unsafe traits
  • Access or modify mutable statics
  • Access fields of unions

Pointers

  • Rust has pointers with similar syntax to C/C++:
    • *const T and *mut T
  • Rust pointers have no ownership and lifetime information
  • Documented in std::ptr module
    • Pointer usage is only safe if a pointer is valid for a given access
    • But: The precise rules for validity are not determined yet

Using pointers

  • You can always go from a Rust reference / borrow to a pointer:
let val = 42;
let borrow = &val;
let as_ptr = borrow as *const i32;
  • The opposite direction also works, but is unsafe:
let val = 42;
let borrow = &val;
let as_ptr = borrow as *const i32;
// SAFETY: The pointer comes from a reference
let borrow_from_ptr = unsafe {
    &*as_ptr // * dereferences the pointer, & makes a borrow
};

Pointer to reference conversion

  • Why is a pointer-to-reference conversion unsafe?
  • Because Rust references (borrows) have very strict rules:
    • Aliasing: The Rule Of One
    • Validity: Not null, initialized, respecting the valid values (e.g. either 0 or 1 for a bool, nothing else)
    • Alignment: The memory address is aligned to the requirement of the type
  • Very easy to get wrong…

Pointer reads and writes

  • Three options:
    1. Dereference using *
    2. ptr::read
    3. ptr::write
  • Slightly different:
let boxed = Box::new(42);
let ptr = &boxed as *const Box<i32>;
unsafe {
    // let box2 = *ptr; // Does not compile because Box is not Copy
    let box2 = ptr.read(); // Works, but breaks ownership
    let box3 = ptr.read_unaligned(); // Unnecessary, ptr is properly aligned
}

Who cares?

Let’s implement Vec<T> :)

Vec<T> v1

struct Vec<T> {
    ptr: Box<[T]>,
    size: usize,
    capacity: usize,
}
  • Problems:
    • How do we obtain a Box<[T]>?
    • All elements in a slice must be properly initialized! No room for uninitialized elements…

Vec<T> v2

struct Vec<T> {
    ptr: *mut T,
    size: usize,
    capacity: usize,
}
  • Now everything becomes unsafe:
    • Allocation
    • Insertion
    • Growing
    • Accessing elements
  • But we can wrap everything in safe functions if we are careful!

Pushing an element

Pushing an element (code)

fn push_first(&mut self, element: T) {
    const INITIAL_CAPACITY: usize = 4;
    let layout = Layout::array::<T>(INITIAL_CAPACITY)
        .expect("Failed to allocate new dynamic array");
    // SAFETY: We check for null afterwards
    let arr: *mut T = unsafe { std::alloc::alloc(layout) as *mut T };
    // alloc is allowed to return a null-pointer if there is not enough memory
    if arr.is_null() {
        panic!("Out of memory");
    }
    let first = arr.add(0); // For clarity
    // SAFETY: Layout matches `T`
    unsafe {
        first.write(element); // This 'forgets' the old value
    }
    self.length += 1;
    self.ptr = arr;
    self.capacity = INITIAL_CAPACITY;
}

Accessing an element

fn get(&self, index: usize) -> &T {
    if index >= self.length {
        panic!("Index out of bounds");
    }
    // SAFETY: Index is within bounds and `ptr` is not null
    unsafe {
        &*self.ptr.add(index)
    }
}

Cleanup

  • Implementing Drop for Vec<T> is important
    • Deallocate the memory on the heap
    • Call the destructor of every element in the Vec<T> (how?)

Cleanup (code)

impl<T> Drop for Vec<T> {
    fn drop(&mut self) {
        if self.ptr.is_null() {
            return;
        }

        for idx in 0..self.length {
            // SAFETY: Element is properly initialized
            let element = unsafe { self.ptr.add(index).read() };
            drop(element);
        }

        let layout = Layout::array::<T>(self.capacity)
            .expect("Invalid Layout");
        // SAFETY: ptr is not null and the Layout matches
        unsafe {
            dealloc(self.ptr as *const u8, layout);
        }
    }
}

Soundness

  • The property of unsafe code to be safe for all possible usage patterns
    • unsafe code can be made sound by preventing invalid usage
    • e.g. bounds checks, checks for OOM
  • Vec<T> in the standard library uses tons of unsafe code, but has a safe and easy-to-use API