Implementing Rc<T>
in Rust
We saw how to implement a reference counted smart pointer in C++, now we would like to do the same in Rust. Immediately, we stumble upon a problem: Which type do we use for the heap-allocated instance of T
and the reference count? We can't use Box<T>
, because the whole point of our new type is to have multiple owners of the same piece of memory. So we need something like an ownerless type that points to heap-allocated memory. Does Rust have something like this?
In C++, we use raw pointers for this, but the whole point of Rusts memory safety guarantees was to get rid of raw pointers. Still, Rust is a systems programming language, and any good systems programming language should be able to interface directly with code written in C. To do so, it has to support the same types that C does, specifically raw unchecked pointers. Enter unsafe
Rust!
unsafe
Rust
Instead of going into a whole lot of detail on what exactly unsafe
Rust is, at this point it makes sense to simply look into the Rust documentation. It motivates the need for unsafe
code and explains exactly how it works. Please take some time to read the section on unsafe
Rust before continuing here!
An Rc<T>
implementation in Rust using unsafe
code
Part of unsafe
Rust is the support for raw pointers. A raw pointer in Rust has no lifetime checking associated with it, which means it is a perfect candidate for the internal type of our Rc<T>
implementation:
#![allow(unused)] fn main() { struct RefCountAndT<T> { ref_count: usize, obj: T, } struct Rc<T> { ref_count_and_t: *mut RefCountAndT<T>, } impl <T> Rc<T> { pub fn new(obj: T) -> Self { todo!() } pub fn get(&self) -> &T { todo!() } } }
In this rough outline of our Rc<T>
type, we use a combined reference count and instance block RefCountAndT
, just as in the last C++ example. We then store a raw pointer to this structure inside our Rc<T>
type. Raw pointers in unsafe
Rust come in two flavors: Immutable ones (*const T
) and mutable ones (*mut T
). Since we want to mutate the reference count eventually, we use a mutable pointer here! Our simple interface for Rc<T>
currently only supports constructing a new Rc<T>
from a value, and retrieving this value. Here is how we would implement the two functions:
#![allow(unused)] fn main() { pub fn new(obj: T) -> Self { //Move 'obj' onto the heap into a new RefCountAndT instance. We can use Box for this, just temporarily! let ref_count_and_t = Box::new(RefCountAndT { ref_count: 1, obj, }); //Now we *leak* the box to obtain the raw pointer! Box::leak returns a mutable *borrow*... let leaked : &mut RefCountAndT<T> = Box::leak(ref_count_and_t); //...but we want a *pointer*. Luckily, we can cast one into the other. This does NOT require unsafe code, //only *using* the pointer is unsafe (because it can be null)! let leaked_ptr : *mut RefCountAndT<T> = leaked; Self { ref_count_and_t: leaked_ptr, } } }
We make use of the Box<T>
type to perform the heap allocation in a safe way. Box<T>
provides a neat method called leak()
, which gives us access to the underlying memory block without deallocating it. We can convert the result of leak()
into a raw pointer and store this pointer inside the new Rc<T>
instance.
Accessing the instance works like this:
#![allow(unused)] fn main() { pub fn get(&self) -> &T { unsafe { &self.ref_count_and_t.as_ref().unwrap().obj } } }
Dereferencing a raw pointer is unsafe
, so we wrap the body of get()
in an unsafe
block. Raw pointers in Rust have a method as_ref()
for obtaining a borrow to the underlying instance. The pointer might be null, so as_ref()
can either return a valid borrow or nothing. We know that it will never return nothing, so we can bypass any checks using the unwrap()
callas_ref()
returns a special type called Option<T>
, which is a fancy way of handling things that can either be something or nothing. It is null
on steroids. We will learn more about Option<T>
in chapter 4.. This gives us access to our RefCountAndT
instance, from which we access obj
and return a borrow to it.
At this point, our Rc<T>
is still incomplete, because it never releases the allocated memory. We need something like a destructor for our type. In Rust, the equivalent to a destructor is the Drop
trait, which provides a method that gets called whenever an instance of the associated type goes out of scope. With Drop
, we can implement the cleanup of our memory!
#![allow(unused)] fn main() { impl <T> Drop for Rc<T> { fn drop(&mut self) { unsafe { let ref_count_and_t = self.ref_count_and_t.as_mut().unwrap(); ref_count_and_t.ref_count -= 1; if ref_count_and_t.ref_count == 0 { let _as_box = Box::from_raw(self.ref_count_and_t); } //_as_box goes out of scope here, deallocating the memory of the `RefCountAndT` block } } } }
The code should look familiar: We obtain a mutable borrow to the RefCountAndT
block, decrement the reference count, and if it reaches zero, we release the memory. Here, we use a small trick: We obtained our dynamic memory from a Box
, so we can now put it back into a new Box
, which we then let go out of scope, causing the Box
to clean up the memory.
Here is another helpful trait that we can implement to make our Rc<T>
type more usable: Deref
:
#![allow(unused)] fn main() { impl <T> Deref for Rc<T> { type Target = T; fn deref(&self) -> &Self::Target { self.get() } } }
Deref
enables automatic dereferencing from Rc<T>
to &T
, which allows us to call methods of T
directly on an Rc<T>
:
struct Test {} impl Test { pub fn foo(&self) { println!("Test -> foo"); } } pub fn main() { let rc1 = Rc::new(Test{}); rc1.foo(); }
Before moving on, here is an exercise for you:
Exercise 3.4: Implement the Clone
trait for the Rc<T>
type.
The problem with mutability and Rc<T>
Until now, our Rc<T>
type only supported immutable access to the underlying instance of T
. We would really like to mutate our values however! So let's implement this:
#![allow(unused)] fn main() { impl <T> Rc<T> { //... pub fn get_mut(&mut self) -> &mut T { unsafe { &mut self.ref_count_and_t.as_mut().unwrap().obj } } } }
Very simple, basically the same as get()
. Look what happens if we use this though:
pub fn main() { let mut rc1 = Rc::new(Test{}); let mut rc2 = rc1.clone(); let mut1 : &mut Test = rc1.get_mut(); let mut2 : &mut Test = rc2.get_mut(); mut1.foo(); mut2.foo(); }
This example compiles and runs, even though we have two mutable borrows to the same instance! By the Rust borrow checking rules, this should not be allowed, yet here we are. The reason why this works is because we used unsafe
code to bypass all borrow checking rules. Through unsafe
, there is no way for the borrow checker to know that rc1
and rc2
refer to the same instance in memory, and that consequently mut1
and mut2
point to the same instance. While this code works, it is wrong! We bypassed the Rust rules and now we don't have any safety anymore when using this Rc<T>
type.
Let's move to the Rc<T>
type of the Rust standard library to see how we can do better! To distinguish between our Rc<T>
implementation and the one from the Rust standard library, we will write the latter with its full path: std::rc::Rc<T>
.
pub fn main() { let mut rc1 = std::rc::Rc::new(Test{}); let mut rc2 = rc1.clone(); let mut1 = rc1.get_mut(); let mut2 = rc2.get_mut(); mut1.foo(); mut2.foo(); }
If we just replace our Rc<T>
by std::rc::Rc<T>
, our code will stop to compile because std::rc::Rc<T>
does not have a get_mut()
method. Well, it does, just not one that you can call using the regular function call syntax. We can call std::rc::Rc::get_mut(&mut rc1)
instead:
pub fn main() { let mut rc1 = std::rc::Rc::new(Test{}); let mut rc2 = rc1.clone(); let mut1 = std::rc::Rc::get_mut(&mut rc1).unwrap(); let mut2 = std::rc::Rc::get_mut(&mut rc2).unwrap(); mut1.foo(); mut2.foo(); }
std::rc::Rc::get_mut()
returns again one of these Option
things, so we can just unwrap()
them again like last time, right?
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value'
Seems not to be the case? A quick look into the documentation of std::rc::Rc::get_mut()
tells us the following:
Returns a mutable reference into the given Rc, if there are no other Rc or Weak pointers to the same allocation.
Returns None otherwise, because it is not safe to mutate a shared value.
The critical point is the second line: It is not safe to mutate a shared value. That's the issue we had before, and the standard library implementation of Rc<T>
enforces this! The problem with this approach is that it is too strict. It prevents us from ever mutating the value behind an Rc<T>
, as long as there are more than one Rc<T>
instances pointing to this value. However, the other Rc<T>
s might be perfectly harmless and don't do anything with the underlying value. So it might be fine to mutate the value, if we just knew that no one else is currently holding a borrow to this value.
Notice how this is very similar to what Rc<T>
itself does? But instead of tracking how many owners there are for a value, we want to track how many borrows there are to the value. So we need something like reference counting, but for Rust borrows. Luckily, the Rust standard library has something for us: Cell<T>
and RefCell<T>
. To understand these types, we have to understand two concepts first: Inherited mutability and interior mutability.
Inherited vs. interior mutability
The concepts of inherited mutability and interior mutability all deal with how mutability (or immutability) propagates to nested types. Inherited mutability says that the nested types inherit their mutability status from the parent type:
struct Nested { a: i32, b: i32, } pub fn main() { let nested = Nested { a: 42, b: 43 }; nested.a = 23; }
In this example, the variable nested
is immutable by default (because we wrote let
instead of let mut
). This immutability means that we can't assign something else to the nested
variable, but it also means that we can't assign something to the nested fields of the Nested
type! The mutability is inherited in this case.
The opposite of inherited mutability is interior mutability, which we often see in C++ when using pointers:
struct Nested {
int a;
int* b;
};
int main() {
int local_var = 43;
const Nested nested = Nested {
42,
&local_var
};
*nested.b = 23;
return 0;
}
Here, our Nested
type contains a pointer, which is a type with interior mutability. Even though the nested
variable is declared const
, we can assign through the pointer b
to manipulate local_var
. Since interior mutability bypasses the mutability status of the outer type, it can be a bit dangerous since it potentially hides mutation. For the Rc<T>
example however, it is just what we need!
Cell<T>
and RefCell<T>
Both the Cell<T>
and the RefCell<T>
types can be use to introduce interior mutability to a type in Rust. Cell<T>
works by replacing the underlying value using moves, whereas RefCell<T>
provides references/borrows to the underlying value. If you want the nitty-gritty details of how exactly this is achieved in Rust, check out the documentation of UnsafeCell<T>
, which is the underlying type that makes both Cell<T>
and RefCell<T>
possible. For us, all that matters is how we use these types. Let's expand our previous example a bit to illustrate the effects of mutability through an Rc<T>
:
struct Test { text: String, } impl Test { pub fn foo(&self) { println!("Test -> {}", self.text); } } pub fn main() { let mut rc1 = std::rc::Rc::new(Test{ text: "hello".into() }); let mut rc2 = rc1.clone(); { let mut1 = std::rc::Rc::get_mut(&mut rc1).unwrap(); mut1.foo(); } { let mut2 = std::rc::Rc::get_mut(&mut rc2).unwrap(); mut2.foo(); } }
This is mostly what we were left with before, with some small additions. The Test
type actually contains a value that we want to modify, and we slightly changed the order of our Rc::get_mut()
calls. This order is important, because it is perfectly safe! We get the first borrow, do something with it, and then get the second borrow and do something with that. Potentially, this should work, because at no point in time do we have two mutable borrows to the same value, since we did our borrowing inside of two scopes! Still, this will give an error, because Rc::get_mut()
checks for the number of Rc
s that exist, not the number of borrows.
We can now use the RefCell<T>
type, which allows us to mutate the underlying value through a regular borrow (&T
) instead of a mutable borrow:
pub fn main() { let rc1 = std::rc::Rc::new(std::cell::RefCell::new(Test{ text: "hello".into() })); let rc2 = rc1.clone(); { let mut mut1 = rc1.borrow_mut(); mut1.text = "mutation 1".into(); mut1.foo(); } { let mut mut2 = rc2.borrow_mut(); mut2.text = "mutation 2".into(); mut2.foo(); } }
RefCell<T>
gives us a nice borrow_mut()
method that we can use to obtain a mutable borrow to the value T
. We can use this for mutation and everything works just as we would expect, giving the desired output:
Test -> mutation 1
Test -> mutation 2
Note that the type returned from borrow_mut()
is not &mut T
, but instead a special wrapper type called RefMut<T>
that keeps track of the mutable borrow. With this type, RefCell<T>
can guarantee that there will never be more than one mutable borrow of the same value active at the same time (because this is invalid under the borrow checking rules of Rust)! So if we had dropped the local scopes from our example, we would have gotten an error like this: thread 'main' panicked at 'already borrowed: BorrowMutError'
. It is now our responsibility as developers to make sure that we don't accidentally create two mutable borrows to the same value, but even if we do: Rust is there to tell us what went wrong, just not at compile time because we gave up on those guarantees.