3.5. Memory allocators
In this last subchapter of chapter 3, we will take a look at memory allocators. We will learn what they are, how they work and how we can write a custom memory allocator and in which situations that might be a good idea.
What are memory allocators?
Recall that the operating system provides processes access to the heap through moving around a pointer to the end of the heap (the program break). We already saw that this just gives us a large chunk of memory and we somehow have to manage access to blocks within this chunk of memory. We used malloc()
and free()
for this, which were functions from the C standard library (which is used by both c++ and Rust). malloc()
together with free()
is a memory allocator!
The purpose of every memory allocator is to manage access to memory, typically by splitting up a large chunk of memory into smaller chunks and handing those out whoever is requesting memory. In doing so, it manages both the lifecycle of these memory chunks, that is to say 'which chunk is free, which chunk is currently in use?', as well as the lookup of suitable chunks when memory is requested from the allocator. In the case of malloc()
and free()
:
- Marking a chunk as in use is done when
malloc()
is called: The chunk thatmalloc()
returns is now in use - Marking a chunk as free to use is done when
free()
is called: The chunk passed tofree()
is now free to use again - Looking up a suitable free chunk is done in
malloc()
, based on the requested memory size
The main reason why we care about memory allocators (or why we even need them) is because memory is a reusable resource. The same region of memory can be used by one part of the code during one point in time, and another part of the code at another point in time. We cannot afford to treat memory as a disposable resource because memory is scarce! Think about what would happen if, instead of reusing memory regions, we would always use the sbrk()
function to grow the heap for every new memory allocation. Even our virtual address space is finite, so eventually we will reach the end of the virtual address space, at which point no more memory can ever be allocated for the current process! Clearly this is not a good idea, hence the need for memory allocatorsThere are examples of disposable resources in computer science, for example UUIDs (universally unique identifiers). These are 128-bit numbers used as labels for all sorts of things and they are generated randomly on the fly. No central authority is required that hands out UUIDs. If a UUID is not needed anymore, it is simply discarded. The uniqueness property of UUIDs is guaranteed by the absurdly large number of possible UUIDs: There are 2128 different possible UUIDs, which is about 3.4*1038. Wikipedia has a nice article explaining the chance that two identical UUIDs will be generated..
Requirements for memory allocators
There are many different types of allocators with different goals in mind. In both systems and applications programming, you will often see one general purpose allocator being used, with other allocators building on top of this general purpose allocator. malloc
is such a general purpose allocator.
Here are the main goals of a general purpose allocator:
- Serve a very wide range of requests, from one Byte to mutliple Gigabytes
- Be as fast as possible in all of these cases
- Prevent memory fragmentation as much as possible
The first requirement is what we might think of first when we hear the term general purpose allocator: It should be usable in all sorts of situations. Calling malloc(1)
should be just as valid as calling malloc(1_000_000_000)
(given that enough free memory is available).
Performance is another consideration: When a new memory request comes in, an allocator has to locate a suitable chunk of memory that fulfills this request. We want this lookup process to go as fast as possible, so that there is little overhead in doing a dynamic memory allocation.
The last point is about how efficient the allocator is in locating free memory chunks. Suppose we had 32 Bytes of available memory, managed by an allocator, and the following three allocation requests:
alloc(8)
alloc(8)
alloc(16)
Here are two strategies for serving these allocation request visualized:
Depending on how the allocator works, the third request can either be fulfilled, as shown on the left, or it cannot be fulfilled, which is shown on the right. The second case is unfortunate: There are a total of 16 Bytes still free to use, just not in a single contiguous memory region, so a request for 16 contiguous Bytes cannot be served by the allocator. This situation is called memory fragmentation, and a good allocator tries to prevent fragmentation as much as possible.
Since a general purpose allocator has to work in any possible situation, it cannot make any assumptions for its usage. This means that sometimes, it can be more efficient to use a custom allocator that has a more narrow usage area and thus can employ certain optimizations that the general purpose allocator cannot. We will learn about two allocators that can be useful in practice, but before we dive into how they work, let's look at how to use allocators in systems programming languages in general.
Allocators in C++ and Rust
If we use raw memory management, we interact directly with the memory allocator. This is what we did when we called malloc()
and free()
manually. The whole point of the last two chapters however was to get rid of manual memory management and hide it behind useful abstractions, such as Box<T>
or std::shared_ptr<T>
. So how do these types work together with memory allocators? Since the answer to this question is highly language-specific, we will look at C++ first and then at Rust.
Memory allocators in the C++ STL
In C++, there is the concept of an Allocator
built into the STL. If you look at a container type such as std::vector<T>
, you will see that its type definition includes an allocator:
template<
class T,
class Allocator = std::allocator<T>
> class vector;
In its most basic form, the Allocator
template type must specify three things:
- A typedef
Allocator::value_type
, which specifies the type of objects that the allocator can allocate memory for - A function
T* allocate(size_t n)
, wheren
is the number of objects of typeT
that should be allocated - A function
void deallocate(T* ptr, size_t n)
, whereptr
is a pointer obtained fromallocate()
andn
is the number of elements that was passed toallocate()
When we use std::vector<T>
, we don't have to write the allocator type manually, because a default template argument is provided: std::allocator<T>
. std::allocator<T>
is the default allocator used by all STL containers. It uses the builtin operators new
and delete
for memory management, which internally call malloc()
and free()
from the C standard library.
If we want to use a custom allocator, we can write a type that satifsfies the constraints of the Allocator
concept (allocate
, deallocate
and a value_type
typedef) and plug it into the appropriate container class that we want to use:
#include <vector>
#include <iostream>
// Custom allocator that allocates twice as much memory as needed, because why not?
template<typename T>
struct CustomAllocator {
using value_type = T;
static T* allocate(size_t n) {
return static_cast<T*>(malloc(2 * n * sizeof(T)));
}
static void deallocate(T* ptr, size_t n) {
free(ptr);
}
};
template<typename T>
using CustomVector = std::vector<T, CustomAllocator<T>>;
int main() {
CustomVector<int> cvec;
cvec.push_back(42);
std::cout << cvec[0] << std::endl;
return 0;
}
The previous example illustrates a common pattern for custom memory allocators, namely that they built on top of other allocators. Here, the CustomAllocator
builds on the malloc
allocator and provides some additional (silly) behaviour.
A downside to using allocators this way in C++ is that the allocator becomes part of the type signature of the containers we use. In general, we are used to writing function signatures that accept the container types with the default argument for their allocator, like this:
template<typename T>
void foo(const std::vector<T>& vec) {}
Using a custom allocator, we end up with a different type that is not compatible to the default std::vector<T>
type:
template<typename T>
void foo(const std::vector<T>& vec) {}
int main() {
CustomVector<int> cvec;
foo(cvec);
return 0;
}
This example fails to compile with the error message mismatched types 'std::allocator<_Up>' and 'CustomAllocator<int>'
. To fix this, we can either add the allocator type to the type signature of the foo()
function:
template<typename T, typename Alloc>
void foo(const std::vector<T, Alloc>& vec) {}
Or we can use the std::pmr::polymorphic_allocator<T>
type! This is a custom allocator which supports runtime polymorphism, so multiple instances of std::pmr::polymorphic_allocator<T>
can exhibit different behaviour while having the same type signature. std::pmr::polymorphic_allocator<T>
does this by wrapping an instance of the interface type std::memory_resource
, which custom allocator types can derive from. It exposes similar methods to the Allocator
concept (do_allocate()
for allocation and do_deallocate()
for deallocation), but is not dependent any single value_type
as the Allocator
concept is.
Memory allocators in Rust
In Rust, using custom memory allocators works in a similar way to C++: All the standard containers have a second generic argument which specifies the allocator to use with this type. Look at the type signature for the Box<T>
type, for example:
#![allow(unused)] fn main() { pub struct Box<T, A = Global>(_, _) where T: ?Sized, A: Allocator; }
Since Rust has generic bounds, the second argument has to implement the Allocator
trait:
#![allow(unused)] fn main() { pub unsafe trait Allocator { pub fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError>; pub unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout); // Some other methods... } }
Disclaimer: At the time of writing, the Allocator
trait is still unstable and only available in nightly builds of Rust!
Let's try to understand the functions that the Allocator
trait provides. The allocate()
function takes not the number of bytes to allocate, but instead an instance of the Layout
type as its argument. This is a special type that describes the layout of the memory that we want to allocate, which includes not only its size in bytes, but also something called the alignment of the memory. We will learn about alignment in the next section. For now, it is sufficient to say that we can create an appropriate Layout
for any type T
by calling Layout::new::<T>()
. The allocate()
function then returns a Result<T, E>
type, which is Rusts way of indicating either a successful result (the first generic argument of Result<T, E>
), or an error that has occurred (the second generic argument of Result<T, E>
). On success, allocate()
will return a pointer to the allocated memory. One of the guarantees of the Allocator
trait is that it will never return a null pointer, which is why this function does not return *const T
but instead the NonNull<T>
type. NonNull<T>
is a wrapper around a raw pointer with the guarantee that the pointer is not null. Since allocate()
allocates a range of memory, the return type is a slice of bytes ([u8]
) and not a single pointer to bytes (*const u8
). On failure, allocate()
returns an AllocError
containing information about the reason for the failure.
deallocate()
is a bit simpler. It takes the pointer obtained from allocate()
and the corresponding Layout
and deallocates the memory. Note that deallocate()
does NOT return a potential error. Instead, one of the guarantees that a type implementing Allocator
has to enforce is that the pointer passed to deallocate()
came from a call to allocate()
on the same instance of the allocator. Since the compiler cannot enforce these guarantees statically, the whole trait is marked as unsafe
.
To use a custom allocator, the container classes that support the Allocator
API provide functions to default-construct the container with a custom allocator. Usually these functions will be called new_in()
and accept an instance of the allocator:
// Since the allocator API is not yet stable, we have to enable it as a nightly feature // This only compiles if you build it with Rust nightly!! You can use `rustup override set nightly` // in the root of your Rust project to force this #![feature(allocator_api)] use std::{alloc::{AllocError, Allocator, Layout}}; use std::ptr::NonNull; struct DummyAllocator; unsafe impl Allocator for DummyAllocator { fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> { todo!() } unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) { todo!() } } pub fn main() { let mut vec_with_allocator = Vec::new_in(DummyAllocator{}); vec_with_allocator.push(42); }
Unfortunately, at the time of writing only Vec
and Box
have support for the Allocator
API.
Before we dive into implementing a bunch of custom allocators, there is one more thing that we have to learn about: Memory alignment.
Memory alignment
In the previous section we saw that Rust allocators require a bit more information to allocate a block of memory than the C++ allocators. This information is encapsulated in the Layout
Rust type, which looks like this:
#![allow(unused)] fn main() { pub struct Layout { size_: usize, align_: NonZeroUsize, } }
In addition to the size of the memory region, it also stores its alignment. To understand alignment, recall that the smallest unit of memory that is addressable on most modern computers is one byte, which means we can't get a pointer to anything smaller than a single byte. A memory address a
is called N-byte aligned if a
is a multiple of N, where N is a power of two. The memory address 48 thus is 16-byte aligned (and 8-byte, 4-byte, and 1-byte aligned), whereas the address 17 is only 1-byte aligned.
Exercise 3.5: Write a Rust function that takes a usize
value and returns the maximum alignment of this value.
Why does alignment matter? In contrast to single-byte addressing, the word size of modern CPUs is typically larger than a single byte. For example, on a 64-bit Linux system, the word size is 8 bytes, which means that the CPU processes data in 8-byte chunks. For this reason, it is fastest to read data from an address that is at least aligned to the word size of the machine. The details depend heavily on the CPU instruction set. There are some CPUs that do not support reading from a memory address that is not properly aligned, such as older ARM CPUs. CPUs that support reads/writes at unaligned memory addresses might still incur a performance penalty for doing so. The x86-64 instruction set belongs to the latter category. General reads/writes will always work even at unaligned addresses, but might be slower than aligned memory access, but there are also special registers and instructions that do not work for unaligned memory access, such as the SSE instructions. Even disregarding the raw instructions, memory access at an unaligned address can affect cache access as well, requiring multiple reads/writes because the address is right at the edge of a cache lineWhat is a cache line? Whenever a value is requested from a cache, for example from L1 cache, and it is not present, the next higher cache (or main memory) is queried. Due to the principle of locality, it makes sense to load not just a single byte, but multiple bytes at once in this situation. The chunk of bytes that is loaded at once is called a cache line which typically is 64 bytes large..
So memory alignment matters, sometimes implicitly and sometimes explicitly. This is why Rust stores the alignment requirement for a memory allocation inside the Layout
type, together with the size. It is the job of the memory allocator to adhere to this alignment requirement. Note that since alignment is always a power-of-two and an address that is N2-aligned is also N-aligned, an allocator is free to return a memory address with a larger alignment than the one requested.
Alignment has fairly interesting effects in systems programming languages such as C++ or Rust. Since unaligned memory accesses are slow, compilers will often add invisible padding bytes to memory structures (struct
s, class
es) so that they can generate more efficient code. This can lead to types having unexpected sizes:
struct Small { v1: u32, v2: u8, } struct NotSoSmall { v1: u32, v2: [u8; 4], } pub fn main() { println!("{}", std::mem::size_of::<Small>()); println!("{}", std::mem::size_of::<NotSoSmall>()); }
On 64-bit Linux, this example prints:
8
8
Indicating that both types have the same size, even though the NotSoSmall
type stores 4 u8
values, where Small
stores just one. This is a situation in which the compiler added padding bytes to the Small
structure, in this case to make the type 8-byte aligned. In Rust, the layout of struct members is called its representation. Interestingly enough, the default representation in Rust has "no guarantees of data layout". If we want some guarantee, we can use the C representation, which guarantees a minimum alignment for our type and defines an algorithm that computes the member layout. If we want all members to be as tightly packed as possible, we can use the #[packed] modifier:
#[repr(packed(1))] struct Small { v1: u32, v2: u8, } struct NotSoSmall { v1: u32, v2: [u8; 4], } pub fn main() { println!("{}", std::mem::size_of::<Small>()); println!("{}", std::mem::size_of::<NotSoSmall>()); }
With #[packed(1)]
, the size of Small
is 5, which is just the sum of the size of all its members.
The StackAllocator
- Our first custom memory allocator
Now we are ready to write our first custom memory allocator: The StackAllocator
. It works very similar to the real stack, but gives us more control over how we use it. At its core, the StackAllocator
manages a contiguous chunk of memory and treats it as a stack. It keeps a pointer to the top of the stack and serves each allocation by incrementing this stack pointer by the amount of memory that should be allocated (taking alignment into account). This allocator type is sometimes refered to as a bump allocator due to the fact that its allocation strategy is to simply increment ('bump') a pointer, or a memory region because all allocations are served from the same region in memory. The following image illustrates how the StackAllocator
serves an allocation:
The StackAllocator
is very fast, with each allocation only requiring a pointer increment (and some housekeeping). It is great if you want to store a bunch of things with the same lifetime that are too large to fit onto the regular stack. Think of a level in a videogame: Most the data for a level gets loaded at the start of the level and lives until the end of the level. Although nowadays open-world games are quite common, many games still feature the classic loading screen at the start of a level, indicating the loading process of all (static) data for the level. If it is known how large this data is in memory, a StackAllocator
can be used to accommodate this data.
The strength of the StackAllocator
is at the same time its major downside: Allocations can't be freed in an arbitrary order. Since it is difficult to get single deallocations right with the StackAllocator
, it is instead common to simply free all of the allocations by resetting the stack pointer to the bottom of the stack:
Let's implement StackAllocator
in Rust! First, the type definition:
#![allow(unused)] fn main() { struct StackAllocator { // We have to use Cell/RefCell here because the `Allocator` trait takes &self instead of &mut self stack: RefCell<Box<[u8]>>, top_of_stack: Cell<NonNull<u8>>, end_of_stack: NonNull<u8>, } }
For our stack memory block, we use Box<[u8]>
for simplicity. We then store the current top of the stack as a NonNull<u8>
, as well as the end of the stack as another NonNull<u8>
. We have to wrap the non-constant members in Cell
/RefCell
, because the Allocator
trait functions allocate
and deallocate
take a immutable borrow to self
(&self
), but we have to mutate self
inside these methods. This is a good usecase of the concept of interior mutability that we learned about in the previous section. Note that a more flexible implementation would not store Box<[u8]>
, because this assumes that the underlying memory block comes from the system allocator instead of a custom allocator.
The StackAllocator
is not meant to grow, so its new
method has to know the amount of memory that the StackAllocator
should manage:
#![allow(unused)] fn main() { impl StackAllocator { pub fn new(stack_size: usize) -> Self { // In real code, you would do better error handling than this 'unwrap()' here let (stack_memory_layout, size) = Layout::new::<u8>().repeat(stack_size).unwrap(); unsafe { let stack_memory = std::alloc::alloc(stack_memory_layout); let stack_memory_slice = std::slice::from_raw_parts_mut(stack_memory, size); Self { stack: RefCell::new(Box::from_raw(stack_memory_slice.as_mut() as *mut [u8])), //alloc CAN return null, so we check here! top_of_stack: Cell::new(NonNull::new(stack_memory).unwrap()), end_of_stack: NonNull::new(stack_memory.add(stack_size)).unwrap(), } } } } }
Here we see the Layout
type in action. Since we want to allocate a block of raw memory, we use the default Layout
of the u8
type, repeated for the desired size of the stack. We then allocate our memory using std::alloc::alloc
, which on a Linux system will call malloc
from the C library internally. We convert the raw pointer that alloc
returns into a slice using std::slice::from_raw_parts_mut
, which then encodes the size of the memory block. This slice can then be put into a Box<[u8]>
using Box::from_raw
. We start out with the bottom of the stack as the current top of the stack, and memorize the end of the stack so that we can do out-of-bounds checks in the allocate
function.
allocate
is then written like so:
#![allow(unused)] fn main() { unsafe impl Allocator for &StackAllocator { fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> { if layout.size() == 0 { return Err(AllocError{}); } // Align the top of the stack to the alignment requirement let top_of_stack = self.top_of_stack.get(); let alignment = top_of_stack.as_ptr().align_offset(layout.align()); unsafe { let alloc_begin = top_of_stack.as_ptr().add(alignment); let alloc_end = alloc_begin.add(layout.size()); if alloc_end > self.end_of_stack.as_ptr() { return Err(AllocError{}); } self.top_of_stack.set(NonNull::new_unchecked(alloc_end)); let memory_block = std::slice::from_raw_parts_mut(alloc_begin, layout.size()); Ok(NonNull::new_unchecked(memory_block as *mut [u8])) } } //... } }
First, note that we unsafe impl
the Allocator
trait, because the compiler can't guarantee the invariants of the allocate
and deallocate
functions. We also implement the trait not for the StackAllocator
type itself, but for a borrow instead (&StackAllocator
). This is part of the concept behind the Allocator
trait: Containers are assumed to store types that implement Allocator
by value. This requires all types implementing Allocator
to be movable, which, for reasons that are beyond the scope of this lecture, cannot be guaranteed in all cases. So instead, Allocator
is expected to be implemented on references or smart pointers instead.
The implementation of allocate
is a bit verbose, but not hard to grasp. First, we check that all allocation requests are for at least one byte, as allocating zero bytes does not make sense. Then, we take the pointer to the top of the stack and align it to the alignment requirement in the requested Layout
. To this aligned pointer, we add the requested size of the allocation to obtain the pointer to the end of the memory allocation. If this pointer exceeds the end of the stack, we return an error because we are out of memory. Otherwise, we increment the top of the stack and return the allocated memory block.
deallocate
is very simple, because StackAllocator
does not support deallocating specific allocations:
#![allow(unused)] fn main() { unsafe impl Allocator for &StackAllocator { //... unsafe fn deallocate(&self, ptr: NonNull<u8>, _layout: Layout) { // Allocator::deallocate does nothing, because we can't deallocate in random order! We can only // verify that 'ptr' came from this allocator let stack_borrow = self.stack.borrow(); if (ptr.as_ptr() as *const u8) < stack_borrow.as_ptr() || ptr.as_ptr() >= self.top_of_stack.get().as_ptr() { panic!("Pointer is out of bounds!"); } } } }
Because we still need a way to deallocate all memory in the StackAllocator
, we can provide a reset
function:
#![allow(unused)] fn main() { impl StackAllocator { pub unsafe fn reset(&self) { let mut bottom_of_stack = self.stack.borrow_mut(); self.top_of_stack.set(NonNull::new_unchecked(bottom_of_stack.as_mut_ptr())); } } }
Calling this function is highly unsafe! If there are any objects still referencing memory allocated from the StackAllocator
, after calling reset
this memory can be handed out to another object, leading to two objects referring to the same memory. So we as developers have to make sure that we use the StackAllocator
correctly, hence reset
is also unsafe
.
That is all there is to say about the StackAllocator
custom allocator. Let's look at another allocator type!
The PoolAllocator
In this section, we will develop the PoolAllocator
, which is a memory allocator that hands out fixed-sized chunks of memory from a larger memory source. It is useful if you have a lot of instances of the same type that you want to manage. This could be entities in a video game or events in a distributed system. The following picture illustrates the mechanism behind the PoolAllocator
:
The PoolAllocator
is sometimes called a memory pool. Compared to the StackAllocator
, the PoolAllocator
can grow its internal memory and supports deallocations in random order, with the drawback that it can only serve allocations of a predetermined (maximum) size.
Conceptually, the PoolAllocator
can be thought of as a linked list of chunks of memory called a free list. To be efficient, these chunks initially all sit within a single large block of memory, allocated by the PoolAllocator
from its parent allocator (malloc
for example). The size of the chunks is a configurable parameter, as is the size of the large block of memory that these chunks live in. As an example, suppose that you want to implement a PoolAllocator
with a chunk size of 32 bytes. The block size should be an integer multiple of the chunk size, and it should be sufficiently large that we can serve a reasonable number of allocations before having to allocate another block. So let's use 4096 bytes as the block size, which gives us 4096/32=128 blocks per chunk:
All the possible chunks that can be allocated from this block of memory start at offsets that are an integer multiple of the chunk size away from the start of the block. The chunks thus are the nodes in our internal linked list. Now we can do something clever: Instead of storing the pointers to all chunks inside a separate datastructure, we can store them inside the chunks themselves! This requires that each chunk is at least as big as a single pointer, for example 8 bytes on x64 Linux, but this is a reasonable assumption to make (and we can enforce it in the constructor of the PoolAllocator
). We are allowed to do this because the chunks are free memory from the point of view of the PoolAllocator
- it does not matter what's in this memoryEven if the clients of our PoolAllocator
were to require zeroed memory upon allocation, once we allocate a chunk we have to remove it from the linked list anyways, at which point we can zero the memory!. Our PoolAllocator
thus creates an intrusive linked list containing all the free chunks whenever a new block is allocated:
The PoolAllocator
itself now always points to the next free chunk. At the start, this is the simply the start of the first allocated block. When an allocation request comes in, the PoolAllocator
can look at the next free chunk and see if it matches the allocation request. One of the requirements for the PoolAllocator
is that no one ever allocates memory that is larger than a single chunk, which is easy to enforce. So the question of whether an allocation can be served or not now boils down to the question: Is there still a free chunk? If there is one, the PoolAllocator
looks for the pointer to the next free chunk stored inside the memory of this chunk and sets this as the pointer to the next free chunk. If not, it has to allocate a new memory block from the parent allocator and initialize this block again, similar to before. In both cases, the current free chunk is then returned to the client that requested the allocation:
What about freeing memory? Allocating memory with the PoolAllocator
was equal to a pop_front
operation on the free list. For freeing, we can either push the chunk that is to be freed to the front of the free list, or to the back. For simplicity and potential cache efficiency, we will push the chunk to the front. To do so, we write the value of the pointer to the next free chunk into the memory of the chunk that is to be freed, and set this chunk as the next free chunk:
This is all there is to know about the functionality of allocating and freeing memory with the PoolAllocator
. Since all chunks are the same size, it works well if all allocation requests also have the same size, but in principle the PoolAllocator
can serve any allocation with a size less than or equal to the chunk size.
We have to talk about one last thing though: How the PoolAllocator
releases the allocated blocks back to its parent allocator. The easiest way to do so is to store a list of all blocks as owning pointers inside the PoolAllocator
, for example a Vec<Box<[u8]>>
, so that when the PoolAllocator
goes out of scope, the blocks are deallocated automatically. With a sufficiently large block size, the memory overhead of this additional vector is small, making the PoolAllocator
itself very memory-efficient.
You might be wondering why there were no code examples for the PoolAllocator
in this section? Simple: Implementing a PoolAllocator
is an exercise left for the students in the lab!
Other use-cases for custom allocators
Up until this point, we talked about custom allocators as a means to achieve better performance in our code. There is also another use-case: Diagnostics. This is perhaps less relevant for Rust, which has good memory safety as long as you don't use unsafe
code, but for C/C++, memory diagnostics are an important tool to identify and prevent bugs.
A particularily annoying part of low-level programming is that bugs tend to manifest themselves in unpredictable ways. The dreaded undefined behaviour that is deeply ingrained in the C++ standard is partly to blame: The behaviour of systems software written in C++ in the presence of programming errors is simply undefined most of the time. Memory errors are one of the harder errors to track down, as they often manifest at a seemingly unrelated place in the code from the place that caused the error. Let's look at a simple out-of-bounds memory access:
#include <vector>
#include <iostream>
void do_something(uint64_t* ptr) {
ptr[5] = ptr[1] + ptr[2] + ptr[3];
}
int main() {
uint64_t numbers[4] = {1,2,3,4};
uint64_t reference = 42;
do_something(numbers);
std::cout << reference << std::endl;
return 0;
}
In this (admittedly rather constructed) example, we have a simple out-of-bounds memory access, maybe due to a typo or simple oversight. We also have some unrelated number reference
, which we initialize to 42. If we run this program, one possible output might be this:
9
Which is a prime example of undefined behaviour! We never (explicitly) wrote to reference
after its initialization, and yet its value has changed. In this situation, it is easy to explain why: Because we wrote past the end of the numbers
array, which lives on the stack before the reference
variable. So going past the end of numbers
means the we eventually reach the memory of reference
and write to this. Similar situations can also happen with heap-allocated memory, if two unrelated pieces of memory happend to be close to each other, or an access is so far out of bounds that it accidentally ends up in another piece of memory.
We could catch such an error if we had some mechanism of detecting an out-of-bounds memory access. Luckily, virtual memory provides us with such a mechanism: Protected virtual pages! Under Linux (and all the other major operating systems), we can set protection flags on virtual pages, so that the memory inside a virtual page may not be written to or read from. If we were to allocate our array right on the edge of a page that is read and write protected, an out-of-bounds memory access beyond the end of the array would trigger a page fault by the operating system. But we would have to allocate our memory just right and set the appropriate protection flags. Under Linux, we can use mmap
to allocate memory inside a virtual page with specific page flags, as well as mprotect
to set page flags. With this, we can create a rudimentary memory protection system:
#include <vector>
#include <iostream>
#include <sys/mman.h>
#include <unistd.h>
#include <signal.h>
void signal_handler(int signal) {
printf("Got signal %i\n", signal);
}
void do_something(uint64_t* ptr) {
ptr[5] = ptr[1] + ptr[2] + ptr[3];
}
int main() {
// Capture segmentation violation signals to get notified if something went wrong!
signal(SIGSEGV, signal_handler);
const auto page_size = getpagesize();
// Allocate a bit more memory than fits in a single page, so that we get two pages! Make them both readable and
// writeable initially
std::byte* block = static_cast<std::byte*>(mmap(nullptr, page_size + 8, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0));
// Find the start of the second page
std::byte* start_second_page = block + page_size;
// Make the second page read and write protected
mprotect(start_second_page, 8, PROT_NONE);
// Put our numbers array right at the end of the first page, so that the last element in numbers takes up the last
// bytes of the first page. Any access beyond that will land in the second (protected) page
uint64_t* numbers = reinterpret_cast<uint64_t*>(block + page_size - (4 * sizeof(uint64_t)));
numbers[0] = 1;
numbers[1] = 2;
numbers[2] = 3;
numbers[3] = 4;
uint64_t reference = 42;
do_something(numbers);
std::cout << reference << std::endl;
munmap(block, page_size + 8);
return 0;
}
This is tedious to write and a bit difficult to get correct (and it only protects for out-of-bounds accesses in one direction), but we could encapsulate such code inside a custom memory allocator, for example a FencingAllocator
to identify memory problems automatically. Note that this strategy is very wasteful, we allocated two pages (8192 bytes on the target system) for just 4 numbers, so this is really only something for debugging. There are also automatic tools which do something similar by swapping out the default malloc
allocator, for example Valgrind.
Summary
This was the last chapter on memory. In it, we learned about memory allocators and strategies to manage memory allocations from a systems programming perspective. We saw the current state of allocator support in both C++ and Rust. We learned about two custom allocators, the StackAllocator
for allocating linear memory very fast, and the PoolAllocator
for allocating fixed-size chunks reasonably fast. Lastly we saw that custom allocators can be used to help detect memory problems.