30 Days of Rust - Day 27
johnnylarner
Posted on June 13, 2023
Good afternoon everyone, I'm entering the final blog sprint. I'm finishing my 30 days of learning this week. We'll be getting into some more advanced concepts over the next few days. The final sprint starts with a concept not totally unique to Rust: smart pointers.
Yesterday's questions answered
No questions to answer
Today's open questions
No open questions
What is the point (of reading my blog?)
We already know that pointers provide an efficient way of tracking large data structures during a program's runtime. Rather than copying the data each time we do some operation on it, we can simply pass around a pointer that points to the memory location of the result. References are the most common types of pointer and are represented by the &
symbol.
Smart pointers in Rust are more powerful than a mere reference. Smart pointers can:
- Help us efficiently allocate and track data on the heap
- Count the numbers of references so that we can have multiple ownership (i.e. multithreaded code)
- Enable us to run borrow rules at runtime instead of compile time
Put your data in a Box<T>
Most programs interact with data of an unknown quantity. The Box
API allows us to allocate that data directly to program's heap while maintaining a pointer on the stack. There are certain types of data structures where this is required in Rust.
Any type that is of a recursive nature needs to be in a box. This includes binary trees and linked lists. The reason for this is that a type which declares one of its fields as being of its own type will set off a recursive chain. Consider a binary tree:
a
/ \
b c
/ \ / \
e - f -
We could set a type for this structure if we knew that the root node would always have 2 children, and each of its children would have a left leaf node and no right leaf:
struct RootNode {
left: ChildNode,
right: ChildNode,
}
struct ChildNode {
left: LeafNode,
}
struct LeafNode {}
But as soon as that structure changes, its type would also have to change. When working with recursive structures, we know that the size of each instance is very, very likely to vary (otherwise we'd be using some other data structure). Box
is a way to tell this to the Rust compiler:
struct Node {
left: Option<Box<Node>>,
right: Option<Box<Node>>,
}
Here we are saying to the compiler: we will have a node which may have another node which may have another node and so on to infinity 🚀
Getting inside your Box
As a Box
is a reference to data, you need to dereference a box if you want to access your data:
fn main() {
let x = 5;
let y = Box::new(x);
assert_eq!(5, x);
assert_eq!(5, *y); }
Under the hood, the deference operator is calling deref
method of the Deref
trait of the Box
object and then dereferencing the result. The verbosity of dereferencing is hidden from us as programmers.
Rust is also able to coercively dereference smart pointers. This means you can get references to the underlying value simply by calling the &
operator. When you call this on a smart pointer, it will be implicitly dereferenced to yield the underlying value. As this behaviour is defined before compilation, Rust can determine the value of final result, which would be the last value to not implement the Deref
trait.
Cleaning up your Box
Smart pointers also implement the Drop
trait. This tells Rust what to do with a given resource when it goes out of scope. For Boxes, this will result in the pointer and the data on the heap being deallocated.
As with the Deref
trait, Rust is actually calling the drop
method behind the scenes. The Rust compiler won't let you call this method in your code. This could lead to two attempts being made to clear memory, this is known as a double free error. Instead you can use: std::mem::drop
.
Sharing is caring
Above we talked about a case where want to express to the compiler that we don't know how many instances type may be nested in a structure: we didn't know the number of instances.
Another case for smart pointers is when we don't know how many owners a piece of data will have at compile time. For example, if we're trying to express a graph structure where each node is a person that has edges which represent connections to that node. We want obviously want to hold on to that node until we know each edge has been removed. If a node represents an expensive EC2 instance and an edge an active inbound connection, we only want autoshutdown to kick in when no more connections are active. Enter Rc<T>
...
The reference counter smart pointer (Rc<T>
) is a way of expressing this situation in Rust. Data wrapped in a Rc
will only ever go out of scope once all reference owners themselves go out of scope.
Who cares what the compiler thinks!
Throughout this series, we've seen how the Rust compiler pushes us towards writing safer programs. One example of this is the borrowing rules:
- There can be multiple immutable references in scope at one time if there are no mutable references in scope
- There can be only one mutable reference in scope at any time
But sometimes we may want to break these rules. This is what the RefCell
smart pointer does. Well strictly speaking, it doesn't break the rules, it defers validating the rules until runtime (instead of compile time). RefCell
wraps data that might otherwise be immutable such that you can then mutate it at runtime if desired. A RefCell
has the borrow
and borrow_mut
methods that return references to the data it is wrapping.
You may rightly ask: why not just use a mutable reference if you want to mutate data. Most of the time you probably should. But there are cases, particularly when mocking stuff for unit tests, where you may want to use mutability to validate the behaviour of your library. The Rust Book explains the case really well and I'll defer to that as this blog post is already getting pretty long...
Separating the weak from the strong
Some data structures may want to contain references to each other. For example our binary tree we look at earlier. We may want a root node to have access to all its children. Each parent and grandparent node should also access their children. Thus each node except the root will have an unknown number of owners at compile time. This means each node should be wrapped in a reference counter type.
But that means we aren't able to modify the hierarchy of our tree as reference counters return immutable references. We can wrap the reference counter in a RefCell
if we want to adjust the hierarchy at run time:
use std::cell::RefCell;
use std::rc::Rc;
struct Node {
value: i32,
children: RefCell<Vec<Rc<Node>>>,
}
A child node may also want to have a reference to its parent. And as parents can have multiple children, we'd want an reference counter wrapper type:
struct Node {
value: i32,
parent: Rc<Node>,
children: RefCell<Vec<Rc<Node>>>,
}
But now we have an issue: the child and parent now have circular references to each other. Why is that an issue? Reference counter types are only dropped when they have 0 strong references. If a Rc<Node>
has 2 children with this setup, it will have 2 references. If those children are dropped or go out of scope, the Rc<Node>
will have 0 references, meaning it will be dropped. This is probably not what we want to happen.
The Weak
smart pointer is designed to solve this: when you want to have multiple references that should not influence the ownership of a given value:
struct Node {
value: i32,
parent: Weak<Node>,
children: RefCell<Vec<Rc<Node>>>,
}
Now when a node's children are dropped, the compiler will not forcibly drop the parent node 🙌
Posted on June 13, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.