Matt Angelosanto
Posted on December 20, 2022
Written by Andre Bogus✏️
Let's face it; with its optional and named arguments, Python has an advantage over many other programming languages, like Rust. However, Rust library authors can work around this shortcoming quite effectively using the builder pattern. The idea behind this pattern is deceptively simple: create an object that can, but doesn't need to hold all values, and let it create our type when all the required fields are present.
In this article, we’ll explore Rust’s builder pattern, covering the following:
- The builder pattern
- Owned vs. mutably referenced builders
-
Into
andAsRef
traits - Default values
- Keeping track of set fields using type state
- Rust builder pattern crates
The builder pattern
To get familiar with the builder pattern in Rust, let’s first compare how our code might look with and without a builder:
// without a builder
let base_time = DateTime::now();
let flux_capacitor = FluxCapacitor::new();
let mut option_load = None;
let mut mr_fusion = None;
if get_energy_level(&plutonium) > 1.21e9 {
option_load = Some(plutonium.rods());
} else {
// need an energy source, can fail
mr_fusion = obtain_mr_fusion(base_time).ok();
}
TimeMachine::new(
flux_capacitor,
base_time,
option_load,
mr_fusion,
)
// with a builder
let builder = TimeMachineBuilder::new(base_time);
.flux_capacitor(FluxCapacitor::new());
if get_energy_level(&plutonium) > 1.21e9 {
builder.plutonium_rods(plutonium.rods())
} else {
builder.mr_fusion()
}
.build()
All of the examples in this text are simple by design. In practice, you would use a builder for complex types with more dependencies.
The builder’s main function is keeping the data needed to build our instance together in one place. Just define the TimeMachineBuilder
struct, put a bunch of Option<_>
fields in, add an impl
with a new
and a build
method, as well as some setters, and you're done. That's it, you now know all about builders. See you next time!
You're still here? Ah, I suspected you wouldn't fall for that trick. Of course, there is a bit more to builders than the obvious collection of data.
Owned vs. mutably referenced builders
Unlike in some garbage-collected languages, in Rust, we distinguish owned values from borrowed values. And as a result, there are multiple ways to set up builder methods. One takes &mut self
by mutable reference, the other takes self
by value. The former has two sub-variants, either returning &mut self
for chaining or ()
. It is slightly more common to allow chaining.
Our example uses chaining and therefore uses a by-value builder. The result of new
is directly used to call another method.
Mutably borrowed builders have the benefit of being able to call multiple methods on the same builder while still allowing some chaining. However, this comes at the cost of requiring a binding for the builder setup. For example, the following code would fail with &mut self
returning methods:
let builder= ByMutRefBuilder::new()
.with_favorite_number(42); // this drops the builder while borrowed
However, doing the full chain still works:
ByMutRefBuilder::new()
.with_favorite_number(42)
.with_favorite_programming_language("Rust")
.build()
If we want to reuse the builder, we need to bind the result of the new()
call:
let mut builder = ByMutRefBuilder::new();
builder.with_favorite_number(42) // this returns `&mut builder` for further chaining
We can also ignore chaining, calling the same binding multiple times instead:
let mut builder = ByMutRefBuilder::new();
builder.with_favorite_number(42);
builder.with_favorite_programming_language("Rust");
builder.build()
On the other hand, the by-value builders need to re-bind to not drop their state:
let builder = ByValueBuilder::new();
builder.with_favorite_number(42); // this consumes the builder :-(
Therefore, they are usually chained:
ByValueBuilder::new()
.with_favorite_number(42)
.with_favorite_programming_language("Rust")
.build()
So, with by-value builders, we require chaining. On the other hand, mutably referenced builders will allow chaining as long as the builder itself is bound to some local variable. In addition, mutably referenced builders can be reused freely because they are not consumed by their methods.
In general, chaining is the expected way to use builders, so this is not a big downside. Additionally, depending on how much data the builder contains, moving the builder around may become visible in the performance profile, however, this is rare.
If the builder will be used often in complex code with many branches, or it is likely to be reused from an intermediate state, I'd favor a mutably referenced builder. Otherwise, I'd use a by-value builder.
Into
and AsRef
traits
Of course, the builder methods can do some basic transformations. The most popular one uses the Into
trait to bind the input.
For example, you could take an index as anything that has an Into<usize>
implementation or allow the builder to reduce allocations by having an Into<Cow<'static, str>>
argument, which makes the function both accept a &'static str
and String
. For arguments that can be given as references, the AsRef
trait can allow more freedom in the supplied types.
There are also specialized traits like IntoIterator
and ToString
that can be useful on occasion. For example, if we have a sequence of values, we could have add
and add_all
methods that extend each internal Vec
:
impl FriendlyBuilder {
fn add(&mut self, value: impl Into<String>) -> &mut Self {
self.values.push(value.into())
self
}
fn add_all(
&mut self,
values: impl IntoIterator<Item = impl Into<String>>
) -> &mut Self {
self.values.extend(values.into_iter().map(Into::into));
self
}
}
Default values
Types can often have workable defaults. So, the builder can pre-set those default values and only replace them if requested by the user. In rare cases, getting the default can be costly. The builder can either use an Option
, which has its own None
default, or perform another trick to keep track of which fields are set, which we’ll explain in the next section.
Of course, we're not beholden to whatever the Default
implementation gives us; we can set our own defaults. For example, we could decide that more is better, so the default number would be u32::MAX
instead of the zero Default
would give us.
For more complex types involving reference counting, we may have a static
default value. For a small price of runtime overhead for the reference counts, it gets Arc::clone(_)
every time. Or, if we allow for borrowed static instances, we could use Cow<'static, T>
as the default, avoiding allocation while still keeping building simple:
use std::sync::Arc;
static ARCD_BEHEMOTH: Arc<Behemoth> = Arc::new(Behemoth::dummy());
static DEFAULT_NAME: &str = "Fluffy";
impl WithDefaultsBuilder {
fn new() -> Self {
Self {
// we can simply use `Default`
some_defaulting_value: Default::default(),
// we can of course set our own defaults
number: 42,
// for values not needed for construction
optional_stuff: None,
// for `Cow`s, we can borrow a default
name: Cow::Borrowed(DEFAULT_NAME),
// we can clone a `static`
goliath: Arc::clone(ARCD_BEHHEMOTH),
}
}
}
Keeping track of set fields using type state
Keeping track of set fields only applies to the owned variant. The idea is to use generics to put the information regarding what fields have been set into the type. Therefore, we can avoid double-sets, in fact, we may even forbid them, as well as only allow building once all the required fields have been set.
Let's take a simple case with a favorite number, programming language, and color, where only the first is required. Our type would be TypeStateBuilder<N, P, C>
, where N
would convey whether the number has been set, P
whether the programming language was set, and C
whether the color was set.
We can then create Unset
and Set
types to fill in for our generics. Our new
function would return TypeStateBuilder<Unset, Unset, Unset>
, and only a TypeStateBuilder<Set, _, _>
has a .build()
method.
In our example, we use default values everywhere because using unsafe code would not help to understand the pattern. But, it is certainly possible to avoid needless initialization using this scheme:
use std::marker::PhantomData;
/// A type denoting a set field
enum Set {}
/// A type denoting an unset field
enum Unset {}
/// Our builder. In this case, I just used the bare types.
struct<N, P, C> TypeStateBuilder<N, P, C> {
number: u32,
programming_language: String,
color: Color,
typestate: PhantomData<(N, P, C)>,
}
/// The `new` function leaves all fields unset
impl TypeStateBuilder<Unset, Unset, Unset> {
fn new() -> Self {
Self {
number: 0,
programming_language: "",
color: Color::default(),
typestate: PhantomData,
}
}
}
/// We can only call `.with_favorite_number(_)` once
impl<P, C> TypeStateBuilder<Unset, P, C> {
fn with_favorite_number(
self,
number: u32,
) -> TypeStateBuilder<Set, P, C> {
TypeStateBuilder {
number,
programming_language: self.programming_language,
color: self.color,
typestate: PhantomData,
}
}
}
impl<N, C> TypeStateBuilder<N, Unset, C> {
fn with_favorite_programming_language(
self,
programming_language: &'static str,
) -> TypeStateBuilder<N, Set, C> {
TypeStateBuilder {
number: self.number,
programming_language,
color: self.color,
typestate: PhantomData,
}
}
}
impl<N, P> TypeStateBuilder<N, P, Unset> {
fn with_color(self, color: Color) -> TypeStateBuilder<N, P, Set> {
TypeStateBuilder {
number: self.number,
programming_language: self.programming_language,
color,
typestate: PhantomData,
}
}
}
/// in practice this would be specialized for all variants of
/// `Set`/`Unset` typestate
impl<P, C> TypeStateBuilder<Set, P, C> {
fn build(self) -> Favorites {
todo!()
}
}
The interface works exactly the same as the by-value builder, but the difference is that users can only set the fields once, or multiple times, if an impl
for those cases is added. We can even control what functions are called in what order. For example, we could only allow .with_favorite_programming_language(_)
after .with_favorite_number(_)
was already called, and the typestate compiles down to nothing.
The downside of this is obviously the complexity; someone needs to write the code, and the compiler has to parse and optimize it away. Therefore, unless the typestate is used to actually control the order of function calls or to allow for optimizing out initialization, it’s likely not a good investment.
Rust builder pattern crates
Since builders follow such a simple code pattern, there are a number of crates to autogenerate them on crates.io.
The derive_builder crate builds our standard mutably referenced builder with Into
arguments and optional default values from a struct
definition. You can also supply validation functions. It’s the most popular proc macro crate to autogenerate builders, and its a solid choice. The crate is about six years old at the time of writing, so this is one of the first derive crate since derives were stabilized.
The typed-builder crate handles the entire by-value typestate implementation as explained above, so you can forget everything you just read. Just type cargo add typed-builder
and enjoy type-safe builders in your code. It also features defaults and optional into
annotations, and there’s a strip_option
annotation that allows you to have a setter method that always takes any value and sets Some(value)
.
The safe-builder-derive crate also implements typestate, but, by generating an impl
for each combination of set/unset fields, it causes the code to grow exponentially. For small builders with up to three or four fields, this may still be an acceptable choice, otherwise, the compile time cost is probably not worth it. The tidy-builder crate is mostly the same as typed-builder, but it uses ~const bool
for typestate. The buildstructor crate was also inspired by typed-builder, but it uses annotated constructor functions instead of structs. The builder-pattern crate also uses the type state pattern and allows you to annotate lazy defaults and validation functions.
Undoubtedly, there will be more in the future. If you want to use autogenerated builders in your code, I think most of them are fine choices. As always, your mileage may vary. For example, requiring annotations for Into
arguments may be worse ergonomics for some but reduce complexity for others. Some use cases will require validation, while others will have no use for that.
Conclusion
Builders compensate handily for the lack of named and optional arguments in Rust, even going beyond with automatic conversions and validation both at compile time and runtime. Plus, the pattern is familiar to most developers, so your users will feel right at home.
The downside is, as always, the additional code that needs to be maintained and compiled. Derive crates can eliminate the maintenance burden at the cost of another small bit of compile time.
So, should you use builders for all your types? I’d personally only use them for types with at least five parts or complex interdependencies, but, consider them indispensable in those cases.
LogRocket: Full visibility into production Rust apps
Debugging Rust applications can be difficult, especially when users experience issues that are difficult to reproduce. If you’re interested in monitoring and tracking performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.
LogRocket is like a DVR for web apps, recording literally everything that happens on your Rust app. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.
Modernize how you debug your Rust apps — start monitoring for free.
Posted on December 20, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 12, 2024