How to build a Rust API with the builder pattern

mangelosanto

Matt Angelosanto

Posted on December 20, 2022

How to build a Rust API with the builder pattern

Written by Andre Bogus✏️

Let's face it; with its optional and named arguments, Python has an advantage over many other programming languages, like Rust. However, Rust library authors can work around this shortcoming quite effectively using the builder pattern. The idea behind this pattern is deceptively simple: create an object that can, but doesn't need to hold all values, and let it create our type when all the required fields are present.

In this article, we’ll explore Rust’s builder pattern, covering the following:

The builder pattern

To get familiar with the builder pattern in Rust, let’s first compare how our code might look with and without a builder:

// without a builder
let base_time = DateTime::now();
let flux_capacitor = FluxCapacitor::new();
let mut option_load = None;
let mut mr_fusion = None;

if get_energy_level(&plutonium) > 1.21e9 {
    option_load = Some(plutonium.rods());
} else {
    // need an energy source, can fail
    mr_fusion = obtain_mr_fusion(base_time).ok();
}

TimeMachine::new(
    flux_capacitor,
    base_time,
    option_load,
    mr_fusion,
)
// with a builder
let builder = TimeMachineBuilder::new(base_time);
    .flux_capacitor(FluxCapacitor::new());

if get_energy_level(&plutonium) > 1.21e9 {
    builder.plutonium_rods(plutonium.rods())
} else {
    builder.mr_fusion()
}
.build()
Enter fullscreen mode Exit fullscreen mode

All of the examples in this text are simple by design. In practice, you would use a builder for complex types with more dependencies.

The builder’s main function is keeping the data needed to build our instance together in one place. Just define the TimeMachineBuilder struct, put a bunch of Option<_> fields in, add an impl with a new and a build method, as well as some setters, and you're done. That's it, you now know all about builders. See you next time!

You're still here? Ah, I suspected you wouldn't fall for that trick. Of course, there is a bit more to builders than the obvious collection of data.

Owned vs. mutably referenced builders

Unlike in some garbage-collected languages, in Rust, we distinguish owned values from borrowed values. And as a result, there are multiple ways to set up builder methods. One takes &mut self by mutable reference, the other takes self by value. The former has two sub-variants, either returning &mut self for chaining or  (). It is slightly more common to allow chaining.

Our example uses chaining and therefore uses a by-value builder. The result of new is directly used to call another method.

Mutably borrowed builders have the benefit of being able to call multiple methods on the same builder while still allowing some chaining. However, this comes at the cost of requiring a binding for the builder setup. For example, the following code would fail with &mut self returning methods:

let builder= ByMutRefBuilder::new()
    .with_favorite_number(42); // this drops the builder while borrowed
Enter fullscreen mode Exit fullscreen mode

However, doing the full chain still works:

ByMutRefBuilder::new()
    .with_favorite_number(42)
    .with_favorite_programming_language("Rust")
    .build()
Enter fullscreen mode Exit fullscreen mode

If we want to reuse the builder, we need to bind the result of the new() call:

let mut builder = ByMutRefBuilder::new();
builder.with_favorite_number(42) // this returns `&mut builder` for further chaining
Enter fullscreen mode Exit fullscreen mode

We can also ignore chaining, calling the same binding multiple times instead:

let mut builder = ByMutRefBuilder::new();
builder.with_favorite_number(42);
builder.with_favorite_programming_language("Rust");
builder.build()
Enter fullscreen mode Exit fullscreen mode

On the other hand, the by-value builders need to re-bind to not drop their state:

let builder = ByValueBuilder::new();
builder.with_favorite_number(42); // this consumes the builder :-(
Enter fullscreen mode Exit fullscreen mode

Therefore, they are usually chained:

ByValueBuilder::new()
    .with_favorite_number(42)
    .with_favorite_programming_language("Rust")
    .build()
Enter fullscreen mode Exit fullscreen mode

So, with by-value builders, we require chaining. On the other hand, mutably referenced builders will allow chaining as long as the builder itself is bound to some local variable. In addition, mutably referenced builders can be reused freely because they are not consumed by their methods.

In general, chaining is the expected way to use builders, so this is not a big downside. Additionally, depending on how much data the builder contains, moving the builder around may become visible in the performance profile, however, this is rare.

If the builder will be used often in complex code with many branches, or it is likely to be reused from an intermediate state, I'd favor a mutably referenced builder. Otherwise, I'd use a by-value builder.

Into and AsRef traits

Of course, the builder methods can do some basic transformations. The most popular one uses the Into trait to bind the input.

For example, you could take an index as anything that has an Into<usize> implementation or allow the builder to reduce allocations by having an Into<Cow<'static, str>> argument, which makes the function both accept a &'static str and String. For arguments that can be given as references, the AsRef trait can allow more freedom in the supplied types.

There are also specialized traits like IntoIterator and ToString that can be useful on occasion. For example, if we have a sequence of values, we could have add and add_all methods that extend each internal Vec:

impl FriendlyBuilder {
    fn add(&mut self, value: impl Into<String>) -> &mut Self {
        self.values.push(value.into())
        self
    }

    fn add_all(
        &mut self,
        values: impl IntoIterator<Item = impl Into<String>>
    ) -> &mut Self {
        self.values.extend(values.into_iter().map(Into::into));
        self
    }
}
Enter fullscreen mode Exit fullscreen mode

Default values

Types can often have workable defaults. So, the builder can pre-set those default values and only replace them if requested by the user. In rare cases, getting the default can be costly. The builder can either use an Option, which has its own None default, or perform another trick to keep track of which fields are set, which we’ll explain in the next section.

Of course, we're not beholden to whatever the Default implementation gives us; we can set our own defaults. For example, we could decide that more is better, so the default number would be u32::MAX instead of the zero Default would give us.

For more complex types involving reference counting, we may have a static default value. For a small price of runtime overhead for the reference counts, it gets Arc::clone(_) every time. Or, if we allow for borrowed static instances, we could use  Cow<'static, T> as the default, avoiding allocation while still keeping building simple:

use std::sync::Arc;

static ARCD_BEHEMOTH: Arc<Behemoth> = Arc::new(Behemoth::dummy());
static DEFAULT_NAME: &str = "Fluffy";

impl WithDefaultsBuilder {
    fn new() -> Self {
        Self {
            // we can simply use `Default`
            some_defaulting_value: Default::default(),
            // we can of course set our own defaults
            number: 42,
            // for values not needed for construction
            optional_stuff: None,
            // for `Cow`s, we can borrow a default
            name: Cow::Borrowed(DEFAULT_NAME),
            // we can clone a `static`
            goliath: Arc::clone(ARCD_BEHHEMOTH),
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Keeping track of set fields using type state

Keeping track of set fields only applies to the owned variant. The idea is to use generics to put the information regarding what fields have been set into the type. Therefore, we can avoid double-sets, in fact, we may even forbid them, as well as only allow building once all the required fields have been set.

Let's take a simple case with a favorite number, programming language, and color, where only the first is required. Our type would be TypeStateBuilder<N, P, C>, where N would convey whether the number has been set, P whether the programming language was set, and C whether the color was set.

We can then create Unset and Set types to fill in for our generics. Our new function would return TypeStateBuilder<Unset, Unset, Unset>, and only a TypeStateBuilder<Set, _, _> has a .build() method.

In our example, we use default values everywhere because using unsafe code would not help to understand the pattern. But, it is certainly possible to avoid needless initialization using this scheme:

use std::marker::PhantomData;

/// A type denoting a set field
enum Set {}

/// A type denoting an unset field
enum Unset {}

/// Our builder. In this case, I just used the bare types.
struct<N, P, C> TypeStateBuilder<N, P, C> {
    number: u32,
    programming_language: String,
    color: Color,
    typestate: PhantomData<(N, P, C)>,
}

/// The `new` function leaves all fields unset
impl TypeStateBuilder<Unset, Unset, Unset> {
    fn new() -> Self {
        Self {
            number: 0,
            programming_language: "",
            color: Color::default(),
            typestate: PhantomData,
        }
    }
}

/// We can only call `.with_favorite_number(_)` once
impl<P, C> TypeStateBuilder<Unset, P, C> {
    fn with_favorite_number(
        self,
        number: u32,
    ) -> TypeStateBuilder<Set, P, C> {
        TypeStateBuilder {
            number,
            programming_language: self.programming_language,
            color: self.color,
            typestate: PhantomData,
        }
    }
}

impl<N, C> TypeStateBuilder<N, Unset, C> {
    fn with_favorite_programming_language(
        self,
        programming_language: &'static str,
    ) -> TypeStateBuilder<N, Set, C> {
        TypeStateBuilder {
            number: self.number,
            programming_language,
            color: self.color,
            typestate: PhantomData,
        }
    }
}

impl<N, P> TypeStateBuilder<N, P, Unset> {
    fn with_color(self, color: Color) -> TypeStateBuilder<N, P, Set> {
        TypeStateBuilder {
            number: self.number,
            programming_language: self.programming_language,
            color,
            typestate: PhantomData,
        }
    }
}

/// in practice this would be specialized for all variants of
/// `Set`/`Unset` typestate
impl<P, C> TypeStateBuilder<Set, P, C> {
    fn build(self) -> Favorites {
        todo!()
    }
}
Enter fullscreen mode Exit fullscreen mode

The interface works exactly the same as the by-value builder, but the difference is that users can only set the fields once, or multiple times, if an impl for those cases is added. We can even control what functions are called in what order. For example, we could only allow .with_favorite_programming_language(_) after .with_favorite_number(_) was already called, and the typestate compiles down to nothing.

The downside of this is obviously the complexity; someone needs to write the code, and the compiler has to parse and optimize it away. Therefore, unless the typestate is used to actually control the order of function calls or to allow for optimizing out initialization, it’s likely not a good investment.

Rust builder pattern crates

Since builders follow such a simple code pattern, there are a number of crates to autogenerate them on crates.io.

The derive_builder crate builds our standard mutably referenced builder with Into arguments and optional default values from a struct definition. You can also supply validation functions. It’s the most popular proc macro crate to autogenerate builders, and its a solid choice. The crate is about six years old at the time of writing, so this is one of the first derive crate since derives were stabilized.

The typed-builder crate handles the entire by-value typestate implementation as explained above, so you can forget everything you just read. Just type cargo add typed-builder and enjoy type-safe builders in your code. It also features defaults and optional into annotations, and there’s a strip_option annotation that allows you to have a setter method that always takes any value and sets Some(value).

The safe-builder-derive crate also implements typestate, but, by generating an impl for each combination of set/unset fields, it causes the code to grow exponentially. For small builders with up to three or four fields, this may still be an acceptable choice, otherwise, the compile time cost is probably not worth it. The tidy-builder crate is mostly the same as typed-builder, but it uses ~const bool for typestate. The buildstructor crate was also inspired by typed-builder, but it uses annotated constructor functions instead of structs. The builder-pattern crate also uses the type state pattern and allows you to annotate lazy defaults and validation functions.

Undoubtedly, there will be more in the future. If you want to use autogenerated builders in your code, I think most of them are fine choices. As always, your mileage may vary. For example, requiring annotations for Into arguments may be worse ergonomics for some but reduce complexity for others. Some use cases will require validation, while others will have no use for that.

Conclusion

Builders compensate handily for the lack of named and optional arguments in Rust, even going beyond with automatic conversions and validation both at compile time and runtime. Plus, the pattern is familiar to most developers, so your users will feel right at home.

The downside is, as always, the additional code that needs to be maintained and compiled. Derive crates can eliminate the maintenance burden at the cost of another small bit of compile time.

So, should you use builders for all your types? I’d personally only use them for types with at least five parts or complex interdependencies, but, consider them indispensable in those cases.


LogRocket: Full visibility into production Rust apps

Debugging Rust applications can be difficult, especially when users experience issues that are difficult to reproduce. If you’re interested in monitoring and tracking performance of your Rust apps, automatically surfacing errors, and tracking slow network requests and load time, try LogRocket.

LogRocket Dashboard Free Trial Banner

LogRocket is like a DVR for web apps, recording literally everything that happens on your Rust app. Instead of guessing why problems happen, you can aggregate and report on what state your application was in when an issue occurred. LogRocket also monitors your app’s performance, reporting metrics like client CPU load, client memory usage, and more.

Modernize how you debug your Rust apps — start monitoring for free.

💖 💪 🙅 🚩
mangelosanto
Matt Angelosanto

Posted on December 20, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related