Is Rust Bloated?
Sam Pagenkopf
Posted on March 27, 2024
Hi, my name is Sam and I think a lot about language design. I have been using or at least paying attention to Rust since 2015, and have some thoughts to share.
There is an idea going around that Rust is a wonderful language, if only it wasn't so bloated. This implies that it is possible to create a "holy grail" systems language that has the same guarantees as Rust, but is comparable in language complexity to say, C. Is this true?
In this article, I will argue that Rust is big not mainly because of cruft or excess, but that Rust's scope is extremely wide, therefore big by design. That without new ideas to optimize between speed, safety, and complexity, there will never be a language better than Rust for the same purposes.
Rust is a Big Abstraction
C is much simpler than Rust. This is because C is a thin abstraction of a generalized CPU. There is a common set of operations between almost every CPU, such as math, memory access, and subroutines. C turns this into a set of constructs which, without optimization, are trivially conversible into assembly code. Because the core ideas of what a CPU should do are fairly small in scope, C is fairly small.
Compare this to Rust. Though also an abstraction of a CPU, crucially, Rust doesn't just abstract basic CPU operations, but the set of uses of these operations which conform to Rust's safety standard. Before optimization, a Rust program is like a high-level way of describing a safe C program (except using LLVM). Safe as in, free of clearly unsafe operations -- the guarantees chosen by Rust eliminate obvious memory errors and dangerous practices, but are not a formal verification that a program will execute safely. That is, of course, until we enter an "unsafe" block, where these guarantees no longer exist.
This leads to the first reason why Rust becomes bloated: when any given functionality is humanly verifiable to be safe, but the Rust compiler is not smart enough to know that fact, the Rust team would rather write it for you in their own "unsafe" codebase than encourage you to do it. This is a good thing. Rust adds a feature rather than forcing users to rely on unsafety or hand-rolled unpredictability. The "Any" type is better than creating your own void* with type info. Using Rc<RefCell> (reference counting) for shared mutable memory is better than either trying to access stale memory, or failing to deallocate shared memory.
By doing this, the burden of working safely with the underlying machine is transferred to that of using the built-in Rust concept that can achieve a given task. But is more always better? Since one of the goals of Rust is to eliminate the need for "unsafe" wherever possible, the ever-growing diversity of demands compounds with the infinite diversity of Rust-safe operations on a turing-complete machine. For example, Rust 1.77.0 stabilizes a function called array::each_mut, which transforms an array of T into an array &mut T -- it takes a mutable reference to each element of the array. This is something simple in an unsafe block, but now there's a method for it.
Though, why, if a function is in fact safe, must it contain an "unsafe" block? The standard response is that it involves unsafe operations such as pointer math or uninitialized variables. The deeper reason is that it is much harder to add a language feature than to add yet another standard library method. It is far more costly, and in some cases impossible, to make a compiler that can validate more operations as safe.
Though, there is also a danger here. Rust programmers are always looking at docs to figure out what to use, or the meaning of what has been used. Many trivial unsafe things are still added features, and it seems there's a widening gap between being a systems programmer and being a Rust programmer. Sure, at times it's easier to reach for "unsafe", but after all of the optimizations and Rusty memory tricks are put into place, do you even know what the system is doing well enough to use it correctly?
The Cost of Speed
Rust has an optimizing compiler, and one of its banner features is competitive runtime speed. When it comes down to it, run speed is second to safety in Rust's design, and everything else comes later.
The best way to allow for code optimization is to reduce the number of guarantees, assuming that said code is already using a fast algorithm, etc. For example, if the layout of a struct can be changed, if a pointer can be reallocated freely, or if a stack variable can be reused, this gives more freedom to the optimizer. Since Rust must optimize in order to be competitive, the number of guarantees is minimal and more explicit descriptions are required, increasing the complexity required to prove safety, and littering Rust codebases with little pieces of description syntax that feel unnecessary, and sometimes are.
Sadly, inferring usage to eliminate these descriptions would both slow down compile times and cause programmers to use slower patterns without realizing. One of the causes of slow C code is that implicit struct copying is built into the language. Rust avoids this pitfall by making struct copying explicit. Though modern C compilers are likely to remove any given copy of a large struct, even relying on it in the first place changes programmer behavior and can cause less optimizable code. So in another way, Rust is fast because in many cases going slower is more effort, by design, or at the very least a programmer using slow techniques is aware that they are using them.
Early Rust had the philosophy that explicit is better than implicit. This enabled them to move quickly by shifting the burden of compiler intelligence onto the programmer, which in many cases makes users smarter about what they're doing, despite the ugly code that resulted. In the last few years, the Rust team has taught the compiler to infer much more and reduced this load, as they will continue to do. Rustc has learned to infer lifetimes instead of forcing them to be declared, and added things to simplify the language such as "impl Trait". Rust is still working on Polonius, a new borrow checker to reduce the amount of gymnastics needed to satisfy rustc, and it surely has even more in the oven. Though, some things are just better when they're specified.
How Rust Avoids Bloat
Rust being bloated right now isn't the whole story. Rust has deprecated features such as green threads after they proved outside of Rust's philosophy.
Rust also has editions, where a new version is released not expected to work with existing code. This allows Rust to periodically revolutionize itself instead of becoming stale. For example in 2018, Rust overhauled their module system. Though, in 2021, Rust's edition was entirely additions to the language. It is possible that Rust has lost its plasticity, even though the potential still exists for less. Also, Rust must still maintain support for these old editions, with some crates still using 2015 edition. No feature is every truly removed, out of fear of breaking existing code. Does it now make sense why \$2.5 million per year goes to Rust?
Also, like any good FOSS team, Rust is resistant to adding new features, requiring them to go through an RFC process where each feature is proposed, described in detail, and may be approved. A long list of potential features has been tabled, or partially implemented behind a feature gate in nightly, only to be kicked down to a future stable if at all. Not that processes like this have kept C++ from becoming intimidating, it certainly helps and gives conscious direction to Rust's progression.
Finally, for better or worse, Rust also relies on an extensive package ecosystem of packages ("crates") in order to keep the core language smaller. Though one could argue that essentials like num-traits should be part of the standard language, this at least shifts responsibility away from the standard library in covering every possible use case. Even still, this poses the issue that multiple rust crates are likely to cover the same functionality, meaning that more work goes both into selecting the right crate and towards understanding the code written using a given crate. For many programmers, Rust isn't just Rust, but Rust with num-traits, rand, Serde, static_assert, lazy_static, clap, etc.
Room for Improvement
So, where is this holy grail language that is about as simple as C, but as fast and as safe as Rust? This design space needs elaboration -- is it even possible? Like any human creation, Rust does have excess complexity, though much of it is not derived from the fundamental problem space, but from the one Rust has chosen to solve. Instead of just a fast, safe systems language, Rust has decided to be much more.
Much of this stems from Rust being multi-paradigm, with both imperative in-place mutable patterns on one hand, and FP immutable patterns on the other. Paradoxically, FP is easier to check as memory safe, but imperative is more similar to how a computer operates. A new language would have to choose one paradigm and stick with it, rather than building out both in-place mutable APIs and immutable FP ones. This also is related to the duplicity between memory moves, mutable references, and immutable references, which all seem to warrant their own separate built-ins.
Rust also has a tendency, like C++, to duplicate functionality across similar but different concepts. Why are there both functions and closures? Why are there both generic parameters and associated types? Why do data types with "impl" and modules seem to duplicate functionality between one another, such as namespacing?
Also, why does Rust have both "const fn"s as well as macros and type genericity? If a function is "const" in Rust, it is guaranteed to work at compile time. Why, then, would const fns not be able to replace the functionality of macros, or to serve as type generators? Meanwhile, work still continues on const generics, which allow Rust functions to take values at compile time in the same place as type parameters. Surely this is useful for things like Arrays, which are oddly limited in Rust, but this feature adds even more overlap among compile-time concepts.
Not to mention the growth in Rust's standard library, which seems like it can't stop adding methods to built-ins. Why are there so many stream iterator methods? A dozen of these could be based on passing a simple function into another existing method, and others still would be perfectly fine as a "for" loop. Who is using the xor method on Option, or the billion slice methods? To me, much of this screams "put it in a crate!"
Though, doubtlessly the Rust team has thought over these decisions thoroughly and decided to stick with the result, not arbitrarily. Even though there is a lot of growth, the growth is largely self-consistent within the parameters Rust has chosen.
Conclusion
Rust has trailblazed within the design space of systems languages, and I am still going to use it because there is no other language that can do what Rust does.
That said, it is also time for Rust to push against bloat, even though in some ways it is already too late. What I'm hoping to see in the next edition of Rust is not expansionism like in 2021. I'm hoping to see that Rust refines its vision and addresses the public concern about the size of the language. Rust needs to decide what is "The Rust Way", and cut out anything that goes against its core philosophy.
If there is ever going to be a successor to Rust that aims closer to the "holy grail", it will likely either be not as fast, or it will rely on an even more complex compiler. It will, as any language, suffer from the results of its chosen limitations, which introduces a different set of problems than those in Rust itself. I will continue to watch this space closely, but not jumping off of the Rust ship.
And for those of you who avoiding Rust because of the sheer number of concepts, or feeling intimidated by this article, let me just say one more thing. I believe all systems programmers should know enough Rust to master its base concepts. Almost any well-written piece of systems code should be able to be rewritten in safe Rust without an issue. For experienced programmers, Rust is probably the best systems language to start with, since it will teach you many good practices right off of the bat.
I'm optimistic about the future of systems programming, and I feel grateful that software engineers can have access to such a great open-source ecosystem of languages and libraries, which is unprecedented compared to any other field.
It is also not a surprise to me when there is friction between the magical thinking rock that is a CPU and soft fleshy humans who are just trying to make life a little easier. With that in mind, I hope language designers will realize that every single thing they add is another thing to learn. Less is more, and simplicity will win the hearts of programmers.
Posted on March 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 29, 2024