Rust module system

daaitch

Philipp Renoth

Posted on May 5, 2021

Rust module system

The module system of Rust is very powerful and strict. It’s powered by cargo, the de facto build tool for Rust projects. Following the idiom “convention over configuration” there is only a few configuration to be done.

Before we dive in, let’s make sure we have the same understanding for modules, packages and crates:

  • module: Rust language term for code tied together having different visibilities from outside.
  • package / crate: cargo ecosystem term for a Rust project ( Cargo.toml/.lock + src + …).

Build Targets — bin vs. lib

Before we can create a package, we need to decide whether we’re going to have a binary executable build (bin) or a library build (lib).

  • cargo init --bin / cargo init (default): binary executable target
  • cargo init --lib: library target

If we don’t know yet, just start with a bin package, because it’s almost no work to change it and if we just wanna hack around without writing tests, we can’t have a main entry-point in our lib package, so it’s also easier to start with the default bin. Well, it’s possible to have a package being both, but we will not handle that case here. Let’s rather have a look what those types do.

bin build target

  • input: fn main() entry-point in src/main.rs
  • output: executable binary in target/debug/your-project for cargo build (will change if we change the build target, e.g. arm, x86 or release)

lib build target

  • input: all pub components from src/lib.rs
  • output: Rust library intermediate target/debug/your-project.rlib. You will probably don’t use it directly, but have a dependency to the lib package itself, so we can forget about it.

Now let’s have a look at the module system itself.

Module system

We’ve talked about bin and lib packages and that they do different things. Fortunately they are both based on the same Rust module system and to understand what the Rust compiler is doing, we can treat them equal.

Modules everywhere

While other languages try to push developers to have one language component per file, it’s bad practice to do so in Rust, because Rust has the term of modules. That also leads to larger source files, so your IDE better should support Rust “Outlines” (e.g. Visual Studio Code + rust-analyzer is great), to jump to your components quickly.

Every Rust file (.rs) is a module, so the intention is not to map every component like a struct to your filesystem, but to give it a home and make it part of a module. That being said it would be possible to have everything in a new file, but modules have to be interconnected and in a few minutes you’ll see why this is not a good idea.

Let’s make some examples as an abstract definition is more confusing. Here we have a package created by cargo init. We say that it has a module main and per convention it should also contain the fn main() entry-point, so a bin target package should at least have a main module.

Directory structure with main.rs

That was simple, right? Now let’s talk about a more interesting case. Let’s say we’re developing a web server with a database connection. At first we put everything in main.rs and then talk about how to split it.

pub fn main() {
    let config = read_configuration();
    let db = connect_db(&config.db_host);

    let server = Server::new(&config, db);
    server.run();
}

// server

struct Server {
    port: u16,
    db: DbConnection,
}

impl Server {
    fn new(config: &Configuration, db: DbConnection) -> Self {
        Self {
            port: config.server_port,
            db,
        }
    }
    fn run(&self) {}
}

// db

fn connect_db(host: &String) -> DbConnection {
    DbConnection {}
}

// configuration

fn read_configuration() -> Configuration {
    // read env-vars, whatever
    Configuration {
        db_host: "my-db".to_owned(),
        server_port: 8080,
    }
}

struct Configuration {
    db_host: String,
    server_port: u16,
}

// 3rd party library
// Only there that it compiles

struct DbConnection {}
Enter fullscreen mode Exit fullscreen mode

To be honest this is way too less code to really think of splitting it, but let’s assume it’s more and of course our package is getting larger over time. For the sake of simplicity this example does nothing, but it should help to understand how we can use modules in Rust. So we have 5 sections here:

  • fn main(): where everything starts
  • server: our server implementation
  • db: our connect implementation. As we likely use a third-party library for database connection, we only have to write some glue code here.
  • configuration: it’s mostly a good idea to read the configuration into a struct, e.g. from env variables or cli parameters.
  • 3rd party library: ignore that code, it’s only there that it compiles

The fn main() stays in main.rs, but we can think of the other sections to be modules, because they are responsible for a specific thing and they should hide details. Let’s assume that we want to split main.rs and add 3 other modules. We have two options for our sub-modules:

  • embedded modules in the main.rs file
  • file-based modules in the /src folder

Embedded Modules

You’ll probably see them less often, but they are easier to understand for now. In order to understand what needs to be changed to put our code in modules, please read the comments in the code.

// main.rs
// This module is called `main`.



// We put everything into modules, so we don't see it anymore.
// We have to make them visible with `use`.
// It's not needed to make everything we "use" visible, but
// the code we write can only contain visible things, like
// we call `read_configuration`, so we need to make it visible.
// On the other hand we're using the `Configuration` but we don't
// need to make it visible, as `fn main()` doesn't contain it.
// It's only the inferred type for `config`.
use config::read_configuration;
use db::connect_db;
use server::Server;

pub fn main() {
    // We could also write this and forgo the use statement on top.
    // `let config = config::read_configuration();`
    let config = read_configuration();
    let db = connect_db(&config.db_host);

    let server = Server::new(&config, db);
    server.run();
}

mod server {
    // Same here. We only see what we define in our module, so we need
    // to make something from another module visible.
    // We can't jump into that directly, but have to traverse to it
    // - either with `super::...` which is the parent module
    // - or with `crate::...` which is the root module of the package
    use super::{config::Configuration, db::DbConnection};

    // struct is used outside, so needs to be `pub`
    pub struct Server {
        // Those attributes are not `pub` as we have `pub fn new()`.
        // From inside the module, everyone can access them.
        port: u16,
        db: DbConnection,
    }

    impl Server {
        // we also want to create a Server from outside
        pub fn new(config: &Configuration, db: DbConnection) -> Self {
            // creating a server with `Server {...}` from outside is not possible
            // as its attributes are not `pub`
            Self {
                port: config.server_port,
                db,
            }
        }

        // also has to be `pub`
        pub fn run(&self) {}
    }
}

mod db {

    // also has to be `pub`
    pub fn connect_db(host: &String) -> DbConnection {
        DbConnection {}
    }

    // 3rd party library
    pub struct DbConnection {}
}

mod config {

    // also has to be `pub`
    pub fn read_configuration() -> Configuration {
        Configuration {
            db_host: "my-db".to_owned(),
            server_port: 8080,
        }
    }

    // also has to be `pub`
    pub struct Configuration {
        // As we see this struct more like a data container
        // it's not bad practise in Rust to simply make those attributes `pub`.
        // In this case it really makes sense to not implement a `fn new()` with
        // a confusing parameter list, just `Configuration { ... }`.
        // Other structs like `Vec` don't have `pub` attributes, as
        // they contain sensitive data.
        pub db_host: String,
        pub server_port: u16,
    }
}
Enter fullscreen mode Exit fullscreen mode

Let’s summarize what happened to the code to work with modules:

  • The mod { ... } keyword + block in Rust starts a new submodule in the current module.
  • The file itself already is a module so we don’t need mod for it.
  • Rust doesn’t have the term “import” or “include”. It’s all about visibility.

We’ll also shortly stick to the visibility rules of Rust, but let’s first have a look at the file-based modules. We already have one (main.rs), but let’s put all mod {...} sections into different files.

File-based modules

What we have now are 4 files: main.rs, config.rs, db.rs and server.rs. At first we put the content of the mod blocks into a new file. We don’t have to change visibility or use statements, simply copy and paste. Per convention module names are lower snake case and so do the files.

// config.rs

pub fn read_configuration() -> Configuration {
    Configuration {
        db_host: "my-db".to_owned(),
        server_port: 8080,
    }
}

pub struct Configuration {
    pub db_host: String,
    pub server_port: u16,
}
Enter fullscreen mode Exit fullscreen mode
// db.rs

pub fn connect_db(host: &String) -> DbConnection {
    DbConnection {}
}

// 3rd party library
pub struct DbConnection {}
Enter fullscreen mode Exit fullscreen mode
// server.rs

use super::{config::Configuration, db::DbConnection};

pub struct Server {
    port: u16,
    db: DbConnection,
}

impl Server {
    pub fn new(config: &Configuration, db: DbConnection) -> Self {
        Self {
            port: config.server_port,
            db,
        }
    }

    pub fn run(&self) {}
}
Enter fullscreen mode Exit fullscreen mode

The interesting part is the main.rs where we remove the mod blocks and replaced them by simple mod statements (without block). For the compiler mod server; means “please add server module via server.rs and make it visible“. So use server; doesn’t make sense at all, as we already added it. To make our Server visible to be used in the code we say use server::Server;. Really mod is just for our submodules. All of our dependencies are already visible to all modules.

// main.rs

use config::read_configuration;
use db::connect_db;
use server::Server;

mod config;
mod db;
mod server;

pub fn main() {
    let config = read_configuration();
    let db = connect_db(&config.db_host);

    let server = Server::new(&config, db);
    server.run();
}
Enter fullscreen mode Exit fullscreen mode

Module vs. filesystem hierarchy

The module structure of our example is quite flat. But what about nested embedded modules or directories with modules? Wait a second, we should clarify one thing first: filesystem and module tree do not match?

Filesystem         Modules
- main.rs          - main.rs
- config.rs        |-- config.rs
- db.rs            |-- db.rs
- server.rs        |-- server.rs
Enter fullscreen mode Exit fullscreen mode

That’s correct. So for example the config.rs module is sitting next to the main.rs module on the filesystem, but in Rust it’s a submodule and that looks weird at first. We have some files in a directory and one of them is the super module and the others are submodules? Now, do we have to read the code to find the super module? No — fortunately we don’t have to. In this “flat” case it’s clear, that main.rs has to be the super module and all other files are submodules. With that in mind, we can now have a look at nested modules.

Nested modules

Let’s say our server.rs module is growing and we actually want to have more modules. Large files can become a problem for reading, but again, in Rust hundreds or thousand lines of code in a file is actually fine and there is another reason I’ll try to explain in the next sections.

From what we know, what options do we have?

  • We can embed submodules into server.rs via e.g. mod routes { ... } block.
  • We could add another file, but that is then a submodule of main, so that doesn’t work for us.

The first option works, but what about rather spending a new file? Of course it’s possible, let’s check out the next example.

  • We move server.rs to server/mod.rs
  • We create a submodule routes at server/routes.rs
  • Add mod routes; to server/mod.rs
  • The rest of the package doesn’t have to be touched
// /src/server/mod.rs

// we add a submodule here
mod routes;

use super::{config::Configuration, db::DbConnection};

pub struct Server {
    port: u16,
    db: DbConnection,
}

impl Server {
    pub fn new(config: &Configuration, db: DbConnection) -> Self {
        Self {
            port: config.server_port,
            db,
        }
    }

    pub fn run(&self) {
        // Call something from `routes`.
        // Instead of `use` the function we can also call it like this:
        routes::handle_request();
    }
}
Enter fullscreen mode Exit fullscreen mode
// /src/server/routes.rs

pub fn handle_request() {}
Enter fullscreen mode Exit fullscreen mode

That’s it, so the file structure should look like this now.

Directory structure with submodules

To correct a previous statement: mod server; means “please add server module via server.rs or server/mod.rs and make it visible“. That’s how we can nest modules and we’re almost done understanding how modules work in Rust, but what about accessing nested modules from outside? Can I really hide intrinsic code when I want nobody to use it from outside? Let’s have a look at the visibility rules in order to figure out how we can access or really hide something.

Visibility in Rust

Think of: In Rust everything “is there”, but you may not see it from your current view. You can’t use what you cannot see. What we see without use:

  1. what the module itself defines: pub or not, everything
  2. parent modules via super keyword (can be concatenated like we have relative file paths, e.g.super::super::somemodule )
  3. root module via crate keyword (like a Linux filesystem root / )
  4. all dependency packages from everywhere in our package
  5. (prelude stuff like String , Box and so on)

But that’s not all. There is more, because they all can expose things to the outside. You can’t see them directly, but you can access them either via it’s relative or absolute path or make them visible via use keyword. The path separator is a double colon (::), e.g. routes::handle_request();.

Modules and pub

For submodules, to “expose something” means, that the module has to make it pub. The other direction, a submodule has full access to its parents, but only to them. If you as a module wants to access a sibling module, you have the same access constraints like your parent, because you have to access it via super or crate path.

Now we should talk about how to access submodules of a submodule. So from outside a module, we can only see what is pub inside the module without exception. Let’s say we need pub const PREFIX: &str = “/app”; from server/routes.rs in our main.rs module. In order to expose something from another submodule, we have two options to consider for server/mod.rs as server/routes.rs is a submodule of it:

  • “re-export” it e.g. like pub use routes::PREFIX;: in main.rs we can access server::PREFIX.
  • “re-export” submodule e.g. like pub mod routes;: in main.rs we can access server::routes::PREFIX, but we can then also access server::routes::handle_request() and how does that make sense?

In that case it would be better to only expose PREFIX, but there are cases, when your module has so many things in it, that you want put it in submodules and also represent them to the outside world like this. The bigger the packages are, the more likely they do not only expose one module with everything in it, but a bunch of submodules.

When to split modules into submodules?

At least, there is nothing stopping you from throwing a module at every little component, but the main goal of modules is different: hide details, tie code together and design a clear minimal interface to the outside.

If you’re not sure whether to split or not, simply don’t split. It’s more likely that to make things more difficult at first than it helps. I mostly have a main.rs with hundreds of lines of code, before I create the first the submodule, but when I create it, I have a good overview over all components and how they work together.

Depending on what you do you should take that more or less seriously. For bin packages it shouldn’t be a big problem to fix visibility issues as you can see if it still compiles or not and also fix that. As a lib package maintainer you should better check what you’re going to share. Narrowing visibility breaks the public API so it’s going to be an mature version release for “a small patch”, because maybe someone is using it. You can also live with it as technical debt and fix it one day, when there are a few big changes, but better take care before.

When to use embedded or file modules?

To be honest there is no good rule of thumb known to me for this. I mostly see and use file modules. At least I know two cases where we often have embedded modules.

At first, it’s convention to embed a tests module at the bottom for unit tests.

// routes.rs

fn handle_request() {}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_handle_request() {
        handle_request();
    }
}
Enter fullscreen mode Exit fullscreen mode

The module is only compiled for cfg(test), it has full access to its parent (we could also test private state, not saying you should) and it belongs together, so we don’t even have to switch files for writing tests.

Second, some embedded modules are the result of code refactorings, like moving code into submodules need “re-exports” which is not worth a new file.

Putting together packages and modules

A bin package simply starts at main.rs as root module and it has to contain the fn main() entry-point. main.rs doesn’t need to expose anything.

A lib package starts at lib.rs as root module and dependants only see what lib.rs exposes.

So we have 3 special files of the Rust module system: src/main.rs, src/lib.rs and any mod.rs. Think of what the mod.rs file is to its module, main.rs is to a bin package and lib.rs is to a lib package, because the same rules apply. You can choose any other file name just as you like.

Well, and that’s what it’s all about, although we skipped some details like there are multiple different pub visibility modifiers like pub (crate) or pub (super). Just have a look at the visibility and privacy section of the Rust reference and you’ll understand them in the context of modules.

Summary

  • A Rust module represents code that belongs together because it may share details, the outside world should not be interested in or use.
  • A clear and smart interface to a parent module makes it possible to change code without breaking the interface. Especially lib maintainers should keep an eye on that, to not break the API.
  • Visibility in Rust is not absolute. What you see really depends on where you are, like submodules can see everything from the parents, but a parent does not see private things in a submodule.
  • In Rust a submodule can’t make something visible to the parent’s parent. For this case, the parent needs to “re-export”, so it’s like an “export-chain” where every link has to “re-export”.
  • Sibling files are part of the same module, except mod.rs, src/bin.rs and src/lib.rs files which are the “module barrels” itself.
  • Finally: don’t overdo modules. It’s fine to start in main.rs or lib.rs and rethink your modules later. You should better double-check what you’re going to make pub to not unnecessarily expose details from your lib package, because other mistakes can be fixed quite easily.
💖 💪 🙅 🚩
daaitch
Philipp Renoth

Posted on May 5, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related