Writing A Bundler In Rust - Part 1

alshdavid

David Alsh

Posted on December 21, 2023

Writing A Bundler In Rust - Part 1

Introduction

In today's web development landscape, web applications require a bundler to transpile, combine, and optimize the source code of a project so that end users receive distributable code that is optimally packaged for downloading and executing within their browser.

Bundlers themselves have been around for a while, gaining more features as web applications have become more complex.

Many bundlers have appeared over the years, each learning from their predecessors and bringing with them distinct features and advancements - such as;

  • Dead code elimination (commonly known as tree shaking)
  • Federated modules
  • Hot module reloading
  • Faster build times
  • Faster run times
  • etc

Many web tools are written in JavaScript or TypeScript, like the TypeScript compiler itself. Recently, a popular trend has been to create build tooling using a language better suited to handling the demanding workloads of those tools.

A common choice is Rust - rspack, turbopack.

Another choice is Go - esbuild.

Both languages offer fantastic features to enable building high performance and aggressively multi-threaded bundlers.

An interesting case study for writing a multi-threaded bundler in JavaScript is the Parcel bundler.

Parcel brought to the bundling world many novel features; like incremental bundling, a cache system, a plugin system that is both ergonomic and sensible, and an opinionated zero-configuration (or minimal-configuration) build environment.

This allowed for drastically simplified development environments that had a reduced burden of dependency maintenance.

Lots of npm dependencies to few

Parcel's plugin system allowed for extensibility where required, however by default Parcel ships everything you need to build a modern web application.

In addition - Parcel leverages Node.js worker threads, scheduling jobs across threads, to improve bundling performance.

This series will embark on a journey to create a simple bundler, written in Rust, that is heavily inspired by Parcel's philosophy.

Objective

We are going to write our own bundler in Rust, one that targets building applications which exist at "hyper" scale.

I define hyper scale as an application that has tens of thousands of source files - presumably operating with millions of users (might get lucky one day).

I picked this target because this is a context where conventional bundler tooling tends to struggle and the biggest gains from threading are to be had.

I would like to preserve the plugin model offered by Parcel as best as possible as I find it to be extremely ergonomic, supporting both dynamically loaded plugins written in Rust and plugins written in JavaScript.

This first article will center around the creation of the first phase in the bundling process - transformation.

Bundler Phases At A High Level

Parcel's bundling pipeline follows roughly the following flow:

Flow chart of bundling

You can also view a more descriptive flow diagram on Parcel's website

The Transformation Pipeline

The transformation pipeline is a simple recursive loop (in behavior, not implementation) that traverses over the source files of a project; where imports are extracted and used to trigger the next loop.

The loop exits when there are no more imports to resolve and all the files have been processed.

The transformation pipeline is broken down into 2 major steps

  • Resolving imports
  • Transforming Files

To start off, an "entry" file is passed into the pipeline, the import declarations are extracted resulting in the pipeline discovering new files to transform - continuing the loop.

The new found files have their import declarations extracted, again continuing the loop, stopping once all the imports in a project have been extracted.

Flow chart of the transformer pipeline

During transformation, the file is read, potentially converted from TypeScript to JavaScript then stored in memory within a "store".

Parcel actually writes these transform results to disk but for the sake of simplicity, we will keep them in memory unless it becomes a problem.

Resolution

The first step in the transformation loop is dependency resolution. To understand resolution we must first understand JavaScript imports.

Simply, JavaScript imports adhere to the following syntax:



import something from "specifier";
import "specifier"


Enter fullscreen mode Exit fullscreen mode

The string used as the target of an import is known as the "specifier".

The specifier could be a relative path, absolute path, or an alias with rules described by the runtime or the bundler.



import { fromRelative } from "./relative"
import { fromAbsolute } from "/probably/dont/do/this"

// Rules specific to Node.js that obtains an import from 
// the node_modules folder. This is known as "node module resolution"
import { fromNodeModules } from "npm:lodash/get"

// Rules specific to Node.js that obtains an import from 
// the internal Node.js standard library
import { exists } from "node:fs"

// Special custom rules the developer can specify in the bundler
// to allow path aliases. These are arbitrary, I am using "~root/*"
// but it could be anything from "asd" to "&%$hello/*" 
import { fromAlias } from "~root/some-path"


Enter fullscreen mode Exit fullscreen mode

A bundler written in JavaScript can inherit some of these resolution properties, however a bundler written in Rust must reimplement them.

Thankfully, Parcel has a Rust crate that implements the Node.js resolution algorithm - which we will use as is.

To resolve an import, the bundler needs two pieces of information:

  • A from path - describing the absolute path of the file doing the importing
  • A specifier - describing the thing being imported

The resulting output of an import resolution is an absolute path to the target file of the import statement.

Example



// Imported by: "/absolute/path/to/index.js"
import "./a.js" // Will resolve to: "/absolute/path/to/index.js"


Enter fullscreen mode Exit fullscreen mode

Transforming the file contents and extracting new specifiers

After the specifier has been resolved to an absolute path of the dependency, the absolute path is passed to the transformer.

The transformer will read the file contents from the disk then, using a JavaScript parser like SWC, will convert the source code into an AST.

ast of javascript code

We can then walk the AST, identify the imports, extract the specifiers and give them to the next iteration of the loop.

Once again, Parcel has a Rust crate for its transformation step that we will use in our bundler. We will actually reimplement this later on in the series, but it will be helpful in getting us off the ground.

Example

A file with the contents:



// Current file path: "/absolute/path/to/index.js"
import "./a.js"
import "./b.js"
import "node:fs"
import "npm:lodash"


Enter fullscreen mode Exit fullscreen mode

Will see the transformer return following information to be used in the next iteration of the transformation pipeline loop:



{
    "from_path": "/absolute/path/to/index.js",
    "specifiers": [
        "./a.js",
        "./b.js",
        "node:fs",
        "npm:lodash"
    ]
}


Enter fullscreen mode Exit fullscreen mode

The Benchmark

So let's get started.

The first thing we need is something to bundle. For this I have taken the familiar three-js benchmark used by esbuild, Parcel and other bundlers.

Three JS is a suitable benchmark because it has a large dependency graph, non trivial transformations, is written using valid module syntax and has no external dependencies.

To scale the benchmark up; we take the three-js source code and copy it 250 times. This results in a project that contain approximately 90,000 source files.

vegeta saying over 90000

We then need to generate an entry point that imports each copy and re-exports it to avoid the bundler optimizing away unused imports.



import * as copy_1 from './copy_1/Three.js';
import * as copy_2 from './copy_2/Three.js';
import * as copy_3 from './copy_3/Three.js';
// ... to 250

// To avoid tree shaking:
window.copy_1 = copy_1;
window.copy_2 = copy_2;
window.copy_3 = copy_3;
// ... to 250


Enter fullscreen mode Exit fullscreen mode

Test Hardware

I will be using my desktop computer with the following hardware:

screenshot of screenfetch terminal command

Item Details
CPU AMD Ryzen 7950x - 16 Cores @ 5.8ghz (SMT off)
RAM 98 GB
SSD PCI-E NVMe SSD (silly fast I/O)
OS Fedora Linux 39
Node v20.10.0

How does Parcel Do?

For comparison, let's look at how Parcel's does. Isolating only the transformation circuit from the Parcel build.

Single build using 16 threads:

Image description

Graph charting threads against transformation time:

Image description

We can see that Parcel takes between 60 - 110 seconds to complete the transformation and consumes around 7gb of memory during the process.

There is no additional benefit to transformation time after 4 threads. This is likely due to the overhead of copying, serializing and deserializing data to communicate over postMessage and the main thread being overwhelmed with the coordination.

Let's Build A Bundler (Transformer)

TypeScript

Reviewing the circuit diagram for the transformation pipeline, it's possible to create a simple, single threaded implementation in TypeScript.

This will serve as a familiar environment to describe the flow.



import * as fs from 'node:fs'
import { resolve } from '@parcel/node-resolver-core'
import { transform } from '@parcel/transformer-js'

// Parse CLI arguments
let config = new Config();

// A queue that contains the files to resolve
let todo: Array<{ from_path: string, specifier: string }> = []

// A store that maps the absolute file path to the contents of an asset
let assets = new Map<string, string>()

// A set with the absolute file path of a parent concatenated with the
// absolute file path of its dependency
let asset_graph = new Set<string>()

// Add entry asset to queue
todo.unshift({ 
    from_path: process.cwd(), 
    specifier: config.entry_point,
})

// Loop until the queue is empty
while (todo.length !== 0) {
    // Take an item from the front of the queue
    const { from_path, specifier } = todo.pop()

    // Find the absolute path of the dependency
    let absolute_path = resolve(from_path, specifier)

    // Add that dependency as a child of the parent 
    asset_graph.set(`${from_path}->${absolute_path}`)

    // Continue if the child has already been transformed
    if (assets.has(absolute_path)) {
        continue
    }

    // Read the contents of the dependency
    let contents = fs.readFileSync(absolute_path, 'utf8')

    // Transform the contents and extract any import specifiers
    let result = transform(contents, absolute_path)

    // Add the specifiers to the back of the queue
    for (const specifier in result.specifiers) {
        todo.unshift({ from_path: absolute_path, specifier })
    }

    // Add the contents of the new asset to the asset store
    assets.set(absolute_path, result.content_transformed)
}

// .....
// Begin the next phase


Enter fullscreen mode Exit fullscreen mode

Now do it in Rust

Single threaded implementation

Building the flow as a single threaded circuit is a good starting point and the Rust code is not that different from the TypeScript code above (link to source):



use std::fs;
// ...

fn main() {
    // Queue of files to transform
    let mut queue = VecDeque::<(PathBuf, String)>::new();
    // Map of path -> asset contents
    let mut assets = HashMap::<PathBuf, String>::new();
    // Graph as tuples rather than strings
    let mut asset_graph = HashSet::<(PathBuf, PathBuf)>::new();

    // Add entry asset to queue
    queue.push_back((
        config.project_root.clone(), 
        config.entry_point.clone().to_str().unwrap().to_string()
    ));

    // Loop until the queue is empty
    while let Some((from_path, specifier)) = queue.pop_front() {
        // Resolve and ignore errors
        let absolute_path = resolve(
            &from_path, 
            &specifier,
        ).unwrap();

        // Add to graph
        asset_graph.insert((from_path, absolute_path.clone()));

        // Skip if transformed previously
        if assets.contains_key(&absolute_path) {
            continue;
        }

        // Read file and ignore errors
        let contents = fs::read_to_string(&absolute_path).unwrap();

        // Transform file
        let (content_transformed, specifiers) = transform(
            &contents,
            &absolute_path,
        );

        // Add new specifiers to queue
        for specifier in specifiers.iter() {
            queue.push_back((
                absolute_path.clone(),
                specifier.clone(),
            ));
        }

        // Add asset to the asset store
        assets.insert(absolute_path, content_transformed);
    }
}

// .....
// Begin the next phase


Enter fullscreen mode Exit fullscreen mode

That's pretty sweet... how does the single threaded Rust transformer perform?



cargo build --release

env \
    PM_MEM_UNITS=mb \
    PM_TIME_UNITS=s \
    PM_TRACK=cpu,memory \
        procmon \
            ./target/release/single-threaded-transformer "../parcel/entry/entry.js"


Enter fullscreen mode Exit fullscreen mode

Results

graph showing performance of single threaded rust transformer

Looking at the chart, on a single thread, our home baked transformer circuit was able to transform the 90k source file three-js benchmark in 40 seconds and consumed 600mb of ram while doing it!

As expected, the CPU was locked at 1 core being used at 100% and the memory usage grew linearly as the contents of the files were loaded into memory.

To be fair, Parcel stores a lot more information about assets and the environment the bundler is running in - however this is a pretty impressive result.

Multithreading

Let's try to make this circuit multi-threaded.

The simplest approach is to spawn a number of OS threads, defaulting to the number of cores the CPU has otherwise using explicit thread configuration.

Then we simply protect the shared mutable data structures behind mutexes.

In Rust this is done by wrapping them in an Arc<Mutex<T>> like this:



let queue = Arc::new(Mutex::new(VecDeque::<(PathBuf, String)>::new()));
let assets = Arc::new(Mutex::new(HashMap::<PathBuf, String>::new()));
let asset_graph = Arc::new(Mutex::new(HashSet::<(PathBuf, PathBuf)>::new()));


Enter fullscreen mode Exit fullscreen mode

An Arc is a "smart pointer", which you can think of as a clone-able reference to the value contained within. In effect an Arc<String> can be thought of as a *String.

So here we go (link to source):



fn main() {
    // Parse CLI arguments
    let config = Config::new();
    println!("Threads: {}", config.threads);

    // Channel to notify parked threads when new work comes in
    let (submit_job, on_submitted_job) = crossbeam_channel::unbounded::<Option<(PathBuf, String)>>();
    // Counter of the tasks in flight
    let in_flight = Arc::new(AtomicUsize::new(0));

    // Map of path -> asset contents
    let assets = Arc::new(Mutex::new(HashMap::<PathBuf, String>::new()));
    // Graph as tuples rather than strings
    let asset_graph = Arc::new(Mutex::new(HashSet::<(PathBuf, PathBuf)>::new()));
    // Track complete jobs
    let assets_done = Arc::new(Mutex::new(HashSet::<PathBuf>::new()));

    // Add the entry asset to queue
    in_flight.fetch_add(1, Ordering::Relaxed);
    submit_job.send(Some((
        config.project_root.clone(),
        config.entry_point.clone().to_str().unwrap().to_string()
    ))).unwrap();

    // Hold the threads so we know when they all exit
    // In JavaScript terms, you can sort of think of this as an "Array<Promise<void>>"
    let mut thread_handles = Vec::<JoinHandle<()>>::new();

    // Spawn a bunch of threads, this loop will exit immediately and the main thread will block
    // at the end where we have the "handle.join()" code
    for _ in 0..config.threads {
        // Clone all of the Arcs - this clones the pointer, not the value
        // Rust will move the cloned pointers into the thread
        // You can think of this as a sort of "dependency injection" for each thread
        let assets = assets.clone();
        let asset_graph = asset_graph.clone();
        let assets_done = assets_done.clone();
        let in_flight = in_flight.clone();
        let submit_job = submit_job.clone();
        let on_submitted_job = on_submitted_job.clone();
        let threads_num = config.threads.clone();

        // In JavaScript terms, you can think of "thread::spawn()" as "setTimeout(cb)" if it returned a Promise
        thread_handles.push(thread::spawn(move || {
            while let Ok(msg) = on_submitted_job.recv() {
                let Some((from_path, specifier)) = msg else {
                    break;
                };

                // Resolve and ignore errors
                let absolute_path = resolve(&from_path, &specifier).unwrap();

                // Add to graph
                asset_graph.lock().unwrap().insert((from_path, absolute_path.clone()));

                // Skip if transformed previously
                if !assets_done.lock().unwrap().insert(absolute_path.clone()) {
                    // This job is done, kill all threads if there's no more work
                    if in_flight.fetch_sub(1, Ordering::Relaxed) == 1 {
                        for _ in 0..threads_num { submit_job.send(None).unwrap(); }
                    };
                    continue;
                }

                // Read file and ignore errors
                let contents = fs::read_to_string(&absolute_path).unwrap();

                // Transform file
                let (content_transformed, specifiers) = transform(&contents, &absolute_path);

                for specifier in specifiers.iter() {
                    in_flight.fetch_add(1, Ordering::Relaxed);
                    // Send job to another thread 
                    submit_job.send(Some((absolute_path.clone(), specifier.clone()))).unwrap();
                }

                // Add asset to the asset store
                assets.lock().unwrap().insert(absolute_path, content_transformed);

                // This job is done, kill all threads if there's no more work
                if in_flight.fetch_sub(1, Ordering::Relaxed) == 1 {
                    for _ in 0..threads_num { submit_job.send(None).unwrap(); }
                    continue;
                };
            }
        }));
    }

    // Wait for all the threads to finish
    // Kinda like "Promise.all()"
    for handle in thread_handles {
        handle.join().unwrap();
    }

    println!("Assets: {}", assets.lock().unwrap().len());
}


Enter fullscreen mode Exit fullscreen mode

Results

Running this circuit against the 250x three-js project using 16 threads:

Graph showing the performance of the multi-threaded rust implementation

🤯🤯🤯🤯🤯🤯🤯🤯🤯🤯 3 seconds!?

Doing some napkin math, given the single threaded run took 40 seconds; 40 seconds divided by 16 cores is 2.5 seconds - so a transformation time of 3 seconds does make sense and indicates there is the potential to improve this further at some point.

In reality you never see gains that match dividing the time a single threaded task took by the number of threads, but we did get pretty close!

Another interesting thing to note is that the library I am using to track CPU usage started to lose it with the higher CPU utilization.

Let's have a look at the build time vs thread count looks like:



cargo build --release
for THREADS in $(seq 1 16); do
    env \
        HS_THREADS=$THREADS \
        PM_MEM_UNITS=mb \
        PM_TIME_UNITS=s \
        PM_TRACK=cpu,memory \
        PM_POLL_INTERVAL=250 \
        PM_REPORT=hypersonic_$THREADS.csv \
            procmon \
                ./target/release/multi-threaded-transformer "./parcel/entry/entry.js"
done


Enter fullscreen mode Exit fullscreen mode

graph comparing the build time against the number of threads

The blue line tracks the actual build time.

The green line tracks the "theoretical perfect" build time, calculated as build time of 1 thread / number of threads

Red and yellow track average and maximum CPU usage.

We can see that we are able to track very closely to the theoretical perfect time and it's likely not worth optimizing further to achieve a better time.

A benefit of this approach is that a time of 6 seconds is still very acceptable, limiting the thread count to 7 leaves us with 9 threads to continue working on other tasks while the build runs - like checking emails or watching YouTube videos.

Comparison to Parcel

While this transformer circuit attempts to mimic Parcel's transformation circuit - it does not completely encapsulate the work that Parcel is doing.

For example, Parcel is loading plugins dynamically, it's checking assets to determine which plugins to use on them, detecting configuration and supplying it to plugins.

It also support many more language features, automatically detecting if the file uses TypeScript or JSX. It supports Node's require() syntax as well as JavaScript's import syntax.

It's also generating a much more complex graph of the asset relationships to use for advanced optimizations like dead code elimination.

While we do see a 20x improvement in the time it takes for this circuit to run - this isn't indicative of the real world expectations of how a bundler like Parcel would perform if migrated to Rust.

That said, it does make me excited to see how the Parcel project performs once migrated to Rust - there certainly seems a lot to be gained in the way of performance.

Next Article

In the next article we will dive into bundling. We will walk through analyzing the asset graph to determine when to combine source files and generate a "bundle graph".

FIN 🌴

💖 💪 🙅 🚩
alshdavid
David Alsh

Posted on December 21, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Writing A Bundler In Rust - Part 1
rust Writing A Bundler In Rust - Part 1

December 21, 2023