Speed up your Gatsby application's build time by 300% with incremental builds
Kristian Freeman
Posted on May 14, 2020
🤔 Introduction
Gatsby Incremental Builds is a new feature in the Gatsby framework that enables build caching. When you build your Gatsby application using gatsby build, it's common for a lot of your site to stay the same - for instance, if I add a new blog post to my site, I might find that the only pages that should change are ones where that new blog post may show up: the archive page, the home page, and of course, the blog post page itself. In the past, Gatsby applications would rebuild everything on your site - while it adds to your site's build time, this ensures that every part of the site stays up-to-date.
With the release of Incremental Builds, Gatsby is now able to introspect into the .cache and public directories created by past application builds, and determine which parts of the site need to be rebuilt. For everything else that's stayed the same, the build process will just pull in existing data: this leads to much faster build times for most applications.
Gatsby is strongly encouraging that you try incremental builds via Gatsby Cloud, their hosting service. While the incremental build integration in Gatsby Cloud looks quite slick, the underlying work that makes it possible is integrated into the open-source framework, so we can use it in our existing CI tools without having to pay $99/mo for Gatsby's cloud offering.
In this tutorial, I'll show you how to add Incremental Builds to your site using GitHub Actions - a CI/workflow tool built right into GitHub, and free for public repositories - but you can also adapt this code and the principles behind incremental builds into whatever CI tool you're using.
Gatsby's blog post announcing Incremental Builds promises under ten second builds - in my testing, I haven't found it to be that fast, but the speed implications for many sites are quite impressive.
To test Incremental Builds effectively, I used Gatsby's own documentation site. Remarkably, I found that building the Gatsby docs with GitHub Actions without incremental build optimizations took almost thirty minutes! It's a testament to how big JAMStack sites can be that Gatsby can chug along for thirty minutes finding new pages to build. When I introduced incremental builds in my workflow, the build time was reduced to an average of nine minutes - an over 300% decrease in build time!
Gatsby Documentation Website (gatsbyjs.org/docs)
That being said, for many sites, the additional complexity of caching may not be worth it. In my testing of smaller sites, where the average build time is under a minute, the addition of incremental builds reduced the average build time by mere seconds.
If you find that your site is building that quickly, you may find that other optimizations such as reducing the time to deploy (an exercise I've been working on with wrangler-action, an action I maintain for deploying Cloudflare Workers applications) will be a more effective way to speed up your build/deployment process.
☑️ Guide
If you're looking for a tl;dr about how to enable incremental builds in your project, the process can be reduced to four steps:
Opt into incremental builds with an environment variable
Cache your application's public and .cache directories
Begin building your application
(optional) Add flags to gatsby build to understand how/when files are changing
I'll explore each of these steps through the lens of GitHub Actions, but porting these steps to CircleCI or other CI applications should be fairly straightforward.
Many readers of this tutorial may not be currently using GitHub Actions with their Gatsby applications - to help you get started, I've provided a sample workflow that installs your project's NPM packages and builds the application. While I personally use the Yarn variant, which has the added benefit of caching your NPM packages (another big improvement to build time), you may prefer to use the straightforward NPM variant. Pick one of them and commit it in your repository as .github/workflows/build.yml:
Both workflows make use of the build script as a simple alias for gatsby build. We'll iterate on this further in the next section, but for now, ensure that your package.json contains the build script under the scripts object:
{"scripts":{"build":"gatsby build"}}
I've created a sample repository that you can also refer to on GitHub, whether you'd like to copy-paste the code, or even fork it for your own projects. You can find it at signalnerve/gatsby-incremental-builds-gh-actions-example.
Example Gatsby Incremental Builds + GitHub Actions Project
Gatsby Incremental Builds + GitHub Actions
Example repository showing how Gatsby Incremental Builds can be accomplished using GitHub Actions deploys.
As a proof of concept, an example deployment using Cloudflare Workers is included in this repo. When new commits are made, the workflow will run, caching any existing content (using the .cache and public directories) and not requiring content that hasn't changed to be built again.
Note that this repo is pretty WIP from a documentation perspective, but I do want to shout out @raulfdm who beat me to implementing this with a significantly easier implementation than what I was trying to pull off. Some of the workflow code in this project is based on his work.
Limitations
GitHub Actions' caching feature is currently only supported on push and pull_request event types - this means that any repositories using schedules or repository_dispatch (custom webhook events) will be unable to use…
It's important to understand how the incremental build process works, particularly when a total site rebuild happens, versus an incremental rebuild. When a Gatsby application builds, the content of the site comes from two sources: the code of the site (HTML, CSS, and JavaScript), and data - whether it's internal to the site (Markdown files and other local content), or external (APIs, CMS tools, etc).
Gatsby incremental builds focus on data: when the data from a headless CMS or API changes, Gatsby can compare the current cached version of the data and compute what incremental changes need to happen. When code changes on your site, Gatsby will force a total site rebuild. This is covered in the docs, but I missed it as I was experimenting with this project, so I want to call it out to reduce future confusion. Via the docs linked above:
If there are any changes to code (JS, CSS) the bundling process returns a new webpack compilation hash which causes all pages to be rebuilt.
My preferred way to add the environment flag for opting into incremental builds is via a new script in package.json - this way, we can run the traditional gatsby build command via something like yarn run build, and move onto incremental builds without needing to do anything but change the script we call in CI. To do this, I'll define the build:incremental script in package.json:
For incremental builds to work, your build workflow needs to cache any artifacts produced when Gatsby builds your application. At the time of writing, these two folders are public and .cache.
GitHub Actions' caching action, actions/cache, supports persisting directories produced during your workflow. To implement it, we'll add actions/cache to our workflow, and for each directory, pass a path and key to the action, indicating that we want to cache the directory:
# .github/workflows/build.ymljobs:build:name:"BuildGatsbyapp"steps:# previous steps-name:Gatsby Cache Folderuses:actions/cache@v1with:key:gatsby-cache-folderpath:.cache-name:Gatsby Public Folderuses:actions/cache@v1with:key:gatsby-public-folderpath:public-name:Build apprun:'yarnrunbuild:incremental'
🛠 Begin building your application
With caching and the new build:incremental script added to your workflow, we can now begin using incremental builds! GitHub Actions is event-based, meaning that the workflow will run when events occur in your repository.
Using the workflow provided in this tutorial, our workflow will be run via the push event, which is triggered whenever a user pushes commits to the repository. At this point, you can begin to work on your application as you normally would - making changes to your data, adding new content, etc. The mechanisms for incremental builds should occur on your second commit to your repository after merging your workflow updates:
Commit the new workflow improvements: using the incremental builds environment variable, and caching the public and .cache directories
Make any change to your application (first commit: directories will be cached)
Make an additional change to your application – the previously cached data will be loaded at the beginning of the workflow (second commit: incremental builds should begin here!)
Here's some screenshots of my experiments with incremental builds. The first repository is the previously mentioned Gatsby docs repository which takes around thirty minutes to build:
Initial builds for the Gatsby documentation site take, on average, 27 to 30 minutes
When the directories are cached and start being used in the workflow, the build time drops dramatically, down to around nine minutes:
Adding incremental builds reduces the build time by around 300%
Initial builds for the blog template take, on average, 110 to 120 seconds
When incremental builds kick in, the build time reduces to a little over a minute:
Adding incremental builds reduces the build time by around 35%
🚩 (Optional) Add gatsby build flags
To better understand when your content is being cached, Gatsby provides some additional flags that can be passed to gatsby build to provide output regarding incremental builds:
--log-pages: outputs file paths that are updated or deleted
--write-to-file: creates .cache/newPages.txt and .cache/deletedPages.txt, which are lists of the changed files inside of the public folder
Because we're building our Gatsby application inside of a CI workflow, I prefer to see the changed files via my workflow's output, using the --log-pages flag. To implement this, we can add the --log-pages flag to the build:incremental script:
Via the Gatsby documentation, you should begin to see output like this in your workflow:
success Building production JavaScript and CSS bundles - 82.198s
success run queries - 82.762s - 4/4 0.05/s
success Building static HTML for pages - 19.386s - 2/2 0.10/s
+ success Delete previous page data - 1.512s
info Done building in 152.084 sec
+ info Built pages:
+ Updated page: /about
+ Updated page: /accounts/example
+ info Deleted pages:
+ Deleted page: /test
Done in 154.501 sec
As a further exercise, you may find that the --write-to-file flag may be a good way to output how your project is changing via GitHub comments, or potentially to tools like Slack or Discord! Since I'm a "team of one" on many of my sites, I haven't taken the time to implement this, but if you try it, let me know - I'd love to include a sample in this tutorial!
🙅♂️ GitHub Actions Caveat
I want to mention a caveat here around the GitHub Actions + Gatsby incremental builds work, which is the interplay between events and caching.
At time of writing, the actions/cache action provided by GitHub only works on push and pull_request events. This means that if you're building your Gatsby application via other events, such as the very handy schedule event, which allows you to run workflows on a recurring "cron"-style schedule (e.g. "every hour" or "six times a day"), and the repository_dispatch event, which is commonly used as a webhook for triggering new application builds when your external APIs or CMS data changes.
This is currently being fixed by the maintainers of the actions/cache action, with a pull request open to bring caching to all workflow events. In the meantime, this means that for many "true" JAMStack applications, where a lot of data lives outside of your actual repository, you may find that this work isn't super useful quite yet. I've seen movement on that PR in the last few days, as I've been writing this tutorial, so I'm hoping it'll be merged in the next few weeks - when that happens, I'll happily remove this caveat, and opt in to super fast incremental builds on all of my Gatsby projects!
🙋♂️ Conclusion
I'm really excited about this work, and about the optimizations that the Gatsby team is making to the framework to reduce build times. In my video about incremental builds (embedded at the beginning of this tutorial), I mentioned that this improvement has made me excited again about optimizing my workflows: I'm taking the momentum from Gatsby incremental builds and bringing it to the other things I use GitHub Actions for, like deploying my projects to Cloudflare Workers using wrangler-action.
Since I completed this work, I've come back to my own custom actions and I'm now focusing on trying to reduce the execution time for all of them - I still haven't reached the "under 10 second builds" statistic that the Gatsby team has mentioned, but I'm getting close!
If you enjoyed this tutorial, consider subscribing to the Bytesized YouTube channel! I covered this effort for the channel and I'd love to hear from you in the video comments about other things you'd like to see covered in the Gatsby world. I release new videos over there on a weekly basis covering software development, especially web development, serverless programming, and JAMStack.
I also organize Byteconf, a free + remote developer conference series, where Gatsby has been covered numerous times at our past conferences. Every talk from the past few years of conferences is on the Bytesized channel, but I'll also link a few of my favorite vids we've done on Gatsby for you to check out below!
💬 Are you using Gatsby incremental builds? Let me know in the comments! I'd love to hear if this has made your site faster, and if you've taken this work and integrated it into your other CI tools.