Precaching pages with next-pwa

sfiquet

Sylvie Fiquet

Posted on November 22, 2021

Precaching pages with next-pwa

How can you precache all your app pages to emulate a native app experience when offline? While next-pwa allows you to precache JavaScript and CSS files out of the box, there is currently no easy way to precache pages. Here's how I did it.

Table of Contents

Tldr

  • Decide which HTML and JSON files to precache
  • Generate the build id yourself and pass it to the Next.js build via generateBuildId
  • Generate the list of entries to precache and pass it to next-pwa via pwa.additionalManifestEntries
    • Use the build id as the revision for HTML entries
    • Include the build id in the url for JSON entries with revision set to null
    • If you want to precache the content of the public folder, you have to do it yourself
  • To precache the home page HTML: set pwa.dynamicStartUrl to false (default true puts it in the runtime cache instead). Note that this doesn't precache the JSON.
  • Implement as a config function to avoid running your build functions for every single Next.js command

Next.js programmer

Introduction

Are you trying to build an offline-first app with Next.js? Are you tearing your hair out trying to coerce next-pwa to precache your pages? If so, keep reading. This post presents what I learned while researching this problem and the code for my solution.

Since this is about solving a specific problem with next-pwa, I'll assume that you're familiar with Next.js and next-pwa and that you're aware that Webpack, Workbox and workbox-webpack-plugin are all involved in next-pwa's functionality.

This post does not cover:

  • Server-side rendered pages: We're purely talking static generation. Pages produced by server-side rendering cannot be precached and are not discussed here.
  • Images: my pages currently don't have images so I didn't look into precaching them. If you're considering precaching images, you'll want to look very closely at the impact on the size of your cache.

It's possible that next-pwa might support precaching pages in the future. Subscribe to issue 252 to keep up to date on that.

In the meantime, let's look at what we're dealing with.

How next-pwa handles precaching

Behind the scenes next-pwa uses workbox-webpack-plugin, whose API consists of two classes, GenerateSW and InjectManifest. Which one it uses depends on whether you provide your own service worker. In any case both classes use the output of the Webpack build to generate a list of precache entries. That list is called a manifest.

Both classes have an additionalManifestEntries property that allows you to add more files to the precache list. You can specify it through next-pwa's API as pwa.additionalManifestEntries.

additionalManifestEntries is an array of Workbox ManifestEntry objects, with properties url and revision (and an optional integrity).

What next-pwa precaches by default

next-pwa's withPWA() function adds its own custom Webpack config to your Next.js config. If your config already has a webpack function, next-pwa's webpack function simply calls it before doing anything else.

With regard to precaching, by default next-pwa precaches the content of the public folder and the .next/static folder generated by the build (served at /_next/static/).

  • public is where you put your static assets (such as favicons) so they can be served at root.

  • /_next/static/ is where Next.js's Webpack-generated assets are served. Its content is automatically precached by workbox-webpack-plugin. This includes all generated JavaScript bundles and CSS files.

Pages are not precached. There is no way workbox-webpack-plugin could automatically precache them because they are generated in a separate step of the build that doesn't involve Webpack. By default next-pwa stores visited pages in a runtime cache. Since that's dependent on user behaviour it is not suitable for our use case.

Custom precaching with next-pwa

In order to precache anything else you have to specify your own precaching entries with pwa.additionalManifestEntries. There are problems with that, however.

First, if you specify additionalManifestEntries, next-pwa stops precaching public. So, if you want to preserve that behaviour, you must do it yourself.

Second, each entry must be a Workbox ManifestEntry object with properties url and revision. Getting the revision right is the tricky bit. So let's look at how next-pwa does it.

For static files in the public folder, next-pwa generates a hash of the content. That makes sense because those files are not affected by the build and are served as is.

For anything generated by the build, next-pwa uses the buildId which is passed to the webpack function during the build. That reflects the fact that, even if a particular source file has not changed since last build, its dependencies might have changed, causing the generated file to be different from previously.

In our case, pages are generated as HTML and JSON files during the build, so it makes sense to also use the buildId as the revision number. In the case of JSON files the buildId is definitely needed because it is embedded in the url.

Finding a way to access the build id caused me a lot of grief. Before going into this, let's look at what files we need to precache.

How to precache pages

In order to precache pages, the first thing to know is what files are involved and where they are served. Which files you need depends on whether you use client-side navigation.

Page files and where to find them

For static generation pages, Next.js generates two types of files: HTML and JSON.

HTML files are generated for all pages. This is the standard way to represent web pages. They are sent in response to a direct request from the browser. That includes standard navigation via pure <a> links. HTML files are accessed through your app's routes, as defined by the structure of the pages folder.

JSON files are only generated for pages defined with a getStaticProps function and are only used for client-side navigation, i.e. through <Link> components. They are fetched by Next.js's router. JSON files are served at /_next/data/. Urls include the build id and have the following format: /_next/data/{buildId}{path}{pageName}.json

Examples (for build id OUEmUvoIwu1Azj0i9Vad1):

HTML JSON
/ /_next/data/OUEmUvoIwu1Azj0i9Vad1/index.json
/about /_next/data/OUEmUvoIwu1Azj0i9Vad1/about.json
/posts/myfirstpost /_next/data/OUEmUvoIwu1Azj0i9Vad1/posts/myfirstpost.json

Now that we know where to find our files, which ones do we need to precache?

The importance of internal links

The way you implement your internal links affects which files you need to precache.

Standard <a> links

If your internal links are implemented with <a> tags instead of the <Link> component, JSON files are irrelevant to you: You need to precache the HTML files.

Client-side navigation via the <Link> component

When I started with Next.js, I decided to implement a static website first and look into client-side rendering later so I didn't use <Link>. But it is part of the optimisations that make Next.js websites fast.

If you don't use it, all the common JS files need to be downloaded every time you navigate to a new page. Whereas, if you use it, those files are downloaded once, and then only the JS file specific to your current page is downloaded when you navigate. In addition, any <Link> to a static generation page is prefetched when it appears in the viewport. I also like that <Link> degrades gracefully to a standard <a> link if JavaScript is not available in the browser.

With <Link>, the only HTML that is downloaded is the first page the user accesses via the browser, typically the home page but it could be any url they saved as a bookmark or typed directly in the address bar. After that, pages are generated from the page's JS and the page's JSON. If the user reloads the page, we start from scratch from the page's HTML.

That means that you need to precache:

  • all the JSON files for the pages you want to precache
  • at minimum the HTML for the start url

You also need to decide what to do if the user tries to access another page through the browser: reload, bookmark, address bar typing...

JSON files facts
  • No JSON files are generated for pure static pages with no getStaticProps since they can be generated client-side from just the JavaScript (which, as we've seen, is already precached by next-pwa).

  • In the case of dynamic pages, remember that you have one JSON file per path returned by getStaticPaths.

HTML files tips
  • HTML files are typically much bigger than the corresponding JSON files so precaching them all might not be the best approach if there are a lot of them.

  • If you don't precache all HTML files, it's a good idea to provide an offline page. It's easy to add one with next-pwa.

  • Concerning the start url, note that next-pwa assumes that your start url is your homepage. In my case I use a different start url because the homepage is just a landing page, which is not interesting to returning users. So I precache the HTML file for the actual start url as well.

Using a custom build id

The problem is almost solved, now we need to get hold of the build id so we can generate the manifest entries.

I spent a lot of time trying to work out how to get hold of the build id. I knew that the custom webpack function generated by withPWA was passed the buildId as a parameter but as far as I could tell there was no way for me to hook into withPWA to get the buildId. What to do?

One option was to fork next-pwa to write my own version. I wasn't keen on that, I'd rather use the original library and be able to update it.

I finally realised that I could bypass the problem entirely by supplying the build id to the config via generateBuildId. It's supposed to be used for multi-server deployment but I used it to enable page precaching instead.

By default Next.js uses nanoid to produce build ids so I used that too. You don't have to use it, there are other options, such as uuid.

Problem solved. On with the code!

Thinking programmer

Code

Basic structure of the config file

Now that we've gone through all the concepts, here's the basic structure for the config file:



const withPWA = require('next-pwa')
const { nanoid } = require('nanoid')

function getGeneratedPrecacheEntries(buildId){
  // build list of page entries, using buildId as revision for HTML files and as part of the url for JSON files
  ...
}

const buildId = nanoid()

module.exports = withPWA({
  generateBuildId: () => buildId,
  pwa: {
    dest: 'public',
    additionalManifestEntries: getGeneratedPrecacheEntries(buildId),
  }
})


Enter fullscreen mode Exit fullscreen mode

We call nanoid() once and store the value in constant buildId, then we use it to generate all the manifest entries. We also make sure that the generateBuildId function returns buildId so that everything is consistent.

Possible improvements:

  • precache the content of the public folder
  • automatically precache your homepage HTML by setting dynamicStartUrl to false (only do this if you don't redirect your home page)


const withPWA = require('next-pwa')
const { nanoid } = require('nanoid')

function getStaticPrecacheEntries(){
  // build list of manifest entries to precache content of public folder
  ...
}

function getGeneratedPrecacheEntries(buildId){
  // build list of page entries, using buildId as revision for HTML files and as part of the url for JSON files
  ...
}

const buildId = nanoid()

module.exports = withPWA({
  generateBuildId: () => buildId,
  pwa: {
    dest: 'public',
    additionalManifestEntries: [...getStaticPrecacheEntries(), ...getGeneratedPrecacheEntries(buildId)],
    dynamicStartUrl: false, // precache home page instead of storing it in runtime cache by default
  }
})


Enter fullscreen mode Exit fullscreen mode

This basic code has one issue: Although generating the build id and the precache entries is only relevant to the build, all this code will be evaluated each time next.config.js is used. In other words it will be called for every single Next.js CLI call such as next start or next dev in addition to next build. You can convert the config into a config function to prevent that as we'll see later.

Building the list of static files entries

This is more of a side thing since it's basically copying and adapting the code related to the public folder in next-pwa.

util/staticprecache.js



// ** adapted from next-pwa index.js since it doesn't set up its own entries when additionalManifestEntries is specified
const path = require('path')
const fs = require('fs')
const globby = require('globby')
const crypto = require('crypto')

const getRevision = file => crypto.createHash('md5').update(fs.readFileSync(file)).digest('hex')

// precache files in public folder
function getStaticPrecacheEntries(pwaOptions){
  // set up properties used in next-pwa code to precache the public folder
  const basePath = pwaOptions.basePath || '/'
  const sw = pwaOptions.sw || 'sw.js'
  const publicExcludes = pwaOptions.publicExcludes || ['!noprecache/**/*']

  let manifestEntries = globby
  .sync(
    [
      '**/*',
      '!workbox-*.js',
      '!workbox-*.js.map',
      '!worker-*.js',
      '!worker-*.js.map',
      '!fallback-*.js',
      '!fallback-*.js.map',
      `!${sw.replace(/^\/+/, '')}`,
      `!${sw.replace(/^\/+/, '')}.map`,
      ...publicExcludes
    ],
    {
      cwd: 'public'
    }
  )
  .map(f => ({
    url: path.posix.join(basePath, `/${f}`),
    revision: getRevision(`public/${f}`)
  }))
  return manifestEntries
}

module.exports = getStaticPrecacheEntries


Enter fullscreen mode Exit fullscreen mode

I dislike copy-pasting code like this since it needs to be maintained manually to be kept in sync with the evolution of the original library. And it doesn't take long for the code to evolve: I had to add the fallback lines after the functionality was added.

I put it in its own module to at least contain the mess. My hope is that eventually next-pwa will provide an option to do this.

Building the list of page entries

Now we need to implement getGeneratedPrecacheEntries. It takes the build id as argument and returns an array of ManifestEntries.

All the following code is specific to my app so is intended as an illustration.

For context this is the structure of my pages folder:

  • denizens/[denizen].js
  • about.js
  • denizens.js
  • index.js
  • a few custom files that don't generate urls: _app.js, _document.js and _offline.js

The most notable thing is that I have a dynamic route for which I need to generate page names like what getStaticPaths does. In my case those names come from a JSON data file. It means I/O access, that's slow. If you're fetching from a database or an external API, it's even slower.

Because of this, when you have dynamic pages, getGeneratedPrecacheEntries cannot be a pure function. This is why I don't like the idea of all Next.js CLI commands calling it and why I eventually converted my Next.js config to a config function.

Defining how urls are precached

I represented my precaching as an array of objects called pages. Object properties are:

  • route: string - page route or, in the case of dynamic pages, the path prefix
  • precacheHtml: boolean - are we precaching the HTML?
  • precacheJson: boolean - are we precaching the JSON?
  • dynamicPages: array of page names - only needed for dynamic pages


const pages = [
  {
    route: '/',
    precacheHtml: false, // next-pwa already caches the home page
    precacheJson: false, // no props
  },
  {
    route: '/about',
    precacheHtml: false,
    precacheJson: true,
  },
  {
    route: '/denizens',
    precacheHtml: true, // this is now the start url for A2HS
    precacheJson: true,
  },
  {
    route: '/denizens/',
    precacheHtml: false,
    precacheJson: true,
    dynamicPages: getDenizenPages(),
  },
];


Enter fullscreen mode Exit fullscreen mode

As you can see, for most pages I only precache the JSON file. The only HTML files that are precached are the home page (because it's always cached by next-pwa) and '/denizens' because it's my start url for A2HS. All other HTML requests while offline are dealt with by an offline page (a functionality offered by next-pwa; the offline page is automatically precached).

Concerning the home page, both flags are false because next-pwa already takes care of the HTML file and Next.js doesn't generate a JSON file for a pure static page with no props. It could be removed from pages entirely but it might as well stay there in case things change in the future.

getDenizenPages is a function specific to my application that returns an array of strings representing the page names. It does pretty much the same thing as getStaticPaths in pages/denizens/[denizen].js except that each item is a string instead of an object with a param attribute. Like getStaticPaths it reads from a data file.

Generating the manifest entries



function getPageJSONPath(buildId, pageRoute){
  return path.posix.join('/_next/data/', buildId, `${pageRoute}.json`);
}

function getJSONEntry(buildId, pageRoute){
  return {
    url: getPageJSONPath(buildId, pageRoute),
    revision: null,
  };
}

function getHTMLEntry(buildId, pageRoute){
  return {
    url: pageRoute,
    revision: buildId,
  };
}


Enter fullscreen mode Exit fullscreen mode

Both getJSONEntry and getHTMLEntry take the build id and the page route as parameter and return a Workbox ManifestEntry object.

For JSON files the build id is included in the ManifestEntry url so the revision is set to null, in line with the specifications.

Generating the precached entries



function getNormalPageEntries(buildId, page){
  let entries = [];
  if (page.precacheHtml){
    entries.push(getHTMLEntry(buildId, page.route));
  }
  if (page.precacheJson){
    entries.push(getJSONEntry(buildId, page.route));
  }
  return entries;
}

function getDynamicPageEntries(buildId, page){
  let pageList = page.dynamicPages.map(actualPage => path.posix.join(page.route, actualPage));
  let entries = pageList.map(route => getNormalPageEntries(
    buildId, { route: route, precacheHtml: page.precacheHtml, precacheJson: page.precacheJson })
  );
  return entries.reduce((acc, curr) => acc.concat(curr), []);
}

function getPageEntries(buildId, page){
  if (Array.isArray(page.dynamicPages)){
    return getDynamicPageEntries(buildId, page);
  } else {
    return getNormalPageEntries(buildId, page);
  }
}

function getGeneratedPrecacheEntries(buildId){
  return pages.map(page => getPageEntries(buildId, page)).reduce((acc, curr) => acc.concat(curr), []);
}


Enter fullscreen mode Exit fullscreen mode

getNormalPageEntries returns an array of 0 to 2 manifest entries depending on the boolean attributes precacheHtml and precacheJson in the page object parameter.

getDynamicPageEntries builds a list of all the pages for the dynamic page then calls getNormalPageEntries for each page and finally flattens the resulting array before returning it.

getPageEntries returns an array of entries for a given page. It checks whether the page is a dynamic page and call getNormalPageEntries or getDynamicPageEntries accordingly.

getGeneratedPrecacheEntries is passed the build id and generates the required entries. It calls getPageEntries for each page and flattens the array.

Transforming into a config function

As mentioned above, this code is called every time you use a Next.js CLI command. You can improve it by making it build-specific. The answer is to use a config function instead of a config object.

next.config.js



const withPWA = require('next-pwa')
const { PHASE_PRODUCTION_BUILD } = require('next/constants')

module.exports = (phase, { defaultConfig }) => {
  const config = {
        ...defaultConfig,
        pwa: {
            dest: 'public',
            dynamicStartUrl: false, // precache home page instead of storing it in runtime cache by default
        },
    }

    if (phase === PHASE_PRODUCTION_BUILD){
    // Attributes generateBuildId and additionalManifestEntries are only needed
    // for the build and calculating their value is time-consuming.
    // So we add them here, just for the build.
    const getBuildId = require('./util/buildid.js')
    const getStaticPrecacheEntries = require('./util/staticprecache.js')
    const getGeneratedPrecacheEntries = require('./util/precache.js')

        const buildId = getBuildId()

        config.generateBuildId = getBuildId
        config.pwa.additionalManifestEntries = [
      ...getStaticPrecacheEntries({
        // exclude icon-related files from the precache since they are platform specific
        // note: no need to pass publicExcludes to next-pwa, it's not used for anything else
        publicExcludes: [
          '!*.png',
          '!*.ico',
          '!browserconfig.xml',
        ],
      }), 
      ...getGeneratedPrecacheEntries(buildId),
    ]
    }

  return withPWA(config)
}


Enter fullscreen mode Exit fullscreen mode

What this does is:

  1. define the common config by adding to the default config
  2. only do the build-specific processing when we're running in the context of PHASE_PRODUCTION_BUILD. This is where we add build-specific attributes generateBuildId and pwa.additionalManifestEntries to the config.
  3. wrap the config in withPWA before returning

I moved the build id generation and the precache functions to separate files for readability.

Defining the config as a function gets rid of unnecessary processing when starting the server in production or development mode.

While debugging in Next.js version 11.2, I noticed that the config function was called twice during the build, causing nanoid and my precache functions to be called twice unnecessarily. This has been fixed in version 12.

Until we're ready to upgrade we can either put up with it or memoize the functions so that the heavy lifting is only done once. The extra call to nanoid doesn't seem to mess with the precaching but to be on the safe side I memoized it so that only one build id is generated per process.

util/buildid.js



const { nanoid } = require('nanoid')

let buildId = 0

function getBuildId(){
  if (!buildId){
    buildId = nanoid()
  }
  return buildId
}

module.exports = getBuildId


Enter fullscreen mode Exit fullscreen mode

To reiterate, this is not necessary with Next.js v12.

Proud programmer

Limitations

Reliance on an implementation detail of Next.js's router

JSON files urls are served and fetched by Next.js's internal functions. If Next.js decide to change their url scheme, this will break. But it's not like we have an alternative.

Hardcoding of the precache list

Even though I tried to keep the code easy to change by separating the data from the processing, I am still keeping a list of all urls in next.config.js (or util/precache.js in the config function version). It could of course be moved to another file for finer grained version control, but the fact is that every time you add new urls that need precaching, that list needs to be edited.

I looked briefly into automatizing it but for the time being it doesn't seem worth the effort.

  • I only have a few files in my pages folder. Automatization feels like overkill.
  • I am not sure it makes sense. Right now I'm precaching all pages. I have two special cases, the home page and the start url. As I add more features, will additional pages be worth precaching? And if there are exceptions, will automatisation still make sense? I don't know at this stage.

So I went with YAGNI and for now I'll leave automatization as an exercise to the reader 😉.

How much precache is too much?

When deciding which pages to precache and whether to precache the HTML files, you need keep in mind the cost to the user, especially on mobile.

One big difference between precaching JS files and precaching pages is that, with dynamic pages, the number of files can balloon. Depending on your data, you could easily have thousands of pages for one single JS file. That's a lot of files to precache. Is it reasonable?

Two things to consider are the size of the cache and the network data usage. Both need to stay moderate. Remember that not all users have unlimited data contracts.

While I didn't find a definite answer as to how much is reasonable (and it depends on your target userbase), here are some pointers:

Conclusion

We've seen that the trick to precaching pages is to generate your own build id. You can then pass it on to Next.js via generateBuildId and use it to generate the content of pwa.additionalManifestEntries.

The other important thing is to turn your config into a config function. Then you can make sure your expensive build-specific functions only run in the context of the build.

If you've managed to read this far, congratulations! You should now be able to confidently precache your own pages with next-pwa.

Links

If you'd rather see the original code, you can check out my project Anyaral on GitHub. Relevant files are next.config.js and the content of the util folder.

GitHub logo sfiquet / anyaral

Player helper tool for the game World of Twilight. Designed for mobile and offline use. Created with Next, Next-pwa and Tailwind.

Anyaral is a reference app for players of World of Twilight, a table top skirmish game.

Cover image by Arek Socha from Pixabay
Post images from Undraw

💖 💪 🙅 🚩
sfiquet
Sylvie Fiquet

Posted on November 22, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Precaching pages with next-pwa
nextjs Precaching pages with next-pwa

November 22, 2021