Fix Gatsby 4 matchPath issue
Alexandre Fauchard
Posted on April 25, 2022
Introduction
If your Gatsby website has a lot of pre-generated pages, you may have noticed that your app.js
file grows fast. That's because from the moment you add a matchPath
to one of your pages, Gatsby will write every pre-generated "child page route" in the app.js
file.
💡 By "child page route" I mean all pages with same level or deeper level path.
It can concern some parts of your website if you are using a fallback route or, in the worst case, every page of your site if you choose to implement a custom 404 page.
This results in a more and more longer page loading on every page and a catastrophic SEO side effect.
Let's take an example
We have a news website where the journalist post some hot posts. Our website is pre-generated every 10 minutes and build every news with the route /news/{id}
. But if a journalist post a news, users have to be able to see it before the next build.
At every moment, we can split our pages in two parts : the generated news and the non-generated news.
Problem : both types have the same route /news/{id}
. How to access news created after the last build ?
This issue can shortly be solved with a matchPath. All we have to do is to add this code anywhere in our createPages
function
/* ... */
createPage({
component: newsFallbackTemplate,
context: newsContext,
matchPath: "/news/*",
/* ... */
})
/* ... */
The created page will be shown each time the user access a non-pre-generated page with path starting with /news/
. We just have to handle API calls in the newsFallbackTemplate
component to display our news.
In fact, it's work really well. The only problem is the way Gatsby handle it. To better understand why, let see, schematically, what append when a user visit your website at the path /news/ex_6
.
The loadPage process
When a page is loading, there are two functions called.
First, the window.loadPage
function is called. It's looking for the component matching with the path it's loading.
💡 On an SSG page, it display the pre-generated files and "confirm" the used component.
The process give priority to the more detailed path following this algorithm. To achieve this, the loadPage
use the app.js
file to get all paths and matchPaths. Once the component is loaded, the page is displayed.
💡 If you don't have any matchPaths, Gatsby will only use the pre-generated data but if you have at least one matchPath, Gatsby have to verify that he display the right component. That's why the
app.js
will contain every path and matchPath and will grow proportionally with your page number.
Then, the window.loadPageSync
will execute to double-check this result.
In our case, when the browser loads the path /news/ex_6
, as we have at least one matchPath page, the loadPage
function will look into our app.js
file. This file will contain an array which look like this
[
{"path" : "/news/ex_1", "matchPath" : "news/ex_1"},
{"path" : "/news/ex_2", "matchPath" : "news/ex_2"},
/* ... */
{"path" : "/news/ex_5", "matchPath" : "news/ex_5"},
{"path" : "/xxx", "matchPath" : "news/*"},
]
Using the algorithm link before, it will return the component corresponding to the path /xxx
as the closer matchPath is news/*
. The component should be newsFallbackTemplate
and will handle all API calls to display our news as a pre-generated one. Everything work fine but in our case we have only 5 pre-generated news so the app.js
array only have 6 entries. But if we have 100K news pre-generated, the array will be huge as well as our app.js
file.
How to fix it
To fix it, we'll need to replace the way matchPath are handle.
First, we will replace all our matchPath by a static path like /__fallback/__xxx
. The createPage call we write at the beginning to handle non-pre-generated news will now look like this
/* ... */
createPage({
component: newsFallbackTemplate,
context: newsContext,
path: "/__fallback/__news_fallback",
/* ... */
})
/* ... */
We now have a static loading page for every old matchPath page.
Then, we will add to our newsFallbackTemplate
component the following code
return (
/* ... */
<Helmet>
<script>
{`
window.pagePath = undefined
window.news__routeParams = {
path : "/__fallback/__news_fallback",
matchPage : "/news/*"
};
`}
</script>
</Helmet>
/* ... */
)
This will add to the page a script which will add to the global window
variable a property news__routeParams
with the static path .
💡 Replace
news__
prefix by your app name to avoid interferences
Finally, in the gatsby-browser
file, we add an exported function onClientEntry
to handle these data. This function will rewrite the loadPage
and loadPageSync
functions to add the logic to handle our custom matchPath, only if news__routeParams
is defined. The onClientEntry
shape will look like this
const { ___loader: loader, news__routeParams } = window
if (!loader) {
return
}
//Original loadPage functions, to call them later
const originalLoadPage = loader.loadPage
const originalLoadPageSync = loader.loadPageSync
if (news__routeParams) {
loader.loadPage = async rawPath => {/* ... */}
loader.loadPageSync = rawPath => {/* ... */}
}
loadPage
The loadPage
function will be the bigger,
loader.loadPage = async rawPath => {
const path = news__routeParams.path
const matchPath = news__routeParams.matchPath
const lastActualPath = window.news__lastActualPath
const isFallbackPage = !!path
//As this function is executed before the url change, we still detect a
//fallback page if we are leaving one so we need to know if we are leaving or not
const isLeavingFallbackPage = !!lastActualPath && lastActualPath !== rawPath
let pageResources
//If we detect a fallback Page and we're not leaving we override the pagePath
//with the loader page path and we add a matchPath corresponding to the
//wanted page url with the last part replaced by *
if (isFallbackPage && !isLeavingFallbackPage) {
pageResources = await originalLoadPage(path)
const rawParts = rawPath.split("/")
rawParts.splice(rawPath.slice(-1) === "/" ? -2 : -1)
pageResources.page.matchPath = matchPath ?? [...rawParts, "*"].join("/")
} else {
//If we detect a non-fallback page or leaving one
pageResources = await originalLoadPage(rawPath)
}
//We save some data for later
window.news__lastActualPath = rawPath
window.news__savedPageRessources = pageResources
return pageResources
}
As you can see, we check if we're loading a fallback page (with matchPath) or not. In the case of a fallback page, we first load the pageResources (component, page-data.json, ...) with the original function (originalLoadPage
) called with news__routeParams.path
instead of the actual path (rawPath
). Then, we set the pageRessource's matchPath to the window.news__routeParams.matchPath
Thanks to this, if the user load news/ex_6
(which is not pre-generated) the loadPage
function with load the pageResources of the link /__fallback/__news_fallback
and display it.
loadPageSync
At the end of the loadPage
function, we saved the loaded pageResources
into window.news__savedPageRessources
. Thanks to this, we can simply return this variable instead of copy-paste the same logic. Obviously, if there is no saved page resources, we return the original function result
loader.loadPageSync = rawPath => {
return window.news__savedPageRessources ?? originalLoadPageSync(rawPath)
}
You maybe notice, but to make this work, we need to have set the window.news__routeParams
variable. But how can we have this variable which is on the fallback template, if we're trying to load this template ? We add a redirect !
⚠️ That's the limit of this solution : it's only work with an hosted instance, not in development. As it's an SEO issue, you can add matchPath in dev mode.
All you have to do is to add this just after the createPage
seen before
createRedirect({
matchPath:`news/*`,
toPath: `/__fallback/__news_fallback`,
redirectInBrowser: false,
statusCode: 200,
})
That's it ! Now our server will transparently redirect (redirectInBrowser: false
) to the fallback page which will display the news dynamically. Once the JavaScript is loaded, the loadPage
will be executed and, thanks to our modification, will load the same component as well as loadPageSync
.
Results
Here is the Lighthouse Treemap results before and after our modifications on a +40K pages Gatsby 4 project.
Before
After
As you can see, the app.js
bundle is drastically smaller !
Maybe Gatsby will solve the problem by himself 🤞🏻
Posted on April 25, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.