A beginners guide to understanding your Gatsby starter template

Illustration by Katerina Limpitsouni undraw.co/illustrations

Recently I started migrating an older Drupal based blog to a combination of Gatsby and Netlify CMS (source), both of which I had no prior experience with. In this blog series I'll talk you through the experiences, hurdles and solutions. In part 3 I'll explain what's happening in your Gatsby starter template. Although a starter kit is a good place to start, for those with no prior knowledge of Gatsby and GraphQL it might look like magic 🧙‍♂️. So let's try and demystify!

For my project I used the Gatsby + Netlify CMS starter, although much of the information will also be applicable to other starters.

Page creation

The most basic entity for Gatsby is a page. If you were to create a "vanilla Gatsby" project, every .js file with a React Component within the src/pages directory would be compiled to an HTML page. This still holds true for your starter project, even though you might find an index.md, index.yml or index.json file in your template.

So, how does the conversion from markdown file to JavaScript happen?

This process happens in the gatsby.node.js file. Let's take a closer look:

// gatsby.node.js (parts omitted for brevity)

// 1️⃣
exports.createPages = ({ actions, graphql }) => {
  const { createPage } = actions

  // 2️⃣
  return graphql(`
    # you'll find a biggg GraphQL query here
  `).then(result => {
    // ℹ your datastructure might differ
    const pages = result.data.allMarkdownRemark.edges;

    pages.forEach(edge => {
      const id = edge.node.id
      // 3️⃣
      createPage({
        component: path.resolve(`src/templates/page.js`),
        // additional data can be passed via context
        context: {
          id,
        },
      })
    })
  })
}

What we see is that:

the createPages() method of the Gatsby Node API is being utilized to programmatically create new pages.
This method accepts an Promise. The Promise returned here is a GraphQL query. How and where GraphQL gets its data will be explained in the next section. For now we will assume that this query returns some raw data that we have saved in our markdown/yaml/json file.
For each block of raw page data we create a new page. This is done by the createPageAction. This causes Gatsby to create a new page based on the component defined. In our case the component is a JavaScript template file which contains a React Component. As we already know, Gatsby is capable of converting React Components into (static) pages with HTML, CSS and JS.

Data sourcing in Gatsby

Now that we have high-level knowledge on how pages are created, let's do an inspection on how the page data is created.

We know the source of the page data: index.md, we also know the output: it is part of a GraphQL response. The big question is how is this tranformation done?

The answer can be found in Gatsby's plugins. The plugins are defined in gatsby.config.js:

module.exports = {
  siteMetadata: {
    // ... metadata
  },
  plugins: [
    // several plugins omitted for brevity
    {
      resolve: 'gatsby-source-filesystem',
      options: {
        path: `${__dirname}/src/pages`,
        name: 'pages',
      },
    },
    {
      resolve: 'gatsby-transformer-remark',
    },
    {
      resolve: 'gatsby-plugin-netlify-cms',
      options: {
        modulePath: `${__dirname}/src/cms/cms.js`,
      },
    },
    'gatsby-plugin-netlify', // make sure to keep it last in the array
  ],
}

The first plugin that does its work in Gatsby is very often the gatsby-source-filesystem. It reads the files in a certain directory, in this case src/pages/ and reads all files recursively (though there is an ignore option). The source-filesystem plugin then creates an intermediate entity, called a File which can be picked up by other plugins for further processing.

The File contains metadata on the extension and the path to find this file.

After the File is created, subsequent plugins can process it further. In our example, gatsby-transformer-remark filters all Files with extensions md and
markdown.

Now that the transformer-remark has knowledge about the file's location, it can read the file, parse it and make the processed result available under a new GraphQL type: MarkdownRemark.

We now know the process that happen under the hood in processing from file to data queryable via GraphQL.

A short intro to GraphQL in Gatsby

Learning GraphQL is beyond the scope of this article and worthy of a full recommended course. So I'll keep it short and to the bare minimum to get going.

Revisiting `gatsby.node.js`

In the previous section we learned that there is a new MarkdownRemark type added to GraphQL. We can see this when we revisit the query we skipped earlier in gatsby.node.js:

exports.createPages = ({ actions, graphql }) => {
  return graphql(`
    {
      allMarkdownRemark(limit: 1000) {
        edges {
          node {
            id
            frontmatter {
              title
              body
            }
          }
        }
      }
    }
  `).then(result => {
    // create pages
  })
}

We can now understand that allMarkdownRemark will return an array of markdown page data. It will do so in this datastructure:

const data = {
  allMarkdownRemark: {
    edges: [{
      node: {
        id: '123',
        frontmatter: {
          title: 'My awesome blog about Gatsby + Netlify CMS',
          body: 'Lorem ipsum'
        }
      }
    }]
  }
};

Querying at the page level

If you'd want to query a single page, you'd query by the id field.
The id field is added to the page context (scroll up to createPage) and used in the query defined in the template.

// src/templates/page.js

export const query = graphql`
  query PageByID($id: String!) {
    markdownRemark(id: { eq: $id }) {
      html
      frontmatter {
        date(formatString: "MMMM DD, YYYY")
        title
      }
    }
  }
`

const Page = (props) => {
  return (
    <div>
      { props.data.markdownRemark.frontmatter.title }
      <br />{ props.data.markdownRemark.frontmatter.date }
      <div dangerouslySetInnerHTML={{ __html: props.data.markdownRemark.html }} />
    </div>
  )
};

export default Page

A couple things stand out in this snippet:

There is an exported const called query. This constant will be picked up by Gatsby, the query will be executed and passed as props.data to the component.
We see the $id parameter being injected into the query. As stated before, any parameter is being looked up via the page context. The page context is set during page creation in gatsby-node.js
Rather than using allMarkdownRemark which returns an array we use markdownRemark as Query type here. This makes sure a single object is returned

That's it, you know all you need to know to go play around with pages and queries! Try to create some new content, content types or fiddle around with queries such as sorting by date.

🧪 Tip: if you are going to play around with the GraphQL be sure to turn on the GraphQL Playground by changing your npm develop script to "develop" : "GATSBY_GRAPHQL_IDE=playground gatsby develop". This allows you to use the more intuitive Prisma GraphQL Playground

This blog is part of a series on how I migrated away from a self-hosted Drupal blog to a modern JAM stack with Gatsby and Netlify CMS.