GitLab CI speeding up your pipeline with caching

jnmoal

Jean-Nicolas Moal

Posted on April 12, 2023

GitLab CI speeding up your pipeline with caching

Introduction

GitLab CI comes with a cache system very useful when you want to speed up your pipeline.

In this article, we’ll see how we can use the cache system to speed up our pipeline.

📌 NOTE
Don’t use cache to store build results, artifacts are made for this.

What you should know about GitLab CI cache

The cache is stored where the gitlab runner is installed.
It can be uploaded if the distributed cache is enabled.
If you’re running your runners in Kubernetes, the distributed cache is a must-have.

You can clear the cache when necessary.

You can use a fallback key when your cache doesn’t exist. Useful to share cache between branch, while still keeping some isolation.

You can have up to 4 caches per pipeline.

You can tell GitLab whether you want to pull-push (download and update) the cache, or if you just want to pull it.

❗ IMPORTANT
For security reason, GitLab creates a different cache if the branch is protected or not. This feature is configurable, but active by default on your project. You can read more on this here

Let's play with the caching system

Say that you have a python project, for which you need to run linters, tests suites... And finally, you want to package your application.
And you have the following constraints:

  • Your final package shall only contain what’s necessary to run
  • Only download dependencies when they’ve changed
  • The cache from the main branch is the reference

In this example, we’ll set up the cache so that:

  • Each branch will have its own cache
  • When building on a new branch, the build will fall back to the main branch cache
  • The dependencies will only be downloaded and updated when necessary

For clarity, we’ll only use fake jobs in this example.

Defining rules to manage cache updates

.deps_update_rules:
  rules:
    - if: $CI_COMMIT_BRANCH != "main" ①
      changes:
        paths:
          - my-referent-file.lock ②
        compare_to: refs/heads/main ③
    - if: $CI_COMMIT_BRANCH == "main" ④
      when: always
    - if: $CI_COMMIT_TAG != null
      when: always
Enter fullscreen mode Exit fullscreen mode
  1. When not on the main branch, we only update dependency if necessary.
  2. Put all the relevant files here
  3. Compares changes with the main branch
  4. On the main branch, always update dependencies
  5. When tagging the repository, update the dependencies

This hidden field defines rules that will be reused by our job.
If you aren’t familiar with rule, check out this documentation.
Feel free to adapt those rules to your workflow.

Creating the cache

Before creating the jobs to update our caches, let’s first define some reusable stuff.
We’ll leverage some predefined variables to make this work properly.

variables:
  DEFAULT_CACHE_KEY_PREFIX: ${CI_DEFAULT_BRANCH} ①
  CACHE_KEY_PREFIX: ${CI_COMMIT_REF_SLUG} ②
  DEPS_FOLDER: ${CI_PROJECT_DIR}/.deps ③

.deps_dev_cache: 
  cache: &deps_dev_cache
    key: ${CACHE_KEY_PREFIX}-deps-dev ⑤
    paths:
      - ${DEPS_FOLDER}
    policy: pull ⑥

.deps_run_cache: 
  cache: &deps_run_cache
    key: ${CACHE_KEY_PREFIX}-deps-run
    paths:
      - ${DEPS_FOLDER}
    policy: pull
Enter fullscreen mode Exit fullscreen mode
  1. Use to define the fallback cache key in the job
  2. Use to define the cache key to be used by the job
  3. The location of the downloaded dependencies (the folder we want to keep)
  4. The cache configuration for jobs that needs the dev dependencies
  5. The key to use to find the cache with a suffix to identify dev dependencies
  6. The default policy we want to use, most jobs only need to download the cache, not to update it.
  7. The cache configuration for jobs that needs the run dependencies

📌 NOTE
The environment variables with the CI_ prefix are predefined variables.

Now that we have our predefined cache configuration, let’s use them in the jobs that need to update the dependencies.

dev-dependencies: 
  stage: prepare
  script:
    - echo "I am a dependency" > ./deps.txt
    - echo "I am a dev dependency" >> ./deps.txt
  cache:
    <<: *deps_dev_cache
    policy: pull-push
  rules: !reference [.deps_update_rules, rules]

run-dependencies: 
  stage: prepare
  script:
    - echo "I am a dependency" > ./deps.txt
  cache:
    <<: *deps_run_cache
    policy: pull-push
  rules: !reference [.deps_update_rules, rules]
Enter fullscreen mode Exit fullscreen mode
  1. This job can download all dependency groups as its cache will be used for linting, testing...
  2. This job shall download only the run dependency (for example poetry install --only main).

Using the cache

Now, we can use those caches to test or package our application.

test-app:
  stage: test
  script:
    - cat ./deps.txt
  cache:
    <<: *deps_dev_cache

package-app:
  stage: build
  script:
    - cat ./deps.txt
  cache:
    <<: *deps_run_cache
Enter fullscreen mode Exit fullscreen mode

📌 NOTE
Using this method, the pipeline must run at least once on the main branch before the other branches.

💖 💪 🙅 🚩
jnmoal
Jean-Nicolas Moal

Posted on April 12, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related