Jean-Nicolas Moal
Posted on April 12, 2023
Introduction
GitLab CI comes with a cache system very useful when you want to speed up your pipeline.
In this article, we’ll see how we can use the cache system to speed up our pipeline.
📌 NOTE
Don’t use cache to store build results, artifacts are made for this.
What you should know about GitLab CI cache
The cache is stored where the gitlab runner is installed.
It can be uploaded if the distributed cache is enabled.
If you’re running your runners in Kubernetes, the distributed cache is a must-have.
You can clear the cache when necessary.
You can use a fallback key when your cache doesn’t exist. Useful to share cache between branch, while still keeping some isolation.
You can have up to 4 caches per pipeline.
You can tell GitLab whether you want to pull-push (download and update) the cache, or if you just want to pull it.
❗ IMPORTANT
For security reason, GitLab creates a different cache if the branch is protected or not. This feature is configurable, but active by default on your project. You can read more on this here
Let's play with the caching system
Say that you have a python project, for which you need to run linters, tests suites... And finally, you want to package your application.
And you have the following constraints:
- Your final package shall only contain what’s necessary to run
- Only download dependencies when they’ve changed
- The cache from the main branch is the reference
In this example, we’ll set up the cache so that:
- Each branch will have its own cache
- When building on a new branch, the build will fall back to the main branch cache
- The dependencies will only be downloaded and updated when necessary
For clarity, we’ll only use fake jobs in this example.
Defining rules to manage cache updates
.deps_update_rules:
rules:
- if: $CI_COMMIT_BRANCH != "main" ①
changes:
paths:
- my-referent-file.lock ②
compare_to: refs/heads/main ③
- if: $CI_COMMIT_BRANCH == "main" ④
when: always
- if: $CI_COMMIT_TAG != null ⑤
when: always
- When not on the main branch, we only update dependency if necessary.
- Put all the relevant files here
- Compares changes with the main branch
- On the main branch, always update dependencies
- When tagging the repository, update the dependencies
This hidden field defines rules that will be reused by our job.
If you aren’t familiar with rule, check out this documentation.
Feel free to adapt those rules to your workflow.
Creating the cache
Before creating the jobs to update our caches, let’s first define some reusable stuff.
We’ll leverage some predefined variables to make this work properly.
variables:
DEFAULT_CACHE_KEY_PREFIX: ${CI_DEFAULT_BRANCH} ①
CACHE_KEY_PREFIX: ${CI_COMMIT_REF_SLUG} ②
DEPS_FOLDER: ${CI_PROJECT_DIR}/.deps ③
.deps_dev_cache: ④
cache: &deps_dev_cache
key: ${CACHE_KEY_PREFIX}-deps-dev ⑤
paths:
- ${DEPS_FOLDER}
policy: pull ⑥
.deps_run_cache: ⑦
cache: &deps_run_cache
key: ${CACHE_KEY_PREFIX}-deps-run
paths:
- ${DEPS_FOLDER}
policy: pull
- Use to define the fallback cache key in the job
- Use to define the cache key to be used by the job
- The location of the downloaded dependencies (the folder we want to keep)
- The cache configuration for jobs that needs the dev dependencies
- The key to use to find the cache with a suffix to identify dev dependencies
- The default policy we want to use, most jobs only need to download the cache, not to update it.
- The cache configuration for jobs that needs the run dependencies
📌 NOTE
The environment variables with the CI_
prefix are predefined variables.
Now that we have our predefined cache configuration, let’s use them in the jobs that need to update the dependencies.
dev-dependencies: ①
stage: prepare
script:
- echo "I am a dependency" > ./deps.txt
- echo "I am a dev dependency" >> ./deps.txt
cache:
<<: *deps_dev_cache
policy: pull-push
rules: !reference [.deps_update_rules, rules]
run-dependencies: ②
stage: prepare
script:
- echo "I am a dependency" > ./deps.txt
cache:
<<: *deps_run_cache
policy: pull-push
rules: !reference [.deps_update_rules, rules]
- This job can download all dependency groups as its cache will be used for linting, testing...
- This job shall download only the run dependency (for example
poetry install --only main
).
Using the cache
Now, we can use those caches to test or package our application.
test-app:
stage: test
script:
- cat ./deps.txt
cache:
<<: *deps_dev_cache
package-app:
stage: build
script:
- cat ./deps.txt
cache:
<<: *deps_run_cache
📌 NOTE
Using this method, the pipeline must run at least once on the main branch before the other branches.
Posted on April 12, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
September 6, 2024