How to cache dependencies in GitLab

rhamdeew

Rail

Posted on April 8, 2021

How to cache dependencies in GitLab

Hi everybody!
Today I want to tell you about my experience of using GitLab CI dependency caching.

Why is it needed

I have a small pet project where I usually experiment with new technologies and approaches. The repository of this project is stored in GitLab. There I configured CI/CD tasks for testing and deploying a project.

CI-task with testing usually completed in 2 minutes. But every time I thought about what actions are being performed in this time. An example is installing Python dependencies.

On the one hand, this guarantees reproducible builds (let's say hello to leftpad and mimemagick 😄).

But on the other hand, these actions are performed every time when I push changes to the repository. And that's just a pet project.

Let's try to enable caching 🤟

Here is an official GitLab documentation about CI caching with examples - https://docs.gitlab.com/ee/ci/caching

The project on which I tested CI-caching is written on Django and uses poetry for dependency and virtual environments management.

What .gitlab-ci.yml looked like before the changes



stages:
  - tests
  - deploy

tests:
  stage: tests
  image: python:3.7-slim
  script:
    - apt-get update -qy && apt-get install -y build-essential
    - pip --no-cache-dir install poetry
    - poetry config virtualenvs.create false && poetry install --no-root
    - sed 's/#DATABASE_URL/DATABASE_URL/g' telega/.env.example > telega/.env
    - coverage run --source='.' manage.py test && coverage report -m


Enter fullscreen mode Exit fullscreen mode

Here We install Debian packages and then install poetry through pip and install project dependencies with poetry.

How .gitlab-ci.yml looks like after the changes



stages:
  - tests
  - deploy

variables:
  PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"

cache:
  key:
    files:
      - poetry.lock
      - .gitlab-ci.yml
    prefix: ${CI_JOB_NAME}
  paths:
    - .venv
    - .cache/pip

tests:
  stage: tests
  image: python:3.7-slim
  script:
    - apt-get update -qy && apt-get install -y build-essential
    - pip install poetry
    - poetry config virtualenvs.in-project true
    - poetry install --no-root
    - sed 's/#DATABASE_URL/DATABASE_URL/g' telega/.env.example > telega/.env
    - poetry run coverage run manage.py test && poetry run coverage report -m


Enter fullscreen mode Exit fullscreen mode

I added some settings to tell pip and poetry where packages should be stored. Then I added 'cache' section and set poetry.lock and .gitlab-ci.yml files as key for cache.

This means that if at least one of the files is changed then packages should be installed from PyPI, but in another case will be used cached directories with already installed packages.

Results

CI-task running time is decreased from 2 minutes to 1 minute. Of course, the checking and unpacking cache operation was added, but it's still faster than installing dependencies from PyPI.

In the screenshot with the task logs, we can see how pip use the cache.

GitLab CI job logs

And here We can see that the poetry did not install anything new.

GitLab CI job logs

Cache dependencies in GitLab CI/CD are a powerful tool for faster-running tasks and the economy of resources.

💖 💪 🙅 🚩
rhamdeew
Rail

Posted on April 8, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related