GitLab CI/CD Pipelines: Best Practices for Monorepos

ichintansoni

Chintan Soni

Posted on July 1, 2024

GitLab CI/CD Pipelines: Best Practices for Monorepos

Hello everyone! This article is for those who want to optimize their CI/CD pipelines using best practices in a monorepo setup.

To provide a clear walkthrough, let’s consider the following example:

Project structure:

Image description

Initial .gitlab-ci.yml:

stages:
  - build
  - test
  - deploy

build-a:
  stage: build
  script:
    - ...

test-a:
  stage: test
  script:
    - ...

deploy-a:
  stage: deploy
  script:
    - ...

build-b:
  stage: build
  script:
    - ...

test-b:
  stage: test
  script:
    - ...

deploy-b:
  stage: deploy
  script:
    - ...

build-c:
  stage: build
  script:
    - ...

test-c:
  stage: test
  script:
    - ...

deploy-c:
  stage: deploy
  script:
    - ...
Enter fullscreen mode Exit fullscreen mode

The above configuration can quickly become unmanageable as the number of projects in the monorepo increases.

Why is this a problem?

  • Unnecessary Job Triggers: A single commit will trigger all jobs, regardless of the scope of the change. For instance, a commit made for changes in project-a will also trigger jobs for project-b and project-c, which is inefficient..

Screenshot of original pipeline

  • Reduced Readability: The CI/CD configuration becomes less readable and harder to maintain, especially with environment-specific jobs for dev, QA, UAT, and prod.
  • Increased Complexity: The setup becomes fragile, making it easy for anyone to inadvertently disrupt the pipeline. It requires more expertise to understand the scope, impact of changes, and dependencies of jobs.

How to solve this?

We will perform a series of steps to optimize the above pipeline. Let’s start.

Parent-Child Pipelines Architecture

With this approach, you will create a child pipeline, meaning a separate CI/CD file, only for that particular project. Move the relevant code into that project’s .gitlab-ci.yml. Below is the example for project-a, and similarly, it can be replicated for project-b and project-c:

project-a/.gitlab-ci.yml:

stages:
  - build
  - test
  - deploy

build-a:
  stage: build
  script:
    - ...

test-a:
  stage: test
  script:
    - ...

deploy-a:
  stage: deploy
  script:
    - ...
Enter fullscreen mode Exit fullscreen mode

Then, link the child pipeline to the parent as below:

Root .gitlab-ci.yml:

stages:
  - triggers

trigger-project-a:
  stage: triggers
  trigger:
    include: project-a/.gitlab-ci.yml

trigger-project-b:
  stage: triggers
  trigger:
    include: project-b/.gitlab-ci.yml

trigger-project-c:
  stage: triggers
  trigger:
    include: project-c/.gitlab-ci.yml
Enter fullscreen mode Exit fullscreen mode

With this simple refactor, the pipeline structure becomes more manageable:

Screenshot after implementing Parent-child architecture

Use rules: changes

To scope job execution to project-level changes, we can modify the pipeline to trigger jobs only when changes are made to specific projects.

Root .gitlab-ci.yml:

stages:
  - triggers

trigger-project-a:
  stage: triggers
  trigger:
    include: project-a/.gitlab-ci.yml
  rules:
    - changes:
      - project-a/**/*

trigger-project-b:
  stage: triggers
  trigger:
    include: project-b/.gitlab-ci.yml
  rules:
    - changes:
      - project-b/**/*

trigger-project-c:
  stage: triggers
  trigger:
    include: project-c/.gitlab-ci.yml
  rules:
    - changes:
      - project-c/**/*
Enter fullscreen mode Exit fullscreen mode

If you see duplicate pipelines running (a commit to a branch triggering the pipeline twice), you can add the following rule:

trigger-project-a:
  rules:
    - if '$CI_PIPELINE_SOURCE == "merge_request_event"'
      when: never
Enter fullscreen mode Exit fullscreen mode

Result:

Screenshot after implementing rules:changes

Use YAML Anchors:

YAML anchors allow for the reuse of common configuration blocks, increasing reusability and reducing redundancy, especially when targeting multiple environments like dev, QA, staging, and prod.

project-a/.gitlab-ci.yml:

.base-build:
  stage: build
  image: node:22-alpine
  variables: ...
  before_script:
    - cd project-a

build-a-dev:
  extends: .base-build
  script:
    - export ENV = "dev"
    - // build steps for dev

build-a-qa:
  extends: .base-build
  script:
    - export ENV = "qa"
    - // build steps for qa

build-a-staging:
  extends: .base-build
  script:
    - export ENV = "staging"
    - // build steps for staging

build-a-prod:
  extends: .base-build
  script:
    - export ENV = "prod"
    - // build steps for prod
Enter fullscreen mode Exit fullscreen mode

If you want to reuse only specific blocks of an anchor, you can use !reference as below:

build-a-dev:
  before_script: !reference [.base-build, before_script]
  script:
    - export ENV = "dev"
    - // build steps for dev
Enter fullscreen mode Exit fullscreen mode

Using needs for Proper Job Chaining

We can create dependencies between jobs using needs, ensuring proper execution order.

build-a:
  stage: build
  script:
    - ...

test-a:
  stage: test
  needs: [build-a]
  script:
    - ...

deploy-a:
  stage: deploy
  needs: [test-a]
  script:
    - ...
Enter fullscreen mode Exit fullscreen mode

Result:
Screenshot after implementing needs

Parallel Job Execution

To execute multiple jobs in parallel, for example, if there’s a check stage before the build stage, with a check-a job performing static code analysis, lint checks, etc., you can configure it as below:

stages:
  - check
  - build
  - ...

check-a:
  stage: check
  needs: []
  script:
    - ...

build-a:
  stage: build
  needs: []
  script:
    - ...

test-a:
  stage: build
  needs: [build-a]
  script:
    - ...

deploy-a:
  stage: build
  needs: [test-a]
  script:
    - ...
Enter fullscreen mode Exit fullscreen mode

Result:

Screenshot for parallel execution

Source Code

You can find the source code here: https://gitlab.com/iChintanSoni/learning-ci-cd/

Conclusion

Optimizing CI/CD pipelines in a monorepo setup can significantly enhance the efficiency, readability, and maintainability of your projects. By adopting best practices such as using parent-child pipeline architecture, applying rules: changes, leveraging YAML anchors, and strategically utilizing needs for job chaining, you can create a more robust and scalable pipeline.

These techniques not only help in minimizing unnecessary job executions but also streamline the overall development workflow, making it easier to manage complex projects. By implementing these best practices, you ensure that your CI/CD processes are both efficient and adaptable to the evolving needs of your monorepo.

I hope this guide helps you in refining your GitLab CI/CD pipelines. If you have any questions or additional tips, feel free to share them in the comments below. Happy coding!

💖 💪 🙅 🚩
ichintansoni
Chintan Soni

Posted on July 1, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related