Chintan Soni
Posted on July 1, 2024
Hello everyone! This article is for those who want to optimize their CI/CD pipelines using best practices in a monorepo setup.
To provide a clear walkthrough, let’s consider the following example:
Project structure:
Initial .gitlab-ci.yml:
stages:
- build
- test
- deploy
build-a:
stage: build
script:
- ...
test-a:
stage: test
script:
- ...
deploy-a:
stage: deploy
script:
- ...
build-b:
stage: build
script:
- ...
test-b:
stage: test
script:
- ...
deploy-b:
stage: deploy
script:
- ...
build-c:
stage: build
script:
- ...
test-c:
stage: test
script:
- ...
deploy-c:
stage: deploy
script:
- ...
The above configuration can quickly become unmanageable as the number of projects in the monorepo increases.
Why is this a problem?
- Unnecessary Job Triggers: A single commit will trigger all jobs, regardless of the scope of the change. For instance, a commit made for changes in project-a will also trigger jobs for project-b and project-c, which is inefficient..
- Reduced Readability: The CI/CD configuration becomes less readable and harder to maintain, especially with environment-specific jobs for dev, QA, UAT, and prod.
- Increased Complexity: The setup becomes fragile, making it easy for anyone to inadvertently disrupt the pipeline. It requires more expertise to understand the scope, impact of changes, and dependencies of jobs.
How to solve this?
We will perform a series of steps to optimize the above pipeline. Let’s start.
Parent-Child Pipelines Architecture
With this approach, you will create a child pipeline, meaning a separate CI/CD file, only for that particular project. Move the relevant code into that project’s .gitlab-ci.yml
. Below is the example for project-a
, and similarly, it can be replicated for project-b
and project-c
:
project-a/.gitlab-ci.yml:
stages:
- build
- test
- deploy
build-a:
stage: build
script:
- ...
test-a:
stage: test
script:
- ...
deploy-a:
stage: deploy
script:
- ...
Then, link the child pipeline to the parent as below:
Root .gitlab-ci.yml:
stages:
- triggers
trigger-project-a:
stage: triggers
trigger:
include: project-a/.gitlab-ci.yml
trigger-project-b:
stage: triggers
trigger:
include: project-b/.gitlab-ci.yml
trigger-project-c:
stage: triggers
trigger:
include: project-c/.gitlab-ci.yml
With this simple refactor, the pipeline structure becomes more manageable:
Use rules: changes
To scope job execution to project-level changes, we can modify the pipeline to trigger jobs only when changes are made to specific projects.
Root .gitlab-ci.yml:
stages:
- triggers
trigger-project-a:
stage: triggers
trigger:
include: project-a/.gitlab-ci.yml
rules:
- changes:
- project-a/**/*
trigger-project-b:
stage: triggers
trigger:
include: project-b/.gitlab-ci.yml
rules:
- changes:
- project-b/**/*
trigger-project-c:
stage: triggers
trigger:
include: project-c/.gitlab-ci.yml
rules:
- changes:
- project-c/**/*
If you see duplicate pipelines running (a commit to a branch triggering the pipeline twice), you can add the following rule:
trigger-project-a:
rules:
- if '$CI_PIPELINE_SOURCE == "merge_request_event"'
when: never
Result:
Use YAML Anchors:
YAML anchors allow for the reuse of common configuration blocks, increasing reusability and reducing redundancy, especially when targeting multiple environments like dev, QA, staging, and prod.
project-a/.gitlab-ci.yml:
.base-build:
stage: build
image: node:22-alpine
variables: ...
before_script:
- cd project-a
build-a-dev:
extends: .base-build
script:
- export ENV = "dev"
- // build steps for dev
build-a-qa:
extends: .base-build
script:
- export ENV = "qa"
- // build steps for qa
build-a-staging:
extends: .base-build
script:
- export ENV = "staging"
- // build steps for staging
build-a-prod:
extends: .base-build
script:
- export ENV = "prod"
- // build steps for prod
If you want to reuse only specific blocks of an anchor, you can use !reference
as below:
build-a-dev:
before_script: !reference [.base-build, before_script]
script:
- export ENV = "dev"
- // build steps for dev
Using needs for Proper Job Chaining
We can create dependencies between jobs using needs
, ensuring proper execution order.
build-a:
stage: build
script:
- ...
test-a:
stage: test
needs: [build-a]
script:
- ...
deploy-a:
stage: deploy
needs: [test-a]
script:
- ...
Parallel Job Execution
To execute multiple jobs in parallel, for example, if there’s a check stage before the build stage, with a check-a job performing static code analysis, lint checks, etc., you can configure it as below:
stages:
- check
- build
- ...
check-a:
stage: check
needs: []
script:
- ...
build-a:
stage: build
needs: []
script:
- ...
test-a:
stage: build
needs: [build-a]
script:
- ...
deploy-a:
stage: build
needs: [test-a]
script:
- ...
Result:
Source Code
You can find the source code here: https://gitlab.com/iChintanSoni/learning-ci-cd/
Conclusion
Optimizing CI/CD pipelines in a monorepo setup can significantly enhance the efficiency, readability, and maintainability of your projects. By adopting best practices such as using parent-child pipeline architecture, applying rules: changes, leveraging YAML anchors, and strategically utilizing needs for job chaining, you can create a more robust and scalable pipeline.
These techniques not only help in minimizing unnecessary job executions but also streamline the overall development workflow, making it easier to manage complex projects. By implementing these best practices, you ensure that your CI/CD processes are both efficient and adaptable to the evolving needs of your monorepo.
I hope this guide helps you in refining your GitLab CI/CD pipelines. If you have any questions or additional tips, feel free to share them in the comments below. Happy coding!
Posted on July 1, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
January 26, 2024