Building CI/CD for Vertex AI pipelines: The production
Oleksandr Borodavka
Posted on April 26, 2023
Hi there! As you probably already know from the first few articles of this series, we tested some new ideas and tools with a POC for one simple pipeline. Today we will review a new generalized version of CI/CD for Vertex AI pipelines we have built based on that experience and some further investigations.
A bit of context
Let's recall some base points to refresh the context:
Vertex AI is used to build training pipelines.
GitHub Actions is our CI/CD tool.
We built the declarative framework that allows us to standardize the format and operations for all our components and pipelines.
The specifications and implementations of all our components and pipelines are kept in one GitHub repository.
There are three environments: development(DEV), staging(STAGE) and production(PROD).
Stating the task
What do we want to achieve?
In simple words, we want to automate everything as much as possible. Ideally, when new changes appear in our code repository we want these changes applied in production in as short time a period as possible without any manual effort. Moreover, it should be done in a stable, safe, reproducible, and effective way.
Running a little ahead, there are three absolutely awesome things in our CI/CD practice. In all deployment environments, our Continuous Integration system:
- automatically rebuilds components and runs their unit and integration tests
- builds pipelines changed in Pull Requests
- and rebuilds dependent pipelines.
The way of code
Okay how can we achieve this?
Since developers work with the codebase in the form of Pull Requests we can use them as starting points for the workflows. There are two moments that we have to automate.
The first one is when a pull request is opened (or updated). Here we want to run the more basic and faster checks for the coming changes to provide quick feedback for a developer. These generally are unit tests and building jobs (just a build to check if configurations are okay) on our DEV environment. Then, if everything is fine, we have to run integration tests for components and run our pipelines on the STAGE environment to be sure it is ready to be merged.
The second moment is when the PR is approved and merged. Here we have the changes which have already been tested, reviewed, and merged into the main branch, so it is ready for delivery. All the processes are run again but now on the PROD environment this time.
The implementation
For both stages, we use GitHub Actions workflows and some CLI commands of the Python framework, since routines related to code analysis are more expedient to implement with the framework’s specific code. It probably doesn’t make sense to review all the code, that would be too long and too specific. However, we can take a look at the general structure and a bit of a simplified version of the workflows that have detail enough to present the idea.
Here is the structure of the source code:
pipelines/
pipeline1.yaml
...
components/
component1/
src/
...
tests/
unit/
...
integration/
test.yaml
config.yaml
There is a folder with pipeline specifications and a folder with components. Each component contains source files, tests, and a configuration.
And it is how the GitHub Actions directory looks like:
.github
actions
build_component
build_pipeline
run_pipeline
test_component
test_component_integration
workflows
pr_merged.yml
pr_opened.yml
Where the actions
is a directory with reusable composite actions, which do all the operations we need with components and pipelines (build, test, and run). And in the workflows
directory, we have two workflows that are automatically triggered by GitHub when a Pull Request is opened/updated or merged respectively.
PR is opened
Let's take a look at the first workflow:
name: MLOps Pull Request - Opened
on:
# The workflow will be run automatically when a PR is
# opened to the main branch or changes to the PR are pushed
pull_request:
types: [opened, synchronize, reopened]
branches:
- "main"
paths:
- "pipelines/**/*.yaml"
- "components/**/*.yaml"
- "components/**/*.py"
# Use concurrency to ensure that only a single workflow
# using the same concurrency group will run at a time
concurrency: dev_stage_environment
jobs:
git_diff:
# Get a list of changed files in the PR for the future analysis
runs-on: ubuntu-latest
outputs:
diff: ${{ steps.getter.outputs.diff }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Get git diff
id: getter
run: |
GIT_DIFF="$(echo $(git diff --name-only origin/main...origin/${GITHUB_HEAD_REF}))"
echo "::set-output name=diff::$GIT_DIFF"
get_component_list:
# Get a list of names for the added/changed components
needs: git_diff
runs-on: ubuntu-latest
outputs:
names: ${{ steps.getter.outputs.names }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get list
id: getter
run: |
NAMES="$(make get_components --paths='${{ needs.git_diff.outputs.diff }}')"
echo "::set-output name=names::$NAMES"
test_and_build_components_on_dev:
# Test and build changed components on the DEV environment
needs: get_component_list
runs-on: ubuntu-latest
environment: development
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_component_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Test component
uses: ./.github/actions/test_component
with:
component_name: ${{ matrix.name }}
- name: Build component
uses: ./.github/actions/build_component
with:
component_name: ${{ matrix.name }}
get_pipeline_list:
# Get a list of names for the added/changed pipelines
needs: [ git_diff, test_and_build_components_on_dev ]
runs-on: ubuntu-latest
outputs:
names: ${{ steps.getter.outputs.names }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get list
id: getter
run: |
NAMES="$(make get_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
echo "::set-output name=names::$NAMES"
build_pipelines_on_dev:
# Build changed pipelines on the DEV environment
needs: get_pipeline_list
runs-on: ubuntu-latest
environment: development
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_pipeline_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build pipeline
uses: ./.github/actions/build_pipeline
with:
pipeline_name: ${{ matrix.name }}
get_indirect_pipeline_list:
# Get a list of names for the indirectly changed pipelines
# (when a related component was changed)
needs: [ git_diff, build_pipelines_on_dev ]
runs-on: ubuntu-latest
outputs:
names: ${{ steps.getter.outputs.names }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get list
id: getter
run: |
NAMES="$(make get_indirect_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
echo "::set-output name=names::$NAMES"
build_indirect_pipelines_on_dev:
# Build indirectly changed pipelines on the DEV environment
needs: get_indirect_pipeline_list
runs-on: ubuntu-latest
environment: development
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build pipeline
uses: ./.github/actions/build_pipeline
with:
pipeline_name: ${{ matrix.name }}
build_components_and_test_integration_on_stage:
# Build changed components and run integration tests
# for them on the STAGE environment
needs: [ build_pipelines_on_dev, build_indirect_pipelines_on_dev, get_component_list ]
runs-on: ubuntu-latest
environment: staging
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_component_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build component
uses: ./.github/actions/build_component
with:
component_name: ${{ matrix.name }}
- name: Test component integration
uses: ./.github/actions/test_component_integration
with:
component_name: ${{ matrix.name }}
build_and_run_pipelines_on_stage:
# Build changed pipelines and run them on the STAGE environment
needs: [ build_components_and_test_integration_on_stage, get_pipeline_list ]
runs-on: ubuntu-latest
environment: staging
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_pipeline_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build pipeline
uses: ./.github/actions/build_pipeline
with:
pipeline_name: ${{ matrix.name }}
- name: Run pipeline
uses: ./.github/actions/run_pipeline
with:
pipeline_name: ${{ matrix.name }}
build_and_run_indirect_pipelines_on_stage:
# Build indirectly changed pipelines and run them on the STAGE environment
needs: [ build_and_run_pipelines_on_stage, get_indirect_pipeline_list ]
runs-on: ubuntu-latest
environment: staging
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build pipeline
uses: ./.github/actions/build_pipeline
with:
pipeline_name: ${{ matrix.name }}
- name: Run pipeline
uses: ./.github/actions/run_pipeline
with:
pipeline_name: ${{ matrix.name }}
It is automatically called when a pull request is opened or updated. Due to the concurrency feature, it will be run once at a time. However, all the similar jobs, like tests will be run in parallel.
The logic inside is as follows:
Analyze the changes in a pull request and find out which components and/or pipelines were affected.
Build them in the predefined order.
Run automated tests for the components.
Run the pipelines to retrain models and deliver them for the final usage.
It is universal and works with any components and pipelines when they follow the framework agreements. Also, it is safe, works without duplicates, runs the jobs in the right order, and parallelizes them when it is possible.
The current workflow operates on development and staging environments.
PR is merged
The second workflow is run when PR is merged.
name: MLOps Pull Request - Merged
on:
# The workflow will be run automatically when a PR is closed
pull_request:
types:
- closed
branches:
- "main"
paths:
- "pipelines/**/*.yaml"
- "components/**/*.yaml"
- "components/**/*.py"
# Use concurrency to ensure that only a single workflow
# using the same concurrency group will run at a time
concurrency: prod_environment
jobs:
if_merged:
# There is no way to trigger the workflow when it was merged
# (for now we know only it was closed)
# so we have to check it at the first job
if: github.event.pull_request.merged == true
runs-on: ubuntu-latest
steps:
- run: echo The PR was merged
git_diff:
# Get a list of changed files in the PR for the future analysis
needs: if_merged
runs-on: ubuntu-latest
outputs:
diff: ${{ steps.getter.outputs.diff }}
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Get git diff
id: getter
run: |
GIT_DIFF="$(echo $(git diff --name-only ${GITHUB_SHA}^ ${GITHUB_SHA}))"
echo "::set-output name=diff::$GIT_DIFF"
get_component_list:
# Get a list of names for the added/changed components
needs: git_diff
runs-on: ubuntu-latest
outputs:
names: ${{ steps.getter.outputs.names }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get list
id: getter
run: |
NAMES="$(make get_components --paths='${{ needs.git_diff.outputs.diff }}')"
echo "::set-output name=names::$NAMES"
test_and_build_components_on_prod:
# Test and build changed components on the PROD environment
needs: get_component_list
runs-on: ubuntu-latest
environment: production
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_component_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Test component
uses: ./.github/actions/test_component
with:
component_name: ${{ matrix.name }}
- name: Build component
uses: ./.github/actions/build_component
with:
component_name: ${{ matrix.name }}
- name: Test component integration
uses: ./.github/actions/test_component_integration
with:
component_name: ${{ matrix.name }}
get_pipeline_list:
# Get a list of names for the added/changed pipelines
needs: [ git_diff, test_and_build_components_on_prod ]
runs-on: ubuntu-latest
outputs:
names: ${{ steps.getter.outputs.names }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get list
id: getter
run: |
NAMES="$(make get_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
echo "::set-output name=names::$NAMES"
build_and_run_pipelines_on_prod:
# Build and run changed pipelines on the PROD environment
needs: [ test_and_build_components_on_prod, get_pipeline_list ]
runs-on: ubuntu-latest
environment: production
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_pipeline_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build pipeline
uses: ./.github/actions/build_pipeline
with:
pipeline_name: ${{ matrix.name }}
- name: Run pipeline
uses: ./.github/actions/run_pipeline
with:
pipeline_name: ${{ matrix.name }}
get_indirect_pipeline_list:
# Get a list of names for the indirectly changed pipelines
# (when a related component was changed)
needs: [ git_diff, build_and_run_pipelines_on_prod ]
runs-on: ubuntu-latest
outputs:
names: ${{ steps.getter.outputs.names }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Get list
id: getter
run: |
NAMES="$(make get_indirect_pipelines --paths='${{ needs.git_diff.outputs.diff }}')"
echo "::set-output name=names::$NAMES"
build_and_run_indirect_pipelines_on_prod:
# Build indirectly changed pipelines and run them on the PROD environment
needs: [ build_and_run_pipelines_on_prod, get_indirect_pipeline_list ]
runs-on: ubuntu-latest
environment: production
strategy:
# Use matrix strategy to run the tasks in parallel
matrix:
name: ${{ fromJson(needs.get_indirect_pipeline_list.outputs.names) }}
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Build pipeline
uses: ./.github/actions/build_pipeline
with:
pipeline_name: ${{ matrix.name }}
- name: Run pipeline
uses: ./.github/actions/run_pipeline
with:
pipeline_name: ${{ matrix.name }}
This workflow is quite similar to the previous one, just runs all the jobs in the production environment.
Conclusion
That’s it! The presented solution has been working well for us for more than 6 months already. We have some ideas on how to make it even better and maybe we will share the results with the community in the next articles.
I hope this will be be useful to you in your MLOps journey and helps you to save some time while building your own CI/CD process for ML pipelines.
Please, feel free to share any thoughts, questions, or proposals in the comments.
Thank you and happy coding!
Posted on April 26, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.