Validating Python code with a CodeCatalyst Pipeline.
Simon Hanmer
Posted on January 18, 2024
This article will form part of a series that will explore how to validate Python code, deploy it as a Lambda Function via Terraform, through to defining a standard template that can be used to create Lambdas.
- Our example Python Code
- What will the pipeline do?
- Creating a Pipeline
- Running the pipeline
- Conclusion
CodeCatalyst is a unified development environment created by AWS.
It has many features such as blueprints to assist in writing code, integrated Git repositories, dev environments which can be pre-defined and now AI integration. However, for myself, one of the most useful things is being able to define and use pipelines stored in the code repository.
Pipelines are one of the most commonly used tools that many of us working with code and Cloud use, allowing us to automate tasks to be carried out when we make changes to our code, whether that's checking that our code works, building artefacts and packages, and deploying to our environments.
In this post, I'll share an example pipeline that we can use to validate some Python code as we work on it. Whilst this post doesn't cover all options available when working with pipelines, it should be enough to explain how the pipeline works and how you can modify them in your workflows.
⚠️ To streamline the post, I'll assume you understand how CodeCatalyst works and can create and work with code in repositories.
Our example Python Code
For this post, I'm using some example code from https://github.com/headforthecloud/example-python-lambda which defines a simple function that could be used as an AWS Lambda function. The code is structured like:
The code for the main function lambda_function.py
, shown below, just sets up some logging, outputs a message and returns a status code to indicate it ran successfully. It also includes a couple of functions that could be used to demonstrate testing:
python
#! /usr/bin/env python
""" An example lambda function """
import os
import json
import logging
# define a logger using logging library. If LOG_LEVEL is not set, default to INFO.
# otherwise use value of LOG_LEVEL
logger = logging.getLogger()
logger.setLevel(os.getenv('LOG_LEVEL', 'INFO'))
def lambda_handler(event, context):
""" define a lambda_handler function that takes in an event and a context """
logger.info("Hello from Lambda!")
return {
"statusCode": 200,
"body": json.dumps(event)
}
def add_x_y(x, y):
""" This is a simple function that adds two numbers together and returns the result. """
return x + y
def multiply_x_y(x, y):
""" This is a simple function that multiplies two numbers together and returns the result. """
return x * y
# if this file is run directly, run the lambda_handler function with dummy event and context
if __name__ == '__main__':
lambda_handler(None, None)
We also have some code written using the PyTest framework, in a folder called tests
.
What will the pipeline do?
In this example, we're going to perform a set of actions which are typical of a pipeline used with Python:
Linting - we do this to make sure our code meets general best practices in terms of code, and that it should at least run. For this, we'll use a well-known tool called PyLint.
Vulnerability Scanning - we do this to try and make sure our code doesn't contain any security issues such as secret values, possible SQL injection routes etc. We will use a tool called Bandit for this.
Automated Testing - we want to make sure that our code performs as expected. To this end, we'll use the PyTest framework and check that our tests work and that we test an appropriate amount of our code.
Reporting - for each step, we will use the CodeCatalyst functionality to generate reports showing the outcome of each step and whether it was successful.
Creating a Pipeline
There are two approaches to generating or modifying a pipeline with CodeCatalyst - either via a visual editor built into https://codecatalyst.aws or working in the repository and defining a pipeline using a YAML file in the .codecatalyst/workflows
file.
📖 The full definition for the pipelines can be found here
For this example, I'll use the latter approach, working with a file called .codecatalyst/workflows/python-testing-pipeline.yaml
. If you'd like to see the full file, it's available in GitHub
General Configuration
Firstly, we're going to define where and when the pipeline will run with this code:
SchemaVersion: "1.0"
Name: python-testing-pipeline
Compute:
Type: EC2
Triggers:
- Type: Push
With this, we're saying that the pipeline will be called python-testing-pipeline, and that it will be executed using EC2 (we could also use lambdas).
We're also going to define that the pipeline should be triggered every time changes are pushed to the repository. We could also have workflows triggered when working with a pull request, or even on a scheduled basis.
Running Actions
Once we've defined when and where the pipeline runs, we need to tell it what steps to carry out - to do this we'll use an Actions
section, which will have a number of these items:
- A name
-
Identifier - these are equivalent to GitHub actions - in fact we can use some GitHub actions (see here for more info.). In our examples, we'll use the
aws/build@v1
andaws/managed-test@v1
actions (these are functionally equivalent and interchangeable). -
Inputs - in this case, we're going to use these to specify that we want to retrieve our code from the
WorkflowSource
i.e. the repository containing the pipeline, but we could also specify that we want to use artefacts that might contain saved files. -
Configuration steps - we'll use these to list the specific actions we want to perform in the pipeline. With the
build
andmanaged-test
actions, we provide a list ofRun
steps which use the Linux shellbash
to execute the provided commands. -
Outputs - in our example pipeline, we'll use these to define
Reports
that will feedback on the results of our actions in CodeCatalyst.
Linting our code
For our first action, we're going to check that our code meets the best practices for Python. In this case, as mentioned earlier we're going to use PyLint and our pipeline will carry out the following steps:
- Specify that we want to use the code from our repository
- Install Pylint using
pip
- Ensure that we have a location we can use to store the results of our linting
- Run
pylint
and capture the results in the folder created in the previous step - Upload the results as a report to CodeCatalyst, using the
PYLINTJSON
format and defining our success criteria which will control if this pipeline step is successful. In this example, we can specify what level of issues are allowed within a set of categories.
To perform the above, we can use this code:
Actions:
Linting:
Identifier: aws/build@v1.0.0
Inputs:
Sources:
- WorkflowSource
Configuration:
Steps:
- Run: |
echo "Installing pylint"
pip install --user pylint
export PATH=$PATH:~/.local/bin
- Run: |
echo "Check testresults folder exists"
if [ ! -d tests/testresults ]
then
mkdir tests/testresults
fi
- Run: |
echo "Linting Python"
pylint *py tests/*py > tests/testresults/pylint-output.py
Outputs:
Reports:
PyLintResults:
Format: PYLINTJSON
IncludePaths:
- tests/testresults/pylint-output.py
SuccessCriteria:
StaticAnalysisQuality:
Severity: HIGH
Number: 1
StaticAnalysisSecurity:
Severity: MEDIUM
Number: 1
StaticAnalysisBug:
Severity: MEDIUM
Number: 1
PyLint configuration
We have control over what checks PyLint will carry out by using a configuration file .pylintrc
. In our example, we'll use this setup
[BASIC]
good-names=i,j,k,x,y,ex,Run,_
fail-under=0.1
[FORMAT]
max-line-length=120
indent-string=' '
[REPORTS]
output-format=json
Vulnerability scanning
We're also going to add a section to our actions to check that we don't have any security issues in our code such as including secrets, allowing SQL injection etc. To do this, we're going to use a tool called Bandit
The steps are very similar to those from the linting:
- Specify that we want to use the code from our repository
- Install Bandit using
pip
- Ensure that we have a location we can use to store the results of our scans
- Run
bandit
and capture the results in the folder created in the previous step. We'll output the results in a standard format used by scanning tools calledsarif
- Upload the results as a report to CodeCatalyst, using the
SARIFSA
format and defining our success criteria which will control if this pipeline step is successful. Again, we'll specify what criteria are needed for a successful run.
To perform the above, we can use this code:
vuln_scan:
Identifier: aws/build@v1.0.0
Inputs:
Sources:
- WorkflowSource
Configuration:
Steps:
- Run: |
echo "Installing bandit"
pip install --user bandit bandit-sarif-formatter
export PATH=$PATH:~/.local/bin
- Run: |
echo "Check testresults folder exists"
if [ ! -d tests/testresults ]
then
mkdir tests/testresults
fi
- Run: |
echo "Running Bandit"
bandit -r . --format sarif --output tests/testresults/bandit-output.sarif --exit-zero
Outputs:
Reports:
BanditResults:
Format: SARIFSA
IncludePaths:
- tests/testresults/bandit-output.sarif
SuccessCriteria:
StaticAnalysisFinding:
Severity: MEDIUM
Number: 2
Automated testing
Whilst our other steps check our code from a static viewpoint, we want to be sure that our code works as we expect, so we'll have a step included in most pipelines - using automated testing to validate that our code works in the way we want.
In our example, we're going to use the popular PyTest framework, which will use code stored in the tests
folder to check functionality - for this example, we're going to have a single, simple test to demonstrate how this can be done.
As well as understanding whether our code passes the provided tests, we want to understand how much of our code has been tested, so we'll also capture what is known as code coverage which records which lines of our code have been tested.
Again our steps will follow the now familiar process of installing any required tools, executing them, and then capturing the results as a report within CodeCatalyst using the following code:
unit_tests:
Identifier: aws/managed-test@v1.0.0
Inputs:
Sources:
- WorkflowSource
Configuration:
Steps:
- Run: |
echo "Installing pytest"
pip install --user pytest pytest-cov
export PATH=$PATH:~/.local/bin
- Run: |
echo "Check testresults folder exists"
if [ ! -d tests/testresults ]
then
mkdir tests/testresults
fi
- Run: |
echo "Check for requirements"
if [ ! -r requirements.txt ]
then
pip install --user -r requirements.txt
fi
- Run: |
echo "Running PyTest"
python -m pytest
Outputs:
Reports:
PyTestResults:
Format: JUNITXML
IncludePaths:
- tests/testresults/junit.xml
SuccessCriteria:
PassRate: 100
CodeCoverage:
Format: COBERTURAXML
IncludePaths:
- tests/testresults/coverage.xml
SuccessCriteria:
LineCoverage: 80
PyTest configuration
With PyTest, we're going to use two configuration files.
.pytest.ini
is used to define where our tests are and what output we'll generate from the tests - our example looks like:
[pytest]
log_level = INFO
addopts =
-v --no-header --cov=.
--junitxml=tests/testresults/junit.xml
--cov-report=xml:tests/testresults/coverage.xml
--cov-report=term-missing
testpaths = tests
and we'll also use a .coveragerc
file to tell PyTest not to include our test files when calculating code coverage via:
[run]
omit = ./tests/*
Action ordering
As defined here, there are no constraints on the ordering of the linting, scanning and testing steps, so they will run in parallel.
However, if we want to ensure that a step will only run if a previous step is completed, we can use a DependsOn
clause in each action, so for example if we wanted our unit_tests
action to only run if the linting
step worked, we could change our action definition to include the following lines:
unit_tests:
DependsOn:
- Linting
Identifier: aws/managed-test@v1.0.0
...
Running the pipeline
Once we've created our pipeline and committed it to the code repository in CodeCatalyst along with our Python code, we should have a file structure that looks something like:
With all of this in place, CodeCatalyst should recognise that it needs to run the pipeline anytime there are changes to the code in the repository, including the pipeline configuration file. These runs are visible in the CodeCatalyst console under CI/CD > Workflows
as shown below:
Each pipeline will be listed using the name defined at the start of the configuration, along with each run, showing the status of the run, a run ID, a commit ID that triggered the run, and which repository and branch were used.
Clicking on the ID of the run, will take us to the details for that particular run, looking something like this:
As mentioned earlier, you can see that because we didn't define any dependencies between the steps, they ran in parallel. We can also see whether the run was successful, the commit ID that triggered the run, along with when the run started, and how long it took.
We can also click on any of the steps to see the details of each step, including the output from any commands:
Reporting
As well as being able to see whether a workflow run was successful, we can see any reports generated by clicking on Reports
, either in the sidebar or the details of the run screen.
The screen above shows the reports generated by our workflow, when they were generated, if they were successful (as long as we defined criteria to specify what success means), along with repository details, the workflow action step that created the report and the type of data the report contains.
These reports are, in my view, one of the items that helps CodeCatalyst stand out - it's very simple to define what reports are being generated, what type of data they contain, and what constitutes a successful report.
Clicking through on the report name takes you to the detailed report data:
In the example above, showing code coverage i.e. how much of the code has been tested, we can see what the success criteria were, how much of the code we've tested both as a summary, and on a per file basis.
CodeCoverage details.
In the example, we could see a summary of the overall coverage, as well as a per file basis. We're also able to click through to the individual files to see which lines were tested or not:
Conclusion
In my opinion, CodeCatalyst is a useful development tool - by integrating many of the tools required in the SDLC (Software Development Life Cycle), it can provide a very functional working space.
In this example, we've concentrated on how we can define and perform pipelines when we make changes to our code, and how we can report on the outcomes from those changes - an area that I think CodeCatalyst is particularly strong in.
If you have any questions, comments or suggestions for other tasks we could use in the pipelines, use the comment box below.
Posted on January 18, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.