Safely Upgrading our Open Source Dependencies at Scale
Tyler Jang
Posted on June 28, 2024
Trunk Check installs and manages over a hundred linters, code formatters, and other code quality tools. The underlying tools are open source projects that each run on their own release schedule and, like all software, sometimes have bugs. Figuring out which of our 100+ tools are safe to upgrade for our customers is a really hard problem. We have to validate every new version of every tool we support, and their accompanying runtimes and upstream dependencies, before shipping it to our customers. With so many integrations, across all different languages, it’s important to ensure compatibility and continuity as new versions are released. Even something as simple as changing the name of a cli option could break thousands of workflows. Here's how we solved the problem of testing, validating, and automating our linter upgrades.
Black Box Integration Testing
Within the our execution model, each linter can have a runtime, a download, a parser, and configuration for how to run. With so many linter idiosyncrasies, it isn't practical to assert on the behavior of each individual step or linter output; instead, we rely on black box testing.
Our aim is to guarantee Trunk output continuity. When a new linter version is released, it should have the same or a similar set of diagnostics as its predecessor. Fortunately, this problem lends itself well to Jest snapshots. We can capture the JSON result of a linter’s trunk check
run, or the contents of a formatted file, and use that as our source of truth.
Versioned Linter Snapshots
When we first run tests on a linter, we save a snapshot of its Trunk output, named with the linter’s version. Subsequent test runs will use the most recent matching snapshot for our assertions, allowing us to keep a historical record of linter output and maintain compatibility.
We detect the latest version of a linter by querying its runtime or by using GitHub APIs. Using our snapshot setup, and the assumption that the latest release of a linter will pass the most recent snapshot (in practice, true about 95% of the time for new releases), we can run nightly tests on the latest versions of all our supported linters. If a linter fails, we can investigate and adapt as needed. If it passes, we can upload its version to our own Release Version Service, which provides the source of truth for linter upgrades.
Example of Linters Breaking
With this system in place, let’s see what happens when a linter breaks compatibility. Trunk integrated with the linter tidy
when it was released at known_good_version
1.0.0. So, we generated a snapshot named tidy_v1.0.0.check.shot
:
{“issues”: [{
"linter": "tidy",
"code": "missing-return",
"file": “test_data/adder.py",
"line": "8",
"column": "12",
"level": "LEVEL_HIGH",
"message": "’add’ does not return a value"
}
]}
We can see that running tidy
with Trunk generates one file issue for adder.py
. We expect new versions to return the same result - anything else would indicate there was a failure or regression worth investigating.
After we generated this snapshot, tidy
then released versions 1.0.1, 1.1.0, and 1.1.1, all of which had an identical output for our test data, so each passed our tests. No need to generate any new snapshots. One week later, tidy
released a new version 1.2.0, which renamed one of its CLI arguments. Suddenly, our test reports a failure:
{"issues": [],
"failures": [{
"report": `tidy exited with exit_code=2
stdout: (none)
stderr: |
usage: tidy [options]
tidy: error: Unrecognized option found: --verify`
}]}
Trunk doesn’t know how to run tidy@1.2.0
. In response, we can add a new versioned subcommand to Trunk configuration, and it runs again!
lint:
definitions:
- name: tidy
…
commands:
- name: lint
version: “>=1.2.0”
output: sarif
run: tidy --sarif --verify-target {target}
success_codes: [0, 1]
- name: lint
version: “>=1.0.0”
output: sarif
run: tidy --sarif --verify {target}
success_codes: [0, 1]
The output also changed slightly, so we can create another snapshot, and name it tidy_v1.2.0.check.shot
. Going forward, new releases of tidy will use this snapshot, but we always have the option to go back in time and verify that older versions still work as intended, or we can add more test data if we need to. Snapshots are lightweight enough that we can store historical information as long as we need, and Trunk accommodates hermetically installing and running old and new linter versions.
Keeping your linters up to date should be painless. You shouldn’t have to worry about missing downloads, compatibility issues, or sudden flakiness. Trunk will only ever upgrade you to validated linter versions that have passed snapshot tests, sparing you the burden of any bleeding-edge headaches.
Paying Dividends in the Real World
With this system in place, we have already caught several bugs before they could hit user repos.
- Semgrep released a binary that was broken on macOS. We identified this issue and helped push for a quick fix.
- Ansible-lint introduced a bug that executed user playbooks. We blocked this version from installing on user machines until a fix was landed.
- Kube-linter changed their download URL schema. Within hours, we had landed a fix supporting this new URL.
- Trunk runs tools hermetically, and several tools (e.g.
nixpkgs-fmt
,golangci-lint
,gofmt
) deprecated support for older runtimes. Seeing this, we automatically upgraded the runtimes for users running these linters. - Using older snapshots of these linters, we can block any potential regression before it happens, remaining backwards-compatible whenever possible.
How to Contribute
Trunk’s plugins repository is open-source, and we are always welcoming new contributions for additional linter integrations and improvements, as well as new actions. If you want to try out Trunk for yourself, it's just one shell command away.
Posted on June 28, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.