Stamping Bazel builds with selective delivery
Alex 🦅 Eagle
Posted on October 22, 2021
The obvious next step after building a nice CI pipeline around Bazel is Continuous Deployment. So no surprise that one of the frequent questions on Bazel slack is
How do I release the artifacts built by Bazel?
and the answer is really not well documented anywhere. Here's what I've learned.
Stamping
Bazel is mostly unaware of version control, and that's good because coupling causes intended feature interactions. But sometimes you want the git SHA to appear in the binary so your monitoring system can tell which version is crash-looping. This is where stamping is used. Bazel keeps two files sitting in bazel-out
all the time, stable-status.txt
and volatile-status.txt
, which are populated from local environment info like the hostname, and can be inputs to build actions.
The files are just sitting there in the output tree after any build:
$ cat bazel-out/stable-status.txt
BUILD_EMBED_LABEL
BUILD_HOST system76-pc
BUILD_USER alexeagle
$ cat bazel-out/volatile-status.txt
BUILD_TIMESTAMP 1634865540
You can fill in more values in this file by adding
--workspace_status_command=path/to/my_script.sh
to your .bazelrc and writing a script that emits values, often by callinggit
. Note that adding this flag to every build can mean slow git operations slowing down developers, so you might want to include this flag only on CI.
As an aside, instead of just a git SHA let me recommend https://twitter.com/jakeherringbone/status/1324871225898749953
The "stable" statuses are meant to be relatively constant over rebuilds on your machine. So your username is stable. The stable status file is part of Bazel's cache key for actions. So if your value of --embed_label
changes, it will be reflected in the BUILD_EMBED_LABEL line of stable-status.txt and you'll get a cache miss for every stamped action. They will be re-run to find out the new value.
The "volatile" statuses change all the time, like the timestamp. These are not part of an action key, as that would make the cache useless.
Bazel only rebuilds an artifact if the stable stamp or one of the declared inputs changes. Otherwise you can get a cache hit, with a stale value of a volatile stamp.
Due to using a volatile stamp, we had a bug when we made Angular's release process. As a workaround, to make sure all the artifacts were versioned together, we had to do a clean build when releasing. I always felt bad for whoever was doing the push on their laptop and waiting.
This was the wrong approach, it should have used stable stamping.
When to stamp
Bazel has a flag --stamp
. Very sadly, it is not exposed to Bazel rules in any consistent way, and so many rules have a fixed boolean attribute stamp = True|False
. This inconsistency is too bad, and causes a lot of friction around correct stamping.
You should not enable --stamp
on your CI builds. When any stable status value changes, you'll bust the cache and re-do a lot of work. Even if you don't use stable status values, some ruleset you depend on might.
This is also a key element of how we'll find the changed artifacts later. We don't want any stamp info in them at all, so their content hash is deterministic.
Finding the releasable artifacts
Use a custom Bazel rule to describe your release artifacts. Delivery styles vary a lot, so I haven't seen one of these that works for everyone. The custom rule can produce a manifest file of whatever info your continuous delivery system needs to know.
After a green CI build and test step, your pipeline should use bazel query to find all of the release artifacts.
Selective release
We could release everything all the time, but
- we don't want to push duplicate artifacts
- stamped artifacts should always reflect the version info of the last change that affected them
- downstream systems will be confusing for users to operate since there are too many versions to pick from
Here's the recipe:
- CI already ran without --stamp, so the release artifacts are deterministic from sources.
- Query for the release artifacts and loop over them
- for each artifact, compute the content hash (or just take the existing .digest output from sth like docker that supplies it)
- run a reliable key/value store to act like a bloom filter (Redis SETNX is good for this) which quickly tells you that the content hash is different than before
- loop over these newly-seen artifacts labels and run again with
bazel run --stamp thing.deploy
or whatever you need to do to promote them to the next stage in the CD pipeline
Since most of the actions in the dependency graph shouldn't be stamp-aware, the last step here should still be fairly incremental.
Posted on October 22, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.