Stamping Bazel builds with selective delivery

jakeherringbone

Alex 🦅 Eagle

Posted on October 22, 2021

Stamping Bazel builds with selective delivery

The obvious next step after building a nice CI pipeline around Bazel is Continuous Deployment. So no surprise that one of the frequent questions on Bazel slack is

How do I release the artifacts built by Bazel?

and the answer is really not well documented anywhere. Here's what I've learned.

Stamping

Bazel is mostly unaware of version control, and that's good because coupling causes intended feature interactions. But sometimes you want the git SHA to appear in the binary so your monitoring system can tell which version is crash-looping. This is where stamping is used. Bazel keeps two files sitting in bazel-out all the time, stable-status.txt and volatile-status.txt, which are populated from local environment info like the hostname, and can be inputs to build actions.

The files are just sitting there in the output tree after any build:

$ cat bazel-out/stable-status.txt 
BUILD_EMBED_LABEL 
BUILD_HOST system76-pc
BUILD_USER alexeagle
$ cat bazel-out/volatile-status.txt 
BUILD_TIMESTAMP 1634865540
Enter fullscreen mode Exit fullscreen mode

You can fill in more values in this file by adding --workspace_status_command=path/to/my_script.sh to your .bazelrc and writing a script that emits values, often by calling git. Note that adding this flag to every build can mean slow git operations slowing down developers, so you might want to include this flag only on CI.
As an aside, instead of just a git SHA let me recommend https://twitter.com/jakeherringbone/status/1324871225898749953

The "stable" statuses are meant to be relatively constant over rebuilds on your machine. So your username is stable. The stable status file is part of Bazel's cache key for actions. So if your value of --embed_label changes, it will be reflected in the BUILD_EMBED_LABEL line of stable-status.txt and you'll get a cache miss for every stamped action. They will be re-run to find out the new value.

The "volatile" statuses change all the time, like the timestamp. These are not part of an action key, as that would make the cache useless.

Bazel only rebuilds an artifact if the stable stamp or one of the declared inputs changes. Otherwise you can get a cache hit, with a stale value of a volatile stamp.

Due to using a volatile stamp, we had a bug when we made Angular's release process. As a workaround, to make sure all the artifacts were versioned together, we had to do a clean build when releasing. I always felt bad for whoever was doing the push on their laptop and waiting.
This was the wrong approach, it should have used stable stamping.

When to stamp

Bazel has a flag --stamp. Very sadly, it is not exposed to Bazel rules in any consistent way, and so many rules have a fixed boolean attribute stamp = True|False. This inconsistency is too bad, and causes a lot of friction around correct stamping.

You should not enable --stamp on your CI builds. When any stable status value changes, you'll bust the cache and re-do a lot of work. Even if you don't use stable status values, some ruleset you depend on might.

This is also a key element of how we'll find the changed artifacts later. We don't want any stamp info in them at all, so their content hash is deterministic.

Finding the releasable artifacts

Use a custom Bazel rule to describe your release artifacts. Delivery styles vary a lot, so I haven't seen one of these that works for everyone. The custom rule can produce a manifest file of whatever info your continuous delivery system needs to know.

After a green CI build and test step, your pipeline should use bazel query to find all of the release artifacts.

Selective release

We could release everything all the time, but

  • we don't want to push duplicate artifacts
  • stamped artifacts should always reflect the version info of the last change that affected them
  • downstream systems will be confusing for users to operate since there are too many versions to pick from

Here's the recipe:

  1. CI already ran without --stamp, so the release artifacts are deterministic from sources.
  2. Query for the release artifacts and loop over them
  3. for each artifact, compute the content hash (or just take the existing .digest output from sth like docker that supplies it)
  4. run a reliable key/value store to act like a bloom filter (Redis SETNX is good for this) which quickly tells you that the content hash is different than before
  5. loop over these newly-seen artifacts labels and run again with bazel run --stamp thing.deploy or whatever you need to do to promote them to the next stage in the CD pipeline

Since most of the actions in the dependency graph shouldn't be stamp-aware, the last step here should still be fairly incremental.

💖 💪 🙅 🚩
jakeherringbone
Alex 🦅 Eagle

Posted on October 22, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related