Greg Foster
Posted on August 26, 2021
The standard workflow in git is to create feature branches off of a trunk branch (usually called main
), submit a pull request, and merge approved changes back into the trunk. This is fairly straightforward and follows the patterns of most pre-git version control systems. However, as any experienced developer can tell you, this workflow is not without issues. One of the biggest challenges of the feature branch workflow is that it often induces developers to create massive pull requests once they've finished building. If you're working on a large project over many weeks or months, your feature branch can easily contain 100s of commits, which correspond to 1000s of lines of code changes. Software engineering wisdom suggests that "any pull request longer than 400 lines is unmanageable" - so how can we improve on this widely-used workflow?
To reduce the pain of large pull requests, engineers at many top companies (i.e. Facebook, Uber, Dropbox, Google) have moved towards "stacking" their git branches. Stacking is the practice of creating new branches off of existing feature branches instead of the trunk branch. This allows developers to break down large features into smaller, more manageable changesets, each of which can be reviewed, tested, and landed incrementally. While stacking initially gained popularity at larger companies as a way to help engineers stay unblocked while they waited for code reviews, engineers at companies of all sizes have found value and velocity from writing and reviewing stacked changes (shhh... we are some of the converts). For more on the fundamentals of stacking git branches, you can read more! here, here, here, and here.
Although stacked changes have benefits for both authors and reviewers, they sadly lack first-class support from git and many of the popular git GUIs. Perhaps the biggest conceptual change is that the state in git can no longer be represented as a list of commits on a feature branch + a list of feature branches - instead we need to think of our git state as a DAG (Directed Acyclic Graph) of branches. Interestingly, while many major dev tools (i.e. ETL platforms, deployment pipelines, CI runners) support visualizing DAGs, this is almost entirely missing from git-related tooling.
While developing the CLI for our code review platform Graphite, we wanted to provide an intuitive, DAG-based visualization of stacked git branches.
Let's take a look at the problem more closely - as mentioned, visualizing a list of branches is easy:
git branch
However, git has no native command to visualize a DAG of branches.
Our Solution
After playing around a lot with git internals, we developed a good solution. Let’s start small and consider commits. Each commit references its parents - you can print them by running:
git rev-parse ${SHA}^
PS: The --help
description of git rev-parse
is comically cryptic. It seems like we're not alone in thinking this is weird.
We can also associate branches to revision SHAs by running:
git show-ref --heads
Now we can run a breadth-first search from a branch and stop when we see a revision matching a known branch. Unfortunately, walking through many commits with a series of shell executions isn’t very fast, and doing so doesn't help us discover branch children.
A more efficient approach here would then be to load the commit-graph into memory before walking it. Fortunately, git has a helpful command (with a much more human-readable description) for listing all commits and their parent or child SHAs:
git rev-list --parents --all
This provides all the nodes and edges needed to construct an in-memory graph of all commits, from which we can also construct the graph of all branches. Getting the rev-list for child commits is as easy as changing “--parents” to “--children”.
Unfortunately, loading all commit relationships into memory is often a time-intensive operation. This would be prohibitively slow in a monorepo with millions of commits. If we’re looking to build a performant CLI, we can do better by loading only a subset of commits.
A key simplifying assumption we can make is that all our branches were created off of the trunk branch. In this case, the minimum subset of commit relations needed for computing a branch parent is:
git rev-list --parents child_branch ^$(git merge-base child_branch main)~1
This command roughly translates to “List commits reachable from “child_branch,” but not reachable by the trunk. This lists enough commits to calculate a branch’s parent and nothing more - but if we want the minimum set needed to calculate the graph of all branches, we need to tune it further:
git rev-list --parent ^$(git merge-base --octopus branch_1 branch_2 branch_3...)~1 branch_1 branch_2 branch_3...
Which roughly translates to "all commits unreachable by the oldest shared commit of all branches." This final command is how our open-source CLI tool Graphite currently computes the relationships between stacks of branches. Once we have the full graph of all branches, we can finally visualize complex stacks of branches as a graph:
gt log
You can see the full implementation of our branch graphing command gt ls
on our Github repo.
If you want to see our visualizations in action, simply run:
brew install screenplaydev/tap/graphite && cd <some_repo> && gt ls
Posted on August 26, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.