Do you really know Git? (2)
Terrence Jung
Posted on September 27, 2024
Examples in this blog come from "Pro Git" by Scott Chacon, Ben Straub
Let's get into more interesting aspects about Git that may help your understanding of it on a deeper level.
1 - What happens behind the scenes when we stage and commit?
Suppose we have a directory containing 3 files and we stage them all and commit.
What happens during staging?
- Compute checksum for each file using SHA-1 hashing
- Stores the current version of the files in the Git repository as blobs, versioned snapshots of a files.
- Record the checksum for each file in the staging area.
What happens during a commit?
- Git checksums each subdirectory (usually the top level is the root project directory) and stores them as a tree object in the Git repository. This object records the structure of the project at that moment, including file names and their corresponding blobs (file contents).
- Git creates a commit object that contains metadata + a pointer to the tree object. This allows Git to re-create the entire project snapshot when needed.
At this point, your Git repository contains 5 objects:
- 3 blobs
- 1 tree object (lists directory contents and maps file names to blobs)
- 1 commit object that points to the tree object and contains metadata
If we make more changes and commit again, the commit history would look like this:
2 - Using git switch instead of git checkout
You may already be familiar with using git checkout to switch between branches, and that method works totally fine.
However, if you want a more intuitive way to switch between and create branches, use git switch!
- git switch branch-name - switch to an existing branch
- git switch -c new-branch - create new branch and switch to it
- git switch - - return to previously checked out branch
3 - What happens behind the scenes during a merge?
When we merge two branches, we can intuitively say that it just takes two histories and combines their work. But what does that really mean? Let's look at this example:
Let's say we're done with iss10 branch and want to merge it with master.
git switch master
git merge iss10
Since iss53 isn't a direct ancestor of master, we have some work to do when merging. The default merge performs a three-way merge among the two snapshots pointed to by master and iss53 and the common ancestor of them.
In the three-way merge, Git creates a new snapshot that results from it and automatically creates a new commit that points to the snapshot. This commit is called a merge commit. It's special since it has more than 1 parent.
At this point, everything is merged and you can delete the iss10 branch.
4 - Two Common Types of Branching Workflows
Long-Running Branches
In this workflow, you can have several branches that are always open that each represent different stages of your development cycle.
- master/main - code that is entirely stable
- develop/next - code that isn't necessarily stable (for testing stability)
- topic-branch - current feature you are working on
You can also view this workflow as work silos, where sets of commits graduate to a more stable silo when they’re fully tested.
Topic Branches
This workflow is useful for projects of any size. A topic branch is a short-lived branch that you create and use for a single feature. This technique allows you to context switch quickly and completely, since your work is separated into silos where all changes in that branch have to do with that topic.
Let's follow this workflow:
- Do some work on master
- Branch off to iss91
- Branch off to another branch iss91v2 and do some work
- Go back to master and do some work
- Branch off to dumbidea to test an idea
Here's what the commit history would look like:
Let’s say you like the work on iss91v2 and dumbidea turned out to be a genius idea. Let’s throw away iss91 (losing commits C5 and C6) and merge in the other two branches.
5 - What is Rebasing?
In Git, there are 2 ways to integrate changes from 1 branch into another: merge and rebase. We've already gone over merging. Let's go over a basic rebase.
Let's look at this example of a diverged history:
A merge would perform a 3-way merge and create a merge commit. In a rebase, we get the same result in a different way. Essentially, we'll take the changes in C4 and reapply it on top of C3's commit. In this case, you would checkout experiment and rebase it onto master.
What is actually happening here?
- Rebasing goes to the common ancestor of the 2 branches
- It gets the diff introduced by each commit of the branch you're on (experiment) and saves it to a temporarily file
- It rests the current branch (experiment) to the same commit as the branch you're rebasing onto (master)
- Finally, it applies each change (from the file with the diffs).
At this point, you can go back to master and do a fast-forward merge.
Here, the latest snapshot is the exact same as the one that would result from a default merge. The difference here is that rebasing produces a cleaner history.
One more thing. Let's discuss the cons of rebasing since it sounds too good to be true. In can be summed up in one line: do not rebase commits that exist outside your repository and that people may have based work on.
Why? Well, think about this workflow:
- You push commits somewhere and others pull them down and base work off of them.
- You rewrite these commits with git rebase and push them up again.
At this point, your collaborators would have to re-merge their work and it could get very messy when you try to pull their work back into yours.
6 - Which is better? Merge or Rebase?
Now for the age-old debate that has caused a divide among programmers. Is merge or rebase better? In reality, we can't say. It depends on the use case and the team/project.
Personally, I think there's a misconception that the two do the same thing and it just depends on personal preference. While that is true mostly, it's very important to know how each works and what scenarios each could possibly bring.
"Pro Git" mentions a good point about comparing the two. When we talk about a project's history, we should ask ourselves how it should be perceived.
One point of view states that your repository's commit history is a record of what actually happened. It's historical and shouldn't be tampered with.
The opposing point of view is that the commit history is the story of how your project was made. There will be drafts before the final, clean version.
You can probably tell that the first POV caters toward using merge while the second caters towards using rebase. Again, neither is better than the other and it depends on the project and team.
If you can't decide which to use, you could even get the best of both worlds:
- Rebase local changes before pushing to clean up your work
- Never rebase anything you’ve pushed somewhere
That's it! Hope you enjoyed this blog and learned more about Git and how you can use it in your own work!
Posted on September 27, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.