git

Internal mechanics of Git commits - Secret to Git's speed and lightweightness

ello

Ayush Poddar

Posted on August 17, 2023

Internal mechanics of Git commits - Secret to Git's speed and lightweightness

After a series of posts that focused on the practical usage of some of the Git concepts, this post, once again, focuses on the internals of Git. I will tell you about the way Git works behind the scenes when we make a commit. This knowledge will help you understand why branching in Git is fast and extremely lightweight as compared to other version control systems, when we cover branching in a later post.

How does Git store the commit internally?

If you have read my post covering the internals of Git storage, you'd know that Git stores the full snapshots of your files when they change and are committed. When you make a commit, a commit object is created by Git which contains a pointer to the previous commit(s) (null in case of the first commit), a pointer to the snapshot of the content that has been committed, and metadata like the author's name, email and the commit message.

How is the snapshot created?

Assume that your repository's directory structure looks like this:

dir-1/
|-- 1-bar.txt
|-- 1-foo.txt
`-- dir-2/
    `-- 2-foo.txt

1 directory, 3 files
Enter fullscreen mode Exit fullscreen mode

Assuming that all the files are modified and staged, Git first computes the checksum (SHA1 hash) of each file and stores those checksums as blobs.

file blob checksum
dir-1/1-bar.txt da39a3
dir-1/1-foo.txt 1eaf2e
dir-1/dir-2/2-foo.txt 5f2a25

You can run shasum <filename> to calculate the SHA1 checksum of a file.

Next, Git calculates the checksum of each subdirectory (essentially the checksum of all blobs in the subdirectory) and stores them as trees.

So, the "tree" for dir-1/dir-2 contains the blob checksum of dir-1/dir-2/2-foo.txt.

The "tree" for dir-1 contains the blob checksums of 1-bar.txt and 1-foo.txt, in addition to the tree checksum of dir-2.

This is the final snapshot

To get the SHA1 checksum of a directory, you can run

shasum <file1> <file2> .. <fileN> | shasum

where file is a file inside the directory and its subdirectories.

Storage of commit

As already mentioned above, the commit object contains a pointer to the snapshot. In this case, our tree checksum for dir-1 would be the snapshot and its pointer is saved in the commit object.

If it is the first commit of the repository, the pointer to the previous commit is null. The next commit will point to the first commit, the third commit will point to the second commit and so on. In Git language, the previous commit is called a commit's parent.

Conclusion

In this post, you got a glimpse of how Git commits work internally and probably have an inkling of the reason behind Git's lightweightness and speed.

References

💖 💪 🙅 🚩
ello
Ayush Poddar

Posted on August 17, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related