git

Git Objects

shiva

root

Posted on October 3, 2020

Git Objects

I always loved git, In this series of blogs I am documenting about git internals that I learned along the way.

In this blog I am writing about Git Objects, Simply put Object is the storage Unit of Git. All the files, commits, trees and tags in a git repository is stored as Objects. If you are new to Git this will seem a bit cloudy, don't worry, the weather will clear up as you go along with the blog.

Let's start off by creating a new git repo for us to experiment.

$ mkdir gitinternals && cd gitinternals # create and cd into a dir
$ git init # initialize git
Initialized empty Git repository in /Users/home/gitinternals/.git/

with the above commands you will have initialized a git repository locally. This will have created a .git folder inside your repository. this is where git stores all its internal data related to this repository.
A quick tree -a on the current folder will display the files and folders created by git.

$ tree -a
.
└── .git
    ├── HEAD
    ├── config
    ├── description
    ├── hooks
    │   ├── applypatch-msg.sample
    │   ├── commit-msg.sample
    │   ├── fsmonitor-watchman.sample
    │   ├── post-update.sample
    │   ├── pre-applypatch.sample
    │   ├── pre-commit.sample
    │   ├── pre-merge-commit.sample
    │   ├── pre-push.sample
    │   ├── pre-rebase.sample
    │   ├── pre-receive.sample
    │   ├── prepare-commit-msg.sample
    │   └── update.sample
    ├── info
    │   └── exclude
    ├── objects
    │   ├── info
    │   └── pack
    └── refs
        ├── heads
        └── tags

we are interested in the .git/objects directory, this is also referred to as the Object database of git. initially it just has the info and pack folders, no objects just yet.

Blob objects

Git considers all the physical files that user work with in the git repo be it text files, images, audio, video as BLOB(Binary Large OBject).
Let's create new file in the folder and check how git reacts to it.

$ echo "# Hello world" > README.md
$ tree -a .git/objects/
.git/objects/
├── info
└── pack

Running tree on the objects directory reveals that there is no changes in the object storage, This is because git has not started tracking the file yet. you can confirm this by giving the command git status

$ git status
On branch master

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
        README.md

nothing added to commit but untracked files present (use "git add" to track)

README.md file is listed under the untracked files yet. that means that git does not care what you do with the file. The git status gives us info on how to make git track the file.

$ git add README.md
$  tree -a .git/objects/
.git/objects/
├── 71
│   └── 6ed1421c738a75abe6e0c4812ad4aacee0e11a
├── info
└── pack

at last we see some activity in the objects folder.
The string 716ed1421c738a75abe6e0c4812ad4aacee0e11a is the SHA-1 hash of the README.md file.
the first 2 letters of the hash is used as a folder and remaining string is used as the file name.

we can verify the hash of the file with the hash-object plumbing command

$ git hash-object README.md
716ed1421c738a75abe6e0c4812ad4aacee0e11a

we can also check the contents, type and size of a object with the cat-file plumbing command

$ git cat-file -p 716ed1421c738a75abe6e0c4812ad4aacee0e11a
# Hello world

$ get cat-file -p 716e # we need not type the whole hash, a hash prefix which uniquely identifies the hash can also be used.
# Hello world

$ git cat-file -t 716e # checking type of object
blob

$  git cat-file -s 716e # checking size of object
14

Here we see the type of object is blob, which is expected for the file README.md

Commit Objects

Let's create a new commit in our repository and we will look at how the objects directory has reacted to our commit.

$ git commit -m "Add readme file"
[master (root-commit) 0948529] Add readme file
 1 file changed, 1 insertion(+)
 create mode 100644 README.md

$  tree -a .git/objects/
.git/objects/
├── 09
│   └── 4852928af802dfe0f463359c7ade3f7a21fffa
├── 71
│   └── 6ed1421c738a75abe6e0c4812ad4aacee0e11a
├── a5
│   └── ef91ee14be786131cbecfd2eb8c7fef8a2510d
├── info
└── pack

We see two more objects 0948 and a5ef, let's check these type of objects with the cat-file command.

$ git cat-file -t 0948
commit

$ git cat-file -t a5ef
tree

we that 0948 is a commit object and a5ef is a tree object.
let's check the contents of the commit object

$ git cat-file -p 0948
tree a5ef91ee14be786131cbecfd2eb8c7fef8a2510d
author root <root@email.com> 1601801983 +0530
committer root <root@email.com> 1601801983 +0530

Add readme file

tree a5ef91ee14be786131cbecfd2eb8c7fef8a2510d

This is the tree that this commit is pointing to, more on Tree next :)

author root root@email.com 1601801983 +0530
committer root root@email.com 1601801983 +0530

The next two lines contain the author and committer info.

Add readme file

And then separated by a new line we have the commit message.

Tree Object

Tree objects stores a group of blob objects and tree objects. conceptually it makes it easier for us to think of tree object as snapshot of the folder.
Each folder can have multiple files and multiple folders within them, Likewise Tree object can have multiple blob objects or multiple trees in them.

Lets inspect the content of the tree object a5ef

$ git cat-file -p a5ef
100644 blob 716ed1421c738a75abe6e0c4812ad4aacee0e11a    README.md

This tree object has only one file in it as we have added only one file to the git repository.

the tree object content has four parts

part description
100644 100 means that it is a normal blob object and 644 is the file permission on the disk
blob type of object
716ed1421c738a75abe6e0c4812ad4aacee0e11a object name
README.md file name

Tag Object

Adding a new tag to our current commit, let's observe how our object directory reacts.

$ git tag -a v0.0.1 -m "my version 0.0.1"

$  tree -a .git/objects/
.git/objects/
├── 08
│   └── a57a8e9c4b340f5674b96652595bf6727b35bd
├── 09
│   └── 4852928af802dfe0f463359c7ade3f7a21fffa
├── 71
│   └── 6ed1421c738a75abe6e0c4812ad4aacee0e11a
├── a5
│   └── ef91ee14be786131cbecfd2eb8c7fef8a2510d
├── info
└── pack


$ git cat-file -p 08a5
object 094852928af802dfe0f463359c7ade3f7a21fffa
type commit
tag v0.0.1
tagger root <root@email.com> 1601811513 +0530

my version 0.0.1

Upon checking the contents of the Tag object file 08a5

we see that the tag object points to the current commit 0948. and it has the tag information along with the author who created the tag.

Hope this gave you some insight on internal working of git. Follow me for more git internal related blogs :)

💖 💪 🙅 🚩
shiva
root

Posted on October 3, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related