How to "Git Copy": copying files while keeping git history
Deckstar
Posted on August 16, 2022
Have you ever wanted to break a long file into several smaller files, but worried about losing all the git blame
history? Well, with the bash script in this post, you'll be able to split your file into as many files as you want, while still keeping the git history for every single line!
Table of Contents
Motivation
The script
Demo
Pros and cons of this method
Conclusion
Further reading
Motivation
Why would you ever want to copy a file like this? Basically, it's useful whenever you want to turn one file into multiple, but you still want to git history to stay visible, so people could track the evolution of this file. Not only that, but it ensures that you can also blame, or rather git blame
, the right culprits for... interesting coding decisions 😉
I can think of two types of situations that I've encountered often:
- Sometimes, a file grows too big to be easily legible and understandable. In those cases, it makes sense (and is even recommended) to split it up into smaller chunks.
- For me, this is a pretty regular thing with React components. It often happens that a small, seemingly simple component grows more and more complex, until there's too much unrelated logic in one place.
- Example: a "page" component may start out with a few "sections", and the sections may have their own, complicated logic that's only got something to do with that section. This would be a prime candidate for splitting into multiple files.
- For me, this is a pretty regular thing with React components. It often happens that a small, seemingly simple component grows more and more complex, until there's too much unrelated logic in one place.
- Other times, you might want to turn one file into multiple. Perhaps because they have essentially the same functionality.
-
Example, you may start out developing with one "config" file:
config.js
. But as your project gets closer to production-ready, you may realise that you want some different settings for "production" mode, and opt to have two files:config.development.js
andconfig.production.js
. Regularly copying the config file would make it look like all the production-mode changes happened all at once, whereas "Git copying" the old config would — you guessed it — let other developers track the evolution of the file.
-
Example, you may start out developing with one "config" file:
Two use cases:
- Splitting a large file into smaller files.
- Creating a copy of a file that will always remain similar to the original.
The Script
Without further ado, here's the bash script that you can copy-paste and then use at your own convenience:
#!/bin/bash
# HOW TO USE:
#
# PSEUDO-CODE TEMPLATE:
# `
# bash ./gitCopy.bash {fileToCopy} {...newFiles}
# `
#
# EXAMPLE:
# Lets say you have a file called Section.tsx. Your section has grown very big
# and you want to split it into into three subsection files, while preserving
# the git history. Your plan is to copy the section file into three new files, in
# a new subfolder called "subsections".
#
# In that case, you would do something like this:
#
# `
# bash ./gitCopy.bash ./components/Section.tsx ./components/subsections/Section1.tsx ./components/subsections/Section2.tsx ./components/subsections/Section3.tsx
# `
GRAY='\033[1;30m'
GREEN='\033[0;32m'
LIGHT_BLUE='\033[1;34m'
RED='\033[0;31m'
NO_COLOR='\033[0m'
fail_and_quit () {
echo -e "\n${RED}Failed to git copy.${NO_COLOR}"
exit 0
}
if [ ! \( -f "$1" -a $# -ge 2 \) ]; then
echo -e "\n${RED}Invalid inputs${NO_COLOR}"
cat 1>&2 <<-EOF
Usage: \$0 ORIGINAL copy1 [... copyN]
Copy ORIGINAL, preserving history for git blame
New history will have N+3 commits
EOF
exit 1
fi
ORIGINAL="$1";
# shift to $2 and start counting arguments from 2
shift;
# Messages
echo -e "\nWill copy ${GRAY}${ORIGINAL}${NO_COLOR} into ${LIGHT_BLUE}+$# files${NO_COLOR}:"
args=("$@")
for i in "${!args[@]}"; do
echo -e " $i. ${GRAY}${args[$i]}${NO_COLOR}"
done
echo -e "New history will have ${GREEN}+$(($# + 3)) commits ${NO_COLOR}\n"
# /Messages
NEWLINE=$'\n'
KEEP=$(mkdir -p $(dirname $1) && mktemp ./"$1".XXXXXXXX);
MESSAGE="Copied (with git history):$NEWLINE$NEWLINE$ORIGINAL$NEWLINE$NEWLINE — into: —$NEWLINE$NEWLINE$@"
SPLIT=""
# Remember current commit
ROOT=$(git rev-parse HEAD)
# Check for errors
if [ -z "$ORIGINAL" ]; then
echo -e "\n${RED}ERROR:${NO_COLOR} Did not get ORIGINAL variable."
fail_and_quit
elif [ -z "$KEEP" ]; then
echo -e "\n${RED}ERROR:${NO_COLOR} Did not get KEEP variable."
fail_and_quit
elif [ -z "$MESSAGE" ]; then
echo -e "\n${RED}ERROR:${NO_COLOR} Did not get MESSAGE variable."
fail_and_quit
fi
# Create branch where $2 has $ORIGINAL's history
for f in "$@"; do
git reset --soft $ROOT
git checkout $ROOT "$ORIGINAL"
git mv -f "$ORIGINAL" "$f"
git commit -n -m "* (create $f)$NEWLINE$NEWLINE$MESSAGE"
SPLIT="$(git rev-parse HEAD) $SPLIT"
done
# Go back to initial branch and move $ORIGINAL out of the way
git reset --hard HEAD^
git mv "$ORIGINAL" -f "$KEEP"
git commit -n -m "* (keep $ORIGINAL)$NEWLINE$NEWLINE$MESSAGE"
# Merge $2's branch back into the original
git merge $SPLIT -m "* (merge)$NEWLINE$NEWLINE$MESSAGE"
git commit -a -n -m "* (merge)$NEWLINE$NEWLINE$MESSAGE"
# Move $ORIGINAL back where it was
git mv "$KEEP" "$ORIGINAL"
git commit -n -m "$MESSAGE"
# Report
echo -e "\nNew history: ${GRAY}$(git rev-parse --short $ROOT)..$(git rev-parse --short HEAD)${NO_COLOR}"
echo -e "\n${GREEN}Success!${NO_COLOR}\n"
exit 0
I usually title this thing gitCopy.bash
. But in principle you could call it whatever you want.
Note as well that a lot of code in here is just UI fluff (in particular, most of the logic with the color- & text-related variables is completely unnecessary to achieve the core functionality). You don't really need all the colorful messages to show up in your terminal when running this script. But I just think that they're nice to have 😉
Demo
As you can see in the GIF below, this script should be pretty easy to use:
Pros and cons of this method
Advantages:
- Your git history gets kept in every file;
- You and others will be able to easily see the
git blame
records and track how the file changed and why
- You and others will be able to easily see the
- It's very easy to use — just copy a file path, then decide on one or more new ones, press enter and voilà !
- If you change your mind, you can always revert the commits.
Disadvantages:
- This method uses the "octopus-merge" strategy between the temporary branches, which means that your git history will no longer be a perfectly straight line.
- You probably don't care about this, but there do exist arguments for keeping
git log --graph
linear, as well as counterarguments against it. Personally, I've done both, but I don't really have much of an opinion on this.
- You probably don't care about this, but there do exist arguments for keeping
- This method creates a lot of commits, which can be a bit frustrating.
- How many commits? Well, it's
N + 3
for every file that you want to copy, whereN
is the number of copies that you want to make.- So, for example, let's say that you want to copy two different files, with 1 copy of the first file and 3 copies of the second file. For instance, maybe you want to copy
config.js
intoconfig.production.js
, and you want to copyComponent.tsx
intoSubcomponent1.tsx
,Subcomponent2.tsx
andSubcomponent3.tsx
. That would mean 1 + 3 = 4 commits for theconfig
file, and 3 + 3 = 6 commits for theComponent
. Plus at least 1 more commit if you want to actually edit your new files, bringing the total up to at least 11. That's a lot of commits!
- So, for example, let's say that you want to copy two different files, with 1 copy of the first file and 3 copies of the second file. For instance, maybe you want to copy
- The general formula here is:
where:
- is the sum total of the number of commits that you will create by copying all the files, and
- is the number of copies that you want to create of each file .
- We can see that the minimum number of commits here is 4 (when copying one file one time).
- This large number of commits may start to feel overwhelming when reading commit histories, or when reading through pull requests.
- How many commits? Well, it's
Conclusion
So that's "git copying" in a nutshell.
We've seen that this method is an easy and flexible way to keep git history while copying files. However, we've also seen that it produces a lot of commits, and requires an "octopus merge" strategy, which may not be ideal in some teams.
As always with coding tools, you will have to decide for yourself when and whether to use this new trick that I have just presented to you 🙂
Good luck and happy coding!
Further reading
- Raymond Chen's blog post (2019) about splitting files while keeping git history, using a sequence of new file names and new git branches.
- David Sherman's Gitlab snippet (2019) documenting a bash script which that automatically copies a file as many times as needed to to new files, which served as the main inspiration for the script in this post.
Posted on August 16, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.