Igor Bertnyk
Posted on March 13, 2020
Table of content
To backup or not to backup?
Don't get me wrong, I like Azure DevOps. There are some frustrations here and there, for example in managing permissions and caching build resources. And each of Azure DevOps modules (Dashboards/Wiki, Boards, Repos, Pipelines, Test Plans, Artifacts) might not be THE BEST on a market. But integration and ease of use make it greater than sum of the parts, especially for small and medium-size projects.
Still, there is one thing that puzzles me. Backing up your Git repositories seems to me like common sense and a good practise. It also can be a policy in some companies. However there is no way to do it now either manually or on schedule. Of course, Microsoft is committed to keep the data safe, including periodic backups and geo-replication, but we do not have any control over it. And it does not prevent from unintentional or malicious actions leading to data loss.
Microsoft's response to such requests, and I quote: "In current Azure DevOps, there is no out of the box solution to this, you could backup your projects by downloading them as zip to save it on your local and then upload it to restore them. And you also could backup your work items by open them with Excel to save in your local machine."
I mean what, LOL. Excel as a backup tool is possibly a new high in data safety. Anyway, are there ways to twist control back into our hands?
Of course there are, and today we explore two of them.
Backup repository using plain old git bash script
One of the methods is to use a bash script to get a complete copy of the repository. Let's not run it from our laptop, but rather spin up a small VM in the cloud.
Plan of attack:
- Create a cheap Linux virtual machine in Azure
- Generate new SSH Key Pair
- Add SSH Public key to Azure DevOps
- Create bash script to mirror Git Repo
- Execute that script on schedule
Not diving into too much details, but it is quite easy to create a Linux VM in Azure. It already comes with everything we need: Git and shell scripts. Then we can SSH into it and create a bash script, which I named "devopsbackup.sh".
A script is rather primitive, but it gets the job done. Essentially, it deletes a previous backup and creates a mirror copy of the Git repo. Don't forget to replace variables in angle brackets with your own values.
#!/bin/bash
error_exit()
{
echo "${PROGNAME}: ${1:-"Unknown Error"}" 1>&2
exit 1
}
#
echo "Executing Azure DevOps Repos backup"
cd /home/devopsadmin
rm -rf repos/
mkdir -p repos
cd repos/
git clone --mirror git@ssh.dev.azure.com:v3/<organization>/<project>/<repo> || error_exit "$LINENO: "
cd ..
exit 0
Allow script execution:
chmod 0755 devopsbackup.sh
We also need to generate SSH key pair by using command
ssh-keygen -C "devopsbackup"
By default, keys will be generated in "~/.ssh" folder. We need to copy a public key "id_rsa.pub" from there and paste into Azure DevOps. Go to the profile settings on a top right and add a new key from there:
We can easily create a scheduled execution for our script. Go ahead, type "crontab -e" in the command line and add something like this to the Cron config:
20 1 * * * /home/devopsadmin/bin/devopsbackup.sh >/dev/null 2>&1
Next step could be to extend this script using Azure CLI and upload this archive into Azure Blob Storage or Data Lake.
Alternatively, Azure also has a great feature that allows you to create a daily/weekly backup for your VM. So you can just store a snapshot of the whole VM and don't bother with Blob storage, if you like.
Backup default branch using Azure Devops API
That's all well and good, but is there some more modern way that does not require a dedicated VM and shell scripts/cron? Azure DevOps REST API seems to be promising and allows to manipulate Azure DevOps data, including work items and repositories. Unfortunately this API does not have a parity with Git and full code history cannot be preserved using this method.
However if all you require is a periodic snapshot of the master branch then it could be used to create a simple backup solution. One advantage over previous solution is that we can automatically retrieve information about all our projects and repos, and do not need to hardcode them. So if you add a new project, no modification is required.
Approach:
- Use REST API to retrieve hierarchy of projects, repositories, items and blobs
- Use Azure DevOps token (PAT) for the API authentication
- Use Azure Function with timer trigger to run this on schedule
- Use Azure Blob Storage to keep an archive.
Without further ado, here is a gist for Azure Function. It requires the following parameters that you can set up in Application Settings:
"storageAccountKey", "storageName", "token", "organization"
Conclusion
Comparing these two approaches we can see that newer is not always better. With a help of a simple shell script we can produce a full copy of the repository that could be easily restored or imported into the new project. On the other side, if all you want is a periodic repo snapshot, Azure DevOps REST API and scheduled Azure Function can make those things effortless.
That is all for today, and remember that you always have to protect your work, like a cat protects its spoils from a dog on the image below.
Dirk Valckenburg, A Cat Protecting Spoils from a Dog, 1717
Cover image by Hebi B. from Pixabay
Posted on March 13, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.