Recursively copy files from server to local
Davide de Paolis
Posted on October 25, 2021
TL;DR
rsync -rv
-e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"
--include '*/' --include='index.html' --exclude='*'
--mkpath USER@REMOTE_SERVER:FOLDER_A/FOLDER_B/ backup-deployed/FOLDER_A/FOLDER_B/
this script allows you to connect with SSH to a remote server and download all files named index.html from a specific path to the same path on your machine.
Go on reading to understand how it does that and how to use that within a script with dynamic paths.
Recently I had to back up some files which over the course of months and years were deployed on a fileserver.
Of course I could have simply used Filezilla or Cyberduck to connect to the fileserver and manually navigate and download them, but that process would have been incredibly time-consuming, tedious and error prone.
So why not refreshing some shell skills ( or learn something new at once)?
Since I knew what files to download thanks to a file we had been using to actually deploy stuff to the file server specifying the source file and the destination path, I decided to use that to programmatically retrieve the files starting from that file, connect to the server with ssh and copy them to my local folder.
// example of deployments.json file (why we had that and why folders are names like that does not really matter)
[
{
"src": "public/rendered/www/a/b/",
"dest": "/var/www/somedomain/AAA/BBB/"
},
{
"src": "public/rendered/www/y/z/",
"dest": "/var/www/somedomain/YYY/ZZZ/"
}
// and so on with hundreds of folders
]
First thing was therefore get a list of all the destination folders from which content had to be copied.
JQ to the rescue! (If you don't know it, I really recommend having a look at its documentation, it is a very powerful and useful command-line JSON processor)
shell cat deployments.json | jq -r '.[].dest'
This will read the file, parse it and extract into an array all the destinations.
how do we copy stuff from remote to local?
There are many ways to copy file from remote to local and viceversa, usually I prefer rsync over scp ( I find it simpler to use especially in regards to filters, and I like the fact that it, well, syncs the files that were changed not just copy blindly whatever is in the folder)
rsync -r -e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>" USER@REMOTE_SERVER:FOLDER_A/FOLDER_B/ backup-deployed/FOLDER_A/FOLDER_B/
This recursively copies whatever you have in a folder on the remote to a folder on your machine.
How do I pass the list returned from JQ to rsync ?
Since I wanted the script to be dynamic I needed each destination in the array to be passed as parameter to rsync command above.
Pipes and Args
This was a bit tricky. Piping allows to pass over the values in the array one by one and xargs execute the rsync command using the value instead of the placeholder '{}'
| xargs -I '{}' rsync USER@REMOTE_SERVER:'{}' USER@REMOTE_SERVER:'{}'
Recreate folder structure locally
Yes -r
parameter of rsync recursively navigates the folders remotely to copy the contents, but the script would fail if your machine does not have the same folders structure already.
What to do? How do I tell rsync to create the directory if it does not exist? Does something like mkdir
exist there?
Some answers on Stackoverflow were addressing the issue but were also either too complicated (rsync-path="mkdir -p <folderstructure>"
or misleading ( --relative
) until I found an interesting comment which pointed me in the right direction.
--mkpath looked like the right parameter to do that but after adding it my script was failing!
Running man rsync
on my machine I could not find that option/parameter as available. Could my version of rsync be outdated?
Update rsync
Running rsync --version
gave me 2.6.9 which dates back to 2006 !!
Honestly i have no idea why my brand new 2021 MacBook contains such an old version but updating rsync is as easy as running:
brew install rsync
That allowed me to install the latest version ( currently 3.2.3 from august 2020)
well. not quite.
still version 2.6.9
try again, got this error
Warning: rsync 3.2.3 is already installed and up-to-date.
That was actually a silly nooby mistake:
Just close and restart you terminal and it will show you the updated version.
After that my script was complete, just try a couple of time passing -v and --dry-run to be sure that everything is ok, then, let it run in all its awesomeness and copy whatever you have on the server ( in my case I exclude everything and include only index.html files because that was what I was interested into)
Here it is:
cat deployments.json
| jq -r '.[].dest'
| xargs -I '{}' rsync -rv
-e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"
--include '*/' --include='index.html' --exclude='*'
--mkpath USER@REMOTE_SERVER:'{}' USER@REMOTE_SERVER:'{}'
Extra tip, if you want to filter some values and copy only some of the destinations from the file, just add GREP to the recipe:
cat deployments.json
| jq -r '.[].dest'
| grep "<FILTER_PATH>"
| xargs -I '{}' rsync -rv
-e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"
--include '*/' --include='index.html' --exclude='*'
--mkpath USER@REMOTE_SERVER:'{}' USER@REMOTE_SERVER:'{}'
Hope it helps
Posted on October 25, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.