Recursively copy files from server to local

dvddpl

Davide de Paolis

Posted on October 25, 2021

Recursively copy files from server to local

TL;DR

rsync -rv 
-e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"  
--include '*/' --include='index.html' --exclude='*'
--mkpath  USER@REMOTE_SERVER:FOLDER_A/FOLDER_B/ backup-deployed/FOLDER_A/FOLDER_B/
Enter fullscreen mode Exit fullscreen mode

this script allows you to connect with SSH to a remote server and download all files named index.html from a specific path to the same path on your machine.

Go on reading to understand how it does that and how to use that within a script with dynamic paths.


Recently I had to back up some files which over the course of months and years were deployed on a fileserver.

Of course I could have simply used Filezilla or Cyberduck to connect to the fileserver and manually navigate and download them, but that process would have been incredibly time-consuming, tedious and error prone.
So why not refreshing some shell skills ( or learn something new at once)?

Since I knew what files to download thanks to a file we had been using to actually deploy stuff to the file server specifying the source file and the destination path, I decided to use that to programmatically retrieve the files starting from that file, connect to the server with ssh and copy them to my local folder.

// example of deployments.json file (why we had that and why folders are names like that does not really matter) 
[
  {
    "src": "public/rendered/www/a/b/",
    "dest": "/var/www/somedomain/AAA/BBB/"
  },
  {
    "src": "public/rendered/www/y/z/",
    "dest": "/var/www/somedomain/YYY/ZZZ/"
  }
// and so on with hundreds of folders
]
Enter fullscreen mode Exit fullscreen mode

First thing was therefore get a list of all the destination folders from which content had to be copied.

JQ to the rescue! (If you don't know it, I really recommend having a look at its documentation, it is a very powerful and useful command-line JSON processor)

shell cat deployments.json | jq -r '.[].dest'
Enter fullscreen mode Exit fullscreen mode

This will read the file, parse it and extract into an array all the destinations.

how do we copy stuff from remote to local?

There are many ways to copy file from remote to local and viceversa, usually I prefer rsync over scp ( I find it simpler to use especially in regards to filters, and I like the fact that it, well, syncs the files that were changed not just copy blindly whatever is in the folder)

rsync -r -e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"  USER@REMOTE_SERVER:FOLDER_A/FOLDER_B/ backup-deployed/FOLDER_A/FOLDER_B/
Enter fullscreen mode Exit fullscreen mode

This recursively copies whatever you have in a folder on the remote to a folder on your machine.

How do I pass the list returned from JQ to rsync ?

Since I wanted the script to be dynamic I needed each destination in the array to be passed as parameter to rsync command above.

Pipes and Args

This was a bit tricky. Piping allows to pass over the values in the array one by one and xargs execute the rsync command using the value instead of the placeholder '{}'

 | xargs -I '{}' rsync USER@REMOTE_SERVER:'{}' USER@REMOTE_SERVER:'{}'
Enter fullscreen mode Exit fullscreen mode

Recreate folder structure locally

Yes -r parameter of rsync recursively navigates the folders remotely to copy the contents, but the script would fail if your machine does not have the same folders structure already.
What to do? How do I tell rsync to create the directory if it does not exist? Does something like mkdir exist there?

Some answers on Stackoverflow were addressing the issue but were also either too complicated (rsync-path="mkdir -p <folderstructure>" or misleading ( --relative ) until I found an interesting comment which pointed me in the right direction.

--mkpath looked like the right parameter to do that but after adding it my script was failing!

Running man rsync on my machine I could not find that option/parameter as available. Could my version of rsync be outdated?

Update rsync

Running rsync --version gave me 2.6.9 which dates back to 2006 !!

Honestly i have no idea why my brand new 2021 MacBook contains such an old version but updating rsync is as easy as running:

brew install rsync 
Enter fullscreen mode Exit fullscreen mode

That allowed me to install the latest version ( currently 3.2.3 from august 2020)

well. not quite.
still version 2.6.9
try again, got this error

Warning: rsync 3.2.3 is already installed and up-to-date.

That was actually a silly nooby mistake:

Just close and restart you terminal and it will show you the updated version.

After that my script was complete, just try a couple of time passing -v and --dry-run to be sure that everything is ok, then, let it run in all its awesomeness and copy whatever you have on the server ( in my case I exclude everything and include only index.html files because that was what I was interested into)

Here it is:

cat deployments.json 
| jq -r '.[].dest' 
| xargs -I '{}' rsync -rv 
-e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"  
--include '*/' --include='index.html' --exclude='*'
--mkpath  USER@REMOTE_SERVER:'{}' USER@REMOTE_SERVER:'{}'
Enter fullscreen mode Exit fullscreen mode

Extra tip, if you want to filter some values and copy only some of the destinations from the file, just add GREP to the recipe:

cat deployments.json 
| jq -r '.[].dest' 
| grep "<FILTER_PATH>" 
| xargs -I '{}' rsync -rv 
-e "ssh -o StrictHostKeyChecking=no -i <PATH_TO_YOUR_ID_RSA_SSH_KEY>"  
--include '*/' --include='index.html' --exclude='*' 
--mkpath  USER@REMOTE_SERVER:'{}' USER@REMOTE_SERVER:'{}'
Enter fullscreen mode Exit fullscreen mode

Hope it helps

💖 💪 🙅 🚩
dvddpl
Davide de Paolis

Posted on October 25, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related