Using gnu parallel to save time

As a developer I'm constantly reading code, specially from github.

Often I choose to clone repos from some user or organization and I've been using parallel to save me some time cloning and updating code.

List repos

Clone missing repos

clone-missing.sh

Update repos from a directory in parallel

update-repos.sh

There are more details in parallel_commands readme.

The workflows / scripts defined below assumes that you already have a GITHUB_TOKEN environment variable capable of listing github repositories and that you have the parallel_commands/bin in your PATH.

update-repos.sh is the one that I most use daily.

It does a git pull --rebase and git fetch --prune in all repositories inside a directory in parallel. This can save you and your team lots of time.

Offline studing

PacktPublishing and oreillymedia have lots of quality repos to be used in studies.

Imagine that you want to study kubernetes and clone all repositories from PacktPublishing that contains kubernetes in the name.

Filtering kubernetes from PacktPublishing shows 63 repositories. Cloning 63 repositories by hand would be at least slow.

Here is an nicer way of doing this:

1- Create a list of repos from PacktPublishing

You can use the static file in this gist.

mkdir -p ~/github-orgs/PacktPublishing
cd ~/github-orgs/PacktPublishing
curl -s -L -o PacktPublishing.txt https://gist.githubusercontent.com/joaovitor/e1658abc0946ec9e5528c533f2f502a8/raw/88df9a43c2f9253d686b4ca6eb45bf50c047678b/PacktPublishing.txt

Or get it with the github-print-organization-repos.sh.

This one take a while PacktPublishing has more than 6000 repositories. github-print-organization-repos.sh makes around 600 github api calls to generate the file with the list of repos.

mkdir -p ~/github-orgs/PacktPublishing
cd ~/github-orgs/PacktPublishing
gh_owner=$(basename $(pwd)); github-print-organization-repos.sh ${gh_owner} ${gh_owner}.txt

2- Clone the missing repos that match kubernetes

cd ~/github-orgs/PacktPublishing
gh_owner=$(basename $(pwd)); grep -Ei kubernetes ${gh_owner}.txt | parallel  -j 25 'clone-missing.sh {}; echo job {#} completed {};'

3- Study offline

Blog

Using gnu parallel to save time

João Vitor

List repos

Clone missing repos

Update repos from a directory in parallel

Offline studing

Join Our Newsletter. No Spam, Only the good stuff.

Related