Using gnu parallel to save time

joaovitor

João Vitor

Posted on October 29, 2020

Using gnu parallel to save time

As a developer I'm constantly reading code, specially from github.

Often I choose to clone repos from some user or organization and I've been using parallel to save me some time cloning and updating code.

List repos

Clone missing repos

Update repos from a directory in parallel


There are more details in parallel_commands readme.


The workflows / scripts defined below assumes that you already have a GITHUB_TOKEN environment variable capable of listing github repositories and that you have the parallel_commands/bin in your PATH.


update-repos.sh is the one that I most use daily.

It does a git pull --rebase and git fetch --prune in all repositories inside a directory in parallel. This can save you and your team lots of time.


Offline studing

PacktPublishing and oreillymedia have lots of quality repos to be used in studies.

Imagine that you want to study kubernetes and clone all repositories from PacktPublishing that contains kubernetes in the name.

Filtering kubernetes from PacktPublishing shows 63 repositories. Cloning 63 repositories by hand would be at least slow.

Here is an nicer way of doing this:

1- Create a list of repos from PacktPublishing

You can use the static file in this gist.

mkdir -p ~/github-orgs/PacktPublishing
cd ~/github-orgs/PacktPublishing
curl -s -L -o PacktPublishing.txt https://gist.githubusercontent.com/joaovitor/e1658abc0946ec9e5528c533f2f502a8/raw/88df9a43c2f9253d686b4ca6eb45bf50c047678b/PacktPublishing.txt
Enter fullscreen mode Exit fullscreen mode

Or get it with the github-print-organization-repos.sh.

This one take a while PacktPublishing has more than 6000 repositories. github-print-organization-repos.sh makes around 600 github api calls to generate the file with the list of repos.

mkdir -p ~/github-orgs/PacktPublishing
cd ~/github-orgs/PacktPublishing
gh_owner=$(basename $(pwd)); github-print-organization-repos.sh ${gh_owner} ${gh_owner}.txt
Enter fullscreen mode Exit fullscreen mode

2- Clone the missing repos that match kubernetes

cd ~/github-orgs/PacktPublishing
gh_owner=$(basename $(pwd)); grep -Ei kubernetes ${gh_owner}.txt | parallel  -j 25 'clone-missing.sh {}; echo job {#} completed {};'
Enter fullscreen mode Exit fullscreen mode

3- Study offline

💖 💪 🙅 🚩
joaovitor
João Vitor

Posted on October 29, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Using gnu parallel to save time
linux Using gnu parallel to save time

October 29, 2020