Bash for Project Euler - really???

rodiongork

Rodion Gorkovenko

Posted on January 18, 2020

Bash for Project Euler - really???

Whatever language and stack you prefer, learn bash also! - that wisdom come to me through years. I was doing electronics and POS-terminals in C++, backend in Java, web with PHP and JS, machine learning in BigData with Python, Scala and many horrible names...

And always there were chores to be done with linux/unix shell scripting!

Let's plan "miniseries" on bash, but let's start with math exercise. To see how bash rocks and know when it sucks :)

For example I pick Project Euler #7 - "10001-st Prime". There would be 3 versions of code and they all are in this git repo for reference.

Let's Rock!

For generating primes we usually need array to collect them. I usually start by initializing first few elements manually (this helps avoiding spare stop condition inside the loop).

#!/usr/bin/env bash

primes=(2 3 5 7)

As clever people warn, we add "shebang" header which allows script to properly start if it is executable.

Array is created by enclosing space-separated sequence in parentheses. (non-posix feature, though we can do without them)

Now we create main loop! Just make 20 first primes for test, not thousands...

next=9  # next number to try

while [[ ${#primes[@]} -lt 20 ]] ; do
    #here we put more code
    next=$((next+2))
done

we understand while and do and done, but what is this whimsical expression? Well, logical expressions are tested in square brackets, -lt is "numerical" comparison for "less than" and ${#primes[@]} is curious expression of the length of array. Bash has funny syntax sometimes.

Now the code inside.

At every iteration we are going to test if next is divisible by anything in array primes, starting from beginning.

    i=0
    while true ; do
        p=${primes[i]}
        if [[ $((next % p)) -eq 0 ]] ; then
            # next is divisible by p, not prime, halt
            break
        fi

        # some end condition we add at next step...

        i=$((i+1))
    done

Here we see new features - if - fi block (it supports else too), -eq operator for numeric equality and forms $((...)) for evaluating numeric expressions. Shell heavily deals with text so usually variables are treated as strings, not numbers.

Now how we decide new prime is found? We don't need to go till end of primes, it's enough to stop when p is greater than sqrt(next) - let's add it to the loop:

        if [[ $((p * p)) -gt $next ]] ; then
            primes+=($next)
            break
        fi

Looks good! Let's print out primes after main loop ended to check:

        # various code before...
    next=$((next+2))
done

echo ${primes[@]}

Curiously, syntax for "whole array" is quite close to syntax of taking length... Now I save this into file pe-7.sh, set executable flag and run:

$ chmod a+x pe-7.sh
$ ./pe-7.sh
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71

Wonderful! Now we need two small changes:

  • replace 20 with 10001 in the main loop condition
  • don't print whole array, but only its last element with ${primes[-1]}

Feel free to check the whole code in github - it's brief.

Run the program with time program to see how long it takes :)

$ time ./pe-7.sh 
104743

real    0m12.896s
user    0m12.881s
sys 0m0.004s

Result is correct. But wow, 13 seconds - definitely bash is not intended to be the fastest. However there are faster shell versions (e.g. dash) and on other hand I'll show you how to make it much slower :)


Refactor it

Bash supports functions. Let's extract inner loop into function. It can be checked that running time doesn't increase seriously.

tryprime() {
    local try=$1   # $1 is the 1-st parameter
    i=0
    while true ; do
        p=${primes[i]}
        if [[ $((try % p)) -eq 0 ]] ; then
            break
        fi
        if [[ $((p * p)) -gt $try ]] ; then
            primes+=($try)
            break
        fi
        i=$((i+1))
    done
}

The function takes single parameter and puts it into local variable. Curiously, parameters are not passed in the parentheses :)

We don't return value, rather just add it to primes global array. I'll explain this bit later.

Now call the function in main loop:

next=9
while [[ ${#primes[@]} -lt 10001 ]] ; do
    tryprime $next
    next=$((next+2))
done

Hey, that's neat! Full code in GH. And it still works - took 14 seconds...


Performance Pitfall

Now just small change to make it sloooooow!

Let's return value from function. This is not done by return (which serves for other purpose). We rather print value in the function and capture it from output on call:


    # change this inside function
    # primes+=($try)
    # to this
    echo $try

#and main loop will be:
while [[ ${#primes[@]} -lt 10001 ]] ; do
    found=$(tryprime $next)
    primes+=($found)
    next=$((next+2))
done

The complete code is here. Run it to get... 76 seconds!!!

Well, bash is shell version designed for reach user-friendly features, not for speed. But it is important to know such pitfalls :)

Thanks for reading that far! Let's meet next time to make some web-related work with bash :)

💖 💪 🙅 🚩
rodiongork
Rodion Gorkovenko

Posted on January 18, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related