Bash for Project Euler - really???
Rodion Gorkovenko
Posted on January 18, 2020
Whatever language and stack you prefer, learn bash also! - that wisdom come to me through years. I was doing electronics and POS-terminals in C++, backend in Java, web with PHP and JS, machine learning in BigData with Python, Scala and many horrible names...
And always there were chores to be done with linux/unix shell scripting!
Let's plan "miniseries" on bash, but let's start with math exercise. To see how bash rocks and know when it sucks :)
For example I pick Project Euler #7 - "10001-st Prime". There would be 3 versions of code and they all are in this git repo for reference.
Let's Rock!
For generating primes we usually need array to collect them. I usually start by initializing first few elements manually (this helps avoiding spare stop condition inside the loop).
#!/usr/bin/env bash
primes=(2 3 5 7)
As clever people warn, we add "shebang" header which allows script to properly start if it is executable.
Array is created by enclosing space-separated sequence in parentheses. (non-posix feature, though we can do without them)
Now we create main loop! Just make 20 first primes for test, not thousands...
next=9 # next number to try
while [[ ${#primes[@]} -lt 20 ]] ; do
#here we put more code
next=$((next+2))
done
we understand while
and do
and done
, but what is this whimsical expression? Well, logical expressions are tested in square brackets, -lt
is "numerical" comparison for "less than" and ${#primes[@]}
is curious expression of the length of array. Bash has funny syntax sometimes.
Now the code inside.
At every iteration we are going to test if next
is divisible by anything in array primes
, starting from beginning.
i=0
while true ; do
p=${primes[i]}
if [[ $((next % p)) -eq 0 ]] ; then
# next is divisible by p, not prime, halt
break
fi
# some end condition we add at next step...
i=$((i+1))
done
Here we see new features - if - fi
block (it supports else
too), -eq
operator for numeric equality and forms $((...))
for evaluating numeric expressions. Shell heavily deals with text so usually variables are treated as strings, not numbers.
Now how we decide new prime is found? We don't need to go till end of primes
, it's enough to stop when p
is greater than sqrt(next)
- let's add it to the loop:
if [[ $((p * p)) -gt $next ]] ; then
primes+=($next)
break
fi
Looks good! Let's print out primes
after main loop ended to check:
# various code before...
next=$((next+2))
done
echo ${primes[@]}
Curiously, syntax for "whole array" is quite close to syntax of taking length... Now I save this into file pe-7.sh
, set executable flag and run:
$ chmod a+x pe-7.sh
$ ./pe-7.sh
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71
Wonderful! Now we need two small changes:
- replace
20
with10001
in the main loop condition - don't print whole array, but only its last element with
${primes[-1]}
Feel free to check the whole code in github - it's brief.
Run the program with time
program to see how long it takes :)
$ time ./pe-7.sh
104743
real 0m12.896s
user 0m12.881s
sys 0m0.004s
Result is correct. But wow, 13 seconds - definitely bash is not intended to be the fastest. However there are faster shell versions (e.g. dash
) and on other hand I'll show you how to make it much slower :)
Refactor it
Bash supports functions. Let's extract inner loop into function. It can be checked that running time doesn't increase seriously.
tryprime() {
local try=$1 # $1 is the 1-st parameter
i=0
while true ; do
p=${primes[i]}
if [[ $((try % p)) -eq 0 ]] ; then
break
fi
if [[ $((p * p)) -gt $try ]] ; then
primes+=($try)
break
fi
i=$((i+1))
done
}
The function takes single parameter and puts it into local variable. Curiously, parameters are not passed in the parentheses :)
We don't return value, rather just add it to primes
global array. I'll explain this bit later.
Now call the function in main loop:
next=9
while [[ ${#primes[@]} -lt 10001 ]] ; do
tryprime $next
next=$((next+2))
done
Hey, that's neat! Full code in GH. And it still works - took 14 seconds...
Performance Pitfall
Now just small change to make it sloooooow!
Let's return value from function. This is not done by return
(which serves for other purpose). We rather print value in the function and capture it from output on call:
# change this inside function
# primes+=($try)
# to this
echo $try
#and main loop will be:
while [[ ${#primes[@]} -lt 10001 ]] ; do
found=$(tryprime $next)
primes+=($found)
next=$((next+2))
done
The complete code is here. Run it to get... 76 seconds!!!
Well, bash is shell version designed for reach user-friendly features, not for speed. But it is important to know such pitfalls :)
Thanks for reading that far! Let's meet next time to make some web-related work with bash :)
Posted on January 18, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.