Optimizing recursive functions ππ
Ahmed Osama
Posted on May 22, 2021
If you're not using recursion up until now you're really missing a lot of features and I may assume you haven't come across data structures yet.
I'll assume in this article you already know what a recursive function
is or rather what is the concept so-called recursion
, but in case you don't know, briefly a recursive function is a function that calls itself from within its inner scope.
function inception() {
return inception()
}
inception()
So with that done with, most of us encountered the common error known as stack overflow
or range error
depending on which JS runtime you're using.
In addition to that recursive function exhaust our resources like hell, in terms of memory and time consumption.
So how can we surpass those two problems where we hit the walls of callstack
and memory
?
Well, let me introduce you to two methods that will make your recursive functions a lot faster π on the condition that you implement them correctly.
Tail call optimizations (TCO)
Tail call optimizations, Tail recursion, or Proper tail call are just interchangeable terms for the same concept, but before going through it, I think it's more convenient that we discover how our recursive functions are executed first, and why do they behave viciously to memory?
Consider the following operation as a recursive example.
function factorial(number) {
if (number <= 1) return 1
return number * factorial(number - 1)
}
Surely you've come across this silly example, but let us demonstrate it deeply to understand why this solution is expensive in terms of memory and time complexity.
Well, let's walk through the execution of our function giving it the input number
as the value 5.
The function will have its own execution context where number
is 5, afterwards, this execution context will be added on top of the callstack
as a stack frame, let's ease it a bit and call this very stack frame as frame 5
(Yeah I know, such a creative name π₯±), so this frame will go through the checking if the number is less than or equal to 1 which yields to false
.
Therefore, this frame executes the statement of returning number * factorial(number - 1)
which is equivalent to 5 * factorial(4)
, and the previous operation is repeated with another frame named frame 4
and the same process is being repeated till it reaches the case where number
is decreased to equal 1.
At this stage what do we have on our callstack
?
The callstack
in this case holds within it 5 stack frames where each frame holds the value of number
that was passed to it, and waiting for the next frame to finish its execution to return
the expected output of calling factorial (number - 1)
, so it could compute the value of number * factorial(number - 1)
Well, after the number is decreased to 1, what happens now?
In this case, the callstack
has 5 stack frames on it each one holds is waiting for the next frame to return the value of factorial(number - 1)
to compute its own held value of number * factorial(number - 1)
, and that's where the problem lies, that every stack frame holds its data and we end up having this.
function factorialTCO(number, accum = 1) {
if (number <= 1) return accum
return factorial(number - 1, number * accum)
}
Note: Applying TCO can also be done via defining an inner function (usually named go()) and applying the recursion to it, so you expose the same API to your client code.
function factorialTCO(number) {
function go(number, accum = 1) {
if (number <= 1) return accum
return go(number - 1, accum * number)
}
return go(number)
}
By using tail call optimizations (TCO) we are making every stack frame pass its computed value of number * factorial(number - 1)
to the next stack frame or function call whichever you would like to call it.
Therefore, each stack frame's previous no longer needs to hold any data with it as the computation is passed forward, and thus the garbage collector can freely collect these data held within the stack frames and clear them, now we have less usage π
Note that using TCO assumes that you return only pure recursive call, and by that I mean you must only and only return the recursive function call We will revisit this example once again using another operation that's commonly used flatten
.
Any operation that is performed on the recursive function call makes the compiler of JavaScript hold what each stack frame has in terms of data or function variables, and you cannot have the performance boost given by TCOs.
In the previous example of using the regular factorial function we were operating number * factorial(number - 1)
it was implying to the compiler that it must hold the data as each function call is waiting for the next function call to finish its execution, therefore, TCO cannot be applied.
Hmm but our code is still exposed to stack overflow error
Well, tail-call optimizations aren't responsible for that, but that's where Trampolines
come into action.
Before explaining trampolines I want to consider another example that is consuming much more memory and stack frames and how tail-call optimizations can fix it.
function fibonacci(index) {
if (index === 0) return 0
if (index === 1) return 1
return fibonacci(index - 1) + fibonacci(index - 2)
}
This problem is widely known, but what I am referring to here is the execution of it is extremely heavy as it's two stages recursion or better known as Binary Recursion where each function call invokes another two function calls.
This is overkilling the memory, imagine that our poor factorial
function was exhausting our memory, and it was only recursing once, now we have a function that is recursing twice or binary.
Your stack trace would end up something like that given index
is 5.
That's really where TCO can become very handy, we already stated the fact that TCOs allow your garbage collector to remove those unused data in each stack frame and passing it to the next function call, which is extremely powerful in such case, you can define any recursive function as in TCO position and take advantage of that.
function fibonacciTCO(index) {
// firstFibonacci and secondFibonacci are usually named a and b.
function go(
index,
firstFibonacci = 0,
secondFibonacci = 1,
) {
if (index === 0) return firstFibonacci
if (index === 1) return secondFibonacci
return go(
index - 1,
secondFibonacci,
firstFibonacci + secondFibonacci,
)
}
return go(index)
}
Debugging how this code is run is some sort of hassle and beyond the scope of this article, maybe another time.
But the key point here is that now this function is executing a lot faster than ever.
Umm, yeah that's great, but I cannot execute it on huge inputs that are past the limit of my stack frames, what to do now βΉοΈ?
Meet recursive functions best friend, trampolines.
Trampolines
As shown in the GIF, trampolines for recursive functions is literally making your function calls bounce between two functions, it might sound weird and unreasonable, but trust me, that's how you'll limit your function calls between 6-7 stack frames, lets figure out how.
Now that you have made your recursive function in a tail call position, what's left that you trampolize it, by which I mean to make it bounce-able between your trampoline
utility function and your lovely recursive function factorial, fibonacci, flatten ...etc
.
Well, how can I achieve that? That's super easy, let's define the trampoline
function and explore how it works.
function trampoline(fn) {
return function (...args) {
let result = fn(...args)
while (typeof result == 'function') {
result = result()
}
return result
}
}
If you're not familiar with this style of coding, well that's derived from the functional programming coding paradigm (I have a whole course of 14+ hours on that topic π).
What are we defining here? We are defining a function that accepts your function that should be made bounce-able, and returning an optimized function, if you will, that is already trampolized or ready to be bounced, and that function is awaiting the arguments that should be passed to your original recursive function aka factorial, fibonacci
.
Afterwards, we are looping as long as the return type of calling your function factorial, fibonacci
given the inputs as ...args
is a function, if so, we are invoking the next function call which means that our recursive function hasn't finished its job yet, otherwise we're done here and just returned the value that's returned from your recursive function which is stored in result
.
This approach requires you to alter your recursive functions to return a closure
i.e. wrapping your returned recursive call in a function to be passed to trampoline
.
function factorial(number) {
function go(number, accum = 1) {
if (number <= 1) return accum
return go(number - 1, accum * number)
}
return function () {
return go(number)
}
}
function fibonacci(index) {
function go(index, a = 0, b = 1) {
if (index == 0) return a
if (index == 1) return b
return go(index - 1, b, a + b)
}
return function () {
return go(index)
}
}
let trampFactorial = trampoline(factorial) // pass a reference only to the function
let trampFibonacci = trampoline(fibonacci)
Notice that we are still defining our functions in tail call position to get the advantage of the garbage collector releasing the memory allocated for each stack frame,
But, we aren't implicitly returning go(...args)
but rather returning the recursive function call wrapped within an anonymous function that will be checked within trampoline
if it matches the condition of looping.
Thus, your functions are heavily optimized in terms of memory, time and stack limit, you can run them with inputs up to 1e7 which is 10 million (If my math is right) and even more is possible.
Alrighty aight, that's hella great, but what about complex operations which are commonly required and used?
Let's see the flat
operation which is considered the worst of them all (at least for me).
You can define a regular flat
method as following:
function flat(array, depth = Infinity) {
let result = []
array.forEach(function (item) {
if (!Array.isArray(item)) {
result.push(item)
} else if (depth === 1) {
result = result.concat(item)
} else {
result = result.concat(flat(item, depth - 1))
}
})
return result
}
If you're like me, someone who prefers a more functional style
function flatten(array, depth = Infinity) {
return array.reduce(function (list, item) {
return list.concat(
depth > 0
? depth > 1 && Array.isArray(item)
? flatten(item, depth - 1)
: item
: [item],
)
}, [])
}
Regardless this solution is fucked up in terms of code readability, it's also not optimizable to be in tail call position, notice that we're awaiting each function call to return its value to be concatenated with list.concat
operation, therefore, each stack frame holds within it its value βΉοΈ (Stick with the first solution)
How can we optimize this function using our two new techniques?
Well first, let's re-define it in tail call position, so we free up some memory.
function flat(array, depth = Infinity) {
let result = []
array.forEach(function (item) {
if (!Array.isArray(item)) {
result.push(item)
} else if (depth === 1) {
result = result.concat(item)
} else {
result = flat(item, depth - 1) // Yeeey tail call position, just get rid of operation
// of result.concat so each stack frame can easily forget its held data.
}
})
return result
}
Hmm, I hope it's quite obvious now what's the next step and how to achieve it.
Yup, trampolize that bloody function!! πββοΈ
// {... same code as before}
// just change:
result = function () {
return flat(item, depth - 1)
}
Lastly, just define any variable to hold the returned function from calling trampoline
on our flat
function
let flatten = trampoline(flat)
Hooray, we're done here, our function is now ready to flat up to 30 million items with 3-4 seconds, CAN YOU IMAGINE!
Earlier, we could only flatten 10-20k items in more than 10-15 seconds, now 10-30 million is less than 5 seconds? I don't know, but that sounded insane for me the first time I implemented this method, like Tsk, Imma apply in Google dude, I'm a genius.
Breaking news: this optimized function behaves differently from the default behaviour of any flat function you'd ever see whether in JavaScript, Laravel or anywhere, Let's see why.
The default .flat
JavaScript function that was introduced in ES2019 (I think) and the implementation of the Laravel framework, both maintain the data even if they are duplicates.
Consider the following examples.
let result = Array(1e5)
.fill([[2]])
.flat(2)
$result = Arr::flatten(
array_fill(0, 1e5, [[2]])
);
In both scenarios whether using Laravel or native JavaScript flatten functions, the returned array from flattening those 100k elements of the [[2]]
array is 100k element of the number 2 (Sharingan achieved).
But using our function:
let result = flatten(Array(1e5).fill([[2]]))
Our execution will eliminate all those duplicates, that's no coincidence, remember that we are not concatenating every value, we have eliminated list.concat, result = result.concat
to achieve tail call position.
Therefore, we cannot maintain those values.
But don't frown, it's not a bug, it's a feature, right π?
Why don't we call our cutie function flatUnique
(Modern problems require modern solutions)?
Now our function has a semantic name to what it's really doing.
Still, frowned? Well, yeah you have to, if you're a Laraveler like me, the flatten
function is used almost everywhere in the core of the framework which is not allowing us to use that custom implementation, their test cases will blow up like bitch.
Well luckily, we can use the iterative solution which is a lot faster than the recursive solution, in this case, guess what, JavaScript default implementation is iterative, not recursive, and if you're a functional programmer like me, Ramda.js also implements the flatten function in an iterative manner.
So we can have both functions performing nicely, an iterative one for regular flattening and maintaining all duplicate values, and another recursive one for flattening unique items.
Conclusion
Recursion is really a powerful concept, but it must be implemented right to enjoy all of these great features, Therefore, I'd like to state my first law:
βIf you don't know how to use recursion, don't use recursionβ
Panda's law #1
Although that's not everything about recursion, there's yet more to it, but I believe these are the core concepts you should be aware of.
And, my friend, I really encourage you to implement your algorithms more recursively now that you understand how to gain the utmost power of recursion, but a word of truth, some operations are better performed using iterations, like that flatten
that JavaScript and Ramda.js implement, the iterative solution is a lot faster than the recursive solution in case we want to maintain the same data.
Recursion is one of those concepts that are highly related to data-structures as well, and some common known sorting, searching algorithms, Yeah I know these operations can be implemented iteratively, well, anything that's iterable is recurisveable (If that is even a valid word) and vice-versa, but some problems are solved easily using recursion, binary tree traversing, for example, you really just define a function that either traverses right or left, I haven't seen an iterative solution for it yet, and I don't think that I want to.
I really hope you liked this article and found it useful and not a boring one, lemme know your thoughts ^^
Appendices
Trampolines in PHP && optimizing flatten function
function trampoline(callable $fn)
{
return function (...$args) use ($fn) {
$result = $fn(...$args);
while (is_callable($result)) {
$result = $result();
}
return $result;
};
}
function flatUnique($array, $depth = INF)
{
$result = [];
foreach ($array as $item) {
if (!is_array($item)) {
$result[] = $item;
} elseif ($depth === 1) {
$result = array_merge($result, array_values($item));
} else {
return function () use ($item, $depth) {
return flat($item, $depth - 1);
};
}
}
return $result;
}
$flatten = trampoline('flat');
Iterative flat function
The solution from StackOverFlow also other solutions are provided there, but I find this one the most appropriate and concise.
Once again, if you're functional programming, you'd be saying yikes now as this solution is directly altering the source array, but I believe it's only for demonstration purposes.
function flatten(arr) {
var i = 0
if (!Array.isArray(arr)) {
/* return non-array inputs immediately to avoid errors */
return arr
}
while (i < arr.length) {
if (Array.isArray(arr[i])) {
arr.splice(i, 1, ...arr[i])
} else {
i++
}
}
return arr
}
You can check my GitHub for further material and surely check my course on Functional Programming it's in Arabic for now, but maybe -if wanted- I can make an English version of it, and meanwhile, you can read a free sample of it on the github repo made for it.
Thanks for reading and happy coding πππ
Posted on May 22, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.