The power of reduce
Gregory Witek
Posted on July 27, 2021
reduce
(aka fold
aka inject
aka lfold
) is a very powerful, flexible, and at the same time an unintuitive and controversial function. In this post I'll talk about what makes it both so flexible and unintuitive, and I'll present how other iteration functions like map
or filter
can be implemented on top of reduce
. I'll use JS definition of reduce
as a reference and I'll show what other languages do better in implementing this function.
Basics of reduce
reduce
is a function that works on collections. It usually accepts 2 arguments: a reducer function and an optional initial value. reduce
iterates over the collection, calling the reducer function for every element and passing the output of reducer to the next iteration (with one exception mentioned later). A simple example is calculating a product of all elements of the array:
// returns 2 * 4 * 6 * 8 = 384
[2,4,6,8].reduce((accumulator, el) => accumulator * el, 1);
The reducer function can accept up to 4 arguments:
- accumulator - the output of previous iteration (in the first iteration it takes the default value, or if not provided, the first element of the array)
- element - the current element of the array
- index - the index of the current element of the array
- originalArray - the whole array on which
reduce
is being called.
In the following example, the execution will look like this:
1st iteration: acc = 1 * 2 (output: 2)
2nd iteration: acc = 2 * 4 (output: 8)
3rd iteration: acc = 8 * 6 (output: 48)
4rd iteration: acc = 48 * 8 (output: 384)
If you want to understand it better and see more advanced examples, check the tutorial I recorded:
Use cases
reduce
has traditionally been a part of functional languages, where it acts as kind of equivalent of for
loops. It became more common thanks to a MapReduce framework which allows to easily parallelize operations that aggregate some data. MapReduce splits the work to be done in 2 parts - map
part performs some kind of operation on each piece of data (this part can be done in parallel) and reduce
then gathers all the output from map
and combines the filan result (this part is done sequentially).
Let's say we want to count number of occurences of each word in a piece of text. We can split the text into sentences, and fo each sentence we can calculate number of occurences of each word in parallel. Then we end up with multiple dictionaries, let's say:
{ "dog": 2, "is": 2, "animal": 1, "and": 1, "mammal": 1},
{ "fish": 1, "is": 1, "animal": 1, "too": 1}
Then reduce
function can merge these 2 dictionaries and calculate final output:
{ "dog": 2, "is": 3, "animal": 2, "and": 1, "mammal": 1, "fish": 1, "too": 1 }
Interestingly, reduce
does not need map
to achieve the result above - it's only needed in order to have the first part run in parallel.
Another common use case is to calculate some number that's based on a list of numbers. A good example is sum of squares that has a number of uses in mathematics like in linear regression.
I personally often use reduce
in order to transform one dictionary into another (e.g. I might need to normalize keys, or update values). This is not possible in JavaScript though - I explain it a bit later in the article.
The controversy
For a number of reasons, reduce
is a controversial function among programmers. In JS it gets quite a bad rep, like in the widely retweeted example below:
It's not the only example, though. In Python, reduce
was removed from the standard library and moved to functools
library. It's still shipped as part of the Python language distribution, but in order to use it, you need to explicitly import it.
There are a number of reasons why reduce
gets a bad reputation, the main of them being: for every use of reduce
there's at least one more intuitive and more readable alternative.
For loops and other options
First argument for not using reduce
is that many languages (mainly imperative/OO) there are always more idiomatic and intuitive ways to write code than useing reduce
. The main solution is to use for
loop, forEach
function, or some kind of equivalent. Let's take the example from the previous section:
[2,4,6,8].reduce((accumulator, el) => accumulator * el, 1);
Another way to write is
const product = 1;
for (const el in [2,4,6,8]) {
product *= el;
}
For programmers coming from other imperative languages, the latter version certainly feels more familiar. Is it clearly better though? I'm not so sure.
Readability
The second argument is quite similar, but focuses on reduce
function itself - a lot of people say the function is hard to read. I partially agree with this. Most of the time I have little problem understanding what is the goal of reduce
just by having a quick look, but because it can return anything, it's not as meaningful and intuitive as map
or filter
. What's more, if you want to use reduce
in multiple programming languages, you'll have to remember that each of them has a different number and order of arguments!
There's one more thing that adds to the problem - the initial value, which is an optional parameter in reduce
and which changes a lot about how the function works. If you have a collection of 10 elements, you can expect that it'll trigger 10 iterations, however if you don't pass the initial value to the function, there'll be only 9 iterations. That's because the first element of the collection will become the initial value. In a lot of cases, like when calculating a sum or a product, it doesn't matter, but when you want to calculate sum of squares, that missing initial value will break the function!
function sumSquares(ary) {
return ary.reduce((acc, el) => acc + el * el);
}
sumSquares([1,2,3,4]); // => 30, works!
sumSquares([4,3,2,1]); // => 18, broken!
Limitations
The last reason applies to some specific langauges, for example JavaScript - reduce
was added to JS as a half-baked thing, working only on arrays. The same function in other languages can be used on other types of collections. In Ruby as long as a class includes the Enumerable
module, it gets reduce
function. In Python, where reduce
is used very rarely, you still can use it with dictionaries. I believe reduce
would be way more useful in JavaScript if only it was possible to call it on other types of collections.
Write everything in reduce!
While I agree with the arguments I presented above, I still believe that understanding reduce
can be very helpful, especially if you ever consider learning functional languages. It's really a powerful function. Actually, reduce
is so flexible, that a lot of collection functions can be rewritten using reduce
. Let's try it out!
Warning: don't try to do it in your apps. The original implementations of the functions below are certainly better (and probably much, much faster).
forEach
First, something easy: forEach
is a reduce
that calls a passed callback and does not return any value.
function foreach(array, cb) {
array.reduce((_acc, el) => cb(el));
}
map
map
is reduce
where we start with empty array and in every iteration we add result of the callback function to the accumulator.
function map(array, cb) {
return array.reduce((acc, el) => [...acc, cb(el)], []);
}
A slightly more readable (and faster, I guess) version, with 2 statements, would look like this:
function map(array, cb) {
return array.reduce((acc, el) => {
acc.push(cb(el));
return acc;
}
}
flatMap
This one's quite complicated! flatMap
behaves similarly to map
except that it always returns a flat (1-dimensional) array. If the provided callback returns an array, map returns an array of arrays, while flatMap
, as the name suggests, flattens the output. It could be implemented this way:
function flatMap(array, cb) {
return array.reduce((acc, el) => [...acc, ...cb(el)], []);
}
However, if the cb
does not return an array (we can't guarantee it does), we need to add something more. There are a few different ways to deal with it, the most trivial is to just flatten the outer array. It's not a pretty solution (and oh it is SO slow), but it will do.
function flatMap(array, cb) {
return array.reduce((acc, el) => [...acc, ...cb(el)].flatten(), []);
}
filter
Next, filter
returns elemets of orignal array, but only those that meet the provided expectation (read: where cb(el)
returns truthy value). First, let me implement it using 2 statements to make it easier to read.
function filter(array, cb) {
return array.reduce((acc, el) => {
if (cb(el)) acc.push(el);
return acc;
}, []);
}
Now the same can be rewritten with a single statement, though it's less intuitive.
function filter(array, cb) {
return array.reduce((acc, el) => {
return cb(el) ? [...acc, el] : acc;
}, []);
}
some
some
returns true if callback function returns true
(or any truthy value) for any of the elements in the array. It can be written in pseudocode as cb(array[0]) || cb(array[1]) || ... || cb(array[n-1])
. In order to implement it with reduce
I'll be carrying on the boolean value over each iteration.
function some(array, cb) {
return array.reduce((acc, el) => acc || Boolean(cb(el)), false);
}
every
every
is a sibling function to some
and returns true
if the callback function returns true
for every element of the array. It can be written as fun(array[0]) && fun(array[1]) && ... && fun(array[n-1])
. Similarly I'll carry a boolean value as acc
.
function every(array, cb) {
return array.reduce((acc, el) => acc && Boolean(cb(el)), true);
}
includes
includes
could actually be implemented using some
. For the sake of consistency I'll just keep using the reduce
directly though. In this case we don't have a callback to use, instead we need to check if any element is equal to provided value.
function includes(array, value) {
return array.reduce((acc, el) => acc && (el === value), false);
}
As a side note, the 3 functions above are examples where using reduce
introduces a performance penalty (they'll iterate over the whole array even if they could stop earlier). One more reason not to use this code in any serious application.
find
find
returns the first element that meets a criteria specified by the callback function. In terms of implementation, it's similar to some
with a twist. Just like with some
we're going to pass a certain falsy value and as soon as it becomes truthy, we're going to pass it till the end of the iteration process. The twist is that the value we need to pass is not the output of the callback function, but the element on which the function is called.
function find(array, cb) {
return array.reduce((acc, el) => {
if (acc) return acc;
if (cb(el)) return el;
}, null);
}
Earlier in this post I said I'd try to write the reduce
with only a single expression. It's possible in this case as well, though just as before it's harder to understand:
function find(array, cb) {
return array.reduce((acc, el) => acc || (cb(el) && el)), null);
}
The cb(el) && el
part will return false
if the element does not meet provided requirement, or it will return the value of el
if it does. Then the first part, acc || ...
will either return acc
(output of previous iteration), unless it's a falsy value, in which case it'll return the 2nd part explained above.
findIndex
findIndex
initially seemed more challenging to implement, because somehow I need to keep track of the index together with the element. Then I remembered that the reducer function takes 4 arguments, and not only 2! The 3rd argument is the current index, and 4th one is the array on which the reduce
is called (I'm still thinking how to use it in practice). So findIndex
will be almost identical to find
.
function findIndex(array, cb) {
array.reduce((acc, el, i) => {
if (acc) return acc;
if (cb(el)) return i;
}, null);
}
lastIndexOf
lastIndexOf
is almost the same, except that first we check whether the current element meets the expectation, and only if it doesn't, then we return the last on that did. In short: we swap the order.
function lastIndexOf(array, cb) {
array.reduce((acc, el, i) => {
if (cb(el)) return i;
if (acc) return acc;
}, null);
}
Similarly to find
, the findIndex
and lastIndexOf
functions (why isn't it called findLastIndex
by the way? and why there's no findLast
function?) could be rewritten using a single expression, the only difference is the order and the logical operators used.
Can reduce do everything?
Looking at the list of array functions in JS and I was wondering if there's anything that can't be implemented with reduce
. Initially I had 3 ideas:
- Functions that modify the original array -
reduce
comes from languages with immutable data structures, so modifying original array (with functions likecopyWithin
) was a long shot, but because the reducer accepts original array as a parameter, it is possible (I am 99.99% sure it's always bad idea, though - don't do it at home!) - Sorting - ok, when that idea came to my mind I thought it was really stupid, but maybe it's possible to implement some kind of bubble sort with
reduce
? Well, it seems I was not the only person who wondered about it! - Finally, I found something -
Array
class has methods likekeys
andentries
, and those functions return iterators. I tried to implement them withreduce
, but I failed miserably, so I assume it can't be done (correct me if I'm wrong!).
What's the point?
This was a fun exercise, but my point here is that each function has its place. reduce
gets a lot of bad rep in JS and for good reasons. It's limiting yet overcomplicated and I still don't remember the order of parameters in reducer, although I used it a number of times. Still, it's good to understand it, so that you can use it from time to time.
Oh, and of course - check out other languages where reduce
work also for dictionaries, sets, or other collection types. Languages like Elixir, Haskell or Ruby make reduce
more powerful and intuitive at the same time!
Posted on July 27, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 28, 2024