Reduce
Homam
Posted on May 18, 2019
Recently I needed to parse a semi-structured long text document and convert it into a data structure. As a lazy programmer I did not want to copy and paste the text a thousand times by hand. My solution was quite simple: read the document line-by-line, keep track of each line that I have not successfully parsed yet in an array, and try to parse the array at the end of each iteration, and empty the array every time parsing succeeds. And repeat till EOF.
This is how parsers work generally. My little hack was easy to do only because I contained my logic inside reduce function.
This experience reminded me that I have to write about the power and utility of reduce function.
Read this post if you are getting on board of functional programming train.
Summing up Numbers
Let’s create a function to sum up the numbers inside an array. (you can try these snippets in your browser console)
let oneToTen = [1,2,3,4,5,6,7,8,9,10]
let sum = function(arr) {
let acc = 0 // the accumulated sum
for(var i = 0; i < arr.length; i++) {
let a = arr[i] // ith item in the array
acc += a
}
return acc
}
sum(oneToTen)
Simple, yes! But like most things in programming, there is a nicer way for doing this:
oneToTen.reduce((acc, a) => acc + a, 0)
reduce
function is very powerful and it indeed looks magical if it is the first time you're seeing it.
Reduce is known by many other names: Aggregate in .NET Linq, fold in Scala, foldl in Haskell, Erlang, accumulate in C++. Check the full list at Foldl Wikipedia page.
In JavaScript Array.prototype.reduce receives two arguments. The first one is a function and the second argument is the initial value (or the seed) of the reduction process (here it is 0).
Here’s a more verbose version of the above code:
oneToTen.reduce(function(acc, a) {
return acc + a;
}, 0)
You can compare acc
and a
variables in this version with the similarly named variables in the loop version earlier.
So how does it work?
The function inside reduce (which we call reduction or aggregation function) gets called multiple times, exactly once per item in the array. This is very similar to the operation inside the body of for. At each step the reduction function returns the current accumulated value by summing up the previous accumulated value (acc
) and the current item in the array a
.
Let’s add some logs to see the result at each step:
let oneToTen = [1,2,3,4,5,6,7,8,9,10]
oneToTen.reduce((acc, a) => {
console.log(`acc = ${acc}, a = ${a}`)
return acc + a
}, 0)
reduce
is an abstraction over looping operations. We can convert any operation on arrays to reduce.
Probably counting the number of items in an array is one of the simplest and the most-common things that we do with arrays. JavaScript array natively support Array.prototype.length
. But since it is an operation on arrays we can also use reduce to count the size of our array:
['a', 'b', 'c', 'd', 'e'].reduce((acc, _a) => acc + 1, 0)
The length of an array does not depend on the actual value of each item in the array. That’s why we do not use the parameter _a
in the above code.
Here the seed value of reduce is 0; reduce returns the seed value if the array that it is operating on is empty.
Of course you should continue using Array.prototype.length
and most of the native array functions in your production code. Or use a library like Ramda. Many examples here are for demonstrating the generality and power of reduce function.
So far the reduce operations that we have seen produced a numeric result. Now let’s check string concatenation.
Standard Array.prototype.join
concatenates an array of strings, using its argument and returns the concatenated string. We can also define it using reduce:
['reduce', 'is', 'cool'].reduce((acc, a) => acc + ' ' + a, '')
// " reduce is cool"
Notice the extra space at the beginning of the string.
We have the extra space because we started reducing with an empty string. The value of the first acc
is the initial empty string. Then in the reduction function we added a space and then the word "reduce"
:
['reduce', 'is', 'cool'].reduce((acc, a) => {
console.log(`acc = '${acc}', a = '${a}'`)
return acc + ' ' + a
}, '')
// " reduce is cool"
We can solve this easily by not passing any initial value to the reduce:
['reduce', 'is', 'cool'].reduce((acc, a) => acc + ' ' + a)
// "reduce is cool"
But I argue that this implementation is also problematic because it fails for an empty array.
We can deal with the unnecessary space using an if expression. We check if acc is equal to the empty string (that means we are in the first iteration):
['reduce', 'is', 'cool']
.reduce((acc, a) => acc === '' ? a : acc + ' ' + a, '')
If you are not used to if-then-else expressions in JavaScript, the code above is equivalent to this:
['reduce', 'is', 'cool'].reduce((acc, a) => {
if(acc === '') {
return a;
} else {
return acc + ' ' + a;
}
}, '')
I prefer if-then-else expressions here because they ensure that I would not be forgetting the else clause. Every if in this tutorial will need an else.
I also always pass a seed value to reduce functions.
We can create the join function:
function join(c, arr) {
return arr.reduce((acc, a) => {
if(acc === '') {
return a;
} else {
return acc + c + a;
}
}, '')
}
join('*', ['reduce', 'is', 'cool'])
Or more concisely:
let join = (c, arr) => arr.reduce(
(acc, a) => (acc === '' ? '' : acc + c) + a
, '')
Array functions
Let’s explore defining some basic array operations with reduce starting with map
:
let map = (f, arr) => arr.reduce((acc, a) => {
const mappedA = f(a) // apply f to the current item in the array
return acc.concat([mappedA])
},[])
// the initial seed is an empty array, this is the result of reduction if the input array is empty
map(x => x * 2, oneToTen)
// [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
And filter
:
let filter = (f, arr) => arr.reduce((acc, a) => {
const include = f(a)
return include ? acc.concat([a]) : acc
}, [])
filter(
x => x.startsWith('A')
, ['Orange', 'Apple', 'Pearl', 'Avocado', 'Pomegranate']
)
// ["Apple", "Avocado"]
We can see the pattern now.
identity just creates array with the exact same elements of the array that it receives, without doing any other operation:
let identity = arr => arr.reduce((acc, a) => acc.concat([a]), [])
identity(['a', 'b', 'c', 'd', 'e', 'f'])
// ['a', 'b', 'c', 'd', 'e', 'f']
Now let’s define reverse function using reduce. Check how its definition is different from identity:
let reverse = arr => arr.reduce((acc, a) => [a].concat(acc), [])
reverse(['a', 'b', 'c', 'd', 'e', 'f'])
// ["f", "e", "d", "c", "b", "a"]
take
returns the first N items in the array as a new array:
let take = (howMany, arr) => arr.reduce(
(acc, a) => acc.length === howMany ? acc : acc.concat([a])
, []
)
take(3, ['a', 'b', 'c', 'd'])
// ['a', 'b', 'c']
head
is a function that returns the first item in an array (similar to arr[0]
). And last returns its last item of an array:
let head = arr => arr.reduce((acc, *_a*) => acc)
let last = arr => arr.reduce((*_acc*, a) => a)
head(['a', 'b', 'c', 'd']) // "a"
last(['a', 'b', 'c', 'd']) // "d"
And a bit of sanity check:
head(reverse(['a', 'b', 'c', 'd'])) === last(['a', 'b', 'c', 'd'])
// true
drop
function removes the first N item in the array and returns the rest. We can define drop using take and reverse:
let drop = (howMany, arr) => {
const reversedArr = reverse(arr)
const topN = take(arr.length - howMany, reversedArr)
return reverse(topN)
}
drop(3, ['a','b','c','d','e']) // ["d", "e"]
This definition is not very efficient, because we iterate through the array three times: (reverse
, take
, reverse
).
We can simply count the items in the array and exclude the items which their index is less than N:
drop = (howMany, arr) => arr.reduce(
(acc, a) => {
// current index in array
const currentIndex = acc.currentIndex + 1
const result = currentIndex >= howMany
? acc.result.concat([a])
: acc.result
return {currentIndex, result}
}
, {currentIndex: -1, result: []} //the initial seed of aggregation
)
.result
drop(3, ['a','b','c','d','e']) // ["d", "e"]
Remember that JavaScript array index starts from 0.
Here the initial (seed) value of the reduction process is not a simple array or an empty string or number 0, but it is an object with two fields:
{currentIndex: -1, result: []}
Note that the aggregation (reduction) function returns a similar object.
currentIndex
keeps the count of the items in the array.
result
keeps track of the result of our reduction process.
At the end of the reduction currentIndex
is equal to the length of the array minus one and result contains the final result of the drop operation.
This implementation iterates through the array only once.
We can use de-structuring to make this function shorter and depending on your taste more or less readable:
drop = (howMany, arr) => arr.reduce(
({ currentIndex, result }, a) =>
currentIndex + 1 >= howMany
? { currentIndex: currentIndex + 1, result: result.concat([a]) }
: { currentIndex: currentIndex + 1, result: result }
, { currentIndex: -1, result: [] }
).result
The seed value
The idea of reducing using complex objects as seed values is very powerful. For example we can calculate the sum and the product of the items in an array simultaneously by going through the array only once:
[1,2,3,4,5,6,7,8,9,10].reduce((acc, a) => {
return {
sum: acc.sum + a,
product: acc.product * a
}
}, {sum: 0, product: 1})
Here the choice of {sum: 0, product: 1}
for initial seed is not trivial. 0
is the neutral element of sum operation and 1
is the neutral element of product.
The result of reducing an empty array is equal to the seed value of reduction.
[].reduce((acc, a) => {
return {
sum: acc.sum + a,
product: acc.product * a
}
}, {sum: 0, product: 1})
Let’s study the choice of seed value for sum and product functions in more details:
let sum = arr => arr.reduce((acc, a) => acc + a, 0)
let product = arr => arr.reduce((acc, a) => acc * a, 1)
The idea is that the seed value i must be chosen so that for our reduction function f
and for every a
that is an element of our array:
f(a, i) == f(i, a) == a
Seed value is the neutral element of the reduction function.
For example for product function, where f = (acc, a) => acc * a
, the seed value must be 1 so:
1 * a == a * 1 == a
Pipe
pipe
function receives a list of functions and applies them one after another to its input. By utilizing pipe
we can avoid defining temporary local variables for one-time use:
function addTwoPlusOneOverSeven(a) {
const b = 2 * a
const c = b + 1
const d = c / 7
return c
}
// will become
function addTwoPlusOneOverSeven(a) {
return pipe([
x => x * 2
, x => x + 1
, x => x / 7
])(a)
}
In other words, more generally pipe creates a new function by composing the functions in its input array:
const addTwoPlusOneOverSeven = pipe([
x => x * 2
, x => x + 1
, x => x / 7
])
Defining pipe using reduce is quite easy:
let pipe = arr => arr.reduce(
(acc, next) => x => next(acc(x))
, x => x
)
Note the seed value x => x
. This is identity function that is the neutral element of composition. It is akin to 0
for sum
or 1
for product
.
Here our reduction function is: f = (acc, next) => x => next(acc(x))
Note that acc
and next are both functions and f
compose them together one after the other.
id = x => x
is the neutral element because for every function h
that we can think of:
f(id, h) == f(h, id) == h
pipe([
x => x * 2
, x => x + 1
, x => x / 7
, x => `((10 * 2) + 1) / 7 = ${x}`
])(10)
// "((10 * 2) + 1) / 7 = 3"
Moving Average
Lastly I want to show how we can implement efficient moving average, and some basic statistics using reduce:
let movingAverage = (size, arr) => arr.reduce((acc, a) => {
let currentWindow = acc.currentWindow.concat([a])
currentWindow = currentWindow.length > size
? drop(1, currentWindow)
: currentWindow
return {
currentWindow,
result: currentWindow.length == size
? acc.result.concat([sum(currentWindow) / size])
: acc.result
}
}, {currentWindow: [], result: []})
let {result} = movingAverage(3, [2,5,6,4,1])
let expected = [sum([2,5,6])/3, sum([5,6,4])/3, sum([6,4,1])/3]
{result, expected}
// result = [4.333333333333333, 5, 3.6666666666666665]
Basic descriptive statistics in one-go:
let stats = data => data.reduce(
({count, sum, mean, vari, min, max}, x) => {
const k = 1 / (count + 1)
const mean_ = mean + k * (x - mean)
const ssr_ = (count - 1) * vari + k *
count * (x - mean) * (x - mean)
return {
count: count + 1
, sum: sum + x
, mean: mean_
, vari: ssr_ / Math.max(1, count)
, min: isNaN(min) || x < min ? x : min
, max: isNaN(max) || x > max ? x : max
}
}
, {count: 0, sum: 0, mean: 0, vari: 0, min: NaN, max: NaN}
)
stats([3,4,2,2,4,3,2,2,4,5])
/*
{
count: 10,
sum: 31,
mean: 3.1,
vari: 1.2111111111111112,
min: 2,
max: 5
}
*/
Here I am using Welford variance algorithm to compute the variance. This algorithm also works with streams.
We need to sort our array to calculate other statistics like median or quartiles.
Defining Reduce
Now, to learn how reduce works internally, let’s define our own version reduce function.
reduce is an abstraction over recursion. At each iteration we produce the result by calling the reduction function f
over the current element in the array and the result of of the latest iteration of the reduction.
let reduce = (f, seed) => arr => {
if(arr.length === 0){
// result of reducing an empty array is the initial seed
// the array is empty if it is the last iteration
return seed
} else {
const [a, ...tail] = arr
const result = f(seed, a)
// result is the initial seed of the next iteration
return reduce(f, result)(tail)
}
}
reduce((acc, a) => acc + a, 0)(oneToTen)
// 55
Or equivalently we can define reduce using iteration:
reduce = (f, seed) => arr => {
if(arr.length == 0) {
// result of reducing an empty array is the initial seed
return seed
} else {
let result = seed
for(var i = 0; i < arr.length; i++) {
const a = arr[i]
result = f(result, a)
}
return result
}
}
reduce((acc, a) => acc + a, 0)(oneToTen)
// 55
I hope you agree that our definition using recursion is more elegant. It captures some truth about reduce. It clearly shows that reduce is an abstraction over recursion of the elements in an array.
The iterative version though is faster in JavaScript, because many JavaScript engines do not support tail-call-optimization technique.
Reducing from Right
Standard Array.prototype.reduce
reduces the array from left-to-right. This means it first applies the reduction operation to the seed value and the first element of the array, creates a new seed value, drops the first element and repeats.
We can also reduce arrays from right-to-left:
let reduceRight = (f, seed) => arr => {
if(arr.length === 0){
// result of reducing an empty array is the initial seed
return seed
} else {
const [a, ...tail] = arr
const result = reduceRight(f, seed)(tail)
// first result is the seed,
// second result is f applied to the seed and
// the last element of the the array, ...
return f(result, a)
}
}
// 4 - (3 - (2 - (1 - 0))) = 2
let leftReduceResult = [1,2,3,4].reduce((acc, a) => a - acc, 0)
// (((4 - 0) - 3) - 2) - 1 = -2
let rightReduceResult = reduceRight((acc, a) => a - acc, 0)([1,2,3,4])
console.log({leftReduceResult, rightReduceResult})
Right-to-left reduction is especially efficient with linked-list data structure.
ECMAScript supports Array.prototype.reduceRight:
[1,2,3,4].reduceRight((acc, a) => a - acc, 0)
// -2
Scan
No piece of writing about reduce is complete without mentioning scan.
scan
returns an array that contains the result of each step in reduction. Scan is especially useful in stream processing when we are dealing with effectively infinite streams (Check RxJS scan).
let scan = (f, x0) => arr => arr.reduce(
({prev, result}, a) => {
const current = f(prev, a);
return {prev: current, result: result.concat([current])};
}
, {prev: x0, result: []}
).result
let sum_scan = scan(
(total, a) => total + a
, 0
)
sum_scan(oneToTen)
// [1, 3, 6, 10, 15, 21, 28, 36, 45, 55]
Note that we used reduce to define scan while the final item in the array that scan produces is the result of reduce:
last(scan(f, s, arr)) == reduce(f, s, arr)
Or using pipe:
pipe([
scan(f, s)
, last
]) == reduce(f, s)
Or in math notation:
last ∘ scan = reduce
I hope I got you into the zen of reduce.
Posted on May 18, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 14, 2024
November 10, 2024