Using Array.prototype.includes() vs Set.prototype.has() to filter arrays

arnaud

Arnaud

Posted on July 2, 2020

Using Array.prototype.includes() vs Set.prototype.has() to filter arrays

During a code review, I noticed a piece of code that was calling includes inside filter.

The code looked something like this:

const valuesToKeep = [2, 4];
const valuesToTest = [1, 2, 3, 4];
const values = valuesToTest.filter((n) => valuesToKeep.includes(n)); // [2, 4]

The comment I left was along the lines:

"Hey, be careful, filter is O(n) and includes is O(n) so this is O(n2). This would perform as O(n) if you used a Set, BUT since the arrays are so small it probably does not make a difference".

To illustrate the comment with code, this is what I had in mind:

const valuesToKeep = [2, 4];
const valuesToTest = [1, 2, 3, 4];
const setToKeep = new Set(valuesToKeep);
const values = valuesToTest.filter((n) => setToKeep.has(n)); // [2, 4]

My comment didn't sit well with me because saying "hey this will perform OK because the data is a certain way" is not a good idea: the data may change, or maybe I'm just wrong.

So I decided to put this to test. I'm going to generate two arrays containing random integers: an array of values to keep, and an array of values to test. The premise is that the array of values to keep is much smaller than the array of values to test, so we're going to make the array of values to test 10 times bigger than the array of values to keep.

// array of values to keep
const valuesToKeep = Array.from({ length: LENGTH }, () => getRandomInt());

// array of values to check
const valuesToTest = Array.from({ length: LENGTH * 10 }, () =>
  getRandomInt()
);

Then we're going to run two tests: one using includes, and one using has, and we're going to start with LENGTH at 10, and increase it every time, as my premise is that for small array it won't matter much, but we want to see WHEN it starts mattering:

// filter using includes
console.time("includes");
valuesToTest.filter((v) => valuesToKeep.includes(v)); // n2
console.timeEnd("includes");

// filter using has
console.time("has");
const valuesToKeepSet = new Set(valuesToKeep);
valuesToTest.filter((v) => valuesToKeepSet.has(v)); // n
console.timeEnd("has");

And here are the results:

Length of values to keep:  1
Length of values to test:  10
includes: 0.207ms
has: 0.190ms

Length of values to keep:  10
Length of values to test:  100
includes: 0.020ms
has: 0.017ms

Length of values to keep:  100
Length of values to test:  1000
includes: 0.204ms
has: 0.071ms

Length of values to keep:  1000
Length of values to test:  10000
includes: 9.942ms
has: 1.307ms

Length of values to keep:  10000
Length of values to test:  100000
includes: 131.686ms
has: 8.016ms

Length of values to keep:  100000
Length of values to test:  1000000
includes: 1324.318ms
has: 71.495ms

So yes, I am right that with a small quantity of data, Array.includes and Set.has perform roughly the same, but we can see how quickly performance degrades, and the change is so small that it's hard to justify not making it, even for small data samples. Should the size of the data increase, especially the size of the valuesToKeep array, the code is future proof.

TLDR: when matching a value against a list of values, convert the Array to a Set first.

💖 💪 🙅 🚩
arnaud
Arnaud

Posted on July 2, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related