Find Duplicates in an Array

hminaya

Hector Minaya

Posted on September 11, 2020

Find Duplicates in an Array

The Problem 🤔?

Write a function that will take in an array of integers and will return all duplicate elements.

Sample Dataset

let sampleData = [54,32,5,11,35,32,17,3,3,22,4,1,6,11];
Enter fullscreen mode Exit fullscreen mode

Expected return

[ 32, 3, 11 ]
Enter fullscreen mode Exit fullscreen mode

Approach #1 - Brute force

Let's create an Array to hold repeated elements.

    let repeatedElements = [];
Enter fullscreen mode Exit fullscreen mode

Next we will loop over the array.

    // This is also known as O(n) in Big-O notation since
    // we have to iterate over all of the items in the array
    for(let i = 0; i < sampleData.length; i++) {

    }
Enter fullscreen mode Exit fullscreen mode

Inside of the loop we will need to loop again and compare each integer to every other integer in the array to determine if they are duplicates.

    for(let i = 0; i < sampleData.length; i++) {
        // Added for clarity, not needed since we can access
        // sampleData[i] directly in our next loop.
        let item = sampleData[i];

        // Now we need to loop over the array again to see if
        // we find the same item again.
        // 
        // Unfortunately this adds O(n^2) complexity 😢
        for (ii = 0; ii < sampleData.length; ii++) {

            // Is it the same integer in a different position?
            if ( (item === sampleData[ii]) && (i !== ii) ) {

                // Add to our array so we can return.
                repeatedElements.push(item)
            }
        }
    }
Enter fullscreen mode Exit fullscreen mode

Here is the full code 👇

let sampleData = [54,32,5,11,35,32,17,3,3,22,4,1,6,11];

function FindDuplicatesUsingBruteForce(sampleData) {

    let repeatedElements = [];

    for(let i = 0; i < sampleData.length; i++) {

        let item = sampleData[i];    
        for (ii = 0; ii < sampleData.length; ii++) {
            if ( (item === sampleData[ii]) && (i !== ii) ) {
                repeatedElements.push(item)
            }
        }
    }

    return repeatedElements;
}

console.log(FindDuplicatesUsingBruteForce(sampleData));

// returns: [ 32, 11, 32, 3, 3, 11 ]
// It actually returns items more than once, but
// I'll ignore this for now.
Enter fullscreen mode Exit fullscreen mode

Be honest, at some point we've all written similar code 🤷‍♂️. This will get you the result we are looking for, but it's the slowest path that will take up the most resources 🤦‍♂️.

This is mostly due to the inner loop, it turns the algorithm into O(n^2).

If your dataset is small you won't notice the difference, but it will quickly slow down and 💣.

Don't use this approach 🛑.

Approach #2 - Using Arrays

Now let's try a slightly different approach, we will avoid the inner loop by using an extra array, which may or may not make this more efficient.

This extra array will keep track of the items that we have already seen.

    let uniqueElements = [];
    let repeatedElements = [];
Enter fullscreen mode Exit fullscreen mode

Next up is the same loop as our first approach, which we will use for all other approaches.

    for(let i = 0; i < sampleData.length; i++) {

    }
Enter fullscreen mode Exit fullscreen mode

Inside our loop we need to keep track of items we have already seen 👀.

    for(let i = 0; i < sampleData.length; i++) {

        // This is where it starts to get interesting. If
        // we have already seen this number we will add it
        // to our array of duplicated elements.
        //
        // What is the Big-O complexity of Array.includes?
        // I'll come back to this.
        if (uniqueElements.includes(sampleData[i])) {
            repeatedElements.push(sampleData[i]);
        }

    }
Enter fullscreen mode Exit fullscreen mode

Plus new items 🔍.

    for(let i = 0; i < sampleData.length; i++) {

        if (uniqueElements.includes(sampleData[i])) {
            repeatedElements.push(sampleData[i]);
        } else {
            // Add to our unique elements to track items we have 
            // already seen
            uniqueElements.push(sampleData[i]);
        }

    }
Enter fullscreen mode Exit fullscreen mode

Here is the full code 👇

let sampleData = [54,32,5,11,35,32,17,3,3,22,4,1,6,11];

function FindDuplicatesUsingArrays(sampleData) {

    let uniqueElements = [];
    let repeatedElements = [];

    for(let i = 0; i < sampleData.length; i++) {

        if (uniqueElements.includes(sampleData[i])) {
            repeatedElements.push(sampleData[i]);
        } else {
            uniqueElements.push(sampleData[i]);
        }

    }

    return repeatedElements;
}

console.log(FindDuplicatesUsingArrays(sampleData));

// returns: [ 32, 3, 11 ]
Enter fullscreen mode Exit fullscreen mode

This seems more efficient than our previous approach and it may be, but it all depends on uniqueElements.includes 🤔.

Why? We are relying on the javascript implementation of includes which is a linear search of items in an Array.

If we go back to how data structures work we will remember that an array is very efficient O(1) if we look up an item by it's key/position, but terribly inefficient O(n) if we look up an item by it's value since we'll have to traverse the array until we find it's value 🤦‍♂️.

Is it more efficient than our first approach? Yes, but there are better ways to do this.

Bonus: An Array in javascript is not an Array 🙃.

Approach #3 - Using a Map()

What else can we try? What data structure has an O(1) lookup? A HashTable 😎.

    // As with a lot of things in JavaScript a Map isn't exactly a 
    // HashTable, but it's close enough for this problem.
    let uniqueElements = new Map();
    let repeatedElements = [];
Enter fullscreen mode Exit fullscreen mode

Instead of uniqueElements.includes we will use the uniqueElements.has method of our Map.

    for(let i = 0; i < sampleData.length; i++) {

        // Since a HashTable lookup is O(1) we have greatly improved
        // our performance by just using a different data structure!!!
        if (uniqueElements.has(sampleData[i])) {
            repeatedElements.push(sampleData[i]);
        } else {
            uniqueElements.set(sampleData[i], sampleData[i]);
        }

    }
Enter fullscreen mode Exit fullscreen mode

Here is the full code 👇

let sampleData = [54,32,5,11,35,32,17,3,3,22,4,1,6,11];

function FindDuplicatesUsingMap(sampleData) {

    let uniqueElements = new Map();
    let repeatedElements = [];

    for(let i = 0; i < sampleData.length; i++) {

        if (uniqueElements.has(sampleData[i])) {
            repeatedElements.push(sampleData[i]);
        } else {
            uniqueElements.set(sampleData[i], sampleData[i]);
        }

    }

    return repeatedElements;
}

console.log(FindDuplicatesUsingMap(sampleData));

// returns: [ 32, 3, 11 ]
Enter fullscreen mode Exit fullscreen mode

So, how fast is this approach? Let's try and compare 👇

let sampleData = [];

// 50k array of random numbers
for (let i = 0; i < 50000; i++) {
    sampleData[i] = Math.floor((Math.random() * 50000) + 1);
}

/*
 Add here the 3 functions we just looked at
 */

// Let's run them all on the same array and time it.

console.time("FindDuplicatesUsingBruteForce");
FindDuplicatesUsingBruteForce(sampleData);
console.timeEnd("FindDuplicatesUsingBruteForce");

console.time("FindDuplicatesUsingArrays");
FindDuplicatesUsingArrays(sampleData);
console.timeEnd("FindDuplicatesUsingArrays");

console.time("FindDuplicatesUsingMap");
FindDuplicatesUsingMap(sampleData);
console.timeEnd("FindDuplicatesUsingMap");

Enter fullscreen mode Exit fullscreen mode

Results 👇

Alt Text

Edit: There are dozens of different solutions to this problem, some more efficient in terms of space or time than the ones outlined here. If you'd like to share one please go ahead in the comments 👇

💖 💪 🙅 🚩
hminaya
Hector Minaya

Posted on September 11, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Find Duplicates in an Array
javascript Find Duplicates in an Array

September 11, 2020