Algorithms: Closest to TwoSum

Today I was introduced to another algorithm. It was framed to me as similar to TwoSum, but with a major caveat- Rather than figuring out if/which two numbers in an array added to a target number, it asked to figure out the two numbers that summed closest to the target.

If that's not immediately clear, let's look at an example case.

If we have a set of [1,5,12,6,13], with a target of 12, the closest we can get is either 1+12, which is 13 (distance of 1), or 5+6, which is 11 (also distance of 1).

It's not too hard to do it in O(n^2) time. We can just compute all possible sums, and either compare them at the end (O(n) space complexity), or maintain a lowestDistance value that we update as we continue to navigate our array (O(1) space). That might look something like this:

function closestTwo(arr, target){
  let lowestDistance = Infinity;
  let nums = [null, null];
  for(let i = 0; i < arr.length; i++){
    for(let j = i+1; j < arr.length; j++){
      const sum = arr[i] + arr[j];
      const dist = Math.abs(target-sum);
      if(dist<lowestDistance){
        lowestDistance = dist;
        nums[0] = arr[i];
        nums[1] = arr[j];
      }
    }
  }
  return nums;
}

However, we can actually do better than O(n^2). How much better? We'll find out.

At first, I was perplexed by this; with the way that the problem is framed, I guessed that the solution might be similar to twoSum, and that it would have something to do with that type of inverted thinking. Here are a few of the routes that I went down.

We could do the twoSum thing and subtract our target from each number, storing them in a datastructure that we can check quickly, like a Set.
- However, if we're talking about being "close", rather than spot on, we can't fuzz the thing we're going to give to our .has()- I want to delimit it by a certain range, but even the closest number could end up being very far from our target, and if we have to check each step in the range, it ends up being extremely slow.
We could sort the array. it's O(nlogn), which would make the behavior of the array more predictable
- But how do we find a solution from there? Perhaps a binary search is possible?
- If we do a binary search, how do we know what we're looking for?

Well, the sort and binary search kind of actually works. It's still not the fastest, but we can do it in O(nlogn) time, which is the best time complexity I've gotten so far, even though it can be optimized further.

Here's how this approach works:

Start one pointer at the beginning of the array
Subtract it from the target sum
Whatever the remainder is, binary search through all items after it and grab the closest value you can find
Move the pointer to the next item in the array and repeat the process
Compare to previous lowest distance and keep whatever's a closer answer
Repeat until you've traversed the entire array

Let's write it.

function closestTwo(arr, target){
  arr.sort((a,b)=>a-b);
  let lowestDist = Infinity;
  let lowestNums = [null,null];
  for(let i=0; i<arr.length; i++){
    const searchTarget = target - arr[i];
    let guess = Math.floor((i + arr.length)/2);
    let lastGuess = guess;
    let lastDist = Math.abs(searchTarget - guess);
    while(Math.abs(guess-lastGuess)!=1){
      let dist;
      if(searchTarget>guess){
        guess = Math.floor((guess + arr.length)/2);
        dist = Math.abs(searchTarget - arr[guess]);
      }
      if(searchTarget<guess){
        guess = Math.floor((i+guess)/2);
        dist = Math.abs(searchTarget - arr[guess]);
      }
      if(dist > lastDist){
        break;
      }
      lastGuess = guess;
      lastDist = dist;
    }
    if(lastDist<lowestDist){
      lowestDist = lastDist;
      lowestNums[0] = arr[i];
      lowestNums[1] = arr[lastGuess];
    }
  }
  return lowestNums
}

So, this is well and fine, but it's doing 2 O(nlogn) operations. The first is that we sort it, and the second is that we're iterating through the array, plus doing a binary search for each of those iterations. That's as good as the time complexity gets, but we can tweak it a bit to do one O(nlogn) operation (sorting), and one O(n) operation.

Let's talk about how we do this.

Remember when I said earlier that the behavior of our array gets a lot more predictable after we've sorted it? Let's think about how we can use that to our advantage. If we sort from low to high, we know that the closer you are to the beginning of the array, the lower your number is, and the more you move towards the end, the higher the number is. The lowest possible sum is the two first items of the array, and the highest possible sum is the two last items of the array.

[1,2,3,4,5] has a lowest possible sum of 1+2, and a highest possible sum of 4+5- but how do we figure out the stuff in between? The magic of it is that we can do so by moving around a couple of pointers, which will inevitably converge to the closest possible sum. The way that we ensure that we're moving closer to the desired solution is that we'll use two pointers- one at the start and one at the end of our array. Here's how it works:

Sort the array
Put a pointer at the beginning of the array
Put a pointer at the end of the array
sum the two values we're pointing at
Is our sum higher or lower than the target?
If the sum is too high, move the end pointer to the next lowest item. If it's too low, move the low pointer to the next higher item
Find the sum of the two values again
If that sum has a higher distance than the last sum, return the previous values
Otherwise, continue the process.

Here's what that looks like:

function closestSum(arr, target){
  arr.sort((a,b)=>a-b);
  let lowPointer = 0;
  let highPointer = arr.length-1;
  let nums = [null,null];
  let closestDist = Infinity;
  while((lowPointer+1)!==highPointer){
    const sum = arr[lowPointer] + arr[highPointer];
    const dist = Math.abs(target-sum);
    if (dist < closestDist){
      closestDist = dist;
      nums[0] = arr[lowPointer];
      nums[1] = arr[highPointer];
      if(sum<target){
        lowPointer++;
      }
      if(sum>target){
        highPointer--;
      }
    }else{
      break;
    }
  }
  return nums;
}

It's also not the easiest to read, but we're basically either scooching our low pointer up, or our high pointer down. We know that it's over if our guesses are getting worse, or if our pointers are directly next to each other, at which point we can simply break out of our loop and return our values.

Major takeaways:

Making comparisons and analogies is good, but don't let yourself get trapped in them - I might have been able to solve this faster if I didn't already know twoSum
An initial sort is often key to manipulating your dataset, and can provide you with valuable tools for searching it (binary search in O(logn) in particular).
There are other types of intuitions to develop about handling your dataset- I couldn't intuit that moving pointers from the outside in would guarantee full coverage of the array, but it's now clear to me that any subarray can be reached by doing so. This may prove useful for other algorithms.
This solution also works for twoSum, though the time complexity is slightly worse b/c of the sort. If the array were pre-sorted, we'd be able to do this in both O(n) time and O(1) space.

Blog

Algorithms: Closest to TwoSum

mari tang

Join Our Newsletter. No Spam, Only the good stuff.

Related