How to implement a hash table in Java (Part 2)

marcioendo

Marcio Endo

Posted on October 14, 2022

How to implement a hash table in Java (Part 2)

In the first post of this series we've implemented the initial version of our hash table. The current implementation allows for "putting" and "getting" mappings. But it is only possible to do so in a very limited set of conditions; the previous post focused on implementing happy path test cases.

In this second post of the series we'll start to deviate from that path. We will do it in a incremental manner, as we have been working so far.

In this post we will add test cases that:

  • uses keys having negative hash code values;
  • replaces the value of an existing mapping; and
  • have the get method handle non-existing keys.

Let's continue.

Before we continue

While not strictly required, I recommend reading the first part of this series if you haven't already. The "Writing style" section, in particular, explains why the examples are written the way they are.

Iteration 05: support negative hash code values

Up until this point, we have been working with keys having positive hash code values.

Let's introduce keys with negative hash code values.

Iteration 05: test case

Let's start with a test case:

public class HashTableTest {
  @Test(description = """
  put() & get() methods

  - negative hash code
  """)
  public void iter05() {
    var ht = new HashTable<Integer, String>();
    assertEquals(ht.size(), 0);

    assertEquals(ht.put(-1, "Minus One"), null);
    assertEquals(ht.size(), 1);
    assertEquals(ht.get(-1), "Minus One");

    assertEquals(ht.put(-2, "Minus Two"), null);
    assertEquals(ht.size(), 2);
    assertEquals(ht.get(-2), "Minus Two");
  }
}
Enter fullscreen mode Exit fullscreen mode

It is a variation of the test written in the previous iteration. The difference is that the keys are guaranteed to produce negative hash code values.

Iteration 05: running our test

Let's run this new test with the implementation from the previous iteration:

java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 4
    at iter03.HashTable.put0(HashTable.java:33)
    at iter02.HashTable.put(HashTable.java:25)
    at iter05.HashTableTest.iter05(HashTableTest.java:32)
Enter fullscreen mode Exit fullscreen mode

It fails as it tries to insert the key at a negative array index.

This is expected. We mentioned it during iteration #03.

The problem is in our current bucket method implementation:

// current implementation
protected int bucket(Object key) {
  var hc = key.hashCode();

  return hc % keys.length;
}
Enter fullscreen mode Exit fullscreen mode

If a key's hash code is negative then the method result will also be negative.

Iteration 05: implementation

To prevent the method from returning a negative value, we can use the absolute value of the key's hash code instead:

public class HashTable<K, V> extends iter04.HashTable<K, V> {
  @Override
  protected int bucket(Object key) {
    int hc = key.hashCode();

    return Math.abs(hc) % keys.length;
  }
}
Enter fullscreen mode Exit fullscreen mode

It works. Our test passes.

Iteration 06: distinct hash code values having the same absolute value

As mentioned in this post's introduction, our objective is to incrementally deviate from the happy path.

In the previous iteration, the keys of both mappings we added to our map:

  • had the same sign, both were negative; and
  • had different absolute values.

Let's, in this iteration, do something a little different. We will, once more, add two mappings. But the key's hash code values:

  • will have distinct signs, i.e., one will be positive and the other negative; and
  • will have the same absolute value.

Iteration 06: test case

Here's our test case:

public class HashTableTest {
  @Test(description = """
  put() & get() methods

  - negative hash code
  - positive hash code
  """)
  public void iter06() {
    var ht = new HashTable<Integer, String>();
    assertEquals(ht.size(), 0);

    assertEquals(ht.put(-3, "Minus Three"), null);
    assertEquals(ht.size(), 1);
    assertEquals(ht.get(-3), "Minus Three");

    assertEquals(ht.put(3, "Three"), null);
    assertEquals(ht.size(), 2);
    assertEquals(ht.get(3), "Three");
  }
}
Enter fullscreen mode Exit fullscreen mode

It is very similar to the last few tests. Except the keys were selected so their hash code values meet our requirements.

Iteration 06: running our test

Let's run this test with the implementation from the previous iteration:

java.lang.UnsupportedOperationException: Implement me
    at iter03.HashTable.put1(HashTable.java:43)
    at iter03.HashTable.put0(HashTable.java:39)
    at iter02.HashTable.put(HashTable.java:25)
    at iter06.HashTableTest.iter06(HashTableTest.java:37)
Enter fullscreen mode Exit fullscreen mode

It fails as it reaches a branch of our code we did not implement yet. We can see from the stack trace it is a code written during iteration #03.

Iteration 06: hash collision

The current test failure is expected. We mentioned it in the "same key or collision?" section of the previous post.

Remember that, in the previous iteration, we fixed the negative array index issue by using the absolute value of the key's hash code. But, in doing so, it caused a positive and a negative hash code having the same absolute value to yield the same "bucket".

Take, for example, two distinct keys A and B having the following hash code values:

  • A = +2
  • B = -2

As the following is our current bucket implementation:

int hc = key.hashCode();

return Math.abs(hc) % keys.length;
Enter fullscreen mode Exit fullscreen mode

Both keys will be put in the same "bucket". When keys.length is 4, for example, both keys will be put at the bucket = 2.

When two distinct keys produce the same "bucket" it configures a situation called hash collision.

Any hash table must handle hash collisions. At the same time, a "good" hash function will try to minimize hash collisions.

Iteration 06: implementation

We will handle hash collisions. As mentioned, we must.

But not right now. We can make our test pass by improving our hash function.

The following implementation makes our test pass:

public class HashTable<K, V> extends iter05.HashTable<K, V> {
  @Override
  protected int bucket(Object key) {
    int hc = key.hashCode();

    return Math.floorMod(hc, keys.length);
  }
}
Enter fullscreen mode Exit fullscreen mode

Our solution uses the Math.floorMod static method. From its Javadocs we can see that:

  • as the divisor, keys.length, is always positive the result of floorMod is guaranteed to be positive. Therefore, the bucket method is guaranteed to return a positive value (or zero);
  • when the hash code value hc is positive, the bucket return value is equivalent to:
  return hc % keys.length;
Enter fullscreen mode Exit fullscreen mode
  • when the hash code value hc is negative, it might give a distinct result than:
  return Math.abs(hc) & keys.length;
Enter fullscreen mode Exit fullscreen mode

Put in other words, when two distinct keys produce different hash code values having the same absolute value, our new solution causes fewer hash collisions than the previous solution.

The following listing tries to illustrate this fact for an array of length 4:

Math.floorMod( 1, 4) = 1
Math.floorMod(-1, 4) = 3

Math.floorMod( 2, 4) = 2
Math.floorMod(-2, 4) = 2

Math.floorMod( 3, 4) = 3
Math.floorMod(-3, 4) = 1

Math.floorMod( 4, 4) = 0
Math.floorMod(-4, 4) = 0

Math.floorMod( 5, 4) = 1
Math.floorMod(-5, 4) = 3
Enter fullscreen mode Exit fullscreen mode

The previous solution would cause hash collisions for all of listed positive/negative pairs. The new solution, on the other hand, causes hash collisions only for the (2, -2) and (4, -4) pairs.

A small improvement. An improvement nonetheless.

This technique is used by:

Iteration 07: replace the value mapped to an existing key

When the put method is invoked with a key-value pair such that:

  • the map already contains a mapping for the specified key; and
  • the existing mapped value is distinct from the specified value.

Then the put method must replace the existing mapped value to the specified one.

We will implement this use-case in this iteration.

Iteration 07: test case

And here's the test case for this iteration:

public class HashTableTest {
  @Test(description = """
  put() method

  - replace the value mapped to an existing key
  """)
  public void iter07() {
    var ht = new HashTable<Integer, String>();
    assertEquals(ht.size(), 0);

    assertEquals(ht.put(1, "Won"), null);
    assertEquals(ht.size(), 1);

    assertEquals(ht.put(1, "One"), "Won");
    assertEquals(ht.size(), 1);

    assertEquals(ht.put(2, "Two"), null);
    assertEquals(ht.size(), 2);

    assertEquals(ht.get(1), "One");
    assertEquals(ht.get(2), "Two");
  }
}
Enter fullscreen mode Exit fullscreen mode

We start by associating the Won string to the 1 key.

Next, we replace the Won value to the "correct" One value. As it is a replacement, we verify that the put returns the old value. We additionally verify that the map size remains the same.

Finally we add the 2 = Two mapping. And, for completeness, we verify that we can correctly "get" both mappings from the map.

Iteration 07: running our test

The test fails when run against the implementation of the previous iteration:

java.lang.UnsupportedOperationException: Implement me
    at iter03.HashTable.put1(HashTable.java:43)
    at iter03.HashTable.put0(HashTable.java:39)
    at iter02.HashTable.put(HashTable.java:25)
    at iter07.HashTableTest.iter07(HashTableTest.java:35)
Enter fullscreen mode Exit fullscreen mode

As in the previous iteration, our test fails as it reaches a branch of our code we did not implement yet. The stack trace indicates it is the code written during iteration #03.

Iteration 07: implementation

Here's an implementation that make our test pass:

public class HashTable<K, V> extends iter06.HashTable<K, V> {
  @Override
  protected final V put1(K key, V value, int bucket, Object existing) {
    if (existing.equals(key)) {
      return putReplace(key, value, bucket);
    }

    return put2(key, value, bucket);
  }

  protected V put2(K key, V value, int bucket) {
    throw new UnsupportedOperationException("Implement me");
  }

  @SuppressWarnings("unchecked")
  protected final V putReplace(K key, V value, int bucket) {
    var oldValue = values[bucket];

    keys[bucket] = key;

    values[bucket] = value;

    return (V) oldValue;
  }
}
Enter fullscreen mode Exit fullscreen mode

It starts by overriding the put1 method defined during iteration #03.
Remember, the put1 parameters are:

  • key = the key the put method was invoked with;
  • value = the value the put method was invoked with;
  • bucket = the computed array index for key; and
  • existing = the existing key, if one exists, found at the computed bucket index.

If the existing key is equal to the specified key it means the current mapped value must be replaced.

Iteration 07: the putReplace method

The actual replacement is done by the putReplace method:

@SuppressWarnings("unchecked")
protected final V putReplace(K key, V value, int bucket) {
  var oldValue = values[bucket];

  keys[bucket] = key;

  values[bucket] = value;

  return (V) oldValue;
}
Enter fullscreen mode Exit fullscreen mode

It starts by assigning the current mapped value to the oldValue variable.

Next, both the key and the value are stored at their respective arrays.

Finally, the oldValue is returned, as required by the put method contract.

Note that, since the put method only allows values of type V to be associated, the cast operation is safe. As the cast is safe, we can safely suppress the unchecked conversion warning.

Iteration 07: hash collision

If the existing key is not equal to the specified key it means we have hit a hash collision.

We will handle it eventually, but, for now, we simply invoke the yet to be implemented put2 method:

protected V put2(K key, V value, int bucket) {
  throw new UnsupportedOperationException("Implement me");
}
Enter fullscreen mode Exit fullscreen mode

Iteration 08: querying for a non-existing mapping

Let's focus once more on the get method.

In this iteration we will try to get the mapping of a non-existing key. We expected the get method to return null in this case.

Iteration 08: test case

We write a minimal test exercising our use-case:

public class HashTableTest {
  @Test(description = """
  get() method

  - non-existing key should return null
  """)
  public void iter08() {
    var ht = new HashTable<Integer, String>();

    assertEquals(ht.get(1), null);

    assertEquals(ht.put(1, "One"), null);

    assertEquals(ht.get(1), "One");
  }
}
Enter fullscreen mode Exit fullscreen mode

We start by immediately querying for a non-existing key. We assert that the get method returns null in this case.

For completeness, we associate a value to the previously queried key and verify that the get method now returns the associated value.

Iteration 08: running our test

Let's run our new test against the implementation of the previous iteration:

java.lang.UnsupportedOperationException: Implement me
    at iter04.HashTable.get0(HashTable.java:37)
    at iter04.HashTable.get(HashTable.java:33)
    at iter08.HashTableTest.iter08(HashTableTest.java:31)
Enter fullscreen mode Exit fullscreen mode

It fails at the code written during iteration #04. To be more precise, it fails because it reaches a branch left unimplemented during that iteration:

if (key.equals(candidate)) {
  return (V) values[bucket];
}

// method get0 just throws UOE
return get0(key, bucket, candidate);
Enter fullscreen mode Exit fullscreen mode

It reaches this path as the key candidate found at the bucket index is not equal to the key that was supplied to the get method.

Iteration 08: implementation

Here is an implementation that make the test pass:

public class HashTable<K, V> extends iter07.HashTable<K, V> {
  @Override
  protected final V get0(Object key, int bucket, Object candidate) {
    if (candidate == null) {
      return null;
    }

    return get1(key, bucket);
  }

  protected V get1(Object key, int bucket) {
    throw new UnsupportedOperationException("Implement me");
  }
}
Enter fullscreen mode Exit fullscreen mode

It is somewhat trivial.

If candidate is null it means there was not key found at the bucket index. In other words, the map does not contain the specified key. In this case, and as required by the get method contract, it must return null.

On the other hand, if candidate is not null then we have hit a hash collision once again. Except, this time, it occurs during a "read" operation.

We will handle hash collisions in the next post of this series. For now, we invoke a get1 method that just throws UnsupportedOperationException.

In the next blog post of this series

In this second blog post of the series we started to deviate from the "happy path". In doing so we have incrementally improved our hash table implementation:

  • it now supports keys with negative hash code values;
  • the hash function produces fewer hash collisions when two distinct hash code values have the same absolute value;
  • the put method now supports replacing an existing value mapping; and
  • the get method now properly handles a non-existing key mapping.

In the next blog posts of this series we will:

  • handle hash collisions; and
  • perform a rehash operation, i.e., resize the inner arrays so our hash table can hold more mappings

Stay tuned!

The source code for all of the examples in this post can be found at this GitHub repository.

💖 💪 🙅 🚩
marcioendo
Marcio Endo

Posted on October 14, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related