How to implement a hash table in Java (Part 3)

marcioendo

Marcio Endo

Posted on October 21, 2022

How to implement a hash table in Java (Part 3)

In the previous post of this series we've improved on the initial version of our hash table. It is capable of:

  • "putting" mappings;
  • "getting" mappings; and
  • replacing the value of existing mappings.

However, it still misses a major functionality: it does not handle hash collisions yet.

That's what we will work on during this third post of our series. We will have our hash table handle hash collisions during both put and get operations.

Let's continue.

Before we continue

While not strictly required, I recommend reading the previous parts of this series if you haven't already:

In particular:

Iteration 09: two testing conveniences

In order to properly test that our hash table is capable of handling hash collisions we need:

  • an easy way to create key values capable of causing hash collisions; and
  • a way to assert the internal disposition of our hash table contents.

Therefore, in this iteration, we will introduce two testing conveniences:

  • a Key class for causing hash collisions; and
  • a toString implementation for displaying the contents of our hash table.

Iteration 09: the Key class

Two distinct keys a and b are guaranteed to cause a hash collision in our hash table if:

  • they are distinct, i.e., !a.equals(b) evaluates to true; and
  • they both have the same hash code values, i.e., a.hashCode() == b.hashCode() evaluates to true.

The following class allows to easily create instances satisfying these criteria:

public final class Key {
  private final String value;

  private final int hash;

  public Key(String value, int hash) {
    this.value = value;
    this.hash = hash;
  }

  @Override
  public final boolean equals(Object obj) {
    return obj == this
        || obj instanceof Key that
            && Objects.equals(value, that.value);
  }

  @Override
  public final int hashCode() { return hash; }

  @Override
  public final String toString() { return value; }
}
Enter fullscreen mode Exit fullscreen mode

As an example, the following test passes:

var a = new Key("A", 123);
var b = new Key("B", 123);

assertNotEquals(a, b);
assertEquals(a.hashCode(), 123);
assertEquals(b.hashCode(), 123);
Enter fullscreen mode Exit fullscreen mode

Of course the class also makes it easy to violate the equals and hashCode contract.

For example, we can easily create two instances that are equal to each other but have distinct hash code values. This is a violation of the hashCode contract.

Keep in mind we are only using it to cause hash collisions in a testing environment. In other words, I am not advocating for the use of a class like this one in production.

Iteration 09: test case

Let's now focus on the toString method. Here's our test case:

public class HashTableTest {
  @Test(description = """
  toString() method

  - renders an ascii table with columns:

  1. array index
  2. key toString()
  3. value toString()
  """)
  public void iter09() {
    var ht = new HashTable<Key, String>();

    var a = new Key("AAA", 4);
    var c = new Key("CCC", 6);

    ht.put(a, "aaa");
    ht.put(c, "ccc");

    assertEquals(
      ht.toString(),
      """
      +-----+-----+-----+
      | idx | key | val |
      +-----+-----+-----+
      |   0 | AAA | aaa |
      |   1 |     |     |
      |   2 | CCC | ccc |
      |   3 |     |     |
      +-----+-----+-----+
      """
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

First, we create two instances, a and c, of our Key class. As seen in the previous section, their hash code values are given by the constructor's second argument. The hash code values were chosen so that the mapping stays at a known array index. Given their values, we know that:

  • key a will be stored at index idx = 0; and
  • key c will be stored at index idx = 2.

Remember, we know how our hash function is implemented. We also know the length of the internal keys and values arrays. Therefore, we can "manually" compute the array index.

Next, we associate the keys to their values.

We then write how we want the toString output to look like. It is a table showing the internal contents of our hash table. It has three columns:

  • the array index
  • the key toString output
  • the value toString output

When the key and the value are null their respective table cells are blank.

Iteration 09: implementation

Here's an implementation that make the test pass:

public class HashTable<K, V> extends iter08.HashTable<K, V> {
  @Override
  public final String toString() {
    var sb = new StringBuilder();

    sb.append(
      """
      +-----+-----+-----+
      | idx | key | val |
      +-----+-----+-----+
      """);

    for (int idx = 0; idx < keys.length; idx++) {
      var key = keys[idx];

      if (key != null) {
        var value = values[idx];

        sb.append("| %3d | %3s | %3s |\n".formatted(idx, key, value));
      } else {
        sb.append("| %3d |     |     |\n".formatted(idx));
      }
    }

    sb.append("+-----+-----+-----+\n");

    return sb.toString();
  }
}
Enter fullscreen mode Exit fullscreen mode

It starts by writing the table header.

It then loops through the keys array. If a non-null key is found, it writes, in order, the current array index, the key itself and the mapped value. If the key is null it only writes the array index.

It ends by writing the table footer.

Iteration 10: handling hash collision during put operations

Let's (at last) begin the implementation of the hash collision handling code of our hash table. We'll focus on put operations initially.

In this iteration we will put a series of distinct mappings that "want to stay in the same bucket".

Iteration 10: test case

With the help of our Key class created earlier, we create our test case:

public class HashTableTest {
  @Test(description = """
  put() method

  - handle hash collisions
  - no value replacement
  """)
  public void iter10() {
    var ht = new HashTable<Key, String>();

    var a = new Key("AAA", 2);
    var b = new Key("BBB", 2);
    var c = new Key("CCC", 2);

    assertEquals(ht.put(a, "aaa"), null);
    assertEquals(ht.put(b, "bbb"), null);
    assertEquals(ht.put(c, "ccc"), null);

    assertEquals(
      ht.toString(),
      """
      +-----+-----+-----+
      | idx | key | val |
      +-----+-----+-----+
      |   0 | CCC | ccc |
      |   1 |     |     |
      |   2 | AAA | aaa |
      |   3 | BBB | bbb |
      +-----+-----+-----+
      """
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

We create three Key instances a, b and c in such way that:

  • they are all distinct to each other; and
  • they all produce the same hash code value.

Therefore, keys a, b and c are guaranteed to cause hash collisions in our hash table. When associated with a value via the put method, they will all try to occupy the bucket idx = 2.

But, of course, only a single key can occupy that bucket at any given time. In other words, the first put will occupy the bucket; the keys of the subsequent put operations will have to look for alternative buckets.

The final internal disposition in our hash table is given by the toString assertion. How did we get to this exact disposition depicted by the toString output?

Iteration 10: linear probing

As all of the keys in our current test produce the same hash code value, they will all try to occupy the same bucket.

The hash table is empty when we associate key a to its value. So the a mapping occupies the bucket with index idx = 2.

When we try to associate key b to its value, that bucket it is already taken by key a. So we have key b occupy the next empty cell in the array. The next empty bucket is the one with index idx = 3 and that's where the b mapping is stored.

When we try to associate key c to its value it also wants to stay at bucket idx = 2. As it is occupied by key a, it tries the next cell. As the next cell is occupied by key b it tries the next cell.
Except there's no next cell: we are at the end of the array. So it tries again from the start of the array, idx = 0. As idx = 0 is empty that's where key c stays.

So that's how we got to the toString assertion:

assertEquals(
  ht.toString(),
  """
  +-----+-----+-----+
  | idx | key | val |
  +-----+-----+-----+
  |   0 | CCC | ccc |
  |   1 |     |     |
  |   2 | AAA | aaa |
  |   3 | BBB | bbb |
  +-----+-----+-----+
  """
);
Enter fullscreen mode Exit fullscreen mode

This technique of resolving hash collisions by "probing" for the next empty cell is called linear probing. You should know that there are more ways to handle hash collisions. This is just one of the possible solutions.

In the JDK, the following JDK classes use the same linear probing solution:

The first is the one returned by the Map.of methods when two or more mapping are specified. The second is the one returned by the Set.of methods when three or more elements are specified. I should note that these are the values returned by the methods at time of writing.

Iteration 10: running our test

Running our test against the previous implementation fails:

java.lang.UnsupportedOperationException: Implement me
    at iter07.HashTable.put2(HashTable.java:29)
    at iter07.HashTable.put1(HashTable.java:25)
    at iter03.HashTable.put0(HashTable.java:39)
    at iter02.HashTable.put(HashTable.java:25)
    at iter10.HashTableTest.iter10(HashTableTest.java:38)
Enter fullscreen mode Exit fullscreen mode

It reaches a branch of our code we did not implement yet. The stack trace indicates it is part of the code written during iteration #07.

Iteration 10: implementation

The following code makes our current test pass:

public class HashTable<K, V> extends iter09.HashTable<K, V> {
  @Override
  protected V put2(K key, V value, int bucket) {
    var index = bucket + 1;

    while (true) {
      if (index == keys.length) {
        index = 0;
      }

      var existing = keys[index];

      if (existing == null) {
        return putInsert(key, value, index);
      }

      index++;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

It overrides the put2 method created during iteration #07. This method is invoked with an unresolved hash collision. The parameters are:

  • key = the key the put method was invoked with;
  • value = the value the put method was invoked with; and
  • bucket = the computed array index for key.

Being a hash collision the specified bucket index is already occupied by a mapping. We need to look for an empty bucket.

Iteration 10: looking for an empty bucket

We start looking at the next adjacent bucket:

var index = bucket + 1;
Enter fullscreen mode Exit fullscreen mode

The index variable indicates our current array index candidate.

The search itself is done in a while loop:

while (true) {
  if (index == keys.length) {
    index = 0;
  }

  ...

  index++;
}
Enter fullscreen mode Exit fullscreen mode

At the very end of the loop the index variable is incremented. So, if an empty bucket is not found at the current candidate, we continue the search with the next one.

Except index might already be the last index of the array. When this is the case, we have to continue the search from the beginning of the array. That's the responsibility of the if statement at the beginning of the loop.

At the "core" of the loop, we test the current index candidate:

var existing = keys[index];

if (existing == null) {
  return putInsert(key, value, index);
}
Enter fullscreen mode Exit fullscreen mode

If the hash table contains no mappings at the current index candidate it means we have found our bucket. We proceed to store the specified key-value pair, at the current index, by invoking the putInsert method.

Iteration 11: hash collisions and value replacements

During iteration #07 of Part 2 we allowed the put method to replace the value of an existing mapping.

In this iteration we will implement the same use-case. Except we will do it for a mapping participating in a hash collision.

Iteration 11: test case

Here's our test case:

public class HashTableTest {
  @Test(description = """
  put() method

  - hash collision
  - value replacement
  """)
  public void iter11() {
    var ht = new HashTable<Key, String>();

    var a = new Key("AAA", 2);
    var b = new Key("BBB", 2);

    assertEquals(ht.put(a, "aaa"), null);
    assertEquals(ht.put(b, "123"), null);
    assertEquals(ht.put(b, "bbb"), "123", "\n" + ht.toString());

    var c = new Key("CCC", 3);

    assertEquals(ht.put(c, "ccc"), null);

    assertEquals(
      ht.toString(),
      """
      +-----+-----+-----+
      | idx | key | val |
      +-----+-----+-----+
      |   0 | CCC | ccc |
      |   1 |     |     |
      |   2 | AAA | aaa |
      |   3 | BBB | bbb |
      +-----+-----+-----+
      """
    );
  }
}
Enter fullscreen mode Exit fullscreen mode

We create two Key instances. Keys a and b are guaranteed to cause a hash collision.

We start by associating key a to the "aaa" string value.

Then, immediately after we associate key b to the "123" value, we replace its value with "bbb". As this is a replacement operation, we verify that the put method returns the previous value.

Finally, to additionally verify the correctness of our put method implementation, we associate the "ccc" value to key c. Unlike keys a and b, key c wants to stay at index idx = 3. In other words, it does not cause a hash collision with neither key a nor key b. But, as its "natural bucket" is already taken by key b, it must be put in an alternative location. So it is stored at the beginning of the array.

Iteration 11: running the test

Let's run our current test against the implementation of the previous iteration:

java.lang.AssertionError: 
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
|   0 | BBB | bbb |
|   1 |     |     |
|   2 | AAA | aaa |
|   3 | BBB | 123 |
+-----+-----+-----+
 expected [123] but found [null]
    at org.testng.Assert.fail(Assert.java:110)
    at org.testng.Assert.failNotEquals(Assert.java:1413)
    at org.testng.Assert.assertEqualsImpl(Assert.java:149)
    at org.testng.Assert.assertEquals(Assert.java:131)
    at org.testng.Assert.assertEquals(Assert.java:655)
    at iter11.HashTableTest.iter11(HashTableTest.java:39)
Enter fullscreen mode Exit fullscreen mode

The test fails: the put method does not return the previous mapping.

In fact, observing the error message, it seems our hash table now contains different mappings for the same key.

Iteration 11: implementation

Here's an implementation that makes our test pass:

public class HashTable<K, V> extends iter10.HashTable<K, V> {
  @Override
  protected final V put2(K key, V value, int bucket) {
    var index = bucket + 1;

    while (true) {
      if (index == keys.length) {
        index = 0;
      }

      var existing = keys[index];

      if (existing == null) {
        return putInsert(key, value, index);
      } 

      if (existing.equals(key)) {
        return putReplace(key, value, index);
      }

      index++;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Once again, we override the put2 method. It is the same method we worked on during the previous iteration.

The implementation is almost identical to the one from the last iteration. The difference is an additional if statement:

var existing = keys[index];

...

if (existing.equals(key)) {
  return putReplace(key, value, index);
}
Enter fullscreen mode Exit fullscreen mode

It tests whether existing is equal to the specified key. When the test evaluates to true, the corresponding mapped value must be replaced. We do so by invoking the putReplace method.

Iteration 11: infinite loop?

You may have noticed that our while statement contains an always true expression:

while (true) {
  ...
}
Enter fullscreen mode Exit fullscreen mode

So, unless one of the return statements in its "body" is executed, the while statement will run indefinitely. In fact, we can create a test case that does just that:

@Test(enabled = false, description = """
put() method

- guaranteed to cause an infinite loop
""")
public void iter11_infiniteLoop() {
  var ht = new HashTable<Key, String>();

  var a = new Key("AAA", 2);
  var b = new Key("BBB", 2);
  var c = new Key("CCC", 2);
  var d = new Key("DDD", 2);
  var e = new Key("EEE", 2);

  ht.put(a, "aaa");
  ht.put(b, "bbb");
  ht.put(c, "ccc");
  ht.put(d, "ddd");
  ht.put(e, "eee");
}
Enter fullscreen mode Exit fullscreen mode

Notice that we are trying to put more mappings than the hash table's capacity of 4.

The infinite loop is expected in this case. And there's nothing wrong (as far as I can tell) with our while loop.

The actual problem is that our hash table does not grow to accommodate more mappings. If the arrays are large enough, then an empty bucket will always be found.

Don't worry. We will implement the resize operation in the next post of this series.

Iteration 12: handling hash collision during get operations

We are done implementing the hash collision handling for put operations.

We can now focus on the get operation. Let's now have our get method support retrieving the values associated to keys that cause hash collisions.

Iteration 12: test case

This is our test case:

public class HashTableTest {
  @Test(description = """
  get() test case

  - hash collision
  - non existing key
  """)
  public void iter12() {
    var ht = new HashTable<Key, String>();

    var a = new Key("AAA", 3);
    var b = new Key("BBB", 3);
    var c = new Key("CCC", 3);

    assertEquals(ht.put(a, "aaa"), null);
    assertEquals(ht.put(b, "bbb"), null);
    assertEquals(ht.put(c, "ccc"), null);

    assertEquals(ht.get(a), "aaa");
    assertEquals(ht.get(b), "bbb");
    assertEquals(ht.get(c), "ccc");

    var d = new Key("DDD", 3);

    assertEquals(ht.get(d), null);
  }
}
Enter fullscreen mode Exit fullscreen mode

The first section of our test is very similar to the one we wrote during iteration #10:

  • we create three Key instances that produce the same hash code value; and
  • we proceed to associate a distinct value to each key.

Next, we try to retrieve the values we have just inserted by invoking the get method. We verify that the value returned is the one we expect it to be.

For completeness, we verify that the get method returns null when the hash table does not contain a mapping for the specified key:

var d = new Key("DDD", 3);

assertEquals(ht.get(d), null);
Enter fullscreen mode Exit fullscreen mode

Note that key d has the same hash code value as the other three keys. In other words, if we were to put it in our map, it would cause a hash collision at the same bucket as the other keys.

Iteration 12: running our test

As expected, running our test against the current implementation fails:

java.lang.UnsupportedOperationException: Implement me
    at iter08.HashTable.get1(HashTable.java:29)
    at iter08.HashTable.get0(HashTable.java:25)
    at iter04.HashTable.get(HashTable.java:33)
    at iter12.HashTableTest.iter12(HashTableTest.java:42)
Enter fullscreen mode Exit fullscreen mode

It reaches the method get1. It was left unimplemented during iteration #08.

Iteration 12: implementation

The following is the iteration's implementation:

public class HashTable<K, V> extends iter11.HashTable<K, V> {
  @SuppressWarnings("unchecked")
  @Override
  protected final V get1(Object key, int bucket) {
    var index = bucket + 1;

    while (true) {
      if (index == keys.length) {
        index = 0;
      }

      var existing = keys[index];

      if (existing == null) {
        return null;
      }

      if (key.equals(existing)) {
        return (V) values[index];
      }

      index++;
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

We override the get1 method created during iteration #08. It is called when a hash collision occurs during a get operation. The parameters are:

  • key = the key the get method was invoked with; and
  • bucket = the computed array index for key.

Iteration 12: looking for the specified key

The implementation is very similar to the one of the put operation:

In fact, the "outer layer" is identical to the put operation: we perform the search in a while loop starting at the next adjacent bucket.

var index = bucket + 1;

while (true) {
  if (index == keys.length) {
    index = 0;
  }

  ...

  index++;
}
Enter fullscreen mode Exit fullscreen mode

The differences are in the "core" of the loop.

If the current bucket is empty, it means our hash table does not contain the specified key. We must return null in this case:

var existing = keys[index];

if (existing == null) {
  // empty bucket == key not found
  return null;
}
Enter fullscreen mode Exit fullscreen mode

On the other hand, if the bucket is not empty and the stored existing key is equal to the specified key, it means we have found the key. In this case, we return the corresponding value from the values array:

var existing = keys[index];
...
if (key.equals(existing)) {
  // we've found the key!
  // return the corresponding value
  return (V) values[index];
}
Enter fullscreen mode Exit fullscreen mode

In the next blog post of this series

In this third blog post of our series our hash table is finally capable of handling hash collisions. We have implemented the linear probing technique for both the put and the get operations.

But the implementation is still incomplete. Trying to put a fifth mapping to our hash table causes an infinite loop.

To fix this issue, our hash table needs to dynamically grow so it can store more mappings. In doing so, all of the key-value mappings must be re-inserted.

This is what we call a rehash operation and will be the topic of the next and final blog post of this series.

Stay tuned!

The source code for all of the examples in this post can be found at this GitHub repository.

💖 💪 🙅 🚩
marcioendo
Marcio Endo

Posted on October 21, 2022

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related