Data Structures: Hash Tables I

This is a 2-part series on Hash Table data structure. In this first part, we will learn what hash tables are, how they work, the various parts of a hash table, the benefits of using a hash table, and some real applications of hash tables. In the second part, we shall build a hash table in Javascript. Let's get into it.

What is a Hash Table?

A hash table, also known as a hash map or dictionary, is a data structure made using an array and a hash function, that is used to store data in a way that makes data retrievable faster than in an array.

In an array, to directly access data, we need to first know its index in the array; otherwise, we have to loop through it; hash tables circumvent this need by assigning the data an index in the array using a hash function. Hash tables store data in pairs key and value the keys are used in the hashing process to determine an index for the data (value).

Hash Function

Hashing is the process of converting a given value to another value. The hash function in a hash table takes in a key and transforms it into a value hash code that matches an index in the table(array). An example is a function that accepts a string, gets the length of the string, multiplies it by 3, and mod the results by the number of spaces in the table.


 function hashFunc (string){
   return (string.length * 3) % array.length
}

Using this hash function, we can see that each name gets a number that determines its location in the table. When retrieving the names, all we have to do is get the hash code and we know exactly where in the table the data is stored.

Collisions

If we keep adding names to our names hash table eventually there will be a point when 2 names will have the same hash code, this is called a collision. There is no guarantee that 2 keys won't have the same hash code index in a table, so we need to prepare for when this happens, there are 2 main ways to resolve collisions; open addressing and separate chaining.

Open addressing

Open addressing, also known as closed hashing, is a collision resolution technique that uses probing to resolve collisions by locating other free slots/buckets in the table. How these free slots are located depends on the type of probing implemented. The 3 most common probing techniques are:

Linear probing: In linear probing If a spot is occupied we add 1 to the hash code to check the next spot in the table until a free spot is found. An example of linear probing is *hashFunc(name) + n* where *n* is the number of consecutive collisions.

Quadratic probing: In quadratic probing the hash code is added to the result of a quadratic equation that changes every time we have a collision. An example of a quadratic probing formula is (hashFunc(name) + n²) % table.size Quadratic probing ensures that when collisions occur, the hash code moves further away from that spot in the table.

Double Hashing: This involves using a secondary hash function to create a new hash code based on the previous. An example of a double hash function would look like this (hashFunc(name) + n * secondHashFunc(name)) % table.size it is recommended for n in this situation to be a prime number so should the table size.


 function secondHashFunc (string){
   return (string.length % table.size) - k // k is a random prime number less than table.size
}

Load Factor

The efficiency of open addressing is heavily dependent on the load factor of the hash table. The load factor is the result of dividing the number of entries in the array by the size of the table. The closer the load factor is to 1, the lower the efficiency of open addressing. If the load factor of the hash table becomes 1, open addressing can result in an infinite loop. The solution to this issue is to resize the hash table by creating more slots for data and re-hashing the data in the table. it is recommended to always have more slots in the table than needed between 30% to 50% more. Resizing is usually carried out when the load factor is close to or at (0.8).

Separate Chaining

Separate chaining is the easiest and most commonly used collision resolution technique; separate chaining involves using arrays, linked lists, maps etc; to store all data that share an index. Compared to open addressing, separate chaining proves to be much faster at conflict resolution and maintaining the efficiency of the table. The load factor has very little effect on the efficiency of separate chaining; a major downside of it, though, is that it consumes more memory and slots in the hash table can go unused.

Uses / Real-Life Applications of Hash Table

Hash tables are a very useful data structure for when you need to retrieve, insert and delete data, fast. Hash tables have a lot of real-life applications which include:

File system address management: In some file managers, hash tables are used to keep track of where files are located on the disk.
Programming languages: Some programming languages such as Python, Java, Ruby, etc, have hash tables built into them; hash tables are also used when compiling certain programming languages.
Games: Certain board games such as tic-tac-toe, chess, checkers etc, use hash tables to keep track of the board.
Phone Directory: All mobile phones have a phonebook app on them. Hash tables are the perfect data structure for keeping track of people's names and phone numbers.
Database indexing: Some databases use hash tables to quickly store and retrieve data.

I personally find hash tables to be a very interesting data structure and would love to build something using them. Thanks for reading, in the next part we are going to implement a hash table in Javascript, see you then bye 👋

Blog