Hash Table in Data Structures (2024)

06 Jun 2024

Advanced

2.79K Views

25 min read

Learn via Video Course & by Doing Hands-on Labs

Data Structures & Algorithms Free Course with Certificate

Free Course Hands-On Labs

Free Course Self Paced Courses Hands-On Labs

Hash Table in Data Structures: An Overview

In the previous tutorial, we saw what is hashing and how it works. We saw that a hash table is a data structure that stores data in an array format. The table maps keys to values using a hash function. In this DSA tutorial, we'll explore the hash table in a little detail like its working, implementation, types, etc.

To further enhance your understanding and application of hashing concepts, consider enrolling in the Dsa Course Online, to gain comprehensive insights into effective data structure utilization for improved problem-solving and time management.

What is a Hash Table in Data Structures?

A Hash table is a data structure used to insert, look up, and remove key-value pairs quickly. It implements an associative array. Here, each key is translated by a hash function into a distinct index in an array. The index functions as a storage location for the matching value.

How Does the Hash Table Work?

Initialize the Hash Table: Start with an empty array of a fixed size (let's say 10). Each index in the array represents a bucket in the hash table.
Insertion: When inserting a key-value pair into the hash table, the hash function is used to calculate the index where the pair should be stored. The hash function takes the key as input and returns an integer value.
Let's say we want to insert the key "apple" with the value "fruit" into the hash table. The hash function calculates the index as follows: hash("apple") = 5 Since the hash value is 5, we store the key-value pair in index 5 of the array: [empty, empty, empty, empty, empty, ("apple", "fruit"), empty, empty, empty, empty]
See Also
Deletion in Hash Tables using Python - GeeksforGeeks
Retrieval: To retrieve a value associated with a given key, we use the same hash function to calculate the index. We then look for the key-value pair in that index.
For example, if we want to retrieve the value for the key "apple," the hash function calculates the index as 5. We check index 5 in the array and find the key-value pair ("apple", "fruit"): [empty, empty, empty, empty, empty, ("apple", "fruit"), empty, empty, empty, empty]
Collision: Hash tables can encounter collisions when two different keys map to the same index. This can happen if the hash function is not perfectly distributed or if the array size is relatively small compared to the number of keys.
For example, if we try to insert the key "banana" and the hash function calculates the index as 5, we encounter a collision. We search for the next available slot and find index 6 is empty. So, we store the key-value pair ("banana", "fruit") in index 6: [empty, empty, empty, empty, empty, ("apple", "fruit"), ("banana", "fruit"), empty, empty, empty] [empty, ("apple", "fruit"), ("banana", "fruit"), empty, empty, empty]

Read More - Data Structure Interview Questions and Answers

Hash Table Data Structure Algorithm

Initialize an array of fixed size to serve as the underlying storage for the hash table. The size of the array depends on the expected number of key-value pairs to be stored.
Define a hash function that takes a key as input and generates a unique hash code.
To insert a key-value pair into the hash table, apply the hash function to the key to determine the index. If the calculated index is already occupied, handle collisions using a collision resolution strategy.
To retrieve the value associated with a key, apply the hash function to the key to determine the index. If the index contains a value, compare the stored key with the given key to ensure a match. If the keys match, return the corresponding value. If the index is empty or the keys don't match, the key is not present in the hash table.
To remove a key-value pair from the hash table, apply the hash function to the key to determine the index. If the index contains a value, compare the stored key with the given key to ensure a match. If the keys match, remove the key-value pair from the table. If the index is empty or the keys don't match, the key is not present in the hash table

Implementation of Hash Table in Different Programming Languages

Python
Java
C++

 def get(self, key): index = self.hashFunction(key) bucket = self.table[index] for kvp in bucket: if kvp[0] == key: return kvp[1] return "Key not found" def remove(self, key): index = self.hashFunction(key) bucket = self.table[index] for kvp in bucket: if kvp[0] == key: bucket.remove(kvp) return# Main functionif __name__ == "__main__": hashTable = HashTable(10) # Insert key-value pairs hashTable.insert(1, "Value 1") hashTable.insert(11, "Value 11") hashTable.insert(21, "Value 21") hashTable.insert(2, "Value 2") # Retrieve values print(hashTable.get(1)) # Output: Value 1 print(hashTable.get(11)) # Output: Value 11 print(hashTable.get(21)) # Output: Value 21 print(hashTable.get(2)) # Output: Value 2 print(hashTable.get(5)) # Output: Key not found # Remove a key-value pair hashTable.remove(11) # Try to retrieve the removed value print(hashTable.get(11)) # Output: Key not found

 import java.util.ArrayList; import java.util.LinkedList; import java.util.List; // Hash Table class class HashTable { // Bucket is a list of key-value pairs private class KeyValuePair { int key; String value; KeyValuePair(int key, String value) { this.key = key; this.value = value; } } private List<LinkedList<KeyValuePair>> table; private int size; // Constructor HashTable(int tableSize) { size = tableSize; table = new ArrayList<>(size); for (int i = 0; i < size; i++) { table.add(new LinkedList<>()); } } // Hash function private int hashFunction(int key) { return key % size; } // Insert a key-value pair into the hash table void insert(int key, String value) { int index = hashFunction(key); LinkedList<KeyValuePair> bucket = table.get(index); // Check if the key already exists in the bucket for (KeyValuePair kvp : bucket) { if (kvp.key == key) { // Key found, update the value kvp.value = value; return; } } // Key not found, add a new key-value pair bucket.add(new KeyValuePair(key, value)); } // Retrieve the value associated with a given key String get(int key) { int index = hashFunction(key); LinkedList<KeyValuePair> bucket = table.get(index); // Search for the key in the bucket for (KeyValuePair kvp : bucket) { if (kvp.key == key) { // Key found, return the value return kvp.value; } } // Key not found return "Key not found"; } // Remove a key-value pair from the hash table void remove(int key) { int index = hashFunction(key); LinkedList<KeyValuePair> bucket = table.get(index); // Search for the key in the bucket for (KeyValuePair kvp : bucket) { if (kvp.key == key) { // Key found, remove the pair bucket.remove(kvp); return; } } } } // Main function class Main { public static void main(String[] args) { HashTable hashTable = new HashTable(10); // Insert key-value pairs hashTable.insert(1, "Value 1"); hashTable.insert(11, "Value 11"); hashTable.insert(21, "Value 21"); hashTable.insert(2, "Value 2"); // Retrieve values System.out.println(hashTable.get(1)); // Output: Value 1 System.out.println(hashTable.get(11)); // Output: Value 11 System.out.println(hashTable.get(21)); // Output: Value 21 System.out.println(hashTable.get(2)); // Output: Value 2 System.out.println(hashTable.get(5)); // Output: Key not found // Remove a key-value pair hashTable.remove(11); // Try to retrieve the removed value System.out.println(hashTable.get(11)); // Output: Key not found } }

 #include <iostream> #include <vector> #include <list> // Hash Table class class HashTable { private: // Bucket is a list of key-value pairs typedef std::pair<int, std::string> KeyValuePair; typedef std::list<KeyValuePair> Bucket; std::vector<Bucket> table; int size; public: // Constructor HashTable(int tableSize) : size(tableSize) { table.resize(size); } // Hash function int hashFunction(int key) { return key % size; } // Insert a key-value pair into the hash table void insert(int key, const std::string& value) { int index = hashFunction(key); Bucket& bucket = table[index]; // Check if the key already exists in the bucket for (auto& kvp : bucket) { if (kvp.first == key) { // Key found, update the value kvp.second = value; return; } } // Key not found, add a new key-value pair bucket.push_back(std::make_pair(key, value)); } // Retrieve the value associated with a given key std::string get(int key) { int index = hashFunction(key); Bucket& bucket = table[index]; // Search for the key in the bucket for (const auto& kvp : bucket) { if (kvp.first == key) { // Key found, return the value return kvp.second; } } // Key not found return "Key not found"; } // Remove a key-value pair from the hash table void remove(int key) { int index = hashFunction(key); Bucket& bucket = table[index]; // Search for the key in the bucket for (auto iter = bucket.begin(); iter != bucket.end(); ++iter) { if (iter->first == key) { // Key found, remove the pair bucket.erase(iter); return; } } } }; // Main function int main() { HashTable hashTable(10); // Insert key-value pairs hashTable.insert(1, "Value 1"); hashTable.insert(11, "Value 11"); hashTable.insert(21, "Value 21"); hashTable.insert(2, "Value 2"); // Retrieve values std::cout << hashTable.get(1) << std::endl; // Output: Value 1 std::cout << hashTable.get(11) << std::endl; // Output: Value 11 std::cout << hashTable.get(21) << std::endl; // Output: Value 21 std::cout << hashTable.get(2) << std::endl; // Output: Value 2 std::cout << hashTable.get(5) << std::endl; // Output: Key not found // Remove a key-value pair hashTable.remove(11); // Try to retrieve the removed value std::cout << hashTable.get(11) << std::endl; // Output: Key not found return 0; }

Output

Value 1Value 11Value 21Value 2Key not foundKey not found

Characteristics of a Good Hash Function

Deterministic: A hash function should always produce the same hash value for the same input. This property ensures consistency and allows for easy verification and comparison.
Uniform distribution: The hash function should evenly distribute the hash values across the output space. This property helps to minimize collisions and ensures that different inputs have a low probability of producing the same hash value.
Fixed output size: A good hash function has a fixed output size, regardless of the input size. This property enables efficient storage and comparison of hash values.
Fast computation: Hash functions should be computationally efficient to handle large amounts of data quickly. Speed is crucial for applications that involve hashing large datasets or frequently computing hash values.
Avalanche effect: A small change in the input should result in a significant change in the output hash value. This property ensures that even a minor modification in the input leads to a completely different hash value, making it difficult to predict or reverse-engineer the original input.
Resistance to collisions: While it is impossible to eliminate collisions (different inputs mapping to the same hash value), a good hash function minimizes their occurrence. It should be difficult to find two different inputs that produce the same hash value.
Non-invertibility: Given a hash value, it should be computationally infeasible to determine the original input that generated it. This property provides security and helps protect sensitive information.
Well-defined behavior: The hash function should have an unambiguous definition for all possible inputs. It should handle edge cases and corner cases gracefully, without producing unexpected results or errors.
Security properties: For cryptographic applications, a good hash function should exhibit additional security properties such as resistance to preimage attacks (finding an input given its hash value) and second preimage attacks (finding a different input with the same hash value).

Examples of widely-used hash functions that possess these characteristics include:

(Secure Hash Algorithm 3)
BLAKE2
MD5 (although it is considered less secure due to vulnerabilities)

Note that the choice of a hash function depends on the specific requirements of the application, and different hash functions may be suitable for different use cases.

Hash Collision

A hash collision refers to a situation where two different inputs produce the same hash value or hash code when processed by a hash function. In other words, it occurs when two distinct pieces of data result in an identical hash output.
Hash collisions can have practical implications depending on the context, e.g. in hash tables, collisions can lead to degraded performance if not handled efficiently.
Cryptographic hash functions, used in areas like data security and digital signatures, have stricter requirements to prevent collisions. These functions aim to provide a high level of collision resistance, making it computationally infeasible to find two inputs that produce the same hash value.
Cryptographic hash collisions are of particular concern because they can undermine the integrity and security of cryptographic systems. Researchers and cryptographers continually evaluate and analyze hash functions to ensure their collision resistance and security properties, as vulnerabilities or weaknesses discovered in hash functions could have significant implications for various applications that rely on them.

Types of Hash Collision

There are primarily two types of hash collisions: accidental or random collisions and intentional or malicious collisions. Let's explore each type in more detail:

Accidental Collisions: Accidental collisions occur when two different inputs produce the same hash value due to the nature of hashing algorithms and the limited range of possible hash values. These collisions are unintended and usually considered rare and coincidental. The probability of accidental collisions depends on the quality of the hashing algorithm and the size of the hash space. Good hashing algorithms strive to minimize the chances of accidental collisions by distributing hash values uniformly across the output space.
Intentional Collisions: Intentional collisions occur when an attacker purposefully generates two or more different inputs that produce the same hash value. These collisions are often the result of exploiting vulnerabilities or weaknesses in the hashing algorithm. Intentional collisions can be a significant security concern in various scenarios, such as digital signatures, certificate authorities, password hashing, and data integrity checks.
There are two main categories of intentional collisions:
1. Preimage Attacks: In a preimage attack, an attacker tries to find any input that matches a given hash value. This type of attack is aimed at breaking the one-way property of a hash function, which means that it is computationally infeasible to determine the original input from its hash value.
2. Collision Attacks: In a collision attack, an attacker aims to find two different inputs that produce the same hash value. The goal is to break the collision resistance property of a hash function, which ensures that it is computationally difficult to find any two inputs with the same hash value

Hash Table Applications

Dictionaries: Hash tables are commonly used to implement dictionary data structures, where words or terms are associated with their definitions or values. Hashing allows for quick lookup and retrieval of definitions based on the provided key.
Caching: Hash tables are employed in caching systems to store frequently accessed data. By using a hash table, the most recently used or important items can be stored, and retrieval can be performed in constant time, improving overall system performance.
Symbol tables: In compilers and interpreters, hash tables are used to implement symbol tables. Symbol tables store information about variables, functions, classes, and other symbols used in a program. Hashing enables efficient lookup and manipulation of symbols during compilation or interpretation.
Database indexing: Hash tables are used in database systems to implement indexing structures. Hash-based indexing allows for fast data retrieval based on a specific key or attribute, speeding up query processing in databases.
Set membership: Hash tables are utilized to implement sets, where membership operations like adding, removing, and checking for the presence of an element are performed efficiently. Hashing allows for constant-time lookup and insertion operations, making sets suitable for tasks such as duplicate removal and membership testing.
Caching of computed values: Hash tables are employed to store pre-computed results or expensive computations. By caching the results based on the input parameters, subsequent requests can be satisfied instantly, avoiding redundant calculations.
Spell checkers: Hash tables are used in spell-checking algorithms to store a dictionary of valid words. By hashing the words, the algorithm can quickly check if a given word is present in the dictionary, helping identify potential misspellings.
Cryptographic data structures: Hash tables are employed in cryptographic systems for tasks like password storage and digital signatures. Hash functions convert sensitive data into fixed-length hashes, and hash tables provide an efficient way to store and retrieve hashed values securely.

Summary

The hash table in data structures is a powerful and versatile algorithm that can be used to store and easily access data. It is a comprehensive search mechanism that allows for the efficient retrieval of values without wasting computing resources. By utilizing hash tables, organizations can quickly index large datasets and accomplish their tasks more efficiently. Additionally, it helps in categorizing the elements into appropriate buckets dependent on some calculated value before performing any search.

FAQs

Q1. What is a Hash Table?

A Hash table is a data structure used to insert, look up, and remove key-value pairs quickly.

Q2. What is meant by Hash Collision?

A hash collision refers to a situation where two different inputs produce the same hash value or hash code when processed by a hash function.

Q3. What are the Types of Hash Collision?

There are primarily two types of hash collisions: accidental or random collisions and intentional or malicious collisions.

FAQs

What is a hash table in data structure? ›

Hash tables are a type of data structure in which the address/ index value of the data element is generated from a hash function. This enables very fast data access as the index value behaves as a key for the data value.

Read On ›

What is HashMap and HashTable in data structure? ›

While both classes use keys to look up values, there are some important differences, including: A HashTable doesn't allow null keys or values; a HashMap does. A HashTable is synchronized to prevent multiple threads from accessing it at once; a HashMap isn't.

Discover More Details ›

What is the hash function in data structure? ›

Any function that converts data of any size into fixed-size values is a hash function. Hash values, hash codes, digests, or just hashes are the names given to the results of a hash function. The values are typically used to index a hash table, a fixed-size table. Table of Contents.

What is a HashMap in data structure? ›

Hash maps are a common data structure used to store key-value pairs for efficient retrieval. A value stored in a hash map is retrieved using the key under which it was stored.

See Details ›

What is a real life example of a hash table? ›

There are many practical examples of hash tables used in every-day life. A popular example is in username-password databases. Every time someone signs up on a website using a username and password, that information must be stored somewhere for later retrieval.

Find Out More ›

When to use a hash table? ›

Use Hash tables when we need to store key-value pairs and perform frequent lookup, insert, or delete operations. Use Sets when we need to store unique elements, don't care about duplicates, and just need to perform operations like checking if an element is in the set or not.

Tell Me More ›

What is the purpose of hashing? ›

Hashing is commonly used to ensure data integrity. By generating a hash value for an amount of data, such as a file or message, a user can later compare it with the hash value of the received data to verify if any changes or corruption occurred during transmission. Efficient data retrieval.

Show Me More ›

How is a hash table stored in memory? ›

Our hash table stores keys and their values interleaved, allowing key-value pairs to be retrieved with a single memory access. Each slot of the cuckoo hash table stores a 64-bit entry, with the key and its value stored in the upper and lower 32 bits, respectively.

Explore More ›

What are the three types of hashing? ›

Understanding Three Types of Hashing. In the realm of data security and integrity, understanding the intricacies of Three Types of hashing - MD5, SHA-2 , and CRC32 - is paramount. Each algorithm serves a distinct purpose in safeguarding digital assets and ensuring the authenticity of information.

What is a real life example of HashMap? ›

Real-Life Use Case: Phone Book Application

In this example, we use a HashMap to store the contact information, with names as keys and phone numbers as values. Using the put() method, we add entries to the phone book, and with the get() method, we retrieve a phone number by providing the associated name.

Show Me More ›

Is a HashMap an array? ›

Hash map is implemented as an array, in which every element includes a list. The lists contain (key, value) pairs. The user can search from the hash map based on the key, and they can also add new key-value pairs into it. Each key can appear at most once in the hash map.

Read The Full Story ›

What is a HashMap good for? ›

Benefits of Using HashMaps

Efficiency: HashMap offers constant-time performance for basic operations like get and put, assuming the hash function disperses the elements properly among the buckets. Null Allowed: HashMap allows one null key and multiple null values, providing flexibility in storing data.

See Details ›

What is the difference between index and hash table? ›

Hashing is more appropriate for bigger databases that need to provide rapid and direct access to records without the need for an index, while indexing is best suited for smaller databases where quick read operations and ordered data retrieval are necessary.

Get More Info Here ›

What is the advantage of a hash table as a data structure? ›

The main advantage of hash tables over other data structures is speed . The access time of an element is on average O(1), therefore lookup could be performed very fast. Hash tables are particularly efficient when the maximum number of entries can be predicted in advance.

Is a Python dictionary a hash table? ›

In Python, dictionaries are implemented as hash tables. Resizing a hash table is primarily about managing this balance.

Why is it called a hash table? ›

In French, a hash table is called "table de hachage", the related verb "hacher" means to chop/to mince (food mostly). The verb to hash has the same meaning in English. So as other have pointed out it is called hash, because you chop your input that you put in pieces in different places (your table entries).

View Details ›