Understanding Hash Tables and Hash Functions*

*Understanding Hash Tables and Hash Functions**#

A hash table (or hash map) is a data structure that efficiently stores and retrieves key-value pairs using a hash function. The function converts a key into an index used to store and locate values quickly.

Real-World Use Cases of Hash Tables#

Hash tables are widely used in software development, databases, and networking due to their fast access times. Here are some key applications:

Databases and Caching
- Indexing: Hash tables help optimize database search operations.
- Caching: Used in applications like Redis and Memcached to store frequently accessed data.
Compilers and Interpreters
- Symbol Tables: Stores variable names, function names, and their corresponding memory addresses.
Networking
- Routing Tables: Hash tables map IP addresses to network devices for efficient packet routing.
- DNS Resolution: Maps domain names to IP addresses.
Cybersecurity
- Cryptographic Hashing: Used to store passwords securely (e.g., SHA-256, bcrypt).
- Data Integrity Verification: Hash functions check if files have been altered.
Operating Systems
- Process Scheduling: Hash tables store process states efficiently.
- File Systems: Used in file indexing systems.
Machine Learning & AI
- Feature Hashing: Converts large input spaces into a fixed-size representation.
- Nearest Neighbor Search: Accelerates similarity searches.
Blockchain and Cryptography
- Bitcoin and Ethereum use Merkle Trees, which rely on hash functions for data integrity.
Web Development
- Session Management: Hash maps store user sessions efficiently.
- Auto-Complete Features: Maps user input to predicted results.

Because of their efficiency and versatility, hash tables are a foundational data structure across various computing fields.

How Hash Tables Work#

Hashing Function: A function converts a key (e.g., a string or number) into an integer.
Indexing: The computed hash determines the storage location in an array.
Collision Handling: Since multiple keys can hash to the same index (collision), hash tables use methods like chaining or open addressing.

Understanding Hash Functions#

A hash function maps input data (keys) to an integer index in a hash table.

Example of a simple hash function in Java:

private int hash(String key) {
    return Math.abs(key.hashCode()) % SIZE;  // Uses Java's built-in hashCode()
}

The hash() method uses Java’s hashCode() function.
The modulus operation ensures the index falls within the valid range of the array.

Key Properties of a Good Hash Function#

Deterministic: The same input always produces the same output.
Uniform Distribution: Keys should be distributed evenly across the table.
Efficient Computation: Fast calculation of hash values.
Minimal Collisions: Different keys should ideally generate unique outputs.

Collision Handling Methods#

Since different keys may hash to the same index, hash tables use collision resolution strategies.

1. Chaining (Separate Chaining)#

Each index in the table holds a linked list (or another data structure) to store multiple key-value pairs.
If a collision occurs, new elements are simply appended to the linked list at the index.

Chaining Implementation in Java#

import java.util.LinkedList;

class HashTableChaining {
    private static final int SIZE = 10;
    private LinkedList<Entry>[] table;

    public HashTableChaining() {
        table = new LinkedList[SIZE];
        for (int i = 0; i < SIZE; i++) {
            table[i] = new LinkedList<>();
        }
    }

    private int hash(String key) {
        return Math.abs(key.hashCode()) % SIZE;
    }

    public void insert(String key, String value) {
        int index = hash(key);
        for (Entry entry : table[index]) {
            if (entry.key.equals(key)) {
                entry.value = value;
                return;
            }
        }
        table[index].add(new Entry(key, value));
    }

    public String search(String key) {
        int index = hash(key);
        for (Entry entry : table[index]) {
            if (entry.key.equals(key)) {
                return entry.value;
            }
        }
        return null;
    }

    public void delete(String key) {
        int index = hash(key);
        table[index].removeIf(entry -> entry.key.equals(key));
    }

    private static class Entry {
        String key, value;
        public Entry(String key, String value) { this.key = key; this.value = value; }
    }

    public static void main(String[] args) {
        HashTableChaining ht = new HashTableChaining();
        ht.insert("name", "Alice");
        ht.insert("age", "25");
        System.out.println(ht.search("name")); // Output: Alice
        ht.delete("name");
        System.out.println(ht.search("name")); // Output: null
    }
}

2. Open Addressing#

Instead of using separate lists, open addressing stores all entries directly in the table. If a collision occurs, the algorithm searches for an available slot using a probing technique.

Types of Open Addressing#

Linear Probing: Check the next available slot sequentially.
Quadratic Probing: Check slots at increasing intervals (e.g., 1², 2², 3²…).
Double Hashing: Use a secondary hash function to find new slots.

Open Addressing Implementation in Java (Linear Probing)#

class HashTableOpenAddressing {
    private static final int SIZE = 10;
    private Entry[] table;
    private static final Entry DELETED = new Entry(null, null); // Marker for deleted entries

    public HashTableOpenAddressing() {
        table = new Entry[SIZE];
    }

    private int hash(String key) {
        return Math.abs(key.hashCode()) % SIZE;
    }

    public void insert(String key, String value) {
        int index = hash(key);
        while (table[index] != null && table[index] != DELETED) {
            index = (index + 1) % SIZE; // Linear probing
        }
        table[index] = new Entry(key, value);
    }

    public String search(String key) {
        int index = hash(key);
        while (table[index] != null) {
            if (table[index] != DELETED && table[index].key.equals(key)) {
                return table[index].value;
            }
            index = (index + 1) % SIZE;
        }
        return null;
    }

    public void delete(String key) {
        int index = hash(key);
        while (table[index] != null) {
            if (table[index].key.equals(key)) {
                table[index] = DELETED; // Mark as deleted
                return;
            }
            index = (index + 1) % SIZE;
        }
    }

    private static class Entry {
        String key, value;
        public Entry(String key, String value) { this.key = key; this.value = value; }
    }

    public static void main(String[] args) {
        HashTableOpenAddressing ht = new HashTableOpenAddressing();
        ht.insert("name", "Alice");
        ht.insert("age", "25");
        System.out.println(ht.search("name")); // Output: Alice
        ht.delete("name");
        System.out.println(ht.search("name")); // Output: null
    }
}

Choosing the Right Hashing Method#

For general hash tables: MurmurHash or FNV-1a
For security: SHA-256
For simplicity: DJB2 or Remainder
For handling collisions:
- Use chaining when working with dynamic-sized tables.
Use open addressing when memory efficiency is critical.