Luke a Pro

Luke Sun

Developer & Marketer

🇺🇦
EN||

Hash Map

| , 4 minutes reading.

Hash Map: The Speed of Light

The Story: The Librarian with Teleportation

Imagine a library with 10 million books. In a normal library (an Array or a List), you have to walk past every shelf to find a book. Even in a sorted library (Binary Search), you still have to check 24 shelves.

In 1953, H.P. Luhn (the same genius behind hashing) proposed a different kind of library. In this library, the book’s title is the secret address. You say the title “The Great Gatsby,” apply a magic formula, and it tells you: “It’s on shelf #429.” You don’t walk; you teleport to shelf #429.

This is the Hash Map (or Hash Table). It is the most important data structure in modern software because it makes searching practically instantaneous (O(1)O(1) time).

Why do we need it?

If you ever used a dict in Python, an Object in JavaScript, or a Map in Java, you used a Hash Map.

  • Database Indexes: How does a database find your record among billions? It uses a Hash index.
  • Caching: When you want to save a result (like a user’s profile) so you don’t have to fetch it again, you put it in a Map.
  • Counting: Counting the frequency of every word in a book.

How the Algorithm “Thinks”

The algorithm is a matchmaker between keys and memory.

  1. The Formula (Hashing): It takes your Key (e.g., “Luke”) and turns it into a large number using a Hash Function.
  2. The Address (Compression): It takes that large number and shrinks it (usually with a Modulo % operator) to fit the size of its internal storage (an Array).
  3. The Storage: It puts the Value at that specific index.
  4. The Conflict Resolution: What if two different keys (e.g., “Luke” and “Sun”) want the same index?
    • Chaining: It creates a small list at that index to hold both.
    • Open Addressing: It looks for the next empty seat nearby.

Engineering Context: The Load Factor

A Hash Map is fast, but it is memory-hungry.

  • Space vs Time: To keep it fast, the map must be mostly empty. If the map gets too full (a high “Load Factor”), collisions increase, and O(1)O(1) speed degrades into O(N)O(N) (the speed of a slow list).
  • Resizing: When the map hits about 75% capacity, it “Resizes”—it creates a massive new warehouse and re-calculates every single address. This is a heavy operation, which is why we often pre-allocate space if we know the size.

Implementation (Python)

class SimpleHashMap:
    def __init__(self, size=100):
        self.size = size
        # A list of lists to handle collisions via Chaining
        self.table = [[] for _ in range(self.size)]

    def _hash(self, key):
        # Python's built-in hash() handles the dirty work
        return hash(key) % self.size

    def put(self, key, value):
        index = self._hash(key)
        # Update if exists, else add
        for i, (k, v) in enumerate(self.table[index]):
            if k == key:
                self.table[index][i] = (key, value)
                return
        self.table[index].append((key, value))

    def get(self, key):
        index = self._hash(key)
        for k, v in self.table[index]:
            if k == key:
                return v
        return None # Not found

Summary

The Hash Map teaches us that knowledge is about location. By turning the content of a question into the address of its answer, we bypass the need for searching. It reminds us that in engineering, sometimes the best way to speed up a process is to spend a little more on space.