Luke a Pro

Luke Sun

Developer & Marketer

đŸ‡ș🇩

Rabin-Karp Algorithm

| , 5 minutes reading.

Rabin-Karp: The Digital Fingerprint

The Story: The Efficiency of the Seal

In ancient times, if a King wanted to verify if a letter was authentic, he didn’t read the whole letter first. He looked at the Wax Seal. If the seal matched his signet ring, he knew the letter was genuine.

In 1987, Michael O. Rabin and Richard M. Karp applied this ancient wisdom to computer science. They realized that comparing two strings character-by-character is slow. But comparing two numbers is instantaneous.

They designed a way to turn any string into a unique “number” (a Hash). If the numbers match, the strings likely match. This turned string searching from a linguistic problem into a mathematical one.

Why do we need it?

Rabin-Karp shines in Multiple Pattern Search and Plagiarism Detection.

  • Plagiarism Checkers: If you want to check if a student copied a paragraph from 10,000 different articles, Rabin-Karp can scan all of them simultaneously by comparing hashes.
  • Intrusion Detection: Searching for thousands of virus signatures in a single pass of a network packet.
  • Rolling Hash: It is the core of rsync, allowing it to find which parts of a file have changed without sending the whole file.

How the Algorithm “Thinks”

The algorithm is a mathematician who uses a Rolling Hash.

  1. The Fingerprint (Hash): It turns the pattern apple into a number, say 12345.

  2. The Sliding Window: It slides a window of 5 characters over the text.

    • “hello” →\to 56789 (Not a match)
    • “world” →\to 43210 (Not a match)
  3. The “Rolling” Magic: This is the secret. To calculate the hash of the next window, it doesn’t re-read all 5 characters. It mathematically “rolls out” the first character and “rolls in” the new one. Hash(next) = (Hash(current) - FirstChar) * Base + NewChar. This makes the hash calculation O(1)O(1) for every step.

  4. The Final Verification: If the hash matches, it performs a quick character-by-character check just to be 100% sure there wasn’t a “Hash Collision” (two different strings with the same number).

Engineering Context: The Power of Parallelism

The brilliance of Rabin-Karp is that it turns string matching into something a GPU or a Specialized Chip can do very well.

  • Complexity: Average case is O(N+M)O(N + M), but it allows you to search for KK patterns in almost the same time as one.
  • Trade-off: It is sensitive to the choice of the Hash Function. If the hash is bad, you get too many “false alarms” (collisions), and the speed drops.

Implementation (Python)

def rabin_karp(text, pattern):
    n, m = len(text), len(pattern)
    if m > n: return -1
    
    # A simple rolling hash (polynomial rolling hash)
    base = 256
    prime = 101 # A prime number to keep hashes small
    
    p_hash = 0 # Hash of pattern
    t_hash = 0 # Hash of current window in text
    h = pow(base, m - 1) % prime # The value of the highest digit
    
    # 1. Initial Fingerprint
    for i in range(m):
        p_hash = (base * p_hash + ord(pattern[i])) % prime
        t_hash = (base * t_hash + ord(text[i])) % prime
        
    # 2. Slide and Roll
    for i in range(n - m + 1):
        if p_hash == t_hash:
            # 3. Final Verification (to avoid collisions)
            if text[i:i+m] == pattern:
                return i
        
        if i < n - m:
            # 4. The Rolling Math: Roll out text[i], Roll in text[i+m]
            t_hash = (base * (t_hash - ord(text[i]) * h) + ord(text[i+m])) % prime
            if t_hash < 0: t_hash += prime
            
    return -1

# Example Usage
# print(rabin_karp("AABAACAADAABAABA", "AABA")) # Output: 0

Summary

Rabin-Karp teaches us that abstraction is a shortcut. By turning complex data into simple numbers, we can process the world at the speed of math. It reminds us that sometimes, the best way to understand a face is to look at its fingerprint.