Luke a Pro

Luke Sun

Developer & Marketer

🇺🇦
EN||

Cuckoo Filter

| , 3 minutes reading.

Cuckoo Filter

Introduction: Bloom Filter’s Missing Feature

Bloom Filters (Chapter 4.4) are great, but they have a fatal flaw: You cannot delete items. If you try to unset a bit, you might accidentally delete 10 other items that shared that bit.

Cuckoo Filter is a modern probabilistic data structure that solves this. It supports deletion, uses less space for low false-positive rates, and offers better lookup performance by utilizing a technique called “Cuckoo Hashing.”

What Problem does it solve?

  • Input: An element.
  • Output: False (Definitely not present) or True (Probably present).
  • The Promise: Membership testing with deletion support and higher space efficiency for high-precision requirements.

Core Idea (Intuition)

Think of a Cuckoo Bird. It kicks other birds out of their nests to lay its own eggs.

  1. Buckets: Instead of a bit array, we have an array of buckets, each holding a few “Fingerprints” (small hashes of the items).
  2. Two Nests: Every item has two potential bucket locations.
  3. The Kick Out: If both potential buckets are full, the new item kicks out an existing resident. The kicked-out resident then flies to its own alternative bucket, potentially kicking out someone else. This “cuckoo” behavior continues until everyone finds a home.

Why use it over Bloom Filter?

  • ✅ Deletion: You can safely remove an item by deleting its fingerprint from its bucket.
  • ✅ Performance: It only requires checking 2 buckets, which is very cache-friendly compared to Bloom Filter’s KK random bit lookups.
  • ✅ Space: It is more space-efficient than Bloom Filter when the target false-positive rate is very low (e.g., < 3%).

Typical Business Scenarios

  • ✅ Dynamic Blacklists: A firewall that needs to block IPs temporarily and then unblock them (Deletion support is key).

  • ✅ Inventory Management: Checking if a specific SKU is in a massive warehouse cache where items are frequently added and removed.

  • ✅ Deduplication in Streams: Identifying if we have seen this specific transaction ID in the last hour.

  • ❌ Full Capacity: If the filter gets too full, the “kicking out” process can enter an infinite loop or fail. You must keep the load factor below 95%\approx 95\%.

Performance & Complexity

  • Time: O(1)O(1) for lookup, insertion, and deletion.
  • Space: Slightly better than Bloom Filter for high precision; supports deletion.

Summary

“Cuckoo Filter is the ‘Smart Bloom Filter’. By storing fingerprints and allowing items to move between nests, it provides the one thing Bloom Filters can’t: a ‘Delete’ button.”