Luke a Pro

Luke Sun

Developer & Marketer

đŸ‡ș🇩

Search & Retrieval: Overview

| , 2 minutes reading.

Search & Retrieval: The Art of Not Looking

The Great Library Problem

Imagine you enter the Library of Babel, which contains every book ever written. If you want to find a single sentence, you have two choices:

  1. The Brute Force: Read every book one by one. You will die before you finish the first shelf.
  2. The Index: Look at a pre-computed catalog that tells you exactly which book and which page contains that sentence.

In software engineering, Searching is almost never about “looking at data.” It is about building structures that allow you to skip 99.99% of the data.

The Evolution of Finding

Search technology has evolved through four distinct “Souls”:

StrategyThe Soul / MetaphorRepresentativeBest For

IndexingThe Library Catalog
Mapping keywords to locations before the search starts.
Inverted IndexFull-text Search
(Elasticsearch, Lucene)
PrefixingThe Predictive Typist
Finding words by their shared beginnings.
Trie / Radix TreeAutosuggest / Routing
(Search bars, IP routing)
MatchingThe Pattern Recognizer
Finding a specific needle inside a specific haystack.
KMP / Boyer-MooreLog Analysis / Bioinformatics
(Grep, DNA sequencing)
SimilarityThe Mind Reader
Finding things that “mean” the same thing, even if they look different.
Vector Search / LSHRecommendation / AI
(Pinterest, ChatGPT, Spotify)

The algorithm you choose depends entirely on the Scale and the Tolerance for Error:

  • Small & Exact: Use a Hash Map or a Trie.
  • Large & Textual: Use an Inverted Index.
  • Massive & Fuzzy: Use LSH or Vector Search (Approximate Nearest Neighbors).

Engineering Mindset: Pre-computation is Freedom

The fundamental law of search is: You pay for your search speed during the write phase. If you want to find things in O(1)O(1) or O(log⁡N)O(\log N), you must spend O(N)O(N) time and significant disk space building an index when the data arrives. Search is the ultimate “Trade-off” between storage and latency.

Summary

In this section, we will move from the ancient wisdom of string matching to the futuristic world of vector embeddings. We will learn that finding a needle isn’t about having better eyes—it’s about making the haystack so organized that the needle has nowhere to hide.

Let’s start with the foundation of the modern web: the Inverted Index.