Chapter 5: Concurrency Control (MVCC & Locks) — Magic of Parallel Universes
1. Definition
MVCC (Multi-Version Concurrency Control) is a concurrency control method used by databases to provide concurrent access to data.
The core philosophy is: Read/Write Decoupling. When a transaction is writing to a row, other reading transactions are not blocked. Instead, they read an older version (Snapshot) of the data. This allows each transaction to see the data as it existed in its own “parallel universe.”
2. Technical Depth: Undo Logs and Read Views
Taking MySQL InnoDB as an example:
- Undo Log: When a transaction modifies data, InnoDB does not overwrite the old record. Instead, it copies the old data into an Undo Log and links it to the new record via a pointer, forming a Version Chain.
- Read View: When a transaction starts, InnoDB generates a Read View—a snapshot of all “active” (uncommitted) transaction IDs at that moment.
- Visibility Logic: When reading, the database compares the version ID of a row against the Read View. If the ID belongs to an active transaction, the version is invisible, and the engine traverses the Undo Log chain to find the most recent committed version.
3. Visualizing the Invisible: Snapshot Reads
sequenceDiagram
participant TxA as Transaction A (Reader)
participant TxB as Transaction B (Writer)
participant DB as Row (ID=1)
participant Undo as Undo Log
TxB->>DB: 1. UPDATE ID=1 SET Age=30 (was 20)
Note over DB: Row Locked (X-Lock)<br/>Age = 30<br/>Undo Log Created: Age=20
TxA->>DB: 2. SELECT * FROM users WHERE ID=1
Note over DB: Lock Detected (TxB is active)
DB->>Undo: 3. TxA reads old version from Undo Log
Undo-->>TxA: 4. Returns Age=20 (Snapshot)
TxB->>DB: 5. COMMIT
Note over DB: Row Age=30 becomes permanent4. Real-World Case: GitHub’s Primary Key Contention (2018)
Background: GitHub was performing a major MySQL database migration. Phenomenon: The site experienced brief outages and write failures.
The Internal Cause: Auto-increment Contention & Next-Key Locks. While MVCC eliminates read-write conflicts, write-write conflicts still require locking.
- Auto-inc Lock: During a massive data ingest (
INSERT INTO ... SELECT), MySQL’s auto-increment lock became a serialized bottleneck under high concurrency. - Next-Key Lock: Under the Repeatable Read (RR) isolation level, uniqueness checks (e.g.,
REPLACE INTOorINSERT ON DUPLICATE) may trigger Next-Key Locks (locking index gaps) depending on the index type and execution plan. In highly concurrent conflict scenarios, this led to severe lock wait chains.
Lesson: MVCC is not a magic wand. In extreme high-concurrency write scenarios, understanding lock granularity (Record vs. Gap vs. Next-Key) is critical.
5. Detailed Defense & Optimization
A. Isolation Level Trade-offs
- Read Committed (RC): Generates a new Read View for every query. Suitable for high-concurrency apps where slight non-repeatable reads are acceptable.
- Repeatable Read (RR): Generates a Read View once at the start of the transaction. MySQL default; mitigates phantom reads in locking read scenarios (via Next-Key locks), though standard snapshot reads may still observe range anomalies depending on implementation.
B. Avoid Long Transactions
- Long-running transactions force the database to keep old Read Views active.
- This prevents the Purge thread from cleaning up Undo Logs, causing the History List Length to skyrocket. This not only consumes disk space but slows down all queries as they must traverse longer version chains.
C. Optimistic Locking
For non-critical sections, use application-level optimistic locking via version numbers to avoid database row locks entirely.
UPDATE products SET stock = stock - 1
WHERE id = 1 AND version = 5;6. References
- MySQL 8.0 Reference: InnoDB Multi-Versioning
- VLDB Journal: An Empirical Evaluation of In-Memory MVCC
- GitHub Engineering: MySQL High Availability
