How MySQL Prevents Most Phantom Reads

Overview

MySQL’s InnoDB storage engine uses a sophisticated combination of locking mechanisms and MVCC (Multi-Version Concurrency Control) to prevent phantom reads in the REPEATABLE READ isolation level. This makes MySQL’s implementation more restrictive than the SQL standard, effectively providing near-Serializable behavior while maintaining better performance.

Key Mechanisms

Next-Key Locking

Next-key locking is InnoDB’s primary mechanism for preventing phantom reads. It combines:

  • Record locks: Lock existing rows
  • Gap locks: Lock the spaces between index records

This combination ensures that no new rows can be inserted in the gaps where phantom reads could occur.

Gap Locking

Gap locks specifically target the empty spaces between index records:

  • Prevents INSERT operations in those gaps
  • Only applies to indexed columns
  • Can be disabled (though not recommended)

Consistent Nonlocking Reads (MVCC)

For regular SELECT statements, MySQL uses MVCC snapshots:

  • Each transaction sees a consistent view of data
  • No locking overhead for read operations
  • Phantom reads are prevented through snapshot isolation

Practical Demonstration

Setup: Creating Test Environment

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Create test table
CREATE TABLE employees (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50),
salary DECIMAL(10,2),
department VARCHAR(30),
INDEX idx_salary (salary),
INDEX idx_department (department)
);

-- Insert initial data
INSERT INTO employees (name, salary, department) VALUES
('Alice', 50000, 'Engineering'),
('Bob', 60000, 'Engineering'),
('Charlie', 55000, 'Marketing'),
('Diana', 70000, 'Engineering');

Scenario 1: Regular SELECT (MVCC Protection)

Session A (Transaction 1):

1
2
3
4
5
6
7
-- Start transaction with REPEATABLE READ
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;

-- First query
SELECT * FROM employees WHERE salary > 55000;
-- Results: Bob (60000), Diana (70000)

Session B (Transaction 2):

1
2
3
4
-- Insert new high-salary employee
INSERT INTO employees (name, salary, department)
VALUES ('Eve', 65000, 'Engineering');
COMMIT;

Back to Session A:

1
2
3
4
5
6
-- Repeat the same query
SELECT * FROM employees WHERE salary > 55000;
-- Results: Still Bob (60000), Diana (70000)
-- Eve is NOT visible - phantom read prevented!

COMMIT;

Scenario 2: SELECT FOR UPDATE (Next-Key Locking)

Session A (Transaction 1):

1
2
3
4
5
6
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;

-- Query with FOR UPDATE
SELECT * FROM employees WHERE salary BETWEEN 50000 AND 60000 FOR UPDATE;
-- This creates next-key locks on the range

Session B (Transaction 2):

1
2
3
4
-- Try to insert in the locked range
INSERT INTO employees (name, salary, department)
VALUES ('Frank', 55000, 'Sales');
-- This will BLOCK until Transaction 1 commits

Session A continues:

1
2
3
4
5
6
-- Repeat the query
SELECT * FROM employees WHERE salary BETWEEN 50000 AND 60000 FOR UPDATE;
-- Results remain consistent

COMMIT;
-- Now Session B's INSERT will proceed

Scenario 3: Gap Locking Visualization

1
2
3
4
5
6
7
8
9
-- Current salary values: 50000, 55000, 60000, 70000
-- Gap locks are placed between these values:

-- Gaps protected by next-key locks:
-- (-∞, 50000)
-- (50000, 55000)
-- (55000, 60000)
-- (60000, 70000)
-- (70000, +∞)

Types of Locks Used

Record Locks

1
2
3
-- Locks specific existing rows
SELECT * FROM employees WHERE id = 1 FOR UPDATE;
-- Locks only the row with id = 1

Gap Locks

1
2
3
-- Locks gaps between index values
SELECT * FROM employees WHERE salary > 55000 FOR UPDATE;
-- Locks gaps: (55000, 60000), (60000, 70000), (70000, +∞)

Next-Key Locks

1
2
3
-- Combination of record + gap locks
SELECT * FROM employees WHERE salary >= 55000 FOR UPDATE;
-- Locks: record(55000) + gap(55000, 60000) + record(60000) + gap(60000, 70000) + etc.

Important Limitations and Caveats

Index Dependency

Gap locking only works effectively with indexed columns:

1
2
3
4
5
-- This uses gap locking (salary is indexed)
SELECT * FROM employees WHERE salary > 50000 FOR UPDATE;

-- This may not prevent phantoms effectively (name is not indexed)
SELECT * FROM employees WHERE name LIKE 'A%' FOR UPDATE;

Disabling Gap Locks

Gap locking can be disabled, which reintroduces phantom read risks:

1
2
3
4
-- Disable gap locking (NOT recommended)
SET SESSION innodb_locks_unsafe_for_binlog = 1;
-- or
SET SESSION transaction_isolation = 'READ-COMMITTED';

Different Behavior by Query Type

Query Type Locking Mechanism Phantom Prevention
SELECT MVCC snapshot ✅ Yes
SELECT FOR UPDATE Next-key locks ✅ Yes
SELECT FOR SHARE Next-key locks ✅ Yes
UPDATE Next-key locks ✅ Yes
DELETE Next-key locks ✅ Yes

4. Edge Cases Where Phantoms Can Still Occur

1
2
3
4
5
6
7
8
9
10
-- Case 1: Non-indexed column queries
SELECT * FROM employees WHERE name LIKE 'Z%' FOR UPDATE;
-- May not prevent phantoms effectively

-- Case 2: After updating a row in the same transaction
START TRANSACTION;
SELECT * FROM employees WHERE salary > 50000;
UPDATE employees SET salary = 55000 WHERE id = 1;
SELECT * FROM employees WHERE salary > 50000;
-- Second SELECT might see changes from other committed transactions

Best Practices

Use Indexed Columns for Range Queries

1
2
3
4
5
-- Good: Uses index for gap locking
SELECT * FROM employees WHERE salary BETWEEN 50000 AND 70000 FOR UPDATE;

-- Less effective: No index on name
SELECT * FROM employees WHERE name BETWEEN 'A' AND 'M' FOR UPDATE;

Understand Your Query Patterns

1
2
3
4
5
-- For read-only queries, regular SELECT is sufficient
SELECT COUNT(*) FROM employees WHERE department = 'Engineering';

-- For queries that need to prevent concurrent inserts
SELECT * FROM employees WHERE department = 'Engineering' FOR UPDATE;

Monitor Lock Contention

For MySQL 8.0+:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Check current locks
SELECT * FROM performance_schema.data_locks;

-- Check lock waits
SELECT * FROM performance_schema.data_lock_waits;

-- More detailed lock information
SELECT
dl.OBJECT_SCHEMA,
dl.OBJECT_NAME,
dl.LOCK_TYPE,
dl.LOCK_MODE,
dl.LOCK_STATUS,
dl.LOCK_DATA
FROM performance_schema.data_locks dl;

-- Check which transactions are waiting
SELECT
dlw.REQUESTING_ENGINE_TRANSACTION_ID as waiting_trx,
dlw.BLOCKING_ENGINE_TRANSACTION_ID as blocking_trx,
dl.LOCK_MODE as waiting_lock_mode,
dl.LOCK_TYPE as waiting_lock_type,
dl.OBJECT_NAME as table_name
FROM performance_schema.data_lock_waits dlw
JOIN performance_schema.data_locks dl
ON dlw.REQUESTING_ENGINE_LOCK_ID = dl.ENGINE_LOCK_ID;

For MySQL 5.7 and earlier:

1
2
3
4
5
-- Check current locks (deprecated in 8.0)
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCKS;

-- Check lock waits (deprecated in 8.0)
SELECT * FROM INFORMATION_SCHEMA.INNODB_LOCK_WAITS;

Performance Considerations

Advantages

  • Prevents phantom reads without full table locking
  • MVCC provides excellent read performance
  • Better concurrency than Serializable isolation

Trade-offs

  • Gap locks can increase lock contention
  • More complex lock management overhead
  • Potential for deadlocks in high-concurrency scenarios

Conclusion

MySQL InnoDB’s approach to preventing phantom reads is highly effective, combining:

  • MVCC snapshots for regular SELECT operations
  • Next-key locking for locking reads and modifications
  • Gap locking to prevent insertions in critical ranges

This makes MySQL’s REPEATABLE READ isolation level more restrictive than the SQL standard, effectively preventing most phantom read scenarios while maintaining good performance characteristics. However, understanding the limitations and edge cases is crucial for designing robust database applications.

Testing Your Understanding

Try these scenarios in your own MySQL environment:

  1. Test MVCC behavior: Use two sessions with regular SELECT statements
  2. Test gap locking: Use SELECT FOR UPDATE with range queries
  3. Test limitations: Try queries on non-indexed columns
  4. Observe lock contention: Monitor INFORMATION_SCHEMA.INNODB_LOCKS during concurrent operations

Understanding these mechanisms will help you design more robust database applications and troubleshoot concurrency issues effectively.