LinkedIn Hook

"Your monthly report ran twice and produced two different totals — same query, same database, no writes in between. The auditor is asking questions."

Most developers learn about isolation levels by reading a table on Wikipedia, memorizing the words "dirty read, non-repeatable read, phantom read," and then never thinking about them again until production breaks. Phantom reads are the sneakiest of the three because they do not corrupt a row you already saw — they conjure a row you never asked for, in the middle of a transaction that thought it owned the data.

READ COMMITTED protects you from dirty reads but lets non-repeatable reads through. REPEATABLE READ stops non-repeatable reads but, in the SQL standard, still allows phantoms. PostgreSQL's REPEATABLE READ goes further — it uses snapshot isolation and prevents phantoms entirely. MySQL's InnoDB takes a different route, using gap locks to block other writers from inserting into a range you scanned. Same isolation level name, two completely different mechanisms.

If you do not know which path your database takes, you cannot reason about your own transactions — and "it works on my machine" becomes "it crashed on production at month-end."

In Lesson 11.5, I break down phantom reads, REPEATABLE READ semantics in PostgreSQL and MySQL, and how snapshot isolation differs from gap locking.

Read the full lesson -> [link]

#PostgreSQL #Databases #Transactions #IsolationLevels #InterviewPrep #BackendDevelopment

Phantom Read and REPEATABLE READ thumbnail

What You'll Learn

The precise definition of a phantom read and how it differs from a non-repeatable read
What the SQL standard says about REPEATABLE READ and why the standard allows phantoms
How PostgreSQL implements REPEATABLE READ using snapshot isolation, and why phantoms cannot occur
How MySQL InnoDB uses next-key locks (record + gap locks) to block phantoms at REPEATABLE READ
Two runnable PostgreSQL two-session examples that demonstrate snapshot behavior
When to reach for REPEATABLE READ vs READ COMMITTED vs SERIALIZABLE
Common mistakes around assuming phantoms cannot happen because the standard says so

The Library Card Catalog Analogy — Why Phantoms Appear

Imagine you walk into an old library and ask the librarian for every book about Roman history published before 1950. The librarian opens the card catalog drawer, flips through cards, and reads you a list of seventeen titles. You write them down. A minute later, you ask the librarian to re-count the books on that list to confirm — same drawer, same query — and she says "seventeen, confirmed."

Now imagine that while you were writing down the list, another patron quietly slipped a brand-new card for a book published in 1948 (just rediscovered in storage) into the drawer. You ask the librarian a third time, "How many books about Roman history before 1950?" and she answers "eighteen." You did not change your question. The books you already wrote down are still there, unchanged. But a new one appeared inside the range you asked about — a row that did not exist in your view a moment ago.

That is a phantom read. Notice what it is not: nobody modified the seventeen original entries. None of the rows you already saw changed. A new row appeared inside the predicate published_before_1950 = true, and your second scan returned a different count than your first.

Now imagine the librarian had taken a photograph of the entire drawer the moment you walked in, and answered every one of your questions from the photograph instead of the live drawer. No matter how many cards another patron slips in while you are working, your photograph still shows seventeen. Your transaction sees a frozen world. That is snapshot isolation — PostgreSQL's strategy for REPEATABLE READ. Alternatively, imagine the librarian locked the drawer the moment you started reading, refusing to let any patron insert anything into the "before 1950" section until you were done. That is gap locking — MySQL InnoDB's strategy. Same end result (no phantoms), wildly different mechanism, different consequences for concurrency.

+---------------------------------------------------------------+
|           NON-REPEATABLE READ vs PHANTOM READ                 |
+---------------------------------------------------------------+
|                                                                |
|  NON-REPEATABLE READ (single row changed):                     |
|    T1: SELECT balance WHERE id=7  -> 100                       |
|    T2: UPDATE accounts SET balance=200 WHERE id=7; COMMIT      |
|    T1: SELECT balance WHERE id=7  -> 200  (same row, new val)  |
|                                                                |
|  PHANTOM READ (new row appeared in range):                     |
|    T1: SELECT COUNT(*) WHERE amount > 1000  -> 5               |
|    T2: INSERT INTO orders (amount) VALUES (5000); COMMIT       |
|    T1: SELECT COUNT(*) WHERE amount > 1000  -> 6  (NEW row!)   |
|                                                                |
|  KEY DIFFERENCE:                                               |
|   - Non-repeatable: existing row mutated underneath you        |
|   - Phantom:        new row appeared inside your predicate     |
|                                                                |
+---------------------------------------------------------------+

What the SQL Standard Actually Says

The ANSI/ISO SQL standard defines four isolation levels by listing which anomalies they forbid:

+----------------------+----------+--------------+----------+
| Isolation Level      | Dirty    | Non-Repeat   | Phantom  |
+----------------------+----------+--------------+----------+
| READ UNCOMMITTED     | possible | possible     | possible |
| READ COMMITTED       | NO       | possible     | possible |
| REPEATABLE READ      | NO       | NO           | possible |
| SERIALIZABLE         | NO       | NO           | NO       |
+----------------------+----------+--------------+----------+

By the strict standard, REPEATABLE READ guarantees that any row you read at the start of the transaction will read the same value if you read it again — but it does not guarantee that the set of rows matching a predicate will stay the same. A new row inserted by another committed transaction can appear in your second scan. That is a phantom.

The reason the standard permits phantoms at this level is historical: locking implementations of REPEATABLE READ would lock the rows you actually read but not the gaps between them, so other transactions remained free to insert new rows into those gaps. Forbidding phantoms required either a more aggressive locking strategy (next-key locks) or a totally different approach (snapshot isolation). The standard left the door open for cheap implementations.

In practice, every modern database that calls itself REPEATABLE READ goes beyond the strict standard. PostgreSQL forbids phantoms via snapshots. MySQL InnoDB forbids phantoms via gap locks. Oracle has no REPEATABLE READ at all (it offers READ COMMITTED and SERIALIZABLE, where SERIALIZABLE is implemented as snapshot isolation similar to Postgres REPEATABLE READ). SQL Server's REPEATABLE READ uses range locks similar to InnoDB. Knowing the standard is not enough — you have to know your database.

PostgreSQL REPEATABLE READ — Snapshot Isolation

When a PostgreSQL transaction runs at REPEATABLE READ, the database takes a snapshot the moment the transaction's first non-trivial statement executes. Every read inside that transaction sees the database exactly as it looked at the snapshot moment. Other transactions can insert, update, and delete freely; their committed changes are simply invisible to your transaction. There is no row locking, no gap locking — just a versioned view of the data backed by PostgreSQL's MVCC (Multi-Version Concurrency Control) engine.

Because the snapshot is frozen for the entire transaction, phantom reads cannot occur in PostgreSQL REPEATABLE READ. A new row inserted and committed by another transaction has a transaction ID greater than your snapshot's, so MVCC filters it out of every scan. You will see the same rows on the first SELECT, the second SELECT, and the thousandth SELECT.

Example 1 — A phantom that does NOT happen at REPEATABLE READ

Run this in two psql sessions side by side. The comments show the order of operations.

-- Setup (run once in any session)
CREATE TABLE orders (
  id     serial PRIMARY KEY,
  amount numeric NOT NULL
);

INSERT INTO orders (amount) VALUES (100), (500), (2000), (3500), (4000);
-- Five rows total. Three of them (2000, 3500, 4000) have amount > 1000.

-- SESSION A
BEGIN ISOLATION LEVEL REPEATABLE READ;

-- First scan: snapshot is taken right here, on this statement.
SELECT COUNT(*) AS big_orders FROM orders WHERE amount > 1000;
--  big_orders
-- ------------
--           3

-- SESSION B (run AFTER Session A's first SELECT)
BEGIN;
INSERT INTO orders (amount) VALUES (9999);
COMMIT;
-- The new row is now visible to any new transaction.

-- SESSION A (continue, still inside the same transaction)
SELECT COUNT(*) AS big_orders FROM orders WHERE amount > 1000;
--  big_orders
-- ------------
--           3
-- The 9999 row is INVISIBLE because it was inserted after our snapshot.
-- No phantom read, even though Session B committed an inserting transaction.

SELECT amount FROM orders WHERE amount > 1000 ORDER BY amount;
--  amount
-- --------
--    2000
--    3500
--    4000
-- Same three rows we saw the first time. The snapshot is rock solid.

COMMIT;

-- SESSION A (after COMMIT, start a NEW transaction)
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT COUNT(*) FROM orders WHERE amount > 1000;
--  count
-- -------
--      4
-- Now we see 4, because the new transaction took a fresh snapshot
-- AFTER Session B's commit. The snapshot belongs to the transaction,
-- not to the connection.
COMMIT;

+---------------------------------------------------------------+
|     POSTGRES REPEATABLE READ TIMELINE                         |
+---------------------------------------------------------------+
|                                                                |
|  Time   Session A                  Session B                   |
|  ----   -----------------------    -----------------------     |
|   t0    BEGIN REPEATABLE READ                                  |
|   t1    SELECT  -> count=3                                     |
|         (snapshot frozen here)                                 |
|   t2                               INSERT amount=9999          |
|   t3                               COMMIT                      |
|   t4    SELECT  -> count=3                                     |
|         (9999 invisible -> MVCC)                               |
|   t5    COMMIT                                                 |
|   t6    BEGIN REPEATABLE READ                                  |
|   t7    SELECT  -> count=4                                     |
|         (new snapshot includes 9999)                           |
|                                                                |
+---------------------------------------------------------------+

The snapshot is taken on the first statement of the transaction, not on the BEGIN. That is a subtle but important detail — if you BEGIN, then sit idle for a minute, then run your first SELECT, the snapshot reflects the state of the database at the moment of that SELECT, not at BEGIN.

Example 2 — Write Skew is still possible at REPEATABLE READ

PostgreSQL's snapshot isolation prevents phantom reads in the classical sense, but it does not prevent every concurrency anomaly. The famous remaining hole is write skew: two transactions read overlapping data, make decisions based on what they read, and write disjoint rows that together violate an invariant.

-- Setup: two on-call doctors, one of whom must always be on duty.
CREATE TABLE doctors (
  id      serial PRIMARY KEY,
  name    text NOT NULL,
  on_call boolean NOT NULL
);

INSERT INTO doctors (name, on_call) VALUES
  ('Alice', true),
  ('Bob',   true);
-- Invariant: at least one doctor must have on_call=true at all times.

-- SESSION A (Alice wants to go off-call)
BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT COUNT(*) FROM doctors WHERE on_call = true;
--  count
-- -------
--      2
-- Two doctors are on call, so it is safe for Alice to leave.

-- SESSION B (Bob wants to go off-call, in parallel)
BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT COUNT(*) FROM doctors WHERE on_call = true;
--  count
-- -------
--      2
-- Bob also sees two doctors on call (his snapshot was taken
-- BEFORE Session A wrote anything). Safe to leave, he thinks.

-- SESSION A
UPDATE doctors SET on_call = false WHERE name = 'Alice';
COMMIT;

-- SESSION B
UPDATE doctors SET on_call = false WHERE name = 'Bob';
COMMIT;
-- Both commits succeed. Each transaction wrote a DIFFERENT row,
-- so no write conflict was detected. Final state:
SELECT name, on_call FROM doctors;
--  name  | on_call
-- -------+---------
--  Alice | false
--  Bob   | false
-- INVARIANT VIOLATED: nobody is on call.

This is write skew, and it is the reason SERIALIZABLE exists. PostgreSQL's REPEATABLE READ stops phantom reads but cannot detect that two transactions read the same predicate and made decisions that conflict at the predicate level. To prevent write skew, escalate to SERIALIZABLE, which uses Serializable Snapshot Isolation (SSI) to detect dangerous read-write patterns and abort one of the offending transactions. We cover this in Lesson 11.6.

MySQL InnoDB REPEATABLE READ — Gap Locks (Contrast)

MySQL InnoDB takes a completely different path to the same goal. Its REPEATABLE READ is also the default isolation level, but it relies on next-key locks — a combination of a row lock on a matching index entry plus a gap lock on the open range immediately before that entry. When a transaction runs SELECT ... FOR UPDATE WHERE amount > 1000, InnoDB locks every matching row and every gap in the index where a new matching row could be inserted. Other sessions that try to insert into those gaps block until the first transaction commits or rolls back.

Plain SELECT (without FOR UPDATE) in InnoDB at REPEATABLE READ is a consistent read served from an MVCC snapshot, very much like Postgres. The phantom protection kicks in primarily for locking reads (SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE) and for the underlying scans that DML statements perform.

+---------------------------------------------------------------+
|     POSTGRES vs MYSQL  (REPEATABLE READ STRATEGY)             |
+---------------------------------------------------------------+
|                                                                |
|  POSTGRESQL                       MYSQL InnoDB                 |
|  ----------------------------     -------------------------    |
|  MVCC snapshot per txn            MVCC snapshot for plain      |
|                                   SELECT, gap locks for        |
|                                   locking reads and DML        |
|                                                                |
|  No reader blocks any writer      Writers block on gap locks   |
|  No writer blocks any reader      Inserts blocked across gaps  |
|                                                                |
|  Phantoms: prevented by           Phantoms: prevented by       |
|    invisibility (snapshot)          blocking inserts (locks)   |
|                                                                |
|  Cost: extra storage for old      Cost: more contention,       |
|    row versions; vacuum            potential deadlocks on      |
|    must clean them up              gap locks                   |
|                                                                |
|  Write skew: still possible       Write skew: still possible   |
|    -> use SERIALIZABLE             -> use SERIALIZABLE         |
|                                                                |
+---------------------------------------------------------------+

The practical implication: in PostgreSQL, REPEATABLE READ is essentially "free" at read time (no extra blocking compared to READ COMMITTED), and the cost is paid in storage and vacuum overhead. In MySQL, REPEATABLE READ can introduce surprising blocking when one session does a ranged locking read and another tries to insert into that range — a behavior that bites teams migrating between the two databases.

Example 3 — Repeatable Aggregates with REPEATABLE READ in Postgres

A common reason to choose REPEATABLE READ is running a multi-statement report where every statement must observe the same database state. Without it, a long report can produce internally inconsistent numbers because rows shift between statements.

-- Setup
CREATE TABLE invoices (
  id         serial PRIMARY KEY,
  customer   text NOT NULL,
  amount     numeric NOT NULL,
  status     text NOT NULL  -- 'paid' or 'unpaid'
);

INSERT INTO invoices (customer, amount, status) VALUES
  ('Alice',  100, 'paid'),
  ('Alice',  200, 'unpaid'),
  ('Bob',    300, 'paid'),
  ('Bob',    400, 'unpaid'),
  ('Carol',  500, 'paid');

-- SESSION A: end-of-day reconciliation report
BEGIN ISOLATION LEVEL REPEATABLE READ;

-- Total revenue across all invoices
SELECT SUM(amount) AS total_billed FROM invoices;
--  total_billed
-- --------------
--          1500

-- How much of that is collected?
SELECT SUM(amount) AS collected FROM invoices WHERE status = 'paid';
--  collected
-- -----------
--        900

-- Outstanding receivables
SELECT SUM(amount) AS outstanding FROM invoices WHERE status = 'unpaid';
--  outstanding
-- -------------
--          600

-- SESSION B (interleaved between Session A's queries)
BEGIN;
INSERT INTO invoices (customer, amount, status) VALUES ('Dave', 9999, 'unpaid');
UPDATE invoices SET status = 'paid' WHERE id = 2;  -- Alice's 200 is now paid
COMMIT;

-- SESSION A continues — same transaction, same snapshot
SELECT SUM(amount) AS total_billed FROM invoices;
--  total_billed
-- --------------
--          1500
-- Still 1500. The new 9999 row is invisible.

SELECT SUM(amount) AS collected FROM invoices WHERE status = 'paid';
--  collected
-- -----------
--        900
-- Still 900. The status update to row id=2 is invisible.

-- Sanity check: do the parts add up to the whole?
-- 900 (collected) + 600 (outstanding) = 1500 (total billed). YES.
COMMIT;

If Session A had run at READ COMMITTED instead, the second total_billed query might have returned 11499 (including Dave's new invoice) while collected returned 1100 (because Alice's 200 was now paid) and outstanding returned 10399 — and the auditor would ask why 1100 + 10399 != 11499. REPEATABLE READ gives you the report-level invariant: all queries in this transaction see exactly the same world.

Example 4 — Range Predicate Stability

This example shows that even predicates that match a range of values (not just a count) are stable inside a Postgres REPEATABLE READ transaction.

-- Setup
CREATE TABLE temperature_readings (
  id        serial PRIMARY KEY,
  sensor    text NOT NULL,
  reading_c numeric NOT NULL,
  taken_at  timestamptz NOT NULL DEFAULT now()
);

INSERT INTO temperature_readings (sensor, reading_c) VALUES
  ('s1', 22.0),
  ('s2', 23.5),
  ('s3', 21.0),
  ('s4', 24.5);

-- SESSION A: monitoring loop checking for out-of-range readings
BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT id, sensor, reading_c
  FROM temperature_readings
 WHERE reading_c BETWEEN 21 AND 24
 ORDER BY id;
--  id | sensor | reading_c
-- ----+--------+-----------
--   1 | s1     |      22.0
--   2 | s2     |      23.5
--   3 | s3     |      21.0
-- (s4 = 24.5 is outside the range)

-- SESSION B (runs while Session A is mid-transaction)
BEGIN;
INSERT INTO temperature_readings (sensor, reading_c) VALUES ('s5', 22.7);
UPDATE temperature_readings SET reading_c = 22.0 WHERE sensor = 's4';
COMMIT;

-- SESSION A continues
SELECT id, sensor, reading_c
  FROM temperature_readings
 WHERE reading_c BETWEEN 21 AND 24
 ORDER BY id;
--  id | sensor | reading_c
-- ----+--------+-----------
--   1 | s1     |      22.0
--   2 | s2     |      23.5
--   3 | s3     |      21.0
-- The new s5 row (22.7) is invisible. The s4 update (24.5 -> 22.0)
-- is invisible. The result is exactly the same as the first query.
COMMIT;

+---------------------------------------------------------------+
|     SNAPSHOT STABILITY ACROSS A REPEATABLE READ TXN           |
+---------------------------------------------------------------+
|                                                                |
|  Snapshot taken                            Transaction         |
|  at first query                            commits             |
|       v                                       v                |
|       +---------------------------------------+                |
|       |                                       |                |
|  ---->| query  query  query  query  query     |---->           |
|       |   |      |      |      |      |       |                |
|       +---+------+------+------+------+-------+                |
|           |      |      |      |      |                       |
|        Same   Same    Same   Same   Same                      |
|        view   view    view   view   view                      |
|                                                                |
|  Other sessions inserting/updating/deleting:                   |
|  -> their changes are invisible to this transaction            |
|  -> they do NOT block, they are NOT blocked                    |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Two-panel comparison: LEFT panel labeled 'PostgreSQL REPEATABLE READ' shows a transparent cyan (#4fc3f7) glass dome over a database, with arrows of new rows from outside bouncing off harmlessly. Inside the dome, a frozen snapshot of three rows. RIGHT panel labeled 'MySQL InnoDB REPEATABLE READ' shows a database with pink (#ff5c8a) padlocks placed in the gaps between indexed rows, blocking incoming INSERT arrows. White monospace labels: 'snapshot isolation' (left) and 'gap locks' (right). Bottom caption in white: 'Same name. Different machinery. Same outcome: no phantoms.' Subtle dotted grid background."

Common Mistakes

1. Assuming REPEATABLE READ allows phantoms because the SQL standard says so. The standard permits phantoms at REPEATABLE READ as a historical concession to lock-based implementations. Modern databases all go beyond the standard at this level: PostgreSQL via snapshots, MySQL InnoDB via next-key locks, SQL Server via range locks. Quoting the standard in code review is technically correct and practically wrong. Always verify the actual behavior of your database — and be aware that if you ever migrate, the implementation can change underneath you.

2. Assuming REPEATABLE READ also prevents write skew. Snapshot isolation prevents phantom reads but cannot detect that two concurrent transactions read overlapping predicates and made conflicting decisions on disjoint rows. The on-call doctor example is the canonical illustration: each transaction sees two on-call doctors, each takes one offline, both commit, and the invariant is gone. If you need predicate-level safety, escalate to SERIALIZABLE — that is exactly what it exists for.

3. Thinking the snapshot is taken on BEGIN. PostgreSQL takes the snapshot on the first non-trivial statement of the transaction, not on BEGIN. If you BEGIN, then idle for a minute, then run your first SELECT, the snapshot reflects the state of the database at the moment of that SELECT. This matters for long-running batch jobs that open a transaction early as a "safety net" — the snapshot is younger than they expect.

4. Mixing locking reads and snapshot reads in MySQL InnoDB and being surprised by gap locks. In MySQL, plain SELECT at REPEATABLE READ uses a snapshot, but SELECT ... FOR UPDATE and UPDATE/DELETE scans take next-key locks. A team migrating from PostgreSQL to MySQL often writes the same SELECT ... FOR UPDATE WHERE status = 'pending' query and is shocked when concurrent inserts of new pending rows start blocking. The phantom protection is real but the cost is contention.

5. Picking REPEATABLE READ "for safety" when READ COMMITTED would do. REPEATABLE READ is the right choice when one transaction issues multiple queries that must agree with each other — reports, reconciliations, exports. For ordinary OLTP work where each transaction reads a row, decides, and writes it back, READ COMMITTED is usually fine and avoids the increased serialization-failure rate of REPEATABLE READ. Defaulting everything to REPEATABLE READ "just in case" is cargo-culting and increases retry traffic.

Interview Questions

1. "Define a phantom read precisely. How is it different from a non-repeatable read?"

A phantom read happens when a transaction runs the same predicate query twice and the second execution returns a different set of rows because another transaction committed an INSERT (or DELETE) that changed which rows match the predicate. The rows you already saw are unchanged — the anomaly is that new rows appeared inside (or existing rows disappeared from) the range you scanned. A non-repeatable read, by contrast, is when an existing row that you already read has its column values mutated by another committed transaction, so reading the same row by primary key returns a different value the second time. The two anomalies operate at different granularities: non-repeatable read is a row-level anomaly, phantom is a predicate-level anomaly. READ COMMITTED prevents neither, REPEATABLE READ in the strict standard prevents only non-repeatable reads, and SERIALIZABLE prevents both. In practice, PostgreSQL and MySQL InnoDB both prevent phantoms at REPEATABLE READ via different mechanisms.

2. "How does PostgreSQL's REPEATABLE READ prevent phantoms? Does it use locking?"

PostgreSQL uses snapshot isolation, not locking. When a REPEATABLE READ transaction runs its first non-trivial statement, the database records a snapshot of the transaction system — essentially a list of which transaction IDs were committed at that moment. Every subsequent read in that transaction filters rows through MVCC: a row version is visible only if it was committed by a transaction ID present in the snapshot. New rows inserted and committed by other transactions after the snapshot have higher transaction IDs and are filtered out. Updated rows produce new versions that are also invisible; the transaction continues to see the old versions. This means readers do not block writers and writers do not block readers, and phantoms cannot occur because no row inserted after the snapshot is ever visible to the transaction. The cost is paid in storage (old row versions persist) and in vacuum work to reclaim that space later. It also means PostgreSQL never takes range or gap locks for plain reads, which is a major contrast with MySQL InnoDB.

3. "MySQL InnoDB also calls its default isolation level REPEATABLE READ. Does it work the same way as PostgreSQL?"

Same name, different mechanism. MySQL InnoDB uses MVCC snapshots for plain consistent reads (SELECT without locking clauses), so a series of plain SELECTs inside a transaction will see a stable view, similar to Postgres. But for locking reads (SELECT ... FOR UPDATE, SELECT ... LOCK IN SHARE MODE) and for the scans performed by UPDATE/DELETE statements, InnoDB takes next-key locks: a record lock on each matching index entry plus a gap lock on the open interval between index entries. This blocks other sessions from inserting new rows that would fall inside the locked range, which is how InnoDB prevents phantoms at REPEATABLE READ. The practical consequences differ from PostgreSQL: InnoDB writers can block other writers waiting on gap locks, deadlocks on gap locks are common in high-concurrency workloads, and migrating a query like SELECT ... FOR UPDATE WHERE status = 'pending' between the two databases can have very different concurrency characteristics. Knowing both mechanisms is essential when you reason about contention.

4. "Can you still hit concurrency anomalies at PostgreSQL REPEATABLE READ? Give an example."

Yes — write skew is the famous one. Two transactions read overlapping data, make decisions based on what they read, and write to disjoint rows. Because each transaction writes a different row, there is no write-write conflict for PostgreSQL to detect, both commits succeed, and the combined effect violates an invariant that each transaction individually believed it was preserving. The classic example is two on-call doctors checking that at least one doctor must be on duty: each transaction reads "two doctors are on call," each removes a different doctor, both commit, and now nobody is on call. Snapshot isolation cannot prevent this because the conflict is at the predicate level, not at any specific row. The fix is SERIALIZABLE, which in PostgreSQL is implemented as Serializable Snapshot Isolation (SSI) and tracks read-write dependencies between concurrent transactions. When SSI detects a dangerous pattern that could lead to write skew, it aborts one transaction with a serialization failure, and the application retries.

5. "When would you actually choose REPEATABLE READ in production? When would you choose READ COMMITTED instead?"

Choose REPEATABLE READ when a single transaction issues multiple queries that must observe a consistent snapshot of the database — typically multi-query reports, reconciliations, exports, and any analytical workload where the parts must add up to the whole. The classic case is an end-of-day finance report computing total billed, total collected, and total outstanding: at READ COMMITTED, concurrent commits between queries produce numbers that do not reconcile, and finance loses faith in the report. REPEATABLE READ guarantees that all queries see the same snapshot. Choose READ COMMITTED for ordinary OLTP work where each transaction does a small, local read-modify-write on specific rows. READ COMMITTED has the lowest serialization-failure rate because each statement sees the latest committed state, transactions rarely conflict, and retries are uncommon. Defaulting OLTP to REPEATABLE READ leads to higher retry traffic with no real safety benefit; defaulting reports to READ COMMITTED leads to inconsistent numbers and angry stakeholders. Match the isolation level to the shape of the work.

Quick Reference — Cheat Sheet

+---------------------------------------------------------------+
|           PHANTOM READ AND REPEATABLE READ                    |
+---------------------------------------------------------------+
|                                                                |
|  PHANTOM READ DEFINITION:                                      |
|    Same predicate, two scans, different row sets               |
|    Caused by another transaction's INSERT/DELETE               |
|                                                                |
|  vs NON-REPEATABLE READ:                                       |
|    Non-repeatable: same row, different value                   |
|    Phantom:        same query, different row count             |
|                                                                |
|  STANDARD ISOLATION LEVELS (anomalies allowed):                |
|    READ UNCOMMITTED  -> dirty + non-rep + phantom              |
|    READ COMMITTED    -> non-rep + phantom                      |
|    REPEATABLE READ   -> phantom (per standard)                 |
|    SERIALIZABLE      -> none                                   |
|                                                                |
|  POSTGRESQL REPEATABLE READ:                                   |
|    Snapshot isolation via MVCC                                 |
|    Phantoms: PREVENTED (better than standard)                  |
|    Write skew: still possible -> use SERIALIZABLE              |
|    Snapshot taken on first statement, NOT on BEGIN             |
|    No reader/writer blocking                                   |
|                                                                |
|  MYSQL InnoDB REPEATABLE READ:                                 |
|    Plain SELECT  -> MVCC snapshot                              |
|    Locking read  -> next-key (record + gap) locks              |
|    Phantoms: PREVENTED                                         |
|    Cost: gap-lock contention and deadlocks possible            |
|                                                                |
|  SYNTAX (PostgreSQL):                                          |
|    BEGIN ISOLATION LEVEL REPEATABLE READ;                      |
|    -- or --                                                    |
|    BEGIN;                                                      |
|    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;            |
|                                                                |
|  WHEN TO USE:                                                  |
|    Multi-query reports / reconciliations    -> REPEATABLE READ |
|    Per-row OLTP read-modify-write           -> READ COMMITTED  |
|    Predicate-level invariants (no skew)     -> SERIALIZABLE    |
|                                                                |
+---------------------------------------------------------------+

Concern	READ COMMITTED	REPEATABLE READ (PG)	SERIALIZABLE (PG)
Dirty read	prevented	prevented	prevented
Non-repeatable read	possible	prevented	prevented
Phantom read	possible	prevented (snapshot)	prevented
Write skew	possible	possible	prevented (SSI)
Reader blocks writer	no	no	no
Writer blocks reader	no	no	no
Serialization failures	rare	possible	common, must retry
Best for	OLTP row writes	Multi-query reports	Predicate invariants

Prev: Lesson 11.4 -- Non-Repeatable Read and READ COMMITTED Next: Lesson 11.6 -- SERIALIZABLE and Isolation Comparison

This is Lesson 11.5 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

Phantom Read and REPEATABLE READ