Database Interview Prep
Joins

Join Performance

Nested Loop, Hash, Merge, EXPLAIN ANALYZE, and the Driving Table

LinkedIn Hook

"Your join was 12ms in dev and 47 seconds in production. Same SQL. Same indexes. Different driving table."

Most developers learn JOIN syntax in an afternoon and then spend the next five years confused about why the same query behaves like a completely different animal once the data grows past a million rows. The query plan changes. The join algorithm changes. The "obvious" index is suddenly ignored. The query that ran on a coffee break in staging now blocks the connection pool in production at 3 PM on a Tuesday.

Here is the brutal truth nobody tells you in bootcamp: there is no such thing as "a join" inside the database. There are at least three completely different physical algorithms — Nested Loop, Hash Join, and Merge Join — and the planner picks one based on row count estimates, available indexes, sort orders, and work_mem. A nested loop on 10 rows is the fastest thing in SQL. A nested loop on 10 million rows is a career-ending outage. The only thing that decides which one happens is whether the planner believed the row count from its statistics.

And then there is the N+1 problem — the most common application-level join bug on Earth. Your ORM "helpfully" hides the join. Your code looks like a clean for-loop over users. The database sees 1 + 1000 round trips and your p99 latency goes through the floor.

In Lesson 5.6, I break down how the planner picks a join algorithm, how to read EXPLAIN ANALYZE for joins, how to index join columns properly, why join order matters, what the "driving table" really means, and how to kill N+1 forever.

Read the full lesson -> [link]

#SQL #Database #PostgreSQL #QueryPerformance #Joins #BackendDevelopment #InterviewPrep


Join Performance thumbnail


What You'll Learn

  • The three physical join algorithms — Nested Loop, Hash Join, Merge Join — and when each one wins
  • How to read EXPLAIN ANALYZE output for joins, including actual vs estimated row counts
  • Why indexing the join column on the inner side is the single biggest lever for nested loop performance
  • What the "driving table" (outer table) is and why the planner picks the smaller one
  • How join order affects both correctness and performance, and what join_collapse_limit does
  • Why the N+1 query problem destroys application latency and how to detect it
  • The MySQL differences — until 8.0.18 MySQL had no Hash Join at all, only Nested Loop variants
  • How to force a join algorithm with enable_* planner flags when debugging

The Library Lookup Analogy — Three Ways to Match Two Lists

Imagine you work at a library, and you are given two lists. List A has 20 patron names. List B has 2,000,000 book records, each tagged with the patron who borrowed it. Your job: produce a combined list showing each patron and every book they borrowed. This is exactly a join.

You have three completely different ways to do it.

Strategy 1 — Nested Loop. Walk down List A one patron at a time. For each patron, scan all 2 million book records looking for matches. This is fine if List A has 5 patrons and List B has 10 books. It is catastrophic if both lists are large, because the work is |A| * |B| — 20 * 2,000,000 = 40 million comparisons. But if List B has an index on the patron column, you do not scan all 2 million books — you do a logarithmic lookup per patron, and 20 * log(2,000,000) is tiny. This is why nested loop with an index on the inner side is the fastest join in SQL when the outer side is small.

Strategy 2 — Hash Join. Pick the smaller list (List A, 20 patrons). Build a hash table from it, keyed by patron name. Now walk through List B once, and for each book, check the hash table — O(1) per lookup. Total work is |A| + |B|, regardless of whether indexes exist. This is the workhorse algorithm when one side fits in memory and there is no useful index.

Strategy 3 — Merge Join. Sort both lists by patron name. Then walk them in lockstep, like merging two already-sorted decks of cards. Total work is |A| log |A| + |B| log |B| for the sort, then |A| + |B| for the merge. The win comes when both inputs are already sorted — for example, because both come from indexes on the join column. Then the sort cost vanishes and merge join is the fastest option of all for very large equi-joins.

The database does not pick one algorithm "because it is the best." It picks based on the row counts it expects to see, the indexes available, whether inputs are already sorted, and how much memory it has to build a hash table. And the entire performance story of joins comes down to: did the planner pick the right algorithm, and was its row count estimate close to reality?

+---------------------------------------------------------------+
|           THE THREE JOIN ALGORITHMS                            |
+---------------------------------------------------------------+
|                                                                |
|  NESTED LOOP                                                   |
|    for each row in OUTER:                                      |
|      for each row in INNER where key matches:                  |
|        emit (outer, inner)                                     |
|    Best when:  outer is small, inner has an index on join col  |
|    Cost:       |outer| * lookup_cost_on_inner                  |
|                                                                |
|  HASH JOIN                                                     |
|    build: hash table from SMALLER side, keyed by join col      |
|    probe: scan LARGER side, hash-lookup each row               |
|    Best when:  one side fits in work_mem, equi-join only       |
|    Cost:       |build| + |probe|                               |
|                                                                |
|  MERGE JOIN                                                    |
|    sort: both sides by join col (free if already sorted)       |
|    merge: walk both in lockstep, advance smaller key           |
|    Best when:  both sides huge, both already sorted by join col|
|    Cost:       sort(both) + |left| + |right|                   |
|                                                                |
+---------------------------------------------------------------+

Setting Up the Sample Tables

Every example below uses these two tables. Run them in a scratch Postgres database so you can follow along with your own EXPLAIN output.

-- Two related tables: customers and their orders.
-- Customers is small (10k rows), orders is large (1M rows).
CREATE TABLE customers (
  id          SERIAL PRIMARY KEY,
  name        TEXT NOT NULL,
  country     TEXT NOT NULL,
  signup_date DATE NOT NULL
);

CREATE TABLE orders (
  id           SERIAL PRIMARY KEY,
  customer_id  INTEGER NOT NULL REFERENCES customers(id),
  total        NUMERIC(10,2) NOT NULL,
  status       TEXT NOT NULL,
  created_at   TIMESTAMPTZ NOT NULL
);

-- Generate 10,000 customers
INSERT INTO customers (name, country, signup_date)
SELECT
  'Customer ' || g,
  (ARRAY['US','UK','DE','FR','JP'])[1 + (g % 5)],
  DATE '2024-01-01' + (g % 365)
FROM generate_series(1, 10000) AS g;

-- Generate 1,000,000 orders distributed across those customers
INSERT INTO orders (customer_id, total, status, created_at)
SELECT
  1 + (g % 10000),
  (random() * 500)::NUMERIC(10,2),
  (ARRAY['pending','shipped','delivered','cancelled'])[1 + (g % 4)],
  TIMESTAMPTZ '2024-01-01' + (g || ' minutes')::INTERVAL
FROM generate_series(1, 1000000) AS g;

-- Always run ANALYZE after a bulk insert. The planner is blind without
-- statistics, and a blind planner picks the wrong join algorithm.
ANALYZE customers;
ANALYZE orders;

The foreign key on orders.customer_id does not create an index automatically in Postgres — and that is the single most common join performance bug in real systems. We will add and remove the index below to see the difference.


Reading EXPLAIN ANALYZE for Joins

EXPLAIN ANALYZE actually executes the query and reports the plan along with real timing and row counts. The first thing to check on any join plan is estimated rows vs actual rows — if they are off by 10x or more, the planner picked the wrong algorithm and you need to fix statistics, not the SQL.

-- A small, selective join: one customer's orders.
EXPLAIN ANALYZE
SELECT c.name, o.total, o.created_at
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.id = 42;
                                  QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=0.29..1234.56 rows=100 width=40)
              (actual time=0.05..0.42 rows=100 loops=1)
   ->  Index Scan using customers_pkey on customers c
         (cost=0.29..8.30 rows=1 width=20)
         (actual time=0.02..0.02 rows=1 loops=1)
         Index Cond: (id = 42)
   ->  Seq Scan on orders o
         (cost=0.00..1226.00 rows=100 width=24)
         (actual time=0.03..0.39 rows=100 loops=1)
         Filter: (customer_id = 42)
         Rows Removed by Filter: 999900
 Planning Time: 0.21 ms
 Execution Time: 0.45 ms

Read this top-down. The planner picked Nested Loop: walk customers (1 row from a primary key lookup), for each row scan orders looking for matching customer_id. The outer side is 1 row, perfect for nested loop. But notice Rows Removed by Filter: 999900 — Postgres scanned the entire orders table for one customer because there is no index on orders.customer_id. The query still finishes in under a millisecond because we only ran it once, but multiply this by a thousand customers and you get a thousand full table scans.

-- Add the missing index on the foreign key column.
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Run the same query again
EXPLAIN ANALYZE
SELECT c.name, o.total, o.created_at
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.id = 42;
                                  QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=0.71..405.30 rows=100 width=40)
              (actual time=0.04..0.12 rows=100 loops=1)
   ->  Index Scan using customers_pkey on customers c
         (cost=0.29..8.30 rows=1 width=20)
         (actual time=0.02..0.02 rows=1 loops=1)
         Index Cond: (id = 42)
   ->  Index Scan using idx_orders_customer_id on orders o
         (cost=0.42..396.00 rows=100 width=24)
         (actual time=0.01..0.09 rows=100 loops=1)
         Index Cond: (customer_id = 42)
 Planning Time: 0.18 ms
 Execution Time: 0.15 ms

Now the inner side uses Index Scan using idx_orders_customer_id — no filter rows removed, no sequential scan. This is the textbook fast nested loop join: small outer side, indexed inner side, logarithmic lookup per outer row.


When the Planner Switches to Hash Join

Nested loop is great for small outer sides. The moment the outer side gets large, the planner switches to a hash join — because doing a million index lookups is slower than building one hash table and probing it once.

-- A large join: every customer with at least one order.
-- Outer side is now 10,000 rows, inner side is 1,000,000.
EXPLAIN ANALYZE
SELECT c.country, COUNT(*) AS order_count, SUM(o.total) AS revenue
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.country;
                                  QUERY PLAN
-----------------------------------------------------------------------------
 HashAggregate  (cost=23456.78..23456.83 rows=5 width=40)
                (actual time=312.45..312.47 rows=5 loops=1)
   Group Key: c.country
   ->  Hash Join  (cost=295.00..18456.00 rows=1000000 width=15)
                  (actual time=4.12..198.30 rows=1000000 loops=1)
         Hash Cond: (o.customer_id = c.id)
         ->  Seq Scan on orders o
               (cost=0.00..15406.00 rows=1000000 width=11)
               (actual time=0.01..58.20 rows=1000000 loops=1)
         ->  Hash  (cost=170.00..170.00 rows=10000 width=12)
                   (actual time=4.05..4.06 rows=10000 loops=1)
               Buckets: 16384  Batches: 1  Memory Usage: 519kB
               ->  Seq Scan on customers c
                     (cost=0.00..170.00 rows=10000 width=12)
                     (actual time=0.01..1.50 rows=10000 loops=1)
 Planning Time: 0.32 ms
 Execution Time: 312.95 ms

The planner built a hash table from the smaller side (customers, 10k rows, 519kB in memory — comfortably under work_mem), then sequentially scanned orders and probed the hash table once per row. The index on orders.customer_id is not used — and that is correct. With one million outer rows, doing one million B-tree lookups would be far slower than one sequential scan plus one in-memory hash probe per row.

This is the part most developers get wrong: adding more indexes does not always help. If the planner decides a hash join is cheaper than a nested loop, the index on the join column is dead weight. The right index strategy depends on which join algorithm the planner picks, which depends on the size of the outer side, which depends on your WHERE clause.

+---------------------------------------------------------------+
|           ALGORITHM SELECTION DECISION TREE                    |
+---------------------------------------------------------------+
|                                                                |
|   How big is the OUTER (driving) side after WHERE filters?    |
|                                                                |
|     1 to ~1000 rows                                            |
|        + inner has index on join col                          |
|        -> NESTED LOOP with index scan (fastest)                |
|                                                                |
|     1000 to ~100k rows                                         |
|        + smaller side fits in work_mem                        |
|        -> HASH JOIN (workhorse for medium joins)               |
|                                                                |
|     Both sides huge AND both already sorted on join col        |
|        -> MERGE JOIN (rare but unbeatable when it fits)        |
|                                                                |
|     Both sides huge AND not sorted                             |
|        -> HASH JOIN with batching (spills to disk if needed)   |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). Three vertical algorithm panels side by side. LEFT panel labeled 'Nested Loop' in sky blue (#4fc3f7) shows a small outer table with arrows looping into a large inner table that has a glowing index icon. MIDDLE panel labeled 'Hash Join' shows a small table being converted into a hash bucket grid, with a large table streaming past and probing the buckets. RIGHT panel labeled 'Merge Join' shows two pre-sorted columns being walked in parallel by two pointers. Below each panel, a small cost formula in white monospace. Rose (#ff5c8a) warning at the bottom: 'Wrong algorithm = 1000x slowdown.'"


Indexing Join Columns — The Single Biggest Lever

The most common join performance fix in any database is also the simplest: put a B-tree index on every foreign key column. Postgres does not create one automatically when you declare REFERENCES, and the absence of that index is the root cause of half the slow joins in production.

-- Without idx_orders_customer_id, this query runs as a Hash Join
-- because nested loop with sequential scan on the inner side would
-- be even worse. Cost: ~300ms on 1M rows.
DROP INDEX idx_orders_customer_id;

EXPLAIN ANALYZE
SELECT c.name, COUNT(*)
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.country = 'US'
GROUP BY c.name;
                                  QUERY PLAN
-----------------------------------------------------------------------------
 HashAggregate  (cost=21500.00..21525.00 rows=2000 width=20)
                (actual time=298.10..299.02 rows=2000 loops=1)
   ->  Hash Join  (cost=200.00..20000.00 rows=200000 width=12)
                  (actual time=2.50..240.10 rows=200000 loops=1)
         Hash Cond: (o.customer_id = c.id)
         ->  Seq Scan on orders o   -- full scan of 1M rows
         ->  Hash
               ->  Seq Scan on customers c
                     Filter: (country = 'US')
                     Rows Removed by Filter: 8000
 Execution Time: 299.45 ms
-- Recreate the index. Now the planner has options.
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- For a SMALL outer side (one country = 2k customers), the planner
-- might still pick hash. For a single customer, it picks nested loop.
EXPLAIN ANALYZE
SELECT c.name, COUNT(*)
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.id IN (1, 2, 3, 4, 5)   -- tiny outer side
GROUP BY c.name;
                                  QUERY PLAN
-----------------------------------------------------------------------------
 HashAggregate  (cost=2050.00..2055.00 rows=5 width=20)
                (actual time=2.10..2.12 rows=5 loops=1)
   ->  Nested Loop  (cost=0.71..2025.00 rows=500 width=12)
                    (actual time=0.05..1.85 rows=500 loops=1)
         ->  Index Scan using customers_pkey on customers c
               Index Cond: (id = ANY ('{1,2,3,4,5}'::integer[]))
         ->  Index Scan using idx_orders_customer_id on orders o
               Index Cond: (customer_id = c.id)
 Execution Time: 2.20 ms

The same query against 5 customers takes 2 ms with the index and ~300 ms without (when it falls back to a full hash join over 1M rows). That is a 150x difference from a single CREATE INDEX.

Composite indexes for multi-column joins. When you join on two columns (ON a.x = b.x AND a.y = b.y), a composite index on (x, y) on the inner table lets the planner serve both equality conditions from one index lookup. The leading column should be the more selective one.


The Driving Table — Why "Outer" Isn't Always First

In execution plans, the driving table (also called the outer table of a nested loop) is the one the database iterates over once, while the other side is probed for each row. The planner almost always picks the smaller side as the driving table for nested loop, because the cost is |outer| * lookup_cost. Halving the outer halves the work.

This means: the order you write tables in the FROM clause does not determine the driving table. Postgres reorders joins freely (up to join_collapse_limit, default 8) based on cost estimates. You can write FROM orders o JOIN customers c and the planner will still drive from customers if customers is smaller after filtering.

-- Both of these produce identical plans, because the planner
-- reorders joins to minimize cost.
EXPLAIN
SELECT * FROM customers c JOIN orders o ON o.customer_id = c.id
WHERE c.id = 42;

EXPLAIN
SELECT * FROM orders o JOIN customers c ON c.id = o.customer_id
WHERE c.id = 42;
+---------------------------------------------------------------+
|           NESTED LOOP DATA FLOW                                |
+---------------------------------------------------------------+
|                                                                |
|   DRIVING (outer) side -> small, scanned once                  |
|                                                                |
|     customers (after WHERE)                                    |
|     +--------+                                                 |
|     | id=42  |                                                 |
|     +--------+                                                 |
|         |                                                      |
|         | for each outer row, probe inner                      |
|         v                                                      |
|     orders (inner side)                                        |
|     +-------------------------------+                          |
|     | Index lookup on customer_id   |  <- O(log N) per probe   |
|     +-------------------------------+                          |
|         |                                                      |
|         v                                                      |
|     emit (customer, order) pairs                               |
|                                                                |
|   RULE: the planner ALWAYS prefers the smaller, more           |
|         selective side as the driving table — write whichever  |
|         FROM order reads best, the planner will fix it.        |
|                                                                |
+---------------------------------------------------------------+

The exception is when you cross join_collapse_limit joins (default 8 tables), at which point Postgres switches to genetic query optimization (GEQO) and search becomes heuristic. For huge joins you may need to write them in a sensible order or set join_collapse_limit higher. For the 99% case, write whatever FROM order is most readable and let the planner do its job.


Join Order and Multi-Way Joins

When three or more tables join, the planner has to pick a join order — (A join B) join C or A join (B join C). The cost of these can differ by orders of magnitude, because the intermediate result size depends on join selectivity.

-- Three-way join: customers -> orders -> order_items
CREATE TABLE order_items (
  id        SERIAL PRIMARY KEY,
  order_id  INTEGER NOT NULL REFERENCES orders(id),
  sku       TEXT NOT NULL,
  qty       INTEGER NOT NULL
);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);

EXPLAIN ANALYZE
SELECT c.name, o.id, oi.sku, oi.qty
FROM customers c
JOIN orders o ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
WHERE c.country = 'JP' AND o.status = 'shipped';
                                  QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=...)
   ->  Hash Join  (cost=...)
         Hash Cond: (o.customer_id = c.id)
         ->  Seq Scan on orders o
               Filter: (status = 'shipped')
         ->  Hash
               ->  Seq Scan on customers c
                     Filter: (country = 'JP')
   ->  Index Scan using idx_order_items_order_id on order_items oi
         Index Cond: (order_id = o.id)

Notice how the planner mixed two algorithms: Hash Join between customers and orders (medium-sized inputs after filtering), then Nested Loop to attach order_items via its index on order_id. This is the planner doing the right thing — pick the cheapest algorithm at each level based on the intermediate row counts.

If the planner picks a bad order, the most common fix is better statistics: ANALYZE the tables, increase default_statistics_target for the relevant columns, or create extended statistics for correlated columns. As a last resort, you can rewrite the query as a CTE with MATERIALIZED to force a specific evaluation order — but use this sparingly, because it removes the planner's ability to optimize across the boundary.


The N+1 Problem — Death by a Thousand Joins

The N+1 query problem is not a database bug. It is an application-level pattern where ORM code "joins" tables by issuing one query for the parent, and then one additional query per child — N parent rows produce 1 + N total queries. Each query is fast in isolation, but the round-trip latency adds up to ruin.

// The classic N+1 in Node.js / Sequelize / Prisma pseudocode
const customers = await db.query('SELECT * FROM customers WHERE country = $1', ['US']);
// 1 query so far. Now we "enrich" each customer with their orders.
for (const c of customers) {
  c.orders = await db.query('SELECT * FROM orders WHERE customer_id = $1', [c.id]);
}
// If country=US has 2000 customers, this is 1 + 2000 = 2001 queries.
// Even at 1ms each, that is 2 seconds of pure round trip latency.

The fix is to do a single join query and let the database hand you all the data at once.

-- One query, one round trip. The database does the join in-process,
-- using a hash join on 2000 customers x ~200k orders.
SELECT c.id, c.name, o.id AS order_id, o.total, o.created_at
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
WHERE c.country = 'US'
ORDER BY c.id, o.created_at;

In ORM land, this is called eager loading: Sequelize include, Prisma include, ActiveRecord includes, Hibernate JOIN FETCH. Use it any time you know you will need related rows. The only acceptable form of N+1 is when you might or might not need the children based on a runtime condition — and even then, batch the lookups with WHERE id IN (...).

+---------------------------------------------------------------+
|           N+1 vs JOIN — ROUND TRIP MATH                        |
+---------------------------------------------------------------+
|                                                                |
|   N+1 PATTERN (2000 customers):                                |
|     1 query for customers                                      |
|     2000 queries for orders, one per customer                  |
|     ----                                                       |
|     2001 queries x 1ms RTT = 2.0 seconds                       |
|     Database CPU time:        ~50ms                            |
|     Network latency:          1.95 seconds (97% wasted)        |
|                                                                |
|   SINGLE JOIN:                                                 |
|     1 query, hash join                                         |
|     ----                                                       |
|     1 query x 1ms RTT + 80ms execution = 81ms                  |
|     25x faster, scales the same on bigger inputs               |
|                                                                |
|   HOW TO DETECT:                                               |
|     - Query log shows hundreds of similar queries per request  |
|     - p99 latency scales linearly with parent row count        |
|     - APM shows "many small DB calls" instead of "few big ones"|
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). Two horizontal lanes labeled 'N+1 Anti-Pattern' (top, rose #ff5c8a) and 'Single JOIN' (bottom, sky blue #4fc3f7). The top lane shows 2001 small query arrows streaming between an app box and a database box, with a clock icon showing 2000ms. The bottom lane shows one fat arrow between the same two boxes with a clock showing 81ms. White monospace labels: '2001 round trips' vs '1 round trip'. Subtitle: '25x faster, same data.'"


MySQL Differences

The Postgres-centric examples above need a few notes for MySQL:

  • Hash Join was only added to MySQL in 8.0.18 (2019). Before that, MySQL had only nested loop joins (with a few variants like Block Nested Loop). On older MySQL versions, large joins without an index on the inner side were often catastrophic — the workaround was to always make sure both sides had indexes on the join column.
  • MySQL has no Merge Join algorithm at all. It relies on nested loop with index lookups, and on hash join (since 8.0.18). For range joins it uses Block Nested Loop.
  • EXPLAIN format is different. MySQL EXPLAIN shows a row per table with columns like type (ALL, index, ref, eq_ref, range), key, rows, and Extra. To see actual times like Postgres, use EXPLAIN ANALYZE (8.0.18+) or EXPLAIN FORMAT=TREE.
  • Foreign keys auto-create indexes in MySQL/InnoDB, unlike Postgres. So one of the most common Postgres bugs (missing FK index) does not exist in MySQL. The trade-off is that MySQL's auto-indexing is sometimes wasteful when you would not have wanted that exact index shape.
  • Optimizer hints (/*+ HASH_JOIN(...) */, STRAIGHT_JOIN) exist in MySQL and are sometimes necessary to override bad optimizer decisions. Postgres has no equivalent — you tweak the planner with enable_nestloop = off, enable_hashjoin = off etc. for debugging only, never in production.

Forcing a Join Algorithm for Debugging

When you suspect the planner picked the wrong algorithm, you can flip the corresponding enable_* flag at the session level to force a different choice and compare.

-- Force Postgres to NOT use a hash join, just for this session.
-- Useful for "what would happen if we went back to nested loop?"
SET enable_hashjoin = off;

EXPLAIN ANALYZE
SELECT c.country, COUNT(*) FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.country;

-- Reset to default
SET enable_hashjoin = on;

These flags are debugging tools, not production knobs. Never put them in your application. If the planner is consistently picking a worse plan, the answer is to fix statistics, add an index, rewrite the query, or use extended statistics — not to disable an entire algorithm.


Common Mistakes

1. Forgetting to index the foreign key column. Postgres does not auto-index foreign keys. Every REFERENCES declaration should be paired with a CREATE INDEX on the referencing column. Without it, the planner is forced to use sequential scans on the inner side of a nested loop, or fall back to hash joins that do not benefit from selectivity. This is the single most common slow-join cause in real Postgres systems and is invisible until production load.

2. Trusting the planner with stale statistics. The planner uses pg_statistic to estimate row counts, and those estimates drive every join decision. If you bulk-loaded data and never ran ANALYZE, the estimates are garbage and the planner picks bad algorithms. Run ANALYZE after every bulk load, and consider increasing default_statistics_target (default 100) to 500 or 1000 for tables where joins are critical.

3. Confusing "rows" with "actual rows" in EXPLAIN ANALYZE. The first number (rows=) is the planner's estimate. The second (actual rows=) is what really happened. If they differ by 10x or more, the plan was chosen on bad assumptions and is probably wrong. Always compare them — a great plan with rows=10000 (actual rows=10) is fine, but rows=10 (actual rows=10000000) means the planner thought it was doing a tiny nested loop and instead did a 10-million-row catastrophe.

4. Hiding N+1 behind ORM convenience methods. ORMs make it easy to lazy-load related rows by walking object references in code. That convenience is a footgun. Always use eager loading (include, JOIN FETCH) when you know you will iterate over children, and use query logging in development to catch N+1 patterns before they reach production. APM tools like Datadog and New Relic flag N+1 automatically.

5. Adding indexes "just in case" without checking the plan. Indexes are not free. They consume disk, slow down writes, and sometimes the planner ignores them entirely (because hash join was cheaper). The right workflow is: write the query, run EXPLAIN ANALYZE, identify the bottleneck (sequential scan on a hot join column, sort node spilling to disk, hash batches), and add the specific index that fixes it. Do not bulk-create indexes on every column you might join on.


Interview Questions

1. "What are the three main physical join algorithms in Postgres, and when does the planner pick each one?"

Postgres has three join algorithms: Nested Loop, Hash Join, and Merge Join. Nested Loop iterates the outer (driving) side once and probes the inner side for each outer row — it is the fastest option when the outer side is small (a few hundred rows or less) and the inner side has an index on the join column, because the cost is roughly |outer| * log(|inner|). Hash Join builds an in-memory hash table from the smaller side, then sequentially scans the larger side and probes the hash table once per row — it is the workhorse for medium-to-large equi-joins where one side fits in work_mem. Merge Join sorts both sides by the join key and walks them in lockstep — it wins when both inputs are very large and already sorted by the join column (for example, both come from index scans), because the sort cost vanishes. The planner picks based on row count estimates from pg_statistic, so the entire decision hinges on whether ANALYZE has been run recently and whether the estimates match reality.

2. "What is the driving table in a nested loop join, and how does the planner choose it?"

The driving table (also called the outer table) is the side of a nested loop that gets iterated once, while the other side is probed for each outer row. The total cost of a nested loop is |driving| * lookup_cost_on_inner, so the planner always wants the smaller side after WHERE filters as the driving table — halving the driving side halves the work. Critically, the order you write tables in the FROM clause does not determine the driving table. Postgres reorders joins freely up to join_collapse_limit (default 8) based on cost estimates, so FROM a JOIN b and FROM b JOIN a produce identical plans. The right intuition is: filter the data with WHERE clauses, ask "which side has fewer rows after filtering?", and that side will become the driving table — and that is also the side you should make sure has good selectivity, while the other side is the one that needs an index on the join column.

3. "Explain the N+1 query problem and three ways to detect it."

The N+1 query problem is an application-level anti-pattern where code fetches a parent collection with one query, then issues one additional query per parent row to fetch its children — so N parents produce 1 + N total queries. Each query is fast individually (a few milliseconds) but the cumulative round-trip latency can turn a single-page render into multi-second wall-clock time. The fix is to issue one join query that fetches parents and children together, called eager loading in ORM terminology. To detect N+1: first, enable query logging in development and look for hundreds of similar queries differing only by an ID parameter — that is the classic signature. Second, check whether your p99 latency scales linearly with the number of parent rows on a page, because true joins have nearly constant overhead while N+1 grows with N. Third, use APM tools (Datadog, New Relic, Sentry Performance) which detect the pattern automatically by clustering similar SQL fingerprints. The fix is always one of: ORM eager loading (include, JOIN FETCH), a manual JOIN query, or batched lookups with WHERE id IN (...).

4. "You add an index on a join column and the query gets slower. How can that happen?"

This is more common than people think and the cause is almost always that the planner switched algorithms based on the new "cheaper looking" index, but its row estimates were wrong. Three concrete scenarios. First: with the new index, the planner thinks "I can do a nested loop with index scan now," but the outer side has 500,000 rows instead of the estimated 500, so it runs 500,000 index lookups instead of one hash join scan. Second: the index forces the planner to choose an Index Scan over a Bitmap Heap Scan when many rows match, causing random I/O across the table heap instead of sequential I/O. Third: the new index causes the planner to use a different join order entirely, picking a worse intermediate result size. The diagnosis in all cases is EXPLAIN ANALYZE before and after the index, comparing estimated vs actual rows and the chosen join algorithm. The fix is usually to run ANALYZE, increase default_statistics_target for the affected column, or add extended statistics — not to drop the index.

5. "How does MySQL's join handling differ from Postgres, and why does it matter?"

The biggest difference is that MySQL only got a Hash Join algorithm in version 8.0.18 (October 2019). Before that, MySQL relied entirely on Nested Loop Joins (with optimizations like Block Nested Loop for non-indexed inner sides). This meant that on older MySQL, every large equi-join without an index on the inner side was a near-quadratic disaster, and the standard advice was "always index every join column." Postgres has had hash and merge joins for decades, so it gracefully handles unindexed joins by switching algorithms. MySQL also has no Merge Join algorithm at all, even today — it covers the same use cases with hash join and indexed nested loops. On the upside, MySQL/InnoDB automatically creates an index on every foreign key, which prevents the most common Postgres bug (forgetting to index a FK column). Practical implication: if you are migrating a query from MySQL to Postgres, you have to manually add indexes on every FK column, or you will see surprising slow joins in production despite the SQL being identical.


Quick Reference — Join Performance Cheat Sheet

+---------------------------------------------------------------+
|           JOIN PERFORMANCE CHEAT SHEET                         |
+---------------------------------------------------------------+
|                                                                |
|  THREE ALGORITHMS:                                             |
|    Nested Loop  -> small outer + indexed inner                 |
|    Hash Join    -> medium-large, one side fits in work_mem     |
|    Merge Join   -> both sides huge AND already sorted          |
|                                                                |
|  EXPLAIN ANALYZE — WHAT TO LOOK FOR:                           |
|    - Algorithm chosen (Nested Loop / Hash Join / Merge Join)   |
|    - Estimated rows vs actual rows (10x off = bad stats)       |
|    - "Rows Removed by Filter: N" -> missing index              |
|    - "Sort Method: external merge" -> spilling to disk         |
|    - "Hash Batches: > 1" -> work_mem too small                 |
|                                                                |
|  INDEXING RULES:                                               |
|    - ALWAYS index FK columns in Postgres (not auto)            |
|    - Composite index on multi-column joins (selective first)   |
|    - Inner side of nested loop NEEDS the index                 |
|    - Hash join does NOT use indexes on the join column         |
|                                                                |
|  DRIVING TABLE:                                                |
|    - Smaller side after WHERE filters                          |
|    - Planner reorders FROM freely, do not micro-optimize       |
|    - join_collapse_limit (default 8) caps the search           |
|                                                                |
|  N+1 PROBLEM:                                                  |
|    - Symptom: 1 + N round trips, latency scales with N         |
|    - Detect: query log, p99 vs N, APM SQL fingerprints         |
|    - Fix: eager loading / JOIN / WHERE id IN (...)             |
|                                                                |
|  MYSQL NOTES:                                                  |
|    - Hash join only since 8.0.18 (2019)                        |
|    - NO merge join algorithm                                   |
|    - FK columns auto-indexed (unlike Postgres)                 |
|    - Use EXPLAIN FORMAT=TREE or EXPLAIN ANALYZE                |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           KEY RULES                                            |
+---------------------------------------------------------------+
|                                                                |
|  1. ANALYZE after every bulk load — bad stats = bad joins     |
|  2. Index every FK column in Postgres                          |
|  3. Compare estimated vs actual rows in EXPLAIN ANALYZE        |
|  4. Smaller side becomes the driving table — let planner pick  |
|  5. Hash join ignores join-column indexes — that is correct    |
|  6. N+1 is an application bug, not a database bug              |
|  7. Eager load whenever you know you need related rows         |
|  8. Do not use enable_* flags in production                    |
|  9. Three plans, three different time complexities             |
| 10. The right index depends on which algorithm is chosen       |
|                                                                |
+---------------------------------------------------------------+
ConcernWrong WayRight Way
FK indexTrust REFERENCESCREATE INDEX on FK column
StatsHope planner guesses rightANALYZE after bulk load
Plan readingLook at estimated rows onlyCompare estimated vs actual
Join orderReorder FROM clause manuallyLet planner reorder, fix stats
N+1Lazy-load in app loopsEager load with JOIN
Hash spillIgnore "Batches: 8" warningIncrease work_mem for query
Algorithm tuningSET enable_hashjoin=off in appFix stats / index / query shape
Multi-col joinTwo single-col indexesComposite index, selective first

Prev: Lesson 5.5 -- CROSS and SELF JOIN Next: Lesson 6.1 -- Subqueries


This is Lesson 5.6 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page