LEFT JOIN
Preserving the Left Side, Finding Missing Rows, and the ON vs WHERE Trap
LinkedIn Hook
"You wrote a LEFT JOIN to find customers with no orders. Then you added a WHERE clause and quietly turned it back into an INNER JOIN."
LEFT JOIN is the most misused join in SQL. Every developer learns the mechanics in five minutes — "keep all rows from the left table, fill in NULLs when the right side has no match" — and then spends the next five years getting bitten by the same three bugs. They put a filter on the right table in the WHERE clause and wonder why their LEFT JOIN suddenly drops half its rows. They write
WHERE right_table.col IS NULLto find missing matches and confuse it with filtering for actual NULL values. They count rows usingCOUNT(*)and inflate the number because the unmatched left rows still count as one.The truth is that LEFT JOIN is not just "INNER JOIN that keeps unmatched rows." It is a fundamentally different operation that changes how every subsequent clause in the query behaves. The ON clause filters the right side BEFORE the join preserves left rows. The WHERE clause filters AFTER the join, so any condition on right-side columns will silently delete the unmatched rows that LEFT JOIN promised to keep. Get those two confused and your "find customers with no orders" query returns zero rows when there are thousands.
And then there is the secret superpower of LEFT JOIN: the anti-join pattern.
LEFT JOIN ... WHERE right.id IS NULLis the canonical SQL idiom for "rows in A that have no match in B." It is faster thanNOT EXISTSon some engines, slower on others, and equivalent toEXCEPTfor single-key cases. Knowing when to reach for each is one of the cleanest tells of a senior backend engineer in a SQL interview.In Lesson 5.3, I break down LEFT JOIN end to end: LEFT OUTER JOIN semantics, NULL preservation rules, the ON vs WHERE trap, the anti-join idiom for finding missing rows, multi-table chains, COUNT pitfalls, and how the planner actually executes a LEFT JOIN under the hood.
Read the full lesson -> [link]
#SQL #Database #PostgreSQL #MySQL #BackendDevelopment #InterviewPrep #Joins
What You'll Learn
- What LEFT JOIN (LEFT OUTER JOIN) actually means and why "outer" is the keyword that matters
- How NULLs appear in the result for unmatched right-side rows — and what that breaks downstream
- The single most common LEFT JOIN bug: filtering right-side columns in WHERE instead of ON
- The anti-join idiom —
LEFT JOIN ... WHERE right.id IS NULL— and when to prefer it overNOT EXISTS - How to chain multiple LEFT JOINs and what happens when the second join no longer matches
- Why
COUNT(*)lies after a LEFT JOIN and how to count correctly - How the planner executes a LEFT JOIN (Hash Left Join, Nested Loop Left Join) and what it costs
- MySQL vs PostgreSQL differences in LEFT JOIN behavior
The Guest List Analogy — Everyone Gets a Seat, Even If Their Plus-One Bailed
Imagine you are running a wedding reception. You have a guest list — every confirmed invitee — and a separate list of meal selections that guests sent in over the past month. Your job is to print a seating chart that shows every guest and, next to their name, the meal they chose.
If you only print the guests who actually returned a meal card, you will end up with an INNER JOIN: a clean list, but missing the cousins who never RSVP'd, the friend-of-a-friend you added last week, and the boss who hates filling out forms. None of them will have a seat at the wedding because they "didn't match" a row in the meal table. That is a disaster.
What you actually want is a LEFT JOIN: print every guest from the guest list, and next to each, show the meal they chose — or write "TBD" if they never sent a card. The guest list is preserved completely. Unmatched rows get a placeholder. Nobody is dropped, nobody is duplicated.
That placeholder — the "TBD" — is exactly what SQL does with NULL. A LEFT JOIN promises that every row from the left table appears in the result at least once, and any column from the right table that has no match becomes NULL. The left table is the source of truth for which rows exist. The right table is decoration that fills in when available.
+---------------------------------------------------------------+
| LEFT JOIN AS A GUEST LIST |
+---------------------------------------------------------------+
| |
| GUESTS (left) MEAL CARDS (right) |
| +------------+ +------------+ |
| | Alice | | Alice -> Fish | |
| | Bob | | Carol -> Beef | |
| | Carol | | Eve -> Vegan | |
| | Dave | +------------+ |
| | Eve | |
| +------------+ |
| |
| LEFT JOIN (guests preserved) |
| -> |
| |
| +-----------------------+ |
| | Alice | Fish | |
| | Bob | NULL (TBD) | <- preserved with NULL |
| | Carol | Beef | |
| | Dave | NULL (TBD) | <- preserved with NULL |
| | Eve | Vegan | |
| +-----------------------+ |
| |
| Every guest appears exactly once. Missing meal -> NULL. |
| |
+---------------------------------------------------------------+
This analogy gives you the right mental model for everything that follows. Whenever you find yourself confused about a LEFT JOIN, ask: "Am I treating the left table as my source of truth? Am I letting unmatched right rows show as NULL, or am I quietly dropping them?"
LEFT JOIN vs LEFT OUTER JOIN — Same Thing
The keyword OUTER is optional. LEFT JOIN and LEFT OUTER JOIN are exactly the same operation in every major SQL engine. The "outer" in the name distinguishes it from "inner" join — inner joins keep only matching rows, outer joins keep extra rows from one or both sides. LEFT OUTER JOIN keeps the left side, RIGHT OUTER JOIN keeps the right side, FULL OUTER JOIN keeps both. Most developers and most codebases write LEFT JOIN for brevity, and that is the convention this lesson uses.
-- These two queries are identical
SELECT * FROM customers c LEFT JOIN orders o ON c.id = o.customer_id;
SELECT * FROM customers c LEFT OUTER JOIN orders o ON c.id = o.customer_id;
Setting Up the Sample Tables
Every example in this lesson uses these two tables. Run them in psql or your favorite Postgres client to follow along.
-- Sample tables used throughout this lesson
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT NOT NULL,
city TEXT
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
total NUMERIC(10, 2),
status TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);
INSERT INTO customers (name, city) VALUES
('Alice', 'Seattle'),
('Bob', 'Portland'),
('Carol', 'Seattle'),
('Dave', 'Austin'),
('Eve', 'Denver');
-- Note: Bob and Dave have NO orders at all
INSERT INTO orders (customer_id, total, status) VALUES
(1, 120.00, 'shipped'), -- Alice
(1, 55.50, 'shipped'), -- Alice (second order)
(3, 300.00, 'pending'), -- Carol
(5, 999.00, 'shipped'); -- Eve
After loading: 5 customers, 4 orders, and Bob (id=2) and Dave (id=4) have no orders.
Example 1: Basic LEFT JOIN — Every Customer, With or Without Orders
The simplest use of LEFT JOIN is a "show me everyone, plus their stuff if they have any" query. The customer table is the left side because customers are the source of truth — we want every one of them in the result, regardless of whether they have placed an order.
-- Show every customer and their order info if any
SELECT
c.id AS customer_id,
c.name AS customer_name,
o.id AS order_id,
o.total AS order_total,
o.status AS order_status
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
ORDER BY c.id, o.id;
-- customer_id | customer_name | order_id | order_total | order_status
-- ------------+---------------+----------+-------------+--------------
-- 1 | Alice | 1 | 120.00 | shipped
-- 1 | Alice | 2 | 55.50 | shipped
-- 2 | Bob | NULL | NULL | NULL
-- 3 | Carol | 3 | 300.00 | pending
-- 4 | Dave | NULL | NULL | NULL
-- 5 | Eve | 4 | 999.00 | shipped
Look closely at the result. Every customer appears at least once. Alice appears twice because she has two orders — LEFT JOIN multiplies the left row once per matching right row, just like INNER JOIN does. Bob and Dave appear with NULL in every right-side column because they had no matching order. This is the entire definition of LEFT JOIN.
+---------------------------------------------------------------+
| LEFT JOIN ROW EXPANSION |
+---------------------------------------------------------------+
| |
| 1 Alice -> matches 2 orders -> emits 2 rows |
| 2 Bob -> matches 0 orders -> emits 1 row, right=NULL |
| 3 Carol -> matches 1 order -> emits 1 row |
| 4 Dave -> matches 0 orders -> emits 1 row, right=NULL |
| 5 Eve -> matches 1 order -> emits 1 row |
| |
| TOTAL: 6 result rows from 5 left rows + 4 right rows |
| |
| Rule: left row count <= result row count |
| result = sum over left rows of max(1, matches) |
| |
+---------------------------------------------------------------+
The ON vs WHERE Trap — The #1 LEFT JOIN Bug
This is the most important section in the entire lesson. If you understand nothing else about LEFT JOIN, understand this: filtering a right-side column in the WHERE clause silently turns your LEFT JOIN into an INNER JOIN.
Here is why. The ON clause is evaluated during the join — it decides which right rows match each left row. If no right row matches, LEFT JOIN preserves the left row with NULLs. The WHERE clause is evaluated after the join is complete — it filters the joined result set. NULL values from preserved left rows fail almost every condition (NULL = anything is NULL, which is not TRUE), so they get silently dropped.
The Wrong Way
-- BUG: trying to "find customers and their shipped orders, including
-- customers with no shipped orders". This query DOES NOT do that.
SELECT
c.name,
o.id AS order_id,
o.status
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
WHERE o.status = 'shipped' -- The trap
ORDER BY c.id;
-- name | order_id | status
-- ------+----------+---------
-- Alice | 1 | shipped
-- Alice | 2 | shipped
-- Eve | 4 | shipped
--
-- WHERE NO BOB. NO DAVE. NO CAROL.
-- The LEFT JOIN preserved them with status=NULL, then WHERE
-- evaluated "NULL = 'shipped'" -> NULL -> NOT TRUE -> dropped.
-- This query is now functionally an INNER JOIN.
The Right Way — Filter In ON
-- Correct: move the right-side filter into the ON clause.
-- Now "shipped" is part of the matching condition, not a post-filter.
SELECT
c.name,
o.id AS order_id,
o.status
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
AND o.status = 'shipped' -- Filter happens BEFORE preservation
ORDER BY c.id;
-- name | order_id | status
-- ------+----------+---------
-- Alice | 1 | shipped
-- Alice | 2 | shipped
-- Bob | NULL | NULL <- preserved
-- Carol | NULL | NULL <- preserved (her one order was 'pending')
-- Dave | NULL | NULL <- preserved
-- Eve | 4 | shipped
Notice what changed for Carol. Her only order had status pending, so when the filter is in ON, no order matches her — she is preserved with NULL. When the filter was in WHERE, her single matching row got filtered out and she disappeared entirely.
+---------------------------------------------------------------+
| ON vs WHERE EVALUATION ORDER |
+---------------------------------------------------------------+
| |
| STAGE 1: ON clause |
| - Decides which right rows match each left row |
| - LEFT JOIN preserves unmatched left rows with NULL |
| - Filters here apply BEFORE preservation |
| |
| STAGE 2: WHERE clause |
| - Filters the post-join result set |
| - NULL values from preserved rows fail almost every test |
| - Filters here can DELETE preserved rows |
| |
| THE RULE: |
| Right-side filters -> put in ON |
| Left-side filters -> put in WHERE (or ON, no difference) |
| IS NULL on right -> put in WHERE (anti-join, see below) |
| |
+---------------------------------------------------------------+
The mental model: ON happens during the join; WHERE happens after. If a filter on the right side is in WHERE, you have promised "preserve these rows" and then immediately broken the promise.
Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). Two side-by-side flow diagrams labeled 'Filter in ON' (left, sky blue) and 'Filter in WHERE' (right, rose). Each shows three boxes: 'Customers' -> 'LEFT JOIN' -> 'Result'. The left flow shows Bob, Carol, Dave preserved as NULL rows in the result with sky blue checkmarks. The right flow shows the same NULL rows being struck through with rose X marks under a 'WHERE status = shipped' label. Bottom caption in white monospace: 'WHERE on right-side columns turns LEFT JOIN into INNER JOIN.'"
The Anti-Join Idiom — Finding Missing Rows
LEFT JOIN's killer feature is finding rows in one table that have no match in another. The pattern is: do a LEFT JOIN, then in the WHERE clause require that the right-side primary key IS NULL. The only rows that survive are the ones that had no match — exactly the ones LEFT JOIN preserved with NULLs.
This is called the anti-join because it returns the inverse of an INNER JOIN.
-- Find every customer who has never placed an order
SELECT
c.id,
c.name,
c.city
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
WHERE o.id IS NULL -- The anti-join filter
ORDER BY c.id;
-- id | name | city
-- ---+------+----------
-- 2 | Bob | Portland
-- 4 | Dave | Austin
The query reads almost like English: "Give me every customer where the joined order id is NULL — meaning no order joined to them." The reason WHERE o.id IS NULL works here, even though I just told you that WHERE on right-side columns is dangerous, is that we are intentionally filtering on the post-join NULL marker. We are not trying to preserve rows — we are trying to find precisely the preserved-NULL rows.
Critical detail: the column you check for NULL must be a right-side column that is not nullable in the source table — typically the right-side primary key. If you check a nullable column, you cannot tell whether NULL came from "no match" or from "matched row that legitimately had NULL." The primary key is never NULL in a real row, so a NULL primary key after LEFT JOIN can only mean "no match found."
Anti-Join vs NOT EXISTS vs NOT IN
There are three common ways to express "find rows in A with no match in B" in SQL. They are not always equivalent.
-- 1. Anti-join via LEFT JOIN
SELECT c.* FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE o.id IS NULL;
-- 2. NOT EXISTS subquery
SELECT c.* FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
-- 3. NOT IN subquery (DANGEROUS with NULLs)
SELECT c.* FROM customers c
WHERE c.id NOT IN (SELECT customer_id FROM orders);
All three return the same result for our sample data. But there are differences:
- NOT IN is broken if any value in the subquery is NULL, because
x NOT IN (1, 2, NULL)evaluates to NULL (not TRUE) for every row. Iforders.customer_idcould ever be NULL, NOT IN silently returns zero rows. NOT EXISTS and anti-join LEFT JOIN are immune to this. - NOT EXISTS is usually the most readable and the planner can almost always optimize it into the same plan as the anti-join LEFT JOIN (Postgres turns both into a Hash Anti Join). Many style guides recommend NOT EXISTS as the default.
- Anti-join LEFT JOIN wins when you also need columns from the right table for matched rows in a different branch of the same query, or when an existing query already has the LEFT JOIN and adding
WHERE right.pk IS NULLis a one-line change.
For new code, prefer NOT EXISTS for clarity. For existing LEFT JOIN queries that already have the join, prefer the anti-join idiom because adding a NULL check is cheaper than restructuring the query.
Example 2: Counting With LEFT JOIN — Why COUNT(*) Lies
A subtle bug: counting orders per customer with LEFT JOIN looks easy until you check the results.
-- WRONG: counts the unmatched-NULL row as 1 order
SELECT
c.name,
COUNT(*) AS order_count -- The trap
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY c.id;
-- name | order_count
-- ------+-------------
-- Alice | 2
-- Bob | 1 <- BUG: Bob has 0 orders, not 1
-- Carol | 1
-- Dave | 1 <- BUG: Dave has 0 orders, not 1
-- Eve | 1
-- RIGHT: COUNT a right-side column. NULL values are not counted.
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
GROUP BY c.id, c.name
ORDER BY c.id;
-- name | order_count
-- ------+-------------
-- Alice | 2
-- Bob | 0
-- Carol | 1
-- Dave | 0
-- Eve | 1
The rule is straightforward: COUNT(*) counts rows including the synthetic NULL rows that LEFT JOIN injects for unmatched left rows. COUNT(column) counts rows where column IS NOT NULL — so counting a right-side primary key correctly returns 0 for unmatched left rows. Always count a right-side column when summarizing after a LEFT JOIN.
The same trap applies to SUM, AVG, and other aggregates. SUM(o.total) returns NULL for Bob and Dave (no orders), which displays as NULL in the output. If you want zero instead, wrap it: COALESCE(SUM(o.total), 0).
Example 3: Chained LEFT JOINs and the "First Match" Effect
Real queries often chain multiple LEFT JOINs. The rule to remember is that once the chain encounters NULLs, every subsequent LEFT JOIN on those NULLs also produces NULLs — a single "no match" cascades through the rest of the chain.
-- Add a third table
CREATE TABLE shipments (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
carrier TEXT,
shipped_at TIMESTAMPTZ
);
INSERT INTO shipments (order_id, carrier, shipped_at) VALUES
(1, 'UPS', '2026-04-11'),
(4, 'FedEx', '2026-04-12');
-- Note: orders 2 and 3 have no shipment row
-- Chain: customers -> orders -> shipments, all LEFT JOIN
SELECT
c.name,
o.id AS order_id,
o.status,
s.carrier
FROM customers c
LEFT JOIN orders o
ON c.id = o.customer_id
LEFT JOIN shipments s
ON o.id = s.order_id
ORDER BY c.id, o.id;
-- name | order_id | status | carrier
-- ------+----------+---------+---------
-- Alice | 1 | shipped | UPS
-- Alice | 2 | shipped | NULL <- order exists, no shipment
-- Bob | NULL | NULL | NULL <- preserved through both joins
-- Carol | 3 | pending | NULL <- order exists, no shipment
-- Dave | NULL | NULL | NULL <- preserved through both joins
-- Eve | 4 | shipped | FedEx
Notice the cascade. Bob has no order, so the o.id is NULL after the first LEFT JOIN. The second LEFT JOIN tries to match o.id = s.order_id, but NULL = anything is NULL (not TRUE), so no shipment can ever match. Bob is preserved with NULL across both right tables. This is the correct behavior, and it is why LEFT JOIN chains "just work" — once a row is preserved, it stays preserved through every subsequent LEFT JOIN.
The danger: if you switched the second LEFT JOIN to an INNER JOIN, the chain would suddenly drop Bob and Dave because INNER JOIN would not preserve the NULL o.id. Mixing LEFT and INNER joins in the same chain is a classic source of bugs — usually the first LEFT JOIN you wrote is the one whose preservation gets silently undone by a downstream INNER JOIN.
Example 4: LEFT JOIN With Aggregates in a Subquery
A common pattern is "show me each customer with their total spending and order count, including customers who spent zero." The clean way is to aggregate the right side first in a subquery, then LEFT JOIN.
-- Pre-aggregate orders, then LEFT JOIN to customers.
-- This avoids the COUNT(*) and row-multiplication traps entirely.
SELECT
c.id,
c.name,
c.city,
COALESCE(o.order_count, 0) AS order_count,
COALESCE(o.total_spent, 0.00) AS total_spent
FROM customers c
LEFT JOIN (
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total) AS total_spent
FROM orders
GROUP BY customer_id
) o ON c.id = o.customer_id
ORDER BY c.id;
-- id | name | city | order_count | total_spent
-- ---+-------+----------+-------------+-------------
-- 1 | Alice | Seattle | 2 | 175.50
-- 2 | Bob | Portland | 0 | 0.00
-- 3 | Carol | Seattle | 1 | 300.00
-- 4 | Dave | Austin | 0 | 0.00
-- 5 | Eve | Denver | 1 | 999.00
Why pre-aggregate? Two reasons. First, the subquery uses COUNT(*) correctly because there are no NULL right rows yet — the GROUP BY happens before any join. Second, the LEFT JOIN now joins one row per customer to one aggregated row per customer (or zero rows for the missing customers), so there is no row multiplication. The COALESCE wraps display NULLs as zero, which is what most reports want.
This pattern is so common in production code that it has a name: the LATERAL aggregation pattern when written with LEFT JOIN LATERAL in Postgres, or simply "aggregate-then-join" in dialect-neutral SQL.
How the Planner Executes a LEFT JOIN
A LEFT JOIN is not magic — the planner picks one of a few execution strategies depending on table sizes, indexes, and selectivity.
-- Look at how Postgres plans a LEFT JOIN
EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id;
-- Hash Right Join (cost=1.11..2.27 rows=6 width=...)
-- Hash Cond: (o.customer_id = c.id)
-- -> Seq Scan on orders o
-- -> Hash
-- -> Seq Scan on customers c
--
-- Note: Postgres sometimes flips a LEFT JOIN to a Hash RIGHT JOIN
-- internally because it builds the hash on the smaller side. The
-- semantics are identical -- left rows are still preserved.
The three common physical plans:
- Nested Loop Left Join — for each left row, probe the right side (usually via an index). Best when the left side is small and the right side has an index on the join key. Postgres preserves unmatched left rows by emitting them with NULLs after the inner-loop probe finds nothing.
- Hash Left Join — build a hash table on the right side, then scan the left side and look up each row. Best for medium-to-large joins without a useful index. Unmatched left rows are emitted with NULLs after the hash lookup misses.
- Merge Left Join — sort both sides on the join key and merge them, emitting unmatched left rows as the merge cursor advances past them. Best when both sides are already sorted (e.g. by index).
The cost of preservation is essentially zero in all three plans — the planner just emits the unmatched left row with NULL right columns instead of skipping it. A LEFT JOIN is not meaningfully slower than the equivalent INNER JOIN. The myth that LEFT JOIN is always slower comes from cases where developers write LEFT JOIN out of habit when INNER JOIN would suffice, missing optimizations the planner could have applied to a stricter join (like join elimination, where unused INNER joins can be removed entirely).
MySQL vs PostgreSQL Differences
The core LEFT JOIN semantics are identical across engines, but there are minor differences worth knowing.
+---------------------------------------------------------------+
| LEFT JOIN: POSTGRES vs MYSQL |
+---------------------------------------------------------------+
| |
| FEATURE | POSTGRES | MYSQL |
| -------------------------+------------+---------- |
| LEFT JOIN keyword | yes | yes |
| LEFT OUTER JOIN | yes | yes |
| USING (col) | yes | yes |
| NATURAL LEFT JOIN | yes | yes (avoid) |
| LEFT JOIN LATERAL | yes | yes (8.0.14+) |
| STRAIGHT_JOIN hint | no | yes |
| Hash anti-join optimize | yes | partial |
| Join elimination | yes | limited |
| |
| Both engines: ON vs WHERE trap is identical, anti-join |
| pattern is identical, NULL preservation rules are identical. |
| |
+---------------------------------------------------------------+
The biggest practical difference: MySQL historically had a weaker query planner for anti-join patterns and would sometimes produce a Nested Loop where Postgres produced a Hash Anti Join. Modern MySQL (8.0+) has closed most of that gap. The semantics are the same — only the execution cost may differ.
Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). A flowchart titled 'Anti-Join Decision Tree' in white monospace. Three branches from a top decision diamond labeled 'Find rows in A with no match in B': branch one (sky blue) leads to 'NOT EXISTS - default for new code', branch two (sky blue) leads to 'LEFT JOIN ... WHERE pk IS NULL - already have the join', branch three (rose) leads to 'NOT IN - DANGER if NULLs possible' with a warning icon. Below: small EXPLAIN snippet showing 'Hash Anti Join' in sky blue."
Common Mistakes
1. Filtering right-side columns in WHERE instead of ON.
This is the bug. LEFT JOIN orders o ON c.id = o.customer_id WHERE o.status = 'shipped' looks like "customers and their shipped orders" but actually means "customers who have at least one shipped order." Bob and Dave (no orders) and Carol (only a pending order) all silently disappear, because NULL = 'shipped' is NULL and WHERE filters out anything that is not TRUE. The fix is to move the filter into the ON clause: ON c.id = o.customer_id AND o.status = 'shipped'. Now the filter is part of the join condition, not a post-join filter, and unmatched left rows survive.
2. Using COUNT(*) after a LEFT JOIN.
COUNT(*) counts every row, including the synthetic rows where LEFT JOIN preserved unmatched left rows with NULL. So a customer with zero orders shows up as having "1 order" in the count. The fix is to count a right-side column that is non-nullable in the source table — typically the right-side primary key: COUNT(o.id). NULLs are not counted, so unmatched left rows correctly show as 0.
3. Confusing IS NULL anti-join with filtering for actual NULL values.
WHERE o.status IS NULL after a LEFT JOIN catches both "this row had no matching order" AND "this row matched an order whose status column happens to be NULL." If you want only the unmatched rows, check the right-side primary key (which is never NULL in a real row), not a nullable column. WHERE o.id IS NULL is the correct anti-join filter; WHERE o.status IS NULL is not.
4. Mixing LEFT JOIN and INNER JOIN in the same chain by accident.
A query like customers LEFT JOIN orders LEFT JOIN shipments correctly preserves customers without orders. Change the second join to INNER and the chain silently drops customers without orders, because the INNER JOIN cannot match a NULL o.id. The first LEFT JOIN's promise is undone by the second join's strictness. Always read a multi-join chain top to bottom and ask: does any downstream join require a column that an upstream LEFT JOIN might have left NULL?
5. Using NOT IN with a nullable subquery column.
WHERE c.id NOT IN (SELECT customer_id FROM orders) looks equivalent to the anti-join LEFT JOIN, but if any orders.customer_id is NULL, the entire NOT IN predicate evaluates to NULL for every row and the query returns zero results. This bug is silent — no error, just an empty result. Use NOT EXISTS or the anti-join LEFT JOIN pattern instead; both handle NULLs correctly.
Interview Questions
1. "Explain the difference between filtering a right-side column in the ON clause versus the WHERE clause of a LEFT JOIN."
The ON clause is evaluated as part of the join itself — it decides which right rows match each left row. If no right row matches under the ON conditions, LEFT JOIN preserves the left row with NULL values in the right-side columns. The WHERE clause is evaluated after the join is complete, against the joined result set. The crucial consequence is that any condition on a right-side column in WHERE will silently delete the preserved-NULL rows that LEFT JOIN was supposed to keep, because NULL = anything is NULL (not TRUE), and WHERE only keeps rows where the condition is TRUE. So LEFT JOIN orders o ON c.id = o.customer_id WHERE o.status = 'shipped' is functionally an INNER JOIN — every customer without a shipped order disappears. The fix is to move the right-side filter into ON: ON c.id = o.customer_id AND o.status = 'shipped'. The filter then participates in the matching step, unmatched left rows stay preserved with NULLs, and the LEFT JOIN actually behaves like a LEFT JOIN. The general rule: filters on the left-side table are equivalent in ON or WHERE; filters on the right-side table belong in ON unless you specifically want anti-join semantics.
2. "How do you find all customers who have never placed an order using a LEFT JOIN, and why does the IS NULL check work?"
The pattern is: LEFT JOIN customers to orders on the foreign key, then in the WHERE clause require orders.id IS NULL. The query is SELECT c.* FROM customers c LEFT JOIN orders o ON c.id = o.customer_id WHERE o.id IS NULL. It works because LEFT JOIN preserves every customer row, and for customers with no matching order, every right-side column comes back as NULL. By checking the right-side primary key for NULL, we filter the result down to exactly the preserved-NULL rows — which are exactly the customers with no orders. The reason we check the primary key specifically is that a real order row can never have a NULL primary key, so a NULL o.id after the join can only come from "no match found." If we instead checked a nullable column like o.status IS NULL, we would also catch matched orders whose status happened to be NULL, which is a different question. This pattern is called the anti-join idiom and is one of the most common uses of LEFT JOIN in production SQL.
3. "What's the difference between COUNT(*) and COUNT(column) after a LEFT JOIN, and which should you use?"
COUNT(*) counts every row in the result set, including the synthetic rows that LEFT JOIN inserts when a left row has no matching right row. So if you write SELECT c.name, COUNT(*) FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.id, c.name, a customer with zero orders shows up as "1" because LEFT JOIN gave them one row with NULL right columns and COUNT(*) counts it. COUNT(column) counts only rows where column IS NOT NULL, so counting a right-side column — typically the right-side primary key — correctly returns 0 for unmatched left rows. The right query is SELECT c.name, COUNT(o.id) FROM customers c LEFT JOIN orders o ON c.id = o.customer_id GROUP BY c.id, c.name. The same pitfall applies to other aggregates: SUM(o.total) returns NULL (not 0) for customers with no orders, so wrap it in COALESCE(SUM(o.total), 0) if you want zero. And for queries that need both clean counts and clean sums, the cleanest pattern is to pre-aggregate the right side in a subquery and then LEFT JOIN that aggregate to the customers table.
4. "Compare LEFT JOIN with IS NULL versus NOT EXISTS versus NOT IN for finding rows with no match. When would you use each?"
All three express anti-join semantics, but they differ in NULL handling and performance. The LEFT JOIN with IS NULL pattern (LEFT JOIN orders o ON c.id = o.customer_id WHERE o.id IS NULL) is correct, handles NULLs safely, and reads naturally if you already have the LEFT JOIN for other reasons. NOT EXISTS (WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id)) is usually the most readable for new code, handles NULLs safely, and the planner often produces the same physical plan as the anti-join LEFT JOIN — Postgres turns both into a Hash Anti Join. NOT IN (WHERE c.id NOT IN (SELECT customer_id FROM orders)) is the dangerous one: if any value in the subquery is NULL, the entire predicate evaluates to NULL for every row and the query returns zero results, silently. This is the SQL three-valued-logic bug that catches everyone exactly once. For new code, prefer NOT EXISTS as the default because it is the most explicit about intent. For existing queries that already have a LEFT JOIN, prefer adding the IS NULL filter because it is a one-line change. Avoid NOT IN entirely unless you are 100% certain the subquery column is non-nullable.
5. "What happens when you chain multiple LEFT JOINs together — for example, customers LEFT JOIN orders LEFT JOIN shipments — and what changes if you switch the second join to INNER?"
In a pure LEFT JOIN chain, every row from the leftmost table is preserved through the entire chain. If a customer has no orders, the first LEFT JOIN preserves the customer row with o.id = NULL. The second LEFT JOIN then tries to match o.id = s.order_id, but NULL = anything is NULL, so no shipment can match — and the second LEFT JOIN preserves the row again with s.id = NULL. The cascade is correct: once a row is preserved, it stays preserved. The result is that every customer appears in the output at least once, even if they have no orders or no shipments. Now if you switch the second join to INNER, the contract changes: INNER JOIN drops any row whose join condition is not TRUE, and NULL = anything is not TRUE, so customers with no order are silently dropped at the second join — undoing the preservation that the first LEFT JOIN promised. This is one of the most common multi-join bugs: developers add an INNER JOIN late in the chain "for performance" and accidentally turn the entire query back into an INNER-equivalent on the leftmost table. The rule of thumb: in a chain that starts with LEFT JOIN, every subsequent join should also be LEFT JOIN unless you have a specific reason to drop preserved rows.
Quick Reference — LEFT JOIN Cheat Sheet
+---------------------------------------------------------------+
| LEFT JOIN CHEAT SHEET |
+---------------------------------------------------------------+
| |
| SYNTAX: |
| SELECT ... FROM left |
| LEFT JOIN right ON left.k = right.k |
| |
| SEMANTICS: |
| Every left row appears in result at least once |
| Unmatched right columns are NULL |
| LEFT JOIN == LEFT OUTER JOIN (same thing) |
| |
| ON vs WHERE: |
| ON -> filters BEFORE preservation (keeps NULLs) |
| WHERE -> filters AFTER preservation (drops NULLs) |
| Right-side filters belong in ON |
| |
| ANTI-JOIN (find missing rows): |
| LEFT JOIN right ON ... |
| WHERE right.primary_key IS NULL |
| Always check the PK, never a nullable column |
| |
| COUNT TRAP: |
| COUNT(*) -> counts NULL rows as 1 (WRONG) |
| COUNT(right.id) -> counts only matched rows (RIGHT) |
| SUM, AVG -> wrap in COALESCE(..., 0) if you want zero |
| |
| CHAINS: |
| LEFT -> LEFT -> LEFT preserves through entire chain |
| LEFT -> INNER drops the preserved rows -- usually a bug |
| |
| ALTERNATIVES TO LEFT JOIN + IS NULL: |
| NOT EXISTS -> safest, most readable for new code |
| NOT IN -> DANGER if subquery column nullable |
| EXCEPT -> works for single-key cases |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| KEY RULES |
+---------------------------------------------------------------+
| |
| 1. LEFT JOIN preserves every left row, fills NULL on right |
| 2. Right-side filters belong in ON, not WHERE |
| 3. WHERE on right columns silently turns LEFT into INNER |
| 4. Anti-join: LEFT JOIN ... WHERE right.PK IS NULL |
| 5. Always check the PK for IS NULL, never a nullable column |
| 6. COUNT(*) lies after LEFT JOIN -- use COUNT(right.id) |
| 7. Wrap SUM/AVG in COALESCE if you want zero, not NULL |
| 8. Pre-aggregate the right side when summarizing per left row |
| 9. Don't mix LEFT and INNER in the same chain by accident |
| 10. Prefer NOT EXISTS for new code, anti-join for existing |
| |
+---------------------------------------------------------------+
| Concern | Wrong Way | Right Way |
|---|---|---|
| Filtering right side | WHERE o.status = 'shipped' | ON c.id = o.customer_id AND o.status = 'shipped' |
| Counting per left row | COUNT(*) | COUNT(o.id) |
| Summing per left row | SUM(o.total) | COALESCE(SUM(o.total), 0) |
| Find missing rows | WHERE o.status IS NULL | WHERE o.id IS NULL (the PK) |
| Find missing rows (alt) | c.id NOT IN (SELECT customer_id ...) | NOT EXISTS (SELECT 1 ...) |
| Multi-join chain | LEFT JOIN ... INNER JOIN ... | LEFT JOIN ... LEFT JOIN ... |
| LEFT JOIN keyword | LEFT OUTER JOIN (verbose) | LEFT JOIN (idiomatic) |
| Aggregate per left row | Aggregate after join | Pre-aggregate in subquery |
Prev: Lesson 5.2 -- INNER JOIN Next: Lesson 5.4 -- RIGHT and FULL OUTER JOIN
This is Lesson 5.3 of the Database Interview Prep Course -- 12 chapters, 58 lessons.