Common Table Expressions
WITH Clauses, Recursive CTEs, and Materialization
LinkedIn Hook
"Your 200-line query has six nested subqueries, three of them identical. Nobody on the team will touch it. Not even the person who wrote it."
Most production SQL gets ugly the same way. A simple SELECT grows a subquery, then another subquery references the first one and has to repeat the whole thing, then a third level wraps the second one to add a filter. By the time the query ships, it is a parenthesis pyramid that you read inside-out, and the only way to debug it is to print the intermediate result with
LIMIT 5at three different depths.Common Table Expressions — the
WITHclause — are the single biggest readability win in SQL. They let you name an intermediate result, reference it by that name later in the query, and read the whole thing top-to-bottom like a script. A query that was an inside-out subquery hairball becomes a sequence of named steps, each one a paragraph long, each one independently testable.CTEs also unlock something subqueries cannot do at all: recursion. A recursive CTE walks a tree or graph by repeatedly joining a result back to itself, and it is the standard way to query employee hierarchies, category trees, bill-of-materials explosions, and graph paths in pure SQL — no application-side loops, no procedural code.
The catch: pre-Postgres 12, every CTE was an "optimization fence" — the planner materialized the result and could not push predicates into it. Many engineers learned to avoid CTEs for performance reasons. Postgres 12 changed the rules: CTEs are now inlined by default, and you only get materialization if you ask for it with
MATERIALIZED.In Lesson 6.4, I break down CTEs end to end: WITH-clause syntax, when to prefer a CTE over a subquery, recursive CTEs for hierarchies, and the materialization rules in modern Postgres.
Read the full lesson -> [link]
#SQL #Database #PostgreSQL #MySQL #BackendDevelopment #InterviewPrep #CTE #RecursiveSQL
What You'll Learn
- The
WITHclause syntax and how it introduces named intermediate result sets - Why CTEs read top-to-bottom like a script while nested subqueries read inside-out
- When a CTE is genuinely better than a subquery (and when it is just style)
- Multiple CTEs in a single query and how each can reference earlier CTEs
- Recursive CTEs with the
WITH RECURSIVEkeyword for trees, graphs, and number series - Anchor member, recursive member, and the
UNION ALLjoin in a recursive CTE - The Postgres 12+ materialization rules — inlined by default,
MATERIALIZEDandNOT MATERIALIZEDhints - How CTEs differ from subqueries, derived tables, and views in performance and scope
- MySQL 8+ CTE support and dialect differences
The Recipe Card Analogy — Name Your Steps
Imagine a chef writing down a complex dessert. Option one: cram everything into a single run-on sentence. "Take the result of whisking the eggs that you separated from the yolks that came from the dozen you bought from the farm that opens at six..." By the time you reach the verb, you have lost the subject. You read the sentence three times, you sketch a diagram on a napkin, and you still are not sure what step comes first.
Option two: write the recipe as a list of named sub-recipes. "Step 1: meringue. Whisk the egg whites with sugar until stiff peaks form. Step 2: custard. Combine the yolks with cream and warm to 82 C. Step 3: assembly. Fold the meringue into the custard." Now anybody can read it. Each step has a name, each step is a paragraph, each step references the previous step's output by that name. If you want to know what "meringue" means, you scroll up to step 1. If you want to test step 2 in isolation, you can — you do not have to execute step 3 to see what custard looks like.
That is exactly what a Common Table Expression does for SQL. A nested subquery is the run-on sentence: the engine reads it inside-out, the human reads it inside-out, and the structure is buried in parentheses. A CTE is the named recipe step: you write WITH meringue AS (...), custard AS (...) SELECT ... FROM custard JOIN meringue, and the query reads top-to-bottom. Each named result is testable on its own — you can swap the final SELECT for SELECT * FROM meringue and inspect just that step.
+---------------------------------------------------------------+
| NESTED SUBQUERY vs CTE |
+---------------------------------------------------------------+
| |
| NESTED SUBQUERY (read inside-out): |
| SELECT ... |
| FROM ( |
| SELECT ... |
| FROM ( |
| SELECT ... |
| FROM orders |
| WHERE ... <- start reading here |
| ) AS step1 |
| WHERE ... |
| ) AS step2 |
| WHERE ... <- finish reading here |
| |
| CTE (read top-to-bottom): |
| WITH |
| step1 AS (SELECT ... FROM orders WHERE ...), |
| step2 AS (SELECT ... FROM step1 WHERE ...) |
| SELECT ... FROM step2 WHERE ... |
| |
| ^^^^ start at the top, follow the arrow -> |
| |
+---------------------------------------------------------------+
The win is not just aesthetic. A CTE-shaped query is easier to review in a pull request, easier to debug (swap the final SELECT for SELECT * FROM step1), easier to extend (add step3 without re-indenting half the file), and easier for a new team member to understand without spending fifteen minutes mentally unfolding parentheses.
WITH Clause Syntax — Naming Intermediate Results
A CTE is introduced with the WITH keyword, followed by one or more named query definitions, followed by a final SELECT, INSERT, UPDATE, or DELETE that uses the named results. The named results behave like temporary tables that live only for the duration of the surrounding statement.
-- Sample tables used throughout this lesson
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INT,
total NUMERIC,
created_at TIMESTAMPTZ
);
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name TEXT,
tier TEXT
);
INSERT INTO customers (name, tier) VALUES
('Alice', 'gold'), ('Bob', 'silver'), ('Carol', 'gold'),
('Dave', 'bronze'), ('Eve', 'gold');
INSERT INTO orders (customer_id, total, created_at) VALUES
(1, 120.00, '2026-04-01'), (1, 80.00, '2026-04-05'),
(2, 55.50, '2026-04-02'), (3, 300.00, '2026-04-03'),
(3, 150.00, '2026-04-08'), (4, 42.00, '2026-04-04'),
(5, 999.00, '2026-04-06'), (5, 220.00, '2026-04-09');
A Single Named CTE
-- Simplest CTE: one named step, then a final SELECT.
-- The CTE 'customer_revenue' is a named, reusable result set
-- that exists only for the duration of this statement.
WITH customer_revenue AS (
SELECT customer_id, SUM(total) AS revenue
FROM orders
GROUP BY customer_id
)
SELECT c.name, c.tier, cr.revenue
FROM customer_revenue cr
JOIN customers c ON c.id = cr.customer_id
ORDER BY cr.revenue DESC;
-- name | tier | revenue
-- ------+--------+---------
-- Eve | gold | 1219.00
-- Carol | gold | 450.00
-- Alice | gold | 200.00
-- Bob | silver | 55.50
-- Dave | bronze | 42.00
The exact same query written as a derived table (subquery in the FROM clause) works, but it is harder to scan:
SELECT c.name, c.tier, cr.revenue
FROM (
SELECT customer_id, SUM(total) AS revenue
FROM orders
GROUP BY customer_id
) cr
JOIN customers c ON c.id = cr.customer_id
ORDER BY cr.revenue DESC;
For a single intermediate step, the difference is small. For three, four, or five steps, the CTE form is dramatically cleaner.
Multiple CTEs Chained Together
-- Multiple CTEs separated by commas. Each CTE can reference any
-- CTE defined before it (top-to-bottom), so you can build a pipeline
-- of named transformations.
WITH
-- Step 1: per-customer revenue
customer_revenue AS (
SELECT customer_id, SUM(total) AS revenue, COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
),
-- Step 2: classify customers based on revenue
-- Note this CTE references 'customer_revenue' from step 1.
classified AS (
SELECT
customer_id,
revenue,
order_count,
CASE
WHEN revenue >= 1000 THEN 'whale'
WHEN revenue >= 200 THEN 'regular'
ELSE 'small'
END AS segment
FROM customer_revenue
),
-- Step 3: count customers per segment
segment_summary AS (
SELECT segment, COUNT(*) AS customer_count, SUM(revenue) AS total_rev
FROM classified
GROUP BY segment
)
-- Final SELECT can use any of the CTEs above
SELECT segment, customer_count, total_rev
FROM segment_summary
ORDER BY total_rev DESC;
-- segment | customer_count | total_rev
-- --------+----------------+-----------
-- whale | 1 | 1219.00
-- regular | 2 | 650.00
-- small | 2 | 97.50
Three CTEs, three named steps, read top-to-bottom. Compare this to the equivalent query written as nested subqueries: it would be a five-level pyramid of parentheses that you read from the inside out.
Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). Vertical pipeline diagram showing three rounded boxes stacked top to bottom, each labeled with a CTE name in white monospace: 'customer_revenue', 'classified', 'segment_summary'. Sky blue (#4fc3f7) arrows point downward connecting each box, labeled 'references'. At the bottom, a fourth box labeled 'final SELECT' in rose (#ff5c8a). To the right of each CTE, show a small example row of its output in monospace. Title at top: 'CTE Pipeline' in white monospace."
Recursive CTEs — Walking Trees and Graphs
Recursive CTEs are the killer feature. A regular subquery cannot reference itself; a CTE marked RECURSIVE can. This unlocks tree traversal, graph walking, transitive closure, number generation, and any other pattern where you build up a result by repeatedly joining a previous step back to itself.
A recursive CTE has three parts joined by UNION ALL:
- Anchor member — the starting set of rows (the "seed"). Runs once.
- Recursive member — a query that references the CTE name and produces the next round of rows. Runs repeatedly until it produces zero new rows.
- Termination — happens automatically when the recursive member returns no new rows.
+---------------------------------------------------------------+
| RECURSIVE CTE EXECUTION |
+---------------------------------------------------------------+
| |
| WITH RECURSIVE chain AS ( |
| SELECT ... FROM seed_rows <- anchor (round 0) |
| UNION ALL |
| SELECT ... FROM table JOIN chain <- recursive (rounds 1+) |
| ) |
| |
| Round 0: anchor produces N0 rows |
| | |
| v |
| Round 1: recursive joins those N0 -> N1 new rows |
| | |
| v |
| Round 2: recursive joins those N1 -> N2 new rows |
| | |
| v |
| ... |
| | |
| v |
| Round K: recursive returns 0 rows -> STOP |
| |
| Final result = N0 + N1 + N2 + ... + N(K-1) |
| |
+---------------------------------------------------------------+
Example: Employee Hierarchy
-- An employee table with a self-referencing manager_id.
-- This is the canonical "tree stored as a parent pointer" shape.
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT,
manager_id INT REFERENCES employees(id)
);
INSERT INTO employees (id, name, manager_id) VALUES
(1, 'Alice (CEO)', NULL),
(2, 'Bob (VP Eng)', 1),
(3, 'Carol (VP Sales)', 1),
(4, 'Dave (Eng Mgr)', 2),
(5, 'Eve (Engineer)', 4),
(6, 'Frank (Engineer)', 4),
(7, 'Grace (Sales Rep)', 3);
-- Walk the tree from Alice downward, accumulating depth and a path.
WITH RECURSIVE org_chart AS (
-- Anchor member: the root of the tree (CEO has no manager)
SELECT
id,
name,
manager_id,
1 AS depth,
name AS path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive member: join employees back to the CTE itself
-- to find the children of the rows we found in the previous round.
SELECT
e.id,
e.name,
e.manager_id,
oc.depth + 1,
oc.path || ' -> ' || e.name
FROM employees e
JOIN org_chart oc ON e.manager_id = oc.id
)
SELECT depth, name, path
FROM org_chart
ORDER BY path;
-- depth | name | path
-- ------+-------------------+----------------------------------------------
-- 1 | Alice (CEO) | Alice (CEO)
-- 2 | Bob (VP Eng) | Alice (CEO) -> Bob (VP Eng)
-- 3 | Dave (Eng Mgr) | Alice (CEO) -> Bob (VP Eng) -> Dave (Eng Mgr)
-- 4 | Eve (Engineer) | Alice (CEO) -> Bob (VP Eng) -> Dave -> Eve
-- 4 | Frank (Engineer) | Alice (CEO) -> Bob (VP Eng) -> Dave -> Frank
-- 2 | Carol (VP Sales) | Alice (CEO) -> Carol (VP Sales)
-- 3 | Grace (Sales Rep) | Alice (CEO) -> Carol (VP Sales) -> Grace
Read it as a story. Round 0 (anchor): find every employee whose manager_id is NULL — that is the root, Alice. Round 1 (recursive): for each row found in round 0 (Alice), find every employee whose manager is one of those rows — that yields Bob and Carol. Round 2: for each row found in round 1, find their direct reports — Dave (under Bob) and Grace (under Carol). Round 3: Eve and Frank under Dave. Round 4: Dave's children have no children, the recursive step returns zero rows, recursion stops.
The depth column is incremented by one each round, and the path column accumulates the chain of names — a textbook example of "carry state forward through the recursion."
Safety: Always Plan a Termination
A recursive CTE that never produces zero new rows will run forever (or until your database kills it). The most common cause is a cycle in the data — A reports to B, B reports to A — which makes the recursion bounce between them. The safest defenses:
-- 1. Limit by depth
WITH RECURSIVE walk AS (
SELECT id, parent_id, 1 AS depth FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, w.depth + 1
FROM nodes n
JOIN walk w ON n.parent_id = w.id
WHERE w.depth < 50 -- hard depth cap
)
SELECT * FROM walk;
-- 2. Detect cycles by carrying the visited path
WITH RECURSIVE walk AS (
SELECT id, parent_id, ARRAY[id] AS visited FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id, w.visited || n.id
FROM nodes n
JOIN walk w ON n.parent_id = w.id
WHERE NOT (n.id = ANY(w.visited)) -- skip already-visited nodes
)
SELECT * FROM walk;
-- 3. Postgres 14+ has built-in CYCLE detection syntax
WITH RECURSIVE walk AS (
SELECT id, parent_id FROM nodes WHERE parent_id IS NULL
UNION ALL
SELECT n.id, n.parent_id
FROM nodes n JOIN walk w ON n.parent_id = w.id
) CYCLE id SET is_cycle USING path
SELECT * FROM walk WHERE NOT is_cycle;
Generating a Series of Numbers
Recursive CTEs are also a portable way to build a number series — useful in dialects that lack generate_series:
-- Generate integers from 1 to 10 (Postgres has generate_series, but
-- this same pattern works in MySQL 8+, SQL Server, and others).
WITH RECURSIVE nums(n) AS (
SELECT 1 -- anchor
UNION ALL
SELECT n + 1 FROM nums WHERE n < 10 -- recursive, bounded
)
SELECT n FROM nums;
-- 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). Tree diagram of an organization chart with seven nodes labeled Alice (CEO) at top, branching to Bob and Carol, then Bob to Dave, Dave to Eve and Frank, Carol to Grace. Use sky blue (#4fc3f7) for nodes and edges. Each node has a small depth label (1, 2, 3, 4) in rose (#ff5c8a). To the right, a vertical execution timeline showing 'Round 0', 'Round 1', 'Round 2', 'Round 3' with arrows pointing to the rows discovered in each round. Title at top: 'Recursive CTE Walk' in white monospace."
CTE Materialization — The Big Postgres 12 Change
Before Postgres 12, every CTE was an optimization fence: the planner would always execute the CTE as an independent step, materialize its result into a temporary buffer, and then read from that buffer in the outer query. Predicates from the outer query could not be pushed down into the CTE, so a query like WITH big AS (SELECT * FROM huge_table) SELECT * FROM big WHERE id = 5 would scan the entire huge_table even though only one row was ultimately needed.
This was a real performance trap. Many engineers learned to rewrite CTEs as subqueries or derived tables specifically to avoid the fence, which made the code harder to read for purely planner reasons.
Postgres 12 changed the default. A CTE that is referenced exactly once and is not recursive and has no side effects is now inlined — the planner treats it like a regular subquery and can push predicates down, choose join orders freely, and avoid materializing. CTEs that are referenced more than once, are recursive, or contain side effects (INSERT, UPDATE, DELETE) still materialize because re-executing them would be wrong or wasteful.
You can override the default with explicit hints:
-- MATERIALIZED: force the CTE to be materialized (the old fence behavior).
-- Useful when you reference the CTE multiple times and want to compute
-- it exactly once, or when you specifically want to prevent the planner
-- from inlining a complex expression.
WITH expensive AS MATERIALIZED (
SELECT customer_id, complex_calculation(orders.*)
FROM orders
)
SELECT * FROM expensive WHERE customer_id = 42;
-- Result: the CTE is evaluated in full, then filtered.
-- The customer_id = 42 predicate is NOT pushed into the CTE.
-- NOT MATERIALIZED: force inlining even when the CTE is referenced
-- multiple times. Use this when the underlying query is cheap and
-- you want maximum planner freedom.
WITH cheap AS NOT MATERIALIZED (
SELECT id, name FROM customers WHERE tier = 'gold'
)
SELECT * FROM cheap c1 JOIN cheap c2 ON c1.id < c2.id;
-- Result: the planner inlines the CTE into both references and
-- can re-execute it twice if that produces a better plan.
-- Default (no keyword): inlined when single-use, materialized otherwise.
WITH gold AS (SELECT id FROM customers WHERE tier = 'gold')
SELECT * FROM orders WHERE customer_id IN (SELECT id FROM gold);
-- Single reference, no recursion, no side effects -> INLINED.
-- The planner sees this as equivalent to a subquery and can use
-- a hash join, semi-join, or whatever produces the best plan.
+---------------------------------------------------------------+
| POSTGRES 12+ CTE MATERIALIZATION RULES |
+---------------------------------------------------------------+
| |
| DEFAULT BEHAVIOR (no keyword): |
| Single reference + not recursive + no side effects |
| -> INLINED (predicate pushdown enabled) |
| Multiple references OR recursive OR side effects |
| -> MATERIALIZED |
| |
| EXPLICIT HINTS: |
| WITH x AS MATERIALIZED (...) |
| -> always materialize (old fence behavior) |
| WITH x AS NOT MATERIALIZED (...) |
| -> always inline (cannot be used with recursion) |
| |
| WHEN TO FORCE MATERIALIZED: |
| - Expensive CTE referenced multiple times |
| - CTE contains a volatile function (random(), now()) |
| - You want a planner barrier for debugging |
| |
| WHEN TO FORCE NOT MATERIALIZED: |
| - Cheap CTE used multiple times where re-exec is fine |
| - You want predicate pushdown for performance |
| |
+---------------------------------------------------------------+
Pre-Postgres 12 servers (still common in long-running production deployments) do not understand the MATERIALIZED and NOT MATERIALIZED keywords and always materialize every CTE. If you are writing portable SQL, do not rely on inlining — assume materialization unless you have explicitly tested on a known version.
MySQL 8+ supports CTEs and recursive CTEs with the same WITH and WITH RECURSIVE syntax. MySQL does not have the MATERIALIZED / NOT MATERIALIZED keywords; its optimizer makes its own inlining decisions. Older MySQL versions (5.7 and earlier) do not support CTEs at all — you must rewrite as derived tables.
CTE vs Subquery vs View vs Temp Table
CTEs sit in a family of "name an intermediate result" features, and choosing the right one matters.
| Feature | Lifetime | Reusable across queries? | Indexable? | Best for |
|---|---|---|---|---|
| Subquery (derived table) | Single statement | No | No | Single-use, simple |
CTE (WITH) | Single statement | No | No | Multi-step, readable, recursive |
| View | Persistent | Yes (same query) | No (use materialized view for that) | Reused query logic, abstraction |
| Materialized view | Persistent (refreshed) | Yes | Yes | Expensive query reused often |
| Temp table | Session | Yes (in same session) | Yes | Multi-statement workflow, intermediate sorts |
A CTE is the right tool when: the intermediate result is used only inside one statement, the query has multiple logical steps, you need recursion, or readability of a complex query matters more than micro-optimizing the plan. A view is right when the same query logic should be reused across many statements. A materialized view is right when the query is expensive and the data only needs to be fresh on a schedule (hourly, nightly). A temp table is right when you need to materialize a result, index it, and use it across multiple subsequent statements in the same session.
Common Mistakes
1. Assuming pre-Postgres-12 fence behavior on a modern server. Engineers who learned SQL on Postgres 9 or 10 often "know" that CTEs are slow because they are optimization fences, and rewrite every CTE as a derived table. On Postgres 12 and newer this is wrong: a single-reference, non-recursive CTE is inlined and performs identically to a subquery. Check your server version and trust the modern default.
2. Recursive CTE with no termination condition.
Forgetting the bounding predicate (WHERE n < 10 in the number series, WHERE depth < 50 in a tree walk, or a cycle check) produces an infinite loop. Postgres has a statement_timeout to save you eventually, but the right fix is to design termination into the recursive member from the start.
3. Treating CTEs as temp tables for reuse across statements.
A CTE only exists for the duration of the single statement that defines it. You cannot define a CTE in one query and reference it in the next. If you need that, use a temporary table (CREATE TEMP TABLE) or a session-scoped variable.
4. Using CTEs purely to "speed things up." A CTE is a readability and recursion tool, not a performance tool. On Postgres 12+, an inlined CTE has the same plan as the equivalent subquery; on older Postgres, CTEs are usually slower because of the fence. Pick the form that reads best for the human and let the planner handle performance.
5. Naming CTEs after the query, not the result.
Calling a CTE step1 or tmp defeats half the point of using a CTE. The whole reason to introduce a name is to make the query self-documenting. Call it customer_revenue, gold_customers, org_chart — names that describe what the rows represent. A reviewer should be able to understand the query without reading the CTE body.
Interview Questions
1. "What is a Common Table Expression and when would you use one instead of a subquery?"
A Common Table Expression is a named temporary result set introduced by the WITH keyword that exists only for the duration of the single statement that defines it. The form is WITH name AS (SELECT ...) SELECT ... FROM name, and you can chain multiple CTEs in one WITH clause separated by commas, with each CTE allowed to reference any CTE defined before it. The main reasons to prefer a CTE over a subquery are readability and recursion. A query with three or four logical steps reads top-to-bottom as a sequence of named transformations instead of as an inside-out pyramid of parentheses, which makes pull request reviews, debugging, and onboarding dramatically easier. A CTE is also the only way to express recursion in SQL — a regular subquery cannot reference itself, but a WITH RECURSIVE CTE can, which is how you walk hierarchies, transitive closures, and graph paths in pure SQL. On Postgres 12+ a single-reference CTE is inlined by the planner, so you do not pay a performance cost for the readability win.
2. "Walk me through how a recursive CTE is executed."
A recursive CTE has two halves separated by UNION ALL. The first half is the anchor member, which is a non-recursive query that produces the seed rows for the recursion — for example, finding the root of a tree where parent_id IS NULL. The anchor runs exactly once. The second half is the recursive member, which is a query that references the CTE name itself and produces a new set of rows by joining the previously discovered rows back to the underlying table. The execution model is iterative: round 0 runs the anchor and stores its rows, round 1 runs the recursive member against the round-0 rows and stores any new rows it produces, round 2 runs the recursive member against the round-1 rows, and so on. Recursion stops automatically when the recursive member produces zero new rows in a given round. The final result of the CTE is the union of all rows produced across all rounds. The two failure modes to watch for are infinite recursion (caused by cycles in the data or a missing termination condition) and explosive growth (caused by joining without a deduplicating key, producing combinatorially many paths). The defenses are a depth cap in the recursive member's WHERE clause, a visited-set array to detect cycles, or Postgres 14+ CYCLE syntax.
3. "What changed about CTE materialization in PostgreSQL 12, and why does it matter?"
Before Postgres 12, every CTE was an optimization fence. The planner always materialized the CTE result into a temporary buffer before reading from it, and predicates from the outer query could not be pushed into the CTE definition. That meant WITH big AS (SELECT * FROM huge) SELECT * FROM big WHERE id = 5 would scan the entire huge table to materialize big, then filter to one row — even though a regular subquery with the same WHERE clause would scan only the matching index entry. This caused real performance problems and led many engineers to avoid CTEs entirely for non-trivial queries. Postgres 12 changed the default: a CTE that is referenced exactly once, is not recursive, and has no side effects is now inlined by the planner and treated like an ordinary subquery, so predicate pushdown, join reordering, and other optimizations all work normally. CTEs that are referenced multiple times or are recursive or contain side effects still materialize. You can override the default explicitly with WITH x AS MATERIALIZED (...) to force the old fence behavior or WITH x AS NOT MATERIALIZED (...) to force inlining. The practical impact is that on Postgres 12+, you can use CTEs freely for readability without sacrificing query performance — which removes the main historical objection to using them.
4. "How is a CTE different from a view, a materialized view, and a temporary table?"
A CTE exists only for the duration of one SQL statement — once the statement finishes, the named result is gone. A view is a stored query definition that lives in the database catalog and can be referenced from any number of subsequent queries; it does not store data, it stores the query, and every reference re-executes the underlying SQL. A materialized view stores both the query definition and a snapshot of the result rows on disk, can be indexed, and is refreshed on demand or on a schedule — it is the right choice when an expensive query is read many times between writes. A temporary table is a real table that lives in a special schema for the duration of a session (or transaction, depending on ON COMMIT settings); you can index it, you can reference it from many statements in the same session, and unlike a CTE it survives across multiple SELECTs. Pick a CTE for in-statement readability and recursion, a view for reusable query logic without storage, a materialized view for expensive reads with stale-but-fast semantics, and a temp table for multi-statement workflows where you need indexes on intermediate results.
5. "What is the difference between WITH and WITH RECURSIVE? Does MySQL support both?"
WITH introduces one or more non-recursive Common Table Expressions, where each CTE can reference earlier CTEs but no CTE can reference itself. WITH RECURSIVE permits the recursive form, where a CTE's definition uses UNION ALL (or UNION) to combine an anchor query with a recursive query that references the CTE name itself. The RECURSIVE keyword applies to the whole WITH clause, not to individual CTEs — once you write WITH RECURSIVE, any CTE in that clause may be recursive, though they do not have to be. MySQL 8.0 and later support both forms with the same syntax as PostgreSQL — WITH name AS (...) and WITH RECURSIVE name AS (...) both work, and recursive CTEs in MySQL follow the same anchor + UNION ALL + recursive member structure. MySQL 5.7 and earlier do not support CTEs at all, so code targeting older MySQL must rewrite CTEs as derived tables and cannot express recursive queries in pure SQL. Other notable differences: MySQL does not support the MATERIALIZED / NOT MATERIALIZED hints, and its optimizer makes its own inlining decisions; Postgres 14+ adds the CYCLE and SEARCH clauses for declarative cycle detection and ordered traversal in recursive CTEs.
Quick Reference — CTE Cheat Sheet
+---------------------------------------------------------------+
| CTE CHEAT SHEET |
+---------------------------------------------------------------+
| |
| BASIC SYNTAX: |
| WITH name AS (SELECT ...) |
| SELECT ... FROM name; |
| |
| MULTIPLE CTEs: |
| WITH |
| a AS (SELECT ...), |
| b AS (SELECT ... FROM a), |
| c AS (SELECT ... FROM b) |
| SELECT ... FROM c; |
| |
| RECURSIVE: |
| WITH RECURSIVE walk AS ( |
| SELECT ... FROM seed -- anchor |
| UNION ALL |
| SELECT ... FROM t JOIN walk -- recursive |
| ) |
| SELECT * FROM walk; |
| |
| MATERIALIZATION (Postgres 12+): |
| WITH x AS (...) -> inlined if single-use |
| WITH x AS MATERIALIZED (...) -> always materialize |
| WITH x AS NOT MATERIALIZED (...) -> always inline |
| |
| CYCLE DETECTION (Postgres 14+): |
| WITH RECURSIVE walk AS (...) |
| CYCLE id SET is_cycle USING path |
| |
| WRITES IN A CTE (data-modifying CTE): |
| WITH moved AS ( |
| DELETE FROM src WHERE ... RETURNING * |
| ) |
| INSERT INTO dst SELECT * FROM moved; |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| KEY RULES |
+---------------------------------------------------------------+
| |
| 1. CTEs name intermediate results so queries read top-down |
| 2. Each CTE can reference any CTE defined earlier |
| 3. WITH RECURSIVE = anchor + UNION ALL + recursive member |
| 4. Recursive CTEs need a termination condition (depth, cycle) |
| 5. Postgres 12+ inlines single-use CTEs by default |
| 6. Use MATERIALIZED to force a planner fence |
| 7. CTEs live for ONE statement only -- not across queries |
| 8. Name CTEs after the result, not after a step number |
| 9. MySQL 8+ supports CTEs; 5.7 and earlier do not |
| 10. Prefer a CTE over a subquery when readability matters |
| |
+---------------------------------------------------------------+
| Concern | Wrong Way | Right Way |
|---|---|---|
| Multi-step query | Nested subqueries inside-out | Chain of named CTEs top-to-bottom |
| Walking a hierarchy | App-side loop with N+1 queries | WITH RECURSIVE single statement |
| Reused intermediate | Repeat the subquery | One CTE referenced twice (or temp table) |
| Cross-statement reuse | CTE in each query | View or temp table |
| Performance fear (PG12+) | Avoid CTEs entirely | Trust inlining, profile if in doubt |
| Recursion safety | Hope the data has no cycles | Depth cap or visited set or CYCLE clause |
| CTE naming | tmp, step1, x | customer_revenue, org_chart |
| Pre-PG12 fence | Assume always | Check version, use MATERIALIZED if needed |
Prev: Lesson 6.3 -- UNION, INTERSECT, EXCEPT Next: Lesson 6.5 -- Window Functions
This is Lesson 6.4 of the Database Interview Prep Course -- 12 chapters, 58 lessons.