LinkedIn Hook

"Your query went from 4.2 seconds to 1.8 milliseconds after adding a single index. Same data, same query, same hardware. Why?"

The answer is almost always the same three letters: B-Tree. It is the index type your database reaches for when you type CREATE INDEX without specifying a kind. It is what backs every primary key in PostgreSQL and every clustered index in MySQL InnoDB. It is the data structure that makes "find a row in a billion-row table" feel instantaneous — and yet most developers could not draw one on a whiteboard if their interview depended on it.

A B-Tree is not a binary tree. It is a balanced, fan-out tree where every node holds dozens or hundreds of keys, every leaf is exactly the same distance from the root, and lookups touch at most three or four disk pages even on tables with hundreds of millions of rows. That balance is the magic. It is what guarantees O(log n) search regardless of insert order, what makes range scans cheap, and what lets a multi-column index serve queries on its leftmost prefix.

Most engineers know that indexes "make queries faster." Senior engineers know why: because the B-Tree turns a linear scan of a heap into a logarithmic walk of a tree, because each internal node fits in a single 8KB page, and because PostgreSQL's planner can choose between an index scan, an index-only scan, and a bitmap heap scan based on the shape of that tree.

In Lesson 9.3, I break down the B-Tree index from the ground up: structure, balance, search complexity, range scans, multi-column behavior, and the leftmost prefix rule that decides whether your composite index actually gets used.

Read the full lesson -> [link]

#PostgreSQL #Database #Indexing #BTree #SQL #BackendDevelopment #InterviewPrep

B-Tree Index thumbnail

What You'll Learn

The structure of a B-Tree: root, internal nodes, leaves, and why it is not a binary tree
Why a B-Tree must stay balanced and how Postgres maintains that balance on every insert
The O(log n) search complexity and why it really means "three or four disk reads"
Range scans and the leaf-level linked list that makes BETWEEN queries cheap
Why B-Tree is the default index in PostgreSQL, MySQL InnoDB, SQL Server, and Oracle
Multi-column B-Trees and how composite keys are sorted lexicographically
The leftmost prefix rule that decides which queries can use your composite index
When a B-Tree is the wrong choice (and which lesson covers the alternatives)

The Phone Book Analogy — Why a Tree Beats a List

Imagine looking up "Robinson, James" in a printed phone book. You do not start at page 1 and scan every name. You flip to roughly the middle, see "Mason," realize "Robinson" comes later, and flip forward. You repeat — each flip cuts the remaining pages in half. Within six or seven flips, you are looking at the right page. A million names, found in seconds, with no index card system at all.

Now imagine the same phone book with no alphabetical order. Names appear in the order people moved into town. Finding "Robinson, James" means reading every single entry until you stumble on it. A million names becomes a million reads. The data is identical. The organization is what makes the difference.

A B-Tree index is the phone book version of your database table. Without an index, finding WHERE user_id = 847291 means scanning every row in the heap — a sequential scan, the database equivalent of reading the whole phone book. With a B-Tree on user_id, the database starts at the root, picks the right child based on the key ranges it sees, descends one level, picks again, and lands on the leaf containing row 847291 in just a handful of page reads.

+---------------------------------------------------------------+
|           SEQUENTIAL SCAN (No Index)                          |
+---------------------------------------------------------------+
|                                                                |
|  SELECT * FROM users WHERE user_id = 847291;                  |
|                                                                |
|  Row 1     -> check id, no match                              |
|  Row 2     -> check id, no match                              |
|  Row 3     -> check id, no match                              |
|  ...                                                           |
|  Row 847291 -> match! return row                              |
|  Row 847292 -> keep scanning (planner may stop early)         |
|  ...                                                           |
|  Row 10M   -> done                                            |
|                                                                |
|  Cost: O(n) — touches every page in the heap                  |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           B-TREE INDEX SCAN                                   |
+---------------------------------------------------------------+
|                                                                |
|  SELECT * FROM users WHERE user_id = 847291;                  |
|                                                                |
|  Root page  -> "847291 is in the third child"                 |
|  Internal   -> "847291 is in the seventh child"               |
|  Leaf page  -> "847291 -> heap tuple at (page 4821, slot 12)" |
|  Heap fetch -> read the actual row                            |
|                                                                |
|  Cost: O(log n) — 3 to 4 page reads for a billion rows        |
|                                                                |
+---------------------------------------------------------------+

That is the entire promise of a B-Tree: logarithmic search on a structure that fits naturally onto disk pages, stays balanced under any insert pattern, and supports range scans as a free side effect of how the leaves are linked.

What a B-Tree Actually Looks Like

A B-Tree is a tree where every node holds many keys, not just one or two. PostgreSQL uses an 8KB page as the unit of a node, and a single page can hold hundreds of keys depending on their size. This high fan-out is what keeps the tree shallow — three levels of fan-out 200 already addresses 8 million leaf entries; four levels addresses 1.6 billion.

+---------------------------------------------------------------------+
|                  B-TREE STRUCTURE (simplified)                      |
+---------------------------------------------------------------------+
|                                                                      |
|                          [ 50 | 100 ]                <- root        |
|                         /     |     \                                |
|                        /      |      \                               |
|              [10|25|40]   [60|75|90]   [120|150|180]  <- internal   |
|              /  |  |  \    /  |  |  \    /  |  |  \                  |
|             /   |  |   \  /   |  |   \  /   |  |   \                 |
|         [..][..][..][..] [..][..][..][..] [..][..][..][..] <- leaves|
|             ^                                                        |
|             |                                                        |
|        leaf nodes hold (key -> heap tuple pointer)                   |
|        and are linked left-to-right for range scans:                 |
|                                                                      |
|        [1..9] -> [10..24] -> [25..39] -> [40..49] -> [60..74] ...    |
|                                                                      |
+---------------------------------------------------------------------+

Three things are worth burning into your memory:

Every leaf is at the same depth. This is the "balanced" in B-Tree. A search for the smallest key takes exactly the same number of page reads as a search for the largest key. There is no degenerate worst case.
Internal nodes only store keys and child pointers. The actual row data (the heap tuple) lives in the table heap. The leaves store (key, heap_pointer) pairs — sometimes called TIDs in PostgreSQL.
Leaves are linked horizontally. Each leaf knows its right sibling. This is what turns a BETWEEN query into a single descent plus a sequential walk along the leaf level — no need to climb back to the root.

PostgreSQL's B-Tree is technically a B+Tree variant (the Lehman-Yao concurrent B-link tree, to be exact). MySQL InnoDB uses a B+Tree for both the clustered primary index and all secondary indexes. SQL Server, Oracle, SQLite, and DB2 all use the same family. When someone says "B-Tree index" in a database conversation, they almost always mean B+Tree, and the distinction rarely matters at the SQL layer.

Why Balance Matters — and How It Is Maintained

A binary search tree (BST) gives you O(log n) search on average — but only if inserts arrive in random order. Insert sorted data into a plain BST and you get a linked list: every new key becomes the right child of the previous one, the tree degenerates into a stick, and search degrades to O(n). That is a disaster for a database, where primary keys are often inserted in monotonically increasing order (think auto-incrementing IDs, timestamps, UUIDv7).

A B-Tree solves this by rebalancing on every insert and delete. When a leaf overflows (more keys than fit in a page), it splits: half the keys go to a new sibling leaf, and a separator key is pushed up into the parent. If the parent overflows, it splits too, and so on up to the root. If the root itself splits, a new root is created and the tree grows by one level. Conversely, when a leaf underflows (too few keys after deletes), it borrows from a sibling or merges with one.

+---------------------------------------------------------------------+
|                  LEAF SPLIT (insert 35 into a full leaf)            |
+---------------------------------------------------------------------+
|                                                                      |
|  BEFORE                                                              |
|                          [ 50 ]                                      |
|                         /      \                                     |
|                  [10|20|30|40]   [60|70|80|90]                       |
|                                                                      |
|  Insert 35 -> left leaf is full. Split it:                           |
|                                                                      |
|  AFTER                                                               |
|                       [ 30 | 50 ]                                    |
|                       /     |    \                                   |
|                 [10|20]  [30|35|40]  [60|70|80|90]                   |
|                                                                      |
|  The separator key (30) was promoted into the parent.               |
|  All leaves are still at the same depth. Tree stays balanced.       |
|                                                                      |
+---------------------------------------------------------------------+

The cost of a split is small — a few page writes — and it happens infrequently because each leaf holds hundreds of keys. The benefit is enormous: no matter the insert order, the tree stays balanced and search stays logarithmic. A table with 10 million rows always has a B-Tree of depth 3 or 4. A table with 10 billion rows still only has a depth of 5 or 6.

This is the property that lets you treat indexes as a black box. You do not need to know whether your data arrived sorted, reversed, random, or hash-distributed — the B-Tree will look the same shape either way and your queries will run in the same time.

The `O(log n)` Promise — In Disk Pages, Not CPU Cycles

When textbooks say B-Tree search is O(log n), the base of the logarithm is the fan-out — typically 100 to 500 in real systems, not 2 like a binary tree. This is the difference that matters in practice:

Rows	log_2(n)	log_200(n)	Real B-Tree depth
10,000	14	1.7	2
1,000,000	20	2.6	3
100,000,000	27	3.6	4
10,000,000,000	33	4.4	5

A B-Tree search on a billion-row table touches four or five pages. PostgreSQL caches the upper levels of the tree (root and internal nodes) in shared buffers, so in practice only the leaf page and the heap fetch ever hit disk. The actual cost of a "find one row by primary key" is two I/O operations — usually both served from RAM — regardless of table size.

This is why senior engineers reach for "add an index" as the first lever for any slow WHERE column = ? query. The math is just that good.

Example 1 — Watching a B-Tree Earn Its Keep

Let us set up a real table, run a query without an index, then add one and watch the plan change.

-- Create a 1 million row test table of users
CREATE TABLE users (
    user_id    BIGSERIAL PRIMARY KEY,
    email      TEXT NOT NULL,
    country    TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Populate with 1 million synthetic rows
INSERT INTO users (email, country)
SELECT
    'user' || g || '@example.com',
    (ARRAY['US','UK','DE','FR','JP','BR','IN','CA'])[1 + (g % 8)]
FROM generate_series(1, 1000000) AS g;

-- Run ANALYZE so the planner has fresh statistics
ANALYZE users;

-- Query by email WITHOUT an index — sequential scan
EXPLAIN ANALYZE
SELECT user_id, country
FROM users
WHERE email = 'user742318@example.com';

Sample output:

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..23334.00 rows=1 width=14)
                    (actual time=42.118..183.502 rows=1 loops=1)
   Filter: (email = 'user742318@example.com'::text)
   Rows Removed by Filter: 999999
 Planning Time: 0.087 ms
 Execution Time: 183.541 ms

183 milliseconds, 999,999 rows examined and discarded. Now add a B-Tree index and rerun:

-- Create a B-Tree index on email (B-Tree is the default — no USING clause)
CREATE INDEX idx_users_email ON users (email);

EXPLAIN ANALYZE
SELECT user_id, country
FROM users
WHERE email = 'user742318@example.com';

Sample output:

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using idx_users_email on users
        (cost=0.42..8.44 rows=1 width=14)
        (actual time=0.041..0.043 rows=1 loops=1)
   Index Cond: (email = 'user742318@example.com'::text)
 Planning Time: 0.121 ms
 Execution Time: 0.062 ms

183 ms -> 0.06 ms. Roughly 3,000x faster on the same hardware, same data. The plan changed from Seq Scan to Index Scan using idx_users_email. The index descended three levels, found the matching leaf, followed the heap pointer, and returned the row. Notice that CREATE INDEX did not require USING btree — B-Tree is what you get by default in PostgreSQL.

Example 2 — Range Scans for Free

The leaf-level horizontal pointers are what make B-Trees so good at range queries. Once the descent finds the lower bound, the database walks the leaves left-to-right until it crosses the upper bound — no extra tree descents required.

-- Add a B-Tree index on created_at to support time-range queries
CREATE INDEX idx_users_created_at ON users (created_at);

-- Range query: find all users created in a 1-hour window
EXPLAIN ANALYZE
SELECT user_id, email
FROM users
WHERE created_at >= NOW() - INTERVAL '1 hour'
  AND created_at <  NOW();

Sample output:

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using idx_users_created_at on users
        (cost=0.42..142.18 rows=4127 width=32)
        (actual time=0.038..2.114 rows=4163 loops=1)
   Index Cond: ((created_at >= (now() - '01:00:00'::interval))
            AND (created_at <  now()))
 Planning Time: 0.094 ms
 Execution Time: 2.298 ms

The plan shows a single Index Scan with an Index Cond that contains both bounds. The database descended once to find the first matching leaf entry, then walked rightward across leaf pages until it passed the upper bound. Every key it touched was a hit — there is no filtering and discarding.

This is why you should reach for B-Tree when you need:

Equality: WHERE x = ?
Range: WHERE x BETWEEN ? AND ?, WHERE x > ?, WHERE x < ?
Sorted output: ORDER BY x (the leaves are already in order — no separate sort step needed)
Prefix matching on text: WHERE x LIKE 'foo%' (the prefix is itself a range)

What B-Tree is not good at: substring search (LIKE '%foo%'), full-text search, geospatial queries, or set membership on JSON arrays. Those need GIN, GiST, or BRIN — the topic of Lesson 9.4.

Multi-Column B-Trees and the Leftmost Prefix Rule

A B-Tree can index more than one column. The keys are stored as tuples, sorted lexicographically: first by the first column, then by the second column within each value of the first, and so on.

-- Composite index on (country, created_at)
CREATE INDEX idx_users_country_created ON users (country, created_at);

Conceptually, the index now holds keys like:

('BR', '2026-04-12 10:01'),
('BR', '2026-04-13 14:22'),
('BR', '2026-04-14 09:03'),
...
('CA', '2026-04-10 08:15'),
('CA', '2026-04-11 19:44'),
...
('DE', '2026-04-09 06:30'),
...

The database can use this index efficiently for any query that starts from the leftmost column and uses each subsequent column with an equality (or a range only on the rightmost referenced column). This is the leftmost prefix rule, and it is the single most important thing to understand about composite indexes.

+---------------------------------------------------------------------+
|           LEFTMOST PREFIX RULE — INDEX (country, created_at)        |
+---------------------------------------------------------------------+
|                                                                      |
|  USABLE (index scan):                                                |
|                                                                      |
|  WHERE country = 'BR'                                                |
|  WHERE country = 'BR' AND created_at > '2026-01-01'                  |
|  WHERE country IN ('BR','CA','US')                                   |
|  ORDER BY country, created_at                                        |
|                                                                      |
|  NOT USABLE (will fall back to seq scan unless another index helps): |
|                                                                      |
|  WHERE created_at > '2026-01-01'                  -- skips country  |
|  WHERE country LIKE '%BR%'                        -- not a prefix   |
|                                                                      |
+---------------------------------------------------------------------+

The intuition: the index is sorted by country first, so the database can binary-search for country = 'BR'. Within all the BR entries, the rows are sorted by created_at, so a range on created_at after equality on country is also efficient. But a query that filters only on created_at cannot use this index — the created_at values for BR, CA, DE, etc. are interleaved across the entire tree. There is no contiguous range to scan.

Example 3 — Watching the Leftmost Prefix Rule in Action

-- Query 1: uses the leftmost column (country) — index works
EXPLAIN ANALYZE
SELECT COUNT(*) FROM users
WHERE country = 'BR';

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (cost=4521.20..4521.21 rows=1 width=8)
   ->  Index Only Scan using idx_users_country_created on users
            (cost=0.42..4209.20 rows=124800 width=0)
            (actual time=0.052..18.301 rows=125000 loops=1)
         Index Cond: (country = 'BR'::text)
 Execution Time: 22.108 ms

Note the Index Only Scan — Postgres did not even need to visit the heap because both columns it needed were already in the index.

-- Query 2: uses both columns left-to-right — index works
EXPLAIN ANALYZE
SELECT user_id FROM users
WHERE country = 'BR'
  AND created_at >= '2026-04-01';

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Index Scan using idx_users_country_created on users
        (cost=0.42..1842.11 rows=12480 width=8)
        (actual time=0.061..4.812 rows=12503 loops=1)
   Index Cond: ((country = 'BR'::text)
            AND (created_at >= '2026-04-01'::timestamptz))
 Execution Time: 5.901 ms

Both predicates pushed into Index Cond. The descent finds (BR, 2026-04-01) and walks leaves rightward until country changes.

-- Query 3: skips the leftmost column — index does NOT help
EXPLAIN ANALYZE
SELECT user_id FROM users
WHERE created_at >= '2026-04-01';

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Bitmap Heap Scan on users  (cost=1124.22..14201.18 rows=99840 width=8)
                            (actual time=8.412..62.118 rows=100024 loops=1)
   Recheck Cond: (created_at >= '2026-04-01'::timestamptz)
   ->  Bitmap Index Scan on idx_users_created_at
            (cost=0.00..1099.26 rows=99840 width=0)
            (actual time=8.224..8.225 rows=100024 loops=1)
         Index Cond: (created_at >= '2026-04-01'::timestamptz)
 Execution Time: 68.901 ms

The planner ignored idx_users_country_created entirely and used idx_users_created_at instead. If we had not created that single-column index, this query would have fallen back to a sequential scan. A composite index is not a substitute for the indexes its non-leftmost columns might need on their own.

Example 4 — Index-Only Scans and Covering Indexes

When every column the query needs is already in the index, Postgres can answer from the index alone — it never visits the heap. This is called an Index Only Scan, and it is one of the biggest wins B-Trees offer.

-- Create a covering index that includes the email column
-- INCLUDE adds columns to the leaf payload but NOT to the sort key
CREATE INDEX idx_users_country_inc_email
    ON users (country) INCLUDE (email);

EXPLAIN ANALYZE
SELECT email FROM users
WHERE country = 'JP';

                                 QUERY PLAN
-------------------------------------------------------------------------------
 Index Only Scan using idx_users_country_inc_email on users
        (cost=0.42..3284.20 rows=124800 width=24)
        (actual time=0.038..14.122 rows=125000 loops=1)
   Index Cond: (country = 'JP'::text)
   Heap Fetches: 0
 Execution Time: 17.841 ms

Heap Fetches: 0 — the heap was never touched. The query was answered entirely from the B-Tree leaves. INCLUDE is the right tool here because we want email in the payload for retrieval, but we do not want it to participate in the sort order (which would bloat the tree and slow down inserts).

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Center: a three-level B-Tree diagram with one root node containing two keys, three internal nodes containing three keys each, and a row of six leaf nodes. The descent path from root to a single highlighted leaf is glowing cyan (#4fc3f7). Pink (#ff5c8a) horizontal arrows connect the leaves left-to-right and are labeled 'leaf-level linked list'. To the right, a small inset table shows depth vs row count: '1B rows -> depth 5'. White monospace labels throughout. Title: 'B-Tree: Balanced. Logarithmic. Default.'"

Common Mistakes

1. Indexing every column "just in case." Each B-Tree index costs disk space, slows down INSERT/UPDATE/DELETE (because the tree must be maintained), and bloats the working set in shared buffers. Indexes are not free. Add them when a query plan or pg_stat_statements tells you a specific query needs one — not preemptively. A common rule of thumb: index foreign keys, index columns in WHERE/JOIN/ORDER BY of frequent queries, and stop there until profiling proves otherwise.

2. Misunderstanding the leftmost prefix rule. Creating INDEX (a, b, c) and expecting it to speed up WHERE b = ? or WHERE c = ? is the most common composite-index mistake. The index is sorted by a first — without an equality (or IN) on a, the database cannot find a contiguous range to scan. If you need to query by b alone, create a separate index on b. The order of columns in a composite index is not cosmetic; it determines which queries the index can serve.

3. Indexing a low-cardinality column. A B-Tree on a boolean column, or on a gender column with two values, is almost never useful. The planner will likely choose a sequential scan anyway because the index would have to return half the heap, and a bitmap heap scan plus heap fetches is slower than just reading the heap directly. B-Trees shine on selective columns — ones where each value matches a small fraction of rows.

4. Forgetting to ANALYZE after bulk loads. The planner uses statistics from pg_statistic to decide whether to use an index. If you bulk-load a table and never run ANALYZE, the planner has no idea how selective your index is and may pick a sequential scan even when an index scan would be 1000x faster. After any large load or migration, always run ANALYZE table_name (or wait for autovacuum to do it).

5. Wrapping the indexed column in a function in WHERE. WHERE LOWER(email) = 'foo@bar.com' cannot use a plain index on email — the planner has no way to pre-compute LOWER(email) for every leaf. Either store the column already lowercased, or create a functional index: CREATE INDEX ON users (LOWER(email)). The same applies to WHERE created_at::date = '2026-04-14' — use a range condition (WHERE created_at >= '2026-04-14' AND created_at < '2026-04-15') instead.

Interview Questions

1. "Why is a B-Tree the default index type in PostgreSQL and MySQL, and when would you choose something else?"

A B-Tree is the default because it efficiently supports the most common query patterns: equality, range, sorted retrieval, and prefix matching on text — all in O(log n) time with a structure that maps cleanly onto fixed-size disk pages. It also stays balanced under any insert order, which is critical for tables with monotonically increasing primary keys. You would choose something else when the query pattern does not fit B-Tree semantics: GIN for full-text search and JSONB containment, GiST for geometric and range types, BRIN for very large append-only tables where physical ordering correlates with the indexed value (like time-series logs), and Hash for simple equality lookups when you do not need range scans (rarely used in practice because B-Tree handles equality just as well). The decision tree is essentially: start with B-Tree; only deviate when the query type or data shape rules it out.

2. "Explain the leftmost prefix rule for composite B-Tree indexes. Give an example of a query that uses an index (a, b, c) and one that does not."

A composite B-Tree index sorts its keys lexicographically by the listed columns. To use the index efficiently, a query must filter on the columns from left to right, with equality on every column except optionally a range on the rightmost referenced one. For an index on (a, b, c), the query WHERE a = 1 AND b = 2 AND c > 5 uses the full index — it finds (1, 2, 5) in one descent and walks the leaves until b or a changes. The query WHERE b = 2 cannot use the index, because rows with b = 2 are scattered across every distinct value of a — there is no contiguous range to scan. Likewise, WHERE a = 1 AND c = 5 can use the index for the a = 1 part, but c = 5 becomes a filter rather than an index condition, because between the a = 1 rows the c values are sorted within each b, not globally. The practical implication is that column order in a composite index is a deliberate engineering choice, not an alphabetical convenience.

3. "What is an index-only scan, and how does the INCLUDE clause help?"

An index-only scan is a query plan where Postgres answers a query entirely from the index leaves without ever fetching from the heap. It works when every column the query selects (plus every column it filters on) is already present in the index. This is much faster than a regular index scan because each matching key avoids a heap I/O. The INCLUDE clause lets you add columns to the leaf payload without making them part of the sort key — the index is still sorted only by the key columns, so insert and search performance stay the same, but the extra columns are available for retrieval. For example, CREATE INDEX ON orders (customer_id) INCLUDE (status, total) lets SELECT status, total FROM orders WHERE customer_id = 42 run as an index-only scan, even though status and total are not part of the sort order. There is one caveat: the visibility map must indicate that the relevant heap pages are all-visible, otherwise Postgres still has to visit the heap to check tuple visibility. Frequent VACUUM keeps the visibility map up to date.

4. "A query that used to use an index suddenly switched to a sequential scan after a data load. What are the likely causes and fixes?"

The most common cause is stale statistics. After a bulk insert, pg_statistic still reflects the old row count and value distribution, so the planner thinks the table is small and a sequential scan is cheap. The fix is ANALYZE table_name, or letting autovacuum catch up. Other causes: the new data dramatically changed the selectivity of the indexed column (a query that returned 0.1% of the table now returns 30%, making the planner correctly prefer a sequential scan); the table grew so large that the index itself became bloated and the planner's cost model now favors a sequential scan; an UPDATE rewrote enough tuples to invalidate the visibility map, forcing index-only scans to do heap fetches anyway; or someone accidentally wrapped the column in a function that disables the index. The diagnosis path is: run EXPLAIN ANALYZE to confirm the plan, run ANALYZE and re-check, look at pg_stat_user_indexes to see whether the index is being used by other queries, and check for index bloat with pgstattuple.

5. "How does a B-Tree maintain balance during inserts, and why does that matter for primary keys generated by a sequence?"

When an insert lands in a leaf that is already full, the leaf splits: half the keys move to a new sibling, a separator key is promoted into the parent, and if the parent is also full it splits in turn. This rebalancing propagates upward and may eventually create a new root, increasing the tree depth by one. Crucially, every leaf remains at exactly the same depth — there is no degenerate path. This matters for sequence-generated primary keys because every new row inserts at the rightmost leaf. In a naive BST, this would produce a long right spine and O(n) search, but in a B-Tree the rightmost leaf simply splits when it fills, the new leaf becomes the rightmost, and the tree stays balanced. PostgreSQL even has a special optimization called rightmost leaf cache that avoids descending the tree for sequential inserts. The result is that auto-incrementing primary keys are not just acceptable but actually the best-case insert pattern for a B-Tree, because splits happen at predictable locations and the upper levels of the tree stay hot in cache.

Quick Reference — B-Tree Cheat Sheet

+---------------------------------------------------------------+
|           B-TREE INDEX CHEAT SHEET                            |
+---------------------------------------------------------------+
|                                                                |
|  CREATE (B-Tree is the default — no USING clause needed)       |
|  CREATE INDEX idx_name ON table (column);                      |
|  CREATE INDEX idx_name ON table USING btree (column);          |
|                                                                |
|  COMPOSITE (column order MATTERS — leftmost prefix rule)       |
|  CREATE INDEX ON orders (customer_id, created_at);             |
|                                                                |
|  COVERING (INCLUDE adds payload without changing sort)         |
|  CREATE INDEX ON orders (customer_id) INCLUDE (status, total); |
|                                                                |
|  PARTIAL (index only the rows you query)                       |
|  CREATE INDEX ON orders (customer_id)                          |
|         WHERE status = 'pending';                              |
|                                                                |
|  FUNCTIONAL (index a computed expression)                      |
|  CREATE INDEX ON users (LOWER(email));                         |
|                                                                |
|  DROP / REBUILD                                                |
|  DROP INDEX idx_name;                                          |
|  REINDEX INDEX idx_name;                                       |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           WHEN B-TREE IS THE RIGHT CHOICE                     |
+---------------------------------------------------------------+
|                                                                |
|  YES:                                                          |
|   - Equality:    WHERE x = ?                                   |
|   - Range:       WHERE x BETWEEN ? AND ?                       |
|   - Inequality:  WHERE x > ?, x < ?, x >= ?, x <= ?            |
|   - Sorting:     ORDER BY x                                    |
|   - Prefix:      WHERE x LIKE 'foo%'                           |
|   - IN list:     WHERE x IN (?, ?, ?)                          |
|   - NULL check:  WHERE x IS NULL                               |
|                                                                |
|  NO (use a different index type):                              |
|   - Substring:   WHERE x LIKE '%foo%'    -> GIN (pg_trgm)      |
|   - Full text:   to_tsvector @@ to_tsquery -> GIN              |
|   - JSONB key:   WHERE doc @> '{"k":1}'    -> GIN              |
|   - Geometry:    ST_Within / ST_DWithin    -> GiST             |
|   - Time-series: huge append-only tables   -> BRIN             |
|                                                                |
+---------------------------------------------------------------+

Concern	Wrong Way	Right Way
Default index	Specify `USING btree` everywhere	Just `CREATE INDEX` — B-Tree is default
Composite order	Alphabetical	Most selective / most filtered first
`LIKE '%foo%'`	B-Tree	GIN with `pg_trgm`
Function in WHERE	`WHERE LOWER(x) = ?` with index on `x`	Functional index on `LOWER(x)`
Bulk load	Skip ANALYZE	Always `ANALYZE` after large loads
Covering query	Add column to sort key	Use `INCLUDE`
Low-cardinality column	Index it anyway	Skip the index
Range + equality	Range column first	Equality column first, range last

Prev: Lesson 9.2 -- Primary and Secondary Indexes Next: Lesson 9.4 -- GIN, GiST, Hash, BRIN Indexes

This is Lesson 9.3 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

B-Tree Index

LinkedIn Hook

What You'll Learn

The Phone Book Analogy — Why a Tree Beats a List

What a B-Tree Actually Looks Like

Why Balance Matters — and How It Is Maintained

The O(log n) Promise — In Disk Pages, Not CPU Cycles

Example 1 — Watching a B-Tree Earn Its Keep

Example 2 — Range Scans for Free

Multi-Column B-Trees and the Leftmost Prefix Rule

Example 3 — Watching the Leftmost Prefix Rule in Action

Example 4 — Index-Only Scans and Covering Indexes

Common Mistakes

Interview Questions

Quick Reference — B-Tree Cheat Sheet

On this page

The `O(log n)` Promise — In Disk Pages, Not CPU Cycles