LinkedIn Hook

"Your query was fast on 10,000 rows. It is now 47 seconds slow on 10 million. The data did not get more complex -- you just never added an index."

Almost every junior backend engineer has lived this exact moment. You build a feature, test it locally with a few hundred rows, ship it to production, and three months later the support channel lights up because the dashboard takes a minute to load. The query did not change. The schema did not change. What changed is that PostgreSQL is now reading every single row on disk to answer a question it could have answered in milliseconds -- if only there had been an index.

An index is not magic. It is a smaller, sorted, purpose-built data structure that lives next to your table and lets the database jump straight to the rows you want, instead of scanning every page from the beginning. The cost is real -- writes get slower, disk usage grows, and the planner has to think harder -- but the read speedup is often a thousand times or more. The difference between a competent backend engineer and a great one is knowing exactly when that trade is worth it.

In Lesson 9.1, I break down what a database index actually is -- the book-index analogy that finally makes it click, how a query uses an index step by step, the trade-offs that make indexing a craft and not a checkbox, and the specific cases where adding an index would actually make your system slower.

Read the full lesson -> [link]

#PostgreSQL #Database #Indexing #BackendDevelopment #SQL #InterviewPrep #DataEngineering

What Is an Index? thumbnail

What You'll Learn

What an index actually is at the data-structure level, and why it is separate from the table
The book-index analogy that makes indexing intuitive forever
How a SQL query uses an index, step by step, with EXPLAIN ANALYZE output
The exact trade-offs: faster reads vs slower writes, extra storage, and planner overhead
When an index helps (selective filters, joins, sorts) and when it does not (small tables, low-selectivity columns, write-heavy hot paths)
The difference between a sequential scan, an index scan, and an index-only scan
PostgreSQL syntax for CREATE INDEX, with notes on MySQL differences

The Book Index Analogy -- Why a Database Index Even Exists

Pick up any thick technical book -- say, a 900-page reference on PostgreSQL. Suppose you want to find every page that mentions the word "vacuum." You have two options.

Option A. Open the book at page 1, read every word, mark every occurrence of "vacuum," and continue until page 900. This is exhaustive, guaranteed correct, and unbearably slow. You would spend hours.

Option B. Flip to the back of the book where the index lives. The index is a separate, alphabetically sorted list. You scan down to "V," find "vacuum," and read off the page numbers: 42, 188, 401, 766. You then jump directly to those four pages. Total time: thirty seconds.

That is exactly what a database index is. The table is the body of the book -- rows stored in whatever order they happened to be inserted, with no sorting and no shortcut. An index is a separate, sorted data structure (almost always a B-tree in PostgreSQL) that maps a value -> a pointer to the row's location on disk. When you ask "find all users where email = 'ada@example.com'," the database can either scan every row in the table (the slow way) or look up ada@example.com in the email index, get back a single row pointer, and fetch just that one row (the fast way).

The crucial insight is that the index is not a copy of the table -- it is a much smaller auxiliary structure that only stores the indexed column(s) plus a pointer (called the TID in PostgreSQL, short for tuple identifier). A million-row user table might be 500 MB on disk; an index on the email column might be 30 MB. The index is small enough to keep in RAM even when the table is not, which is half the reason index lookups feel instantaneous.

+---------------------------------------------------------------+
|           NO INDEX -- SEQUENTIAL SCAN                          |
+---------------------------------------------------------------+
|                                                                |
|   users table (1,000,000 rows on disk)                         |
|                                                                |
|   row 1: id=1   email=zoe@x.com    ...                         |
|   row 2: id=2   email=ben@x.com    ...                         |
|   row 3: id=3   email=ada@x.com    <-- match!                  |
|   row 4: id=4   email=lin@x.com    ...                         |
|   ...                                                          |
|   row 1000000: ...                                             |
|                                                                |
|   Query: WHERE email = 'ada@x.com'                             |
|   Strategy: read EVERY row, check email column, keep matches   |
|   Cost: O(n) -- proportional to table size                     |
|   Time on 1M rows: ~4200 ms                                    |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           WITH INDEX -- B-TREE LOOKUP                          |
+---------------------------------------------------------------+
|                                                                |
|   users_email_idx (B-tree, sorted by email)                    |
|                                                                |
|                    [ lin@x.com ]                               |
|                   /             \                              |
|           [ ben ]                 [ zoe ]                      |
|          /      \                /       \                     |
|     [ ada ]   [ jay ]       [ tom ]    [ zoe ]                 |
|        |                                                       |
|        +--> TID (page 7, slot 3) --> jump to row in table      |
|                                                                |
|   Query: WHERE email = 'ada@x.com'                             |
|   Strategy: walk the B-tree (3-4 hops), get TID, fetch 1 row   |
|   Cost: O(log n) -- proportional to TREE DEPTH, not table size |
|   Time on 1M rows: ~0.4 ms                                     |
|                                                                |
+---------------------------------------------------------------+

That is a 10,000x speedup on a single point lookup. And it gets better: as the table grows from 1M rows to 100M rows, the sequential scan gets 100x slower, while the B-tree lookup gets only one or two extra hops slower. Indexes do not just make queries fast -- they make queries stay fast as your data grows.

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Split comparison. LEFT panel labeled 'Sequential Scan' shows a long vertical stack of 1,000,000 row icons in dim white, with a magenta (#ff5c8a) arrow sweeping top-to-bottom and a clock reading '4200 ms'. RIGHT panel labeled 'Index Scan (B-tree)' shows a small cyan (#4fc3f7) tree with 4 levels, a glowing path from root to leaf, and a single arrow jumping to one highlighted row in a small table icon, with a clock reading '0.4 ms'. Bottom caption in white monospace: '10,000x faster -- because we stopped reading what we did not need.'"

How a Query Actually Uses an Index -- Step by Step

Let us walk through what happens inside PostgreSQL when you run a WHERE query against an indexed column. The mechanics are the same for almost every B-tree index.

+---------------------------------------------------------------+
|           QUERY EXECUTION WITH AN INDEX                        |
+---------------------------------------------------------------+
|                                                                |
|   1. Parser   -> turns SQL text into a parse tree              |
|                                                                |
|   2. Planner  -> looks at table stats and available indexes,   |
|                  estimates cost of Seq Scan vs Index Scan,     |
|                  picks the cheaper plan                        |
|                                                                |
|   3. Executor -> runs the chosen plan:                         |
|                                                                |
|        a. Walk the B-tree from root to leaf                    |
|           (typically 3-5 page reads, all cached in RAM)        |
|                                                                |
|        b. Read the matching leaf entries                       |
|           (each entry = (indexed value, TID pointer))          |
|                                                                |
|        c. For each TID, fetch the actual row from the heap     |
|           (this is the "heap fetch" -- one disk I/O per row)   |
|                                                                |
|        d. Apply any remaining filters and return rows          |
|                                                                |
|   4. Result   -> rows stream back to the client                |
|                                                                |
+---------------------------------------------------------------+

The phrase "heap fetch" is worth pausing on. The index gets you to a TID quickly, but the row itself still lives in the heap (the unordered table file). For a single-row lookup, that is one extra read and basically free. For a range query that returns 100,000 rows, those are 100,000 random reads scattered across the disk -- and at that point the planner may decide that a sequential scan, which reads pages in order, is actually faster. This is exactly why the planner is allowed to ignore indexes: sometimes scanning is genuinely cheaper.

Runnable Example 1 -- Building a Table and Watching the Plan Change

Let us prove all of this with real PostgreSQL output. We will create a million-row table, run a query without an index, add the index, and re-run the query.

-- Create a users table and load 1 million rows of fake data.
-- generate_series is a Postgres built-in that produces N rows for testing.
CREATE TABLE users (
    id         BIGSERIAL PRIMARY KEY,
    email      TEXT NOT NULL,
    full_name  TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

INSERT INTO users (email, full_name)
SELECT
    'user' || g || '@example.com',
    'User Number ' || g
FROM generate_series(1, 1000000) AS g;

-- Update planner statistics so EXPLAIN gives realistic numbers.
ANALYZE users;

Now run a query that filters by email, with no index on email yet. The PRIMARY KEY on id already created an index on id, but not on email.

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, full_name
FROM users
WHERE email = 'user873214@example.com';

Output (typical, on a laptop):

                                  QUERY PLAN
-----------------------------------------------------------------------------
 Seq Scan on users  (cost=0.00..23334.00 rows=1 width=24)
                    (actual time=98.412..421.073 rows=1 loops=1)
   Filter: (email = 'user873214@example.com'::text)
   Rows Removed by Filter: 999999
   Buffers: shared hit=8334
 Planning Time: 0.118 ms
 Execution Time: 421.114 ms

Read that carefully. The plan is a Seq Scan, which means PostgreSQL read every page of the users table (shared hit=8334 pages) and threw away 999,999 rows just to find the one we asked for. Total wall-clock time: 421 ms for a single-row lookup.

Now add the index and re-run the exact same query.

-- Create a B-tree index on the email column.
-- B-tree is the default in PostgreSQL when you do not specify a type.
CREATE INDEX users_email_idx ON users (email);

-- Re-analyze so the planner knows about the new index.
ANALYZE users;

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, full_name
FROM users
WHERE email = 'user873214@example.com';

New output:

                                          QUERY PLAN
---------------------------------------------------------------------------------------
 Index Scan using users_email_idx on users
                    (cost=0.42..8.44 rows=1 width=24)
                    (actual time=0.039..0.041 rows=1 loops=1)
   Index Cond: (email = 'user873214@example.com'::text)
   Buffers: shared hit=4
 Planning Time: 0.187 ms
 Execution Time: 0.062 ms

The plan switched to Index Scan using users_email_idx. The buffer count dropped from 8334 to 4 -- the database now reads only four 8 KB pages: three to walk the B-tree from root to leaf, and one to fetch the actual row from the heap. Wall-clock time: 0.062 ms, a roughly 6,800x speedup for a single line of SQL: CREATE INDEX users_email_idx ON users (email).

MySQL note. The same query in MySQL/InnoDB looks identical: CREATE INDEX users_email_idx ON users (email); and EXPLAIN SELECT .... The output format differs (type: ref instead of Index Scan, rows: 1 instead of buffer hits), but the underlying B-tree mechanics are the same. The biggest difference is that InnoDB tables are clustered by primary key -- the table itself is stored as a B-tree on the PK, so secondary indexes store the PK value (not a TID) and require a second B-tree walk to fetch the row. PostgreSQL uses heap tables with TID pointers, which is simpler but means every secondary index pays the heap-fetch cost.

The Trade-Offs -- Why We Do Not Index Every Column

If indexes make reads thousands of times faster, why not index every column on every table? Because indexes are not free. Every index you add imposes three real costs.

Cost 1. Writes get slower. Every INSERT, UPDATE, and DELETE on the table must also update every index that references the changed columns. A table with 6 indexes pays 7 write costs (one for the heap, six for the indexes) on every insert. On a write-heavy table -- say, an event log ingesting 10,000 rows per second -- adding an unnecessary index can drop your throughput by 30% or more.

Cost 2. Storage grows. A B-tree index on a BIGINT column adds roughly 30-40 bytes per row, including overhead. Index a TEXT column with average length 50 characters and you are looking at 80-100 bytes per row. On a 100M-row table, that is 8-10 GB per index. Your users table that was 50 GB on disk is now 80 GB after you added three indexes "just in case."

Cost 3. The planner has more work. Every additional index is another plan the planner has to consider when optimizing a query. The cost is small per query, but on tables with 15-20 indexes the planner overhead is measurable, and -- worse -- the planner sometimes picks the wrong index, especially when statistics are stale. Fewer, well-chosen indexes give the planner an easier job and produce more stable performance.

+---------------------------------------------------------------+
|           THE INDEX TRADE-OFF, AT A GLANCE                     |
+---------------------------------------------------------------+
|                                                                |
|   GOOD: indexes speed up                                       |
|     - WHERE clauses on the indexed column(s)                   |
|     - JOINs on the indexed column(s)                           |
|     - ORDER BY on the indexed column(s) (sorted scan)          |
|     - MIN/MAX queries (one B-tree hop to the leaf edge)        |
|     - Uniqueness enforcement (UNIQUE indexes)                  |
|                                                                |
|   BAD: indexes slow down                                       |
|     - INSERT, UPDATE, DELETE (every index must be maintained) |
|     - VACUUM and bulk loads                                    |
|     - Disk usage and backup size                               |
|     - Planner time on tables with many indexes                 |
|                                                                |
|   RULE OF THUMB:                                               |
|     Index columns you READ by, FILTER by, JOIN on, and SORT by |
|     Do NOT index every column "just in case"                   |
|                                                                |
+---------------------------------------------------------------+

Runnable Example 2 -- Measuring the Write Cost of an Index

Let us prove that writes really do get slower with each added index. We will time a bulk insert into a table with no indexes, then add three indexes and re-time the same insert.

-- Fresh table, no indexes besides the implicit primary key.
CREATE TABLE events (
    id         BIGSERIAL PRIMARY KEY,
    user_id    BIGINT NOT NULL,
    event_type TEXT NOT NULL,
    payload    TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Time a 500k-row insert with only the PK index present.
\timing on
INSERT INTO events (user_id, event_type, payload)
SELECT
    (random() * 100000)::BIGINT,
    (ARRAY['click','view','purchase','signup'])[1 + (random()*3)::INT],
    repeat('x', 80)
FROM generate_series(1, 500000);
-- Time: 1842.314 ms

Now add three more indexes -- the kind a junior engineer might add "to be safe" -- and run the same insert again.

CREATE INDEX events_user_id_idx    ON events (user_id);
CREATE INDEX events_event_type_idx ON events (event_type);
CREATE INDEX events_created_at_idx ON events (created_at);

-- Same insert, same row count, same data shape.
INSERT INTO events (user_id, event_type, payload)
SELECT
    (random() * 100000)::BIGINT,
    (ARRAY['click','view','purchase','signup'])[1 + (random()*3)::INT],
    repeat('x', 80)
FROM generate_series(1, 500000);
-- Time: 4731.802 ms

The same insert went from 1.84 seconds to 4.73 seconds -- a 2.6x slowdown -- purely from maintaining three extra B-trees. On a high-throughput ingestion pipeline, that is the difference between keeping up with traffic and falling behind during a peak. This is why adding an index is an engineering decision, not a free upgrade.

When NOT to Index -- The Cases Beginners Get Wrong

The biggest indexing mistakes happen not from forgetting to add an index, but from adding indexes that make things worse. Five situations where you should think twice.

1. The table is tiny. If your countries table has 250 rows, an index is pointless. PostgreSQL will scan the whole table in under a millisecond regardless, and the planner will (correctly) ignore any index you create. Anything below ~1000 rows is generally not worth indexing.

2. The column has very low selectivity. Selectivity is the fraction of rows that any given value matches. An index on a boolean is_active column where 95% of users are active is worthless -- the planner will skip the index and seq-scan, because reading 950,000 rows via random heap fetches is slower than reading them in order. Rule of thumb: if a value matches more than ~5-10% of the table, an index will not help and may hurt.

3. The table is write-heavy and the column is almost never queried. If you ingest 10k events/sec into an events table and only query the column once a day in a batch job, the daily query can afford to seq-scan. Adding the index would tax every insert all day to save a few seconds in one batch. Move the analytical query to a read replica or an OLAP store instead.

4. The query never uses the column you indexed. This sounds obvious, but it happens constantly: someone indexes created_at and then writes WHERE EXTRACT(YEAR FROM created_at) = 2024. The function call wraps the column, so the index cannot be used. The fix is either to write WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01', or to create a functional index on EXTRACT(YEAR FROM created_at).

5. You are about to bulk-load the table. Loading a billion rows into an indexed table is dramatically slower than loading into an unindexed table and creating the indexes afterward. Drop indexes before a big load, recreate them after. PostgreSQL even has CREATE INDEX CONCURRENTLY so you can build the index without blocking writes once the load is done.

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). A central decision tree titled 'Should I add an index?' in white monospace. Root question: 'Does a query filter, join, or sort on this column?'. Branch NO -> red 'Do not index'. Branch YES -> next question: 'Is the table > 1000 rows AND selectivity < 10%?'. NO -> magenta (#ff5c8a) 'Probably not worth it'. YES -> next: 'Are writes more critical than this read?'. YES -> magenta 'Skip or move to replica'. NO -> cyan (#4fc3f7) glowing box: 'Add the index. CREATE INDEX ... ON table (column)'. Subtle dot grid background."

Index Scan vs Index-Only Scan -- A Free Speedup

There is one more variant worth knowing. If your query selects only columns that are already in the index, PostgreSQL can answer it without ever touching the heap. This is called an index-only scan, and it is often 2-5x faster than a regular index scan because it skips the heap-fetch step entirely.

-- Suppose we frequently run: SELECT email FROM users WHERE email LIKE 'ada%';
-- The existing index on (email) already contains everything we need.

EXPLAIN (ANALYZE, BUFFERS)
SELECT email FROM users WHERE email LIKE 'ada%';

                                          QUERY PLAN
---------------------------------------------------------------------------------------
 Index Only Scan using users_email_idx on users
                    (cost=0.42..8.44 rows=12 width=22)
                    (actual time=0.028..0.034 rows=12 loops=1)
   Index Cond: ((email >= 'ada'::text) AND (email < 'adb'::text))
   Heap Fetches: 0
   Buffers: shared hit=4
 Planning Time: 0.094 ms
 Execution Time: 0.051 ms

The line Heap Fetches: 0 is the magic -- the database answered the query entirely from the index, never touching the underlying table. This is the foundation of covering indexes, which we will explore in Lesson 9.4.

Common Mistakes

1. Indexing every column "to be safe." Adding indexes by reflex feels productive but actively harms write performance, bloats storage, and clutters the planner's options. Every index needs to justify itself with a real query that uses it. If you cannot point to a WHERE, JOIN, or ORDER BY that will hit the index, do not create it. A good heuristic: most well-tuned OLTP tables have between 2 and 6 indexes total, including the primary key.

2. Forgetting that the primary key is already an index. PRIMARY KEY in PostgreSQL automatically creates a unique B-tree index. Beginners sometimes write CREATE INDEX users_id_idx ON users (id) after declaring id as the primary key, ending up with two identical indexes that both must be maintained on every write. Check \d tablename in psql to see what indexes already exist before adding new ones.

3. Wrapping the indexed column in a function. Writing WHERE LOWER(email) = 'ada@x.com' against an index on email will not use the index -- the function call breaks the match. Either store the column in normalized form (always lowercase), use a functional index like CREATE INDEX ON users (LOWER(email)), or use a case-insensitive collation. The same trap applies to WHERE created_at::date = '2024-01-01', WHERE id + 0 = 5, and any other expression that wraps the column.

4. Indexing low-selectivity boolean or enum columns. An index on is_deleted BOOLEAN is almost always useless on its own, because the value distribution is too uneven. The planner will seq-scan anyway. If you really need to query "all not-deleted users fast," consider a partial index like CREATE INDEX ON users (id) WHERE is_deleted = false, which only indexes the rows you actually care about and is dramatically smaller.

5. Not running ANALYZE after creating an index or bulk-loading data. PostgreSQL's planner makes decisions based on table statistics. Stale statistics can cause it to ignore a brand-new index, or to use the wrong one. After any large data change or new index, run ANALYZE tablename; (or rely on autovacuum, which does it eventually). When debugging "why is my index not used?", ANALYZE is the first thing to try.

Interview Questions

1. "Explain in plain English what a database index is and why it makes queries faster."

A database index is a separate, sorted data structure -- almost always a B-tree -- that stores the values of one or more columns plus a pointer to the row's location on disk. It is exactly analogous to the index at the back of a book: instead of reading every page to find a topic, you look up the topic in the alphabetical index and jump straight to the relevant pages. Without an index, the database performs a sequential scan, reading every row to check if it matches the query, which is O(n) in the table size. With a B-tree index, the database walks the tree from root to leaf in O(log n) hops, finds the matching pointer, and fetches just the relevant rows. On a million-row table, that is the difference between hundreds of milliseconds and a fraction of a millisecond. The speedup grows as the table grows, which is why indexing is what keeps databases fast at scale.

2. "What are the trade-offs of adding an index? Why not index every column?"

Indexes speed up reads but impose three real costs. First, writes get slower: every INSERT, UPDATE, or DELETE must also update every index that references the changed columns, so a table with six indexes pays seven write costs per row. Second, storage grows: each index consumes 30-100 bytes per row, which on a 100M-row table is several gigabytes per index. Third, the query planner has more options to consider, and on tables with too many indexes the planner can spend measurable time choosing -- and sometimes pick the wrong one. The right number of indexes is usually small: pick the columns you genuinely query by, join on, and sort by, and skip the rest. Indexing every column is the database equivalent of caching every value in your application -- it sounds defensive but often makes things slower overall.

3. "When would you choose NOT to add an index, even if a query filters on a column?"

Several situations argue against indexing. If the table is small (under a thousand rows or so), the planner will seq-scan regardless and the index is wasted work. If the column has low selectivity -- meaning a typical value matches more than 5-10% of rows -- the index will not be used because random heap fetches are slower than a sequential scan. If the table is heavily write-loaded and the column is queried only rarely, the per-write maintenance cost outweighs the occasional read benefit, and you are better off running the rare query against a read replica or OLAP store. If the query wraps the column in a function, like WHERE LOWER(email) = ..., a plain index will not match anyway. Finally, before a large bulk load, you typically drop indexes and recreate them after the load completes, because building an index on prebuilt data is much faster than maintaining it incrementally during the insert.

4. "What is the difference between an Index Scan and an Index-Only Scan in PostgreSQL?"

An Index Scan walks the B-tree to find matching entries, then performs a heap fetch -- a separate disk read into the actual table -- for each matching row to retrieve any columns the query needs that are not stored in the index. An Index-Only Scan happens when every column the query selects is already present in the index, so PostgreSQL can answer the query without touching the heap at all. The Heap Fetches: 0 line in EXPLAIN ANALYZE confirms it. Index-only scans are typically 2-5x faster than regular index scans because they avoid the random I/O of heap lookups, and they are the foundation of covering indexes, where you deliberately include extra columns in an index (via INCLUDE in PostgreSQL) so that common queries can be answered entirely from the index. Index-only scans depend on the visibility map being up to date, which is maintained by VACUUM.

5. "A junior engineer adds CREATE INDEX ON users (LOWER(email)) and another index CREATE INDEX ON users (email) and asks why the first one is needed. What do you tell them?"

The two indexes serve different queries because PostgreSQL matches indexes by the exact expression in the WHERE clause. CREATE INDEX ON users (email) only helps queries like WHERE email = 'Ada@X.com', where the comparison is on the raw column. As soon as you write WHERE LOWER(email) = 'ada@x.com', the planner sees a function call wrapping the column and refuses to use the plain email index, falling back to a sequential scan. The functional index CREATE INDEX ON users (LOWER(email)) solves this by storing the lowercased values in the B-tree directly, so the function-wrapped query can be satisfied by an index lookup. The deeper lesson is that indexes match the shape of the expression, not just the column. If you frequently query in case-insensitive form, you either need a functional index, a generated column, or a citext column type. This is one of the most common reasons indexes "mysteriously" do not get used.

Quick Reference -- Cheat Sheet

+---------------------------------------------------------------+
|           INDEXING -- LESSON 9.1 CHEAT SHEET                   |
+---------------------------------------------------------------+
|                                                                |
|  WHAT AN INDEX IS:                                             |
|    A separate, sorted data structure (B-tree by default)       |
|    that maps column value -> row pointer (TID).                |
|    Like the index at the back of a book.                       |
|                                                                |
|  WHEN IT HELPS:                                                |
|    - WHERE col = value      (point lookups)                    |
|    - WHERE col BETWEEN ...  (range scans)                      |
|    - JOIN ON a.x = b.y      (both sides)                       |
|    - ORDER BY col           (sorted output for free)           |
|    - MIN(col), MAX(col)     (one B-tree hop to the edge)       |
|                                                                |
|  WHEN IT DOES NOT HELP:                                        |
|    - Tiny tables (< 1000 rows)                                 |
|    - Low-selectivity columns (booleans, status enums)          |
|    - Function-wrapped predicates (LOWER(col), col + 0)         |
|    - Write-heavy, rarely-read columns                          |
|                                                                |
|  COSTS:                                                        |
|    - Slower writes (every index updated per INSERT/UPDATE)     |
|    - More disk + RAM (30-100 bytes per row per index)          |
|    - Planner overhead and risk of wrong-index choice           |
|                                                                |
|  SYNTAX (PostgreSQL):                                          |
|    CREATE INDEX users_email_idx ON users (email);              |
|    CREATE UNIQUE INDEX ... ON users (email);                   |
|    CREATE INDEX CONCURRENTLY ... ON users (email);             |
|    DROP INDEX users_email_idx;                                 |
|    \d users   -- show all indexes on a table                   |
|                                                                |
|  DEBUG:                                                        |
|    EXPLAIN (ANALYZE, BUFFERS) SELECT ...;                      |
|    Look for: Seq Scan vs Index Scan vs Index Only Scan         |
|    After changes: ANALYZE tablename;                           |
|                                                                |
+---------------------------------------------------------------+

Concept	Without Index	With Index
Lookup complexity	O(n) -- scan every row	O(log n) -- walk a B-tree
1M-row point lookup	~400 ms	~0.05 ms
Insert cost	1 heap write	1 heap write + 1 write per index
Disk usage	Just the table	Table + ~30-100 bytes/row per index
Planner choice	Seq Scan	Index Scan, Index Only Scan, or Bitmap Scan
Best for	Small tables, low-selectivity columns	Selective filters, joins, sorts

Prev: Lesson 8.5 -- Cardinality and Ordinality Next: Lesson 9.2 -- Primary and Secondary Indexes

This is Lesson 9.1 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

What Is an Index?