Performance Optimization
The Capstone Checklist for Fast PostgreSQL
LinkedIn Hook
"Your dashboard takes 18 seconds to load. The CTO wants it under 1. Where do you even start?"
Most engineers reach for the same instinct: "add an index." Sometimes that works. Most of the time it papers over a deeper issue — a query plan choosing a sequential scan because statistics are stale, an ORM firing 600 N+1 queries behind the scenes, a connection pool starved by long-running transactions, a table that has not been vacuumed since last quarter, or a read replica that nobody is actually reading from.
Database performance is not one problem. It is a stack of problems, each at a different layer, each with its own diagnostic tool. Query tuning lives at the SQL layer with
EXPLAIN ANALYZE. Index strategy lives at the schema layer. Connection pooling lives at the application boundary.VACUUMandANALYZElive in the storage engine. Partitioning lives at the table layout. Caching lives in front of the database entirely. Read replicas live at the topology level. And N+1 lives in the code that calls the database.The engineers who consistently ship fast systems do not memorize tricks. They follow a checklist, top to bottom, and they measure before and after every change. They know that the answer to "why is this slow?" is almost never the first thing they suspect.
Lesson 12.3 is the capstone of the Database Interview Prep Course. We tie together everything from indexes (Chapter 6) to transactions (Chapter 8) to replication (Chapter 11), and turn it into one performance optimization checklist you can apply to any slow Postgres system.
Read the full lesson -> [link]
#PostgreSQL #DatabasePerformance #SQLOptimization #BackendEngineering #InterviewPrep
What You'll Learn
- How to read
EXPLAIN ANALYZEoutput and find the line that is actually expensive - A repeatable index strategy: which columns to index, which order, and when to stop
- Why connection pooling is the difference between 200 and 20,000 requests per second
- What
VACUUMandANALYZEactually do and why ignoring them silently kills performance - When partitioning helps and when it just moves the problem around
- The three layers of caching (query, application, edge) and where each one belongs
- How read replicas eliminate read pressure without sacrificing write consistency
- How to spot and eliminate N+1 query patterns from ORM code
- A complete top-to-bottom checklist you can apply to any slow PostgreSQL system
The Hospital Triage Analogy — Why You Need a Checklist
Imagine an emergency room where every patient who walks in gets the same treatment: a chest X-ray. Some patients have broken ankles. Some have headaches. Some are having strokes. The chest X-ray catches a few real problems, misses most, and wastes time on everyone. The hospital is busy but the outcomes are terrible.
A real ER does not work that way. It runs triage. A nurse measures vital signs in a fixed order — airway, breathing, circulation, then disability, then exposure. Each step has its own tool. Each step rules out a category of problems before moving to the next. The patient with chest pain gets an EKG, not a leg X-ray. The patient with a head wound gets a CT, not a blood pressure cuff. Triage is a checklist, and the checklist is what makes the ER fast and accurate.
Database performance optimization is triage. A slow query has dozens of possible causes — a missing index, stale statistics, a lock conflict, a bloated table, a starved connection pool, an N+1 in the application, a network round trip, a cold cache. If you guess at the cause and "add an index" to every problem, you are giving every patient a chest X-ray. You will fix some queries by accident and miss the rest. The fix is to follow a fixed sequence of diagnostics, top to bottom, measuring at each step, and only acting on what the data shows.
+---------------------------------------------------------------+
| THE GUESS-AND-INDEX APPROACH (broken) |
+---------------------------------------------------------------+
| |
| slow query -> "add an index" -> still slow -> "add another" |
| -> "maybe a JOIN hint" -> "rewrite the ORM" |
| -> "blame the database" -> 3 days lost |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| THE TRIAGE CHECKLIST (works) |
+---------------------------------------------------------------+
| |
| slow query |
| | |
| v |
| [1] EXPLAIN ANALYZE -> find the expensive node |
| | |
| v |
| [2] Index strategy -> can the planner use an index? |
| | |
| v |
| [3] Statistics -> is ANALYZE up to date? |
| | |
| v |
| [4] N+1 elimination -> how many queries per request? |
| | |
| v |
| [5] Connection pool -> are we waiting on a connection? |
| | |
| v |
| [6] Caching -> can we skip the DB entirely? |
| | |
| v |
| [7] Read replicas -> can we offload reads? |
| | |
| v |
| [8] Partitioning -> is the table too big to scan? |
| |
+---------------------------------------------------------------+
The order matters. Steps 1 through 4 are free — they cost developer time but no infrastructure. Steps 5 through 8 cost real money, real ops complexity, and real risk. Always exhaust the cheap steps before reaching for the expensive ones. A team that adds a Redis cache before fixing an N+1 is paying for a 50ms cache to hide a problem they could have eliminated entirely with a single JOIN.
Step 1 — EXPLAIN ANALYZE: Find the Expensive Line
EXPLAIN ANALYZE runs the query and returns the actual execution plan with real timings, real row counts, and real I/O. It is the single most important tool in the database performance toolbox. Without it you are guessing. With it, the slow line is usually obvious within thirty seconds.
-- Setup: a sample orders table for the rest of the lesson
CREATE TABLE orders (
id bigserial PRIMARY KEY,
customer_id bigint NOT NULL,
status text NOT NULL,
total_cents bigint NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
-- Insert 2 million rows so the planner has something to think about
INSERT INTO orders (customer_id, status, total_cents, created_at)
SELECT
(random() * 100000)::bigint,
(ARRAY['pending','paid','shipped','refunded'])[1 + (random() * 3)::int],
(random() * 50000)::bigint,
now() - (random() * interval '365 days')
FROM generate_series(1, 2000000);
-- Update planner statistics so EXPLAIN ANALYZE is honest
ANALYZE orders;
-- The slow query: recent paid orders for one customer
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total_cents, created_at
FROM orders
WHERE customer_id = 42
AND status = 'paid'
AND created_at >= now() - interval '30 days'
ORDER BY created_at DESC
LIMIT 20;
Sample output (before any index):
Limit (cost=53412.10..53414.43 rows=20 width=24) (actual time=412.831..412.847 rows=20 loops=1)
Buffers: shared hit=18234 read=4116
-> Sort (cost=53412.10..53413.62 rows=608 width=24) (actual time=412.829..412.838 rows=20 loops=1)
Sort Key: created_at DESC
Sort Method: top-N heapsort Memory: 27kB
-> Seq Scan on orders (cost=0.00..53395.00 rows=608 width=24) (actual time=0.142..412.490 rows=587 loops=1)
Filter: ((customer_id = 42) AND (status = 'paid') AND (created_at >= (now() - '30 days'::interval)))
Rows Removed by Filter: 1999413
Planning Time: 0.184 ms
Execution Time: 412.901 ms
How to read this output:
- The bottom node is a
Seq Scan. Postgres scanned all 2 million rows and threw away 1,999,413 of them. That is the smoking gun. actual time=...412.490on that node means the scan itself took 412 ms. Everything above it added rounding-error overhead. The scan IS the problem.Buffers: shared hit=18234 read=4116shows the I/O cost — 4116 pages had to come from disk, the rest were already in the page cache. After a few runs they will all be hits, but the scan still touches them all.cost=is the planner's estimate.actual time=is reality. When they disagree wildly, statistics are stale (Step 3).
The fix in this case is an index, but only because EXPLAIN ANALYZE told us a sequential scan was the bottleneck. If the same query had shown an index scan with a slow Sort node on top, the answer would have been a different index ordering. If it had shown an index scan with a slow Nested Loop, the answer would have been a JOIN rewrite. Always read the plan before acting.
MySQL note: MySQL uses
EXPLAIN ANALYZE(8.0.18+) with similar semantics, orEXPLAIN FORMAT=JSONfor more detail. The vocabulary differs slightly — MySQL says "type: ALL" for what Postgres calls aSeq Scan.
Step 2 — Index Strategy: Composite, Order, and Covering
Once EXPLAIN ANALYZE has identified a sequential scan you cannot afford, the next question is which index will the planner actually use. Three rules cover 90% of cases.
Rule 1 — Composite index column order matches WHERE clause. A B-tree index on (a, b, c) can be used for WHERE a = ?, WHERE a = ? AND b = ?, and WHERE a = ? AND b = ? AND c = ?, but not for WHERE b = ? alone. Put the most selective equality column first.
Rule 2 — Sort columns at the end. If your query orders by created_at DESC, putting created_at at the end of the composite lets the planner walk the index in reverse and skip the sort entirely.
Rule 3 — Covering indexes (INCLUDE) avoid heap fetches. If your SELECT only needs columns that are in the index, Postgres can answer the whole query from the index pages without ever touching the table. This is called an index-only scan and it is dramatically faster.
-- The right index for the query above:
-- equality columns first, sort column last, selected columns INCLUDEd
CREATE INDEX idx_orders_customer_status_created
ON orders (customer_id, status, created_at DESC)
INCLUDE (total_cents);
-- Re-run the same EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, total_cents, created_at
FROM orders
WHERE customer_id = 42
AND status = 'paid'
AND created_at >= now() - interval '30 days'
ORDER BY created_at DESC
LIMIT 20;
Sample output (with index):
Limit (cost=0.43..8.62 rows=20 width=24) (actual time=0.041..0.118 rows=20 loops=1)
Buffers: shared hit=8
-> Index Only Scan using idx_orders_customer_status_created on orders
(cost=0.43..249.12 rows=608 width=24) (actual time=0.040..0.114 rows=20 loops=1)
Index Cond: ((customer_id = 42) AND (status = 'paid') AND (created_at >= (now() - '30 days'::interval)))
Heap Fetches: 0
Planning Time: 0.221 ms
Execution Time: 0.142 ms
412 ms -> 0.14 ms. A 2,900x speedup, from one index. Note Heap Fetches: 0 — the index alone answered the query because total_cents was in the INCLUDE clause and id is the primary key (Postgres includes the heap pointer automatically). Buffers: shared hit=8 versus the previous 22,350.
When NOT to add an index:
- Tables under ~10,000 rows. A sequential scan of a small table is faster than the index lookup.
- Columns with very low cardinality (e.g. a
booleanwith 50/50 split). The planner will reject the index anyway. - Write-heavy tables where the index maintenance cost outweighs read speedup. Every
INSERT,UPDATE, andDELETEhas to update every index on the table. - "Just in case" indexes. Every unused index costs disk, memory, and write throughput. Run
pg_stat_user_indexesperiodically and drop indexes withidx_scan = 0.
MySQL note: InnoDB tables are clustered on the primary key, so secondary indexes always include the PK columns implicitly. MySQL 8 added descending indexes; before that,
ORDER BY ... DESCcould not use a B-tree backwards.
Step 3 — VACUUM and ANALYZE: Keep Statistics Honest
PostgreSQL uses MVCC (multi-version concurrency control). Every UPDATE writes a new row version and marks the old one as dead. Every DELETE marks the row as dead but does not remove it. VACUUM is the background process that reclaims space from dead rows. ANALYZE updates the planner's statistics about column distributions. Both run automatically via autovacuum, but the defaults are conservative and on busy tables they fall behind.
When statistics are stale, the planner makes bad decisions. It might think a column has 100 distinct values when it actually has 100,000, choose a nested loop where a hash join would be 50x faster, or pick a sequential scan because it underestimates how selective a WHERE clause is. The query "suddenly got slow last Tuesday" is often a stale-statistics story.
-- Check when each table was last vacuumed and analyzed
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
round(100.0 * n_dead_tup / NULLIF(n_live_tup + n_dead_tup, 0), 1) AS dead_pct,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE n_live_tup > 1000
ORDER BY n_dead_tup DESC
LIMIT 10;
Sample output:
schemaname | relname | n_live_tup | n_dead_tup | dead_pct | last_autovacuum | last_autoanalyze
------------+---------+------------+------------+----------+------------------------------+------------------------------
public | orders | 2000000 | 487612 | 19.6 | 2026-04-11 03:14:22+00 | 2026-04-11 03:14:25+00
public | events | 15400000 | 3120000 | 16.8 | 2026-04-09 22:01:11+00 | 2026-04-09 22:01:48+00
public | sessions| 920000 | 410000 | 30.8 | 2026-04-12 11:03:00+00 | 2026-04-08 04:12:00+00
Anything with dead_pct > 20% is hurting query performance. The sessions table is also showing a stale last_autoanalyze — four days behind on a high-churn table is a planner-quality problem.
-- Manual VACUUM ANALYZE for an urgent fix.
-- VACUUM does not block reads or writes; ANALYZE is fast.
VACUUM (ANALYZE, VERBOSE) sessions;
-- Tune autovacuum to be more aggressive on this specific table
ALTER TABLE sessions SET (
autovacuum_vacuum_scale_factor = 0.05, -- vacuum at 5% dead, default 20%
autovacuum_analyze_scale_factor = 0.02 -- analyze at 2% changed, default 10%
);
The rule: never run VACUUM FULL on a production table during business hours. It rewrites the entire table and takes an ACCESS EXCLUSIVE lock — your application is offline until it finishes. Use pg_repack or pg_squeeze extensions for online table rewrites instead.
Step 4 — Eliminate N+1: One Query, Not One Hundred
N+1 is the most common application-side performance bug, and EXPLAIN ANALYZE will never find it because each individual query is fast. The pattern: you fetch a list of N parents, then loop over them and fetch each parent's children one at a time. One query becomes N+1 queries. Each is 1 ms in isolation, but 100 of them is 100 ms — and the load on the database is 100x what it should be.
-- The N+1 anti-pattern (executed by an ORM in a loop):
-- Query 1: get the customers
SELECT id, name FROM customers WHERE region = 'EU' LIMIT 100;
-- Then in application code, for each of the 100 customers:
SELECT id, total_cents FROM orders WHERE customer_id = $1 ORDER BY created_at DESC LIMIT 5;
SELECT id, total_cents FROM orders WHERE customer_id = $2 ORDER BY created_at DESC LIMIT 5;
-- ...100 more round trips...
The fix is to express the entire intent as one query and let the database do the join.
-- The single-query rewrite using LATERAL to get top-N per group
SELECT
c.id AS customer_id,
c.name,
o.id AS order_id,
o.total_cents,
o.created_at
FROM customers c
LEFT JOIN LATERAL (
-- For each customer, pull the 5 most recent orders.
-- LATERAL lets the inner query reference c.id from the outer query.
SELECT id, total_cents, created_at
FROM orders
WHERE customer_id = c.id
ORDER BY created_at DESC
LIMIT 5
) o ON true
WHERE c.region = 'EU'
ORDER BY c.id, o.created_at DESC;
Sample output (fragment):
customer_id | name | order_id | total_cents | created_at
-------------+------------+----------+-------------+------------------------
12 | Acme GmbH | 918273 | 42100 | 2026-04-13 09:11:02+00
12 | Acme GmbH | 917001 | 18900 | 2026-04-12 14:22:31+00
12 | Acme GmbH | 911842 | 96400 | 2026-04-11 08:05:14+00
17 | Bjorn AB | 918100 | 7200 | 2026-04-13 12:04:00+00
...
(500 rows)
One round trip. One query plan. Total time about 4 ms instead of 100 ms, and the database does ~1% the work. Most ORMs support this via "eager loading" or "includes" — Sequelize include, Prisma include, ActiveRecord includes, SQLAlchemy selectinload. Learn the syntax for your ORM and use it religiously.
To detect N+1 in production, log queries with their request ID and alert when a single request fires more than a threshold (say, 20). Tools like pg_stat_statements show per-query call counts and average duration:
-- Find the queries called most often (N+1 candidates always rank high)
SELECT
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
round(total_exec_time::numeric, 0) AS total_ms,
query
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;
Step 5 — Connection Pooling: Don't Starve the Database
Every PostgreSQL connection is a real OS process consuming roughly 5-10 MB of RAM and a slot in the shared resource limits. The default max_connections is 100. If your application opens a fresh connection per request, you will hit that limit at modest load and every subsequent request will block waiting for a free slot. Worse, even at 100 active backends Postgres spends a meaningful fraction of CPU just context-switching between them.
The fix is a connection pool: a long-lived set of N connections that the application borrows from and returns to. The pool sits between your app and the database. Two layers usually exist in production:
- Application-side pool (HikariCP for JVM,
pgpool for Node,psycopg_poolfor Python) — limits how many connections one process can hold. - External pooler (
PgBouncer,Pgpool-II, AWS RDS Proxy) — limits total connections across all application processes and multiplexes thousands of client sessions onto a small number of real backend connections.
+---------------------------------------------------------------+
| CONNECTION POOLING TOPOLOGY |
+---------------------------------------------------------------+
| |
| 1000 web workers |
| | |
| | each holds 1-5 client conns |
| v |
| PgBouncer (transaction pool) |
| | |
| | multiplexes onto 50 real backend conns |
| v |
| PostgreSQL (max_connections = 100) |
| |
| Without PgBouncer: 1000 web workers -> 1000 conns -> CRASH |
| With PgBouncer: 1000 web workers -> 50 conns -> happy |
| |
+---------------------------------------------------------------+
Pool sizing rule of thumb: start with pool_size = (cores * 2) + spindles per database server. For an 8-core RDS instance with SSDs, that is around 16-20 connections. More is almost always worse because the bottleneck moves from "waiting on a connection" to "fighting over CPU and locks."
-- Diagnose connection saturation in real time
SELECT
state,
count(*) AS conns,
max(now() - state_change) AS oldest
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY state
ORDER BY conns DESC;
Sample output:
state | conns | oldest
---------------------+-------+-----------------
active | 18 | 00:00:00.412
idle | 62 | 00:14:22.110
idle in transaction | 7 | 00:01:08.330
(null) | 2 | 00:03:11.001
The killer here is idle in transaction with an oldest of 1 minute 8 seconds. Those connections are holding locks and blocking vacuum. Application code probably forgot to commit or rollback after an error. Set idle_in_transaction_session_timeout = '60s' to have Postgres kill them automatically.
MySQL note: MySQL connections are threads, not processes, so they are cheaper than Postgres connections — but pooling still matters because each thread holds buffer memory and lock state. ProxySQL fills the same role as PgBouncer.
Step 6 — Caching: Skip the Database Entirely
The fastest query is the one you do not run. Caching pushes results closer to the user — into application memory, into Redis, into a CDN — so most reads never touch Postgres at all. Three layers exist, and each has a different invalidation story.
Layer 1 — Postgres internal cache (shared_buffers). Already in front of your queries. Tune shared_buffers to ~25% of system RAM. Nothing to do in application code.
Layer 2 — Application cache (Redis, Memcached, in-process LRU). You write the cache lookup explicitly. Cache the result of expensive queries with a TTL or invalidate on write. Best for "read 1000x, write 1x" data: catalogs, configuration, user profiles.
Layer 3 — Edge cache (CDN, reverse proxy). Cache entire HTTP responses at the edge. Best for anonymous, public content. Invalidation is via HTTP Cache-Control headers or explicit purge APIs.
+---------------------------------------------------------------+
| CACHING LAYERS (request path) |
+---------------------------------------------------------------+
| |
| browser -> CDN -> app server -> Redis -> Postgres |
| | | | | |
| | | | +-> 50ms |
| | | +-> 1ms ^ |
| | +-> 10us | |
| +-> 5ms | |
| |
| Hit at CDN: 5ms total, DB load = 0 |
| Hit at Redis: 1ms total, DB load = 0 |
| Miss to DB: 50ms, then warm Redis for next time |
| |
+---------------------------------------------------------------+
The cardinal cache rule: invalidation is harder than caching. Pick a strategy and write it down before adding a single line of cache code. The two strategies that actually work in practice:
- TTL-only (read-through). Cache for N seconds. Accept that data can be stale for up to N seconds. Simple and bulletproof.
- Write-through invalidation. Every write to the database also deletes (or updates) the matching cache key. Catches all writes, but only if every write path goes through the same code. One stray
UPDATEfrom a migration script and your cache is permanently wrong.
Anything more complex (cache-aside with versioning, event-driven invalidation via change data capture) belongs to teams that have already maxed out steps 1-5.
Step 7 — Read Replicas: Scale Reads Horizontally
Once a single primary cannot handle read traffic, the next move is a read replica. The primary streams its WAL (write-ahead log) to one or more standby servers. Reads can be routed to any replica. Writes still go to the primary. Most workloads are 90%+ reads, so adding two replicas can roughly triple read capacity overnight.
Two trade-offs you must understand:
Replication lag. A replica is always slightly behind the primary — usually milliseconds, sometimes seconds under load. A user who writes data and immediately reads it back from a replica might see the old version. The fix is read-your-writes consistency: route the read back to the primary if the user has written within the last N seconds, or use Postgres pg_wait_for_replica_lsn to wait for the specific WAL position.
Eventual consistency at the application boundary. Pages that mix recent writes (e.g. "you just left a comment") with bulk reads (e.g. a comment list) can show inconsistent state. Either route the whole page to the primary, or design the UI to render the user's own write optimistically.
-- On the primary: check replication status
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS sent_lag_bytes,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag_bytes,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
Sample output:
client_addr | state | sent_lag_bytes | replay_lag_bytes | write_lag | flush_lag | replay_lag
-------------+-----------+----------------+------------------+--------------+--------------+----------------
10.0.2.14 | streaming | 0 | 4096 | 00:00:00.001 | 00:00:00.002 | 00:00:00.012
10.0.2.15 | streaming | 0 | 1048576 | 00:00:00.018 | 00:00:00.044 | 00:00:00.310
Replica 10.0.2.15 is 310 ms behind the primary in replay — fine for analytics, dangerous for "did my comment save?" reads. Set per-route policies: write paths and immediate read-after-write -> primary. Background jobs, dashboards, reports -> replica.
MySQL note: MySQL replication is logical (statement or row based) by default, while Postgres is physical (byte for byte WAL). MySQL 8 logical replication is comparable but with different operational semantics. The application-side pattern is identical.
Step 8 — Partitioning: When the Table Is Just Too Big
Partitioning splits one logical table into many physical sub-tables based on a key (commonly date, customer ID, or region). The planner only scans the partitions that match the WHERE clause — called partition pruning — so a query for "yesterday's events" never touches last year's data.
Partitioning is a real operational commitment. Indexes, constraints, and statistics live per-partition. Migrations are more complex. Do not partition until your largest table is at least ~50 GB or has tens of millions of rows being scanned for time-bounded queries.
-- Partition the events table by month using declarative range partitioning
CREATE TABLE events (
id bigserial,
user_id bigint NOT NULL,
event_type text NOT NULL,
payload jsonb,
occurred_at timestamptz NOT NULL,
PRIMARY KEY (id, occurred_at)
) PARTITION BY RANGE (occurred_at);
-- Create monthly partitions
CREATE TABLE events_2026_03 PARTITION OF events
FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
CREATE TABLE events_2026_04 PARTITION OF events
FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');
CREATE TABLE events_2026_05 PARTITION OF events
FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');
-- Index each partition (Postgres 11+ propagates indexes to children
-- if you create them on the parent)
CREATE INDEX ON events (user_id, occurred_at DESC);
-- Query: planner prunes to one partition automatically
EXPLAIN
SELECT count(*) FROM events
WHERE occurred_at >= '2026-04-10' AND occurred_at < '2026-04-15';
Sample output:
Aggregate (cost=8412.30..8412.31 rows=1 width=8)
-> Index Only Scan using events_2026_04_occurred_at_idx on events_2026_04
(cost=0.42..8398.10 rows=5680 width=0)
Index Cond: ((occurred_at >= '2026-04-10') AND (occurred_at < '2026-04-15'))
Only events_2026_04 was touched. The other partitions might as well not exist for this query. Partition pruning happens at plan time when the bounds are constant, and at execution time when they are parameters. Drop old partitions with DROP TABLE events_2025_04 — instant, no DELETE, no vacuum, no bloat.
Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Vertical stack of 8 hexagonal stages numbered 1-8, each glowing cyan (#4fc3f7) on the left edge with a pink (#ff5c8a) right edge. Labels in white monospace: 1 EXPLAIN ANALYZE, 2 INDEX STRATEGY, 3 VACUUM/ANALYZE, 4 N+1 ELIMINATION, 5 CONNECTION POOLING, 6 CACHING, 7 READ REPLICAS, 8 PARTITIONING. A pink arrow on the right side runs top to bottom labeled 'cheaper -> costlier'. At the very bottom, a stopwatch icon transitions from '18s' (red) to '180ms' (cyan). Subtle grid pattern overlay."
Common Mistakes
1. Adding indexes without reading the query plan.
Throwing indexes at slow queries is the optimization equivalent of taking antibiotics for a viral infection. It might help, it might hurt, and you have no way to know without EXPLAIN ANALYZE. Worse, every unused index slows down every write to the table forever. The discipline is: never create an index without a plan that proves it will be used and a benchmark that proves it helps.
2. Optimizing without a baseline measurement.
"It feels faster now" is not a metric. Capture the before-state with EXPLAIN (ANALYZE, BUFFERS) and a wall-clock timing. Make exactly one change. Capture the after-state. Compare. If you change five things at once, you cannot tell which one helped, which one hurt, and which one was placebo.
3. Treating Redis as a fix for missing indexes. Adding a cache layer in front of a slow query does not fix the query — it just hides it until the cache expires. The first request after expiry pays the full cost, and any cache miss takes the system back to the broken state. Always fix the underlying query first, then add caching only if you still need to.
4. Forgetting that VACUUM and ANALYZE exist.
Every "the database was fine yesterday and slow today" mystery has a stale-statistics or table-bloat explanation hiding inside it. Schedule VACUUM ANALYZE on hot tables weekly even if autovacuum is enabled, and monitor pg_stat_user_tables for anything with dead_pct > 20%.
5. Reaching for partitioning too early. Partitioning is operationally expensive. Indexes, constraints, foreign keys, and migrations all become more complex. A 5 GB table does not need partitioning — it needs a better index. Wait until you have a real time-series workload and tens of GB before adding the partition complexity.
Interview Questions
1. "Walk me through how you would diagnose a query that suddenly became slow in production."
I would start by reproducing it against the production schema with EXPLAIN (ANALYZE, BUFFERS) and look for the most expensive node in the plan. Sequential scans on large tables, sort nodes that spill to disk, and nested loops with high iteration counts are the usual suspects. I would compare estimated row counts to actual row counts — if they disagree by an order of magnitude, statistics are stale and the answer is ANALYZE. If the plan looks reasonable but the query is still slow, I would check pg_stat_activity for lock waits and pg_stat_user_tables for bloat. Only after I have a specific diagnosis from the plan would I make a change, and I would re-run EXPLAIN ANALYZE after to prove the change actually helped. The trap to avoid is guessing — adding indexes or rewriting joins without evidence wastes hours and can make things worse.
2. "What is the difference between an index scan, an index-only scan, and a bitmap index scan, and when does each one win?"
An index scan walks the B-tree, gets a row pointer for each match, and visits the heap to fetch the actual row. It is best when the query returns a small number of rows in index order. An index-only scan is the same but skips the heap fetch entirely because every column the query needs is already stored in the index — either because the index covers them naturally or because of an INCLUDE clause. It is dramatically faster when the table is too big to fit in memory. A bitmap index scan builds a bitmap of all matching row pointers, sorts them by physical page, and visits the heap once per page. It is best when many rows match (so the savings from sequential I/O outweigh the bitmap-build overhead), and Postgres can combine bitmaps from multiple indexes with AND/OR. The planner picks among them based on row count estimates and correlation statistics.
3. "How does an N+1 query problem manifest, and why is it invisible to traditional database monitoring?"
N+1 happens when application code fetches a list of N items and then loops over them, firing one extra query per item to load related data. Each individual query is small and fast — often under 1 ms — so monitoring tools that alert on slow queries see nothing wrong. The damage shows up at the request level: a page that should have run 1 query runs 101, multiplying database round-trip time and connection pool pressure. Detection requires per-request query counting, which usually means application middleware that logs query counts per request ID, or pg_stat_statements ranking by calls rather than by total_exec_time. The fix is to express the join in SQL using JOIN, LATERAL, or array aggregation, and to use the ORM's eager loading APIs (include, selectinload, includes) consistently.
4. "When would you reach for a connection pooler like PgBouncer instead of just relying on the application driver's pool?"
The application-side pool limits connections per application process, but in any system with many processes — multiple web workers, multiple containers, autoscaling groups, serverless functions — the total connection count is the sum across all of them. A handful of small instances can easily exceed Postgres max_connections, and short-lived processes (Lambda, cron jobs) cannot keep their pools warm. PgBouncer (or RDS Proxy, Pgpool-II) sits between the application and the database and multiplexes thousands of incoming client connections onto a small number of real backend connections, typically using transaction pooling mode. This is essential for serverless, valuable for any horizontally scaled deployment, and harmless for a single-process app. The trade-off is that some Postgres features (session-level state, prepared statements, LISTEN/NOTIFY, advisory locks) are restricted in transaction pooling mode, so the application must be written to avoid them.
5. "Explain the trade-offs of read replicas for scaling. When does it backfire?"
Read replicas scale read throughput by streaming WAL from the primary to one or more standbys, letting application code route reads to a replica and writes to the primary. They are excellent for read-heavy workloads (analytics, dashboards, content APIs) and they isolate long-running reports from transactional traffic. They backfire in three situations. First, replication lag breaks read-after-write expectations — a user who creates a comment and immediately reloads the page may see the old version because the replica has not caught up. Second, routing complexity leaks into application code: every query needs a "primary or replica" decision, and getting it wrong causes subtle inconsistency bugs. Third, replicas do not scale writes at all — if the bottleneck is INSERTs or UPDATEs, adding replicas makes things worse because the primary now also pays the WAL streaming cost. The cure for read-after-write issues is to either route post-write reads to the primary for a short window, or to use synchronous replication for critical paths and accept the latency cost.
Quick Reference — Performance Optimization Cheat Sheet
+---------------------------------------------------------------+
| THE 8-STEP CHECKLIST |
+---------------------------------------------------------------+
| |
| [1] EXPLAIN ANALYZE |
| EXPLAIN (ANALYZE, BUFFERS) <query> |
| Find the slowest node. Read it before changing anything. |
| |
| [2] INDEX STRATEGY |
| CREATE INDEX ... (eq_cols..., sort_col DESC) |
| INCLUDE (selected_cols) |
| Equality first, sort last, INCLUDE for index-only scan. |
| |
| [3] VACUUM / ANALYZE |
| VACUUM (ANALYZE) <table> |
| Check pg_stat_user_tables for dead_pct > 20%. |
| |
| [4] N+1 ELIMINATION |
| Use JOIN / LATERAL / array_agg / ORM eager loading. |
| Watch pg_stat_statements ORDER BY calls DESC. |
| |
| [5] CONNECTION POOLING |
| pool_size = (cores * 2) + spindles |
| Add PgBouncer for serverless or many app processes. |
| Kill idle_in_transaction with session timeout. |
| |
| [6] CACHING |
| Layer: Postgres shared_buffers -> Redis -> CDN |
| Pick TTL or write-through. Never both. |
| |
| [7] READ REPLICAS |
| Route reads to replica, writes to primary. |
| Watch pg_stat_replication for replay_lag. |
| |
| [8] PARTITIONING |
| PARTITION BY RANGE (date_col) |
| Only when table > ~50 GB or millions of rows scanned. |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| DIAGNOSTIC QUERIES TO MEMORIZE |
+---------------------------------------------------------------+
| |
| Slowest queries by total time: |
| SELECT query, calls, total_exec_time, mean_exec_time |
| FROM pg_stat_statements ORDER BY total_exec_time DESC; |
| |
| Tables that need vacuum/analyze: |
| SELECT relname, n_dead_tup, last_autovacuum |
| FROM pg_stat_user_tables ORDER BY n_dead_tup DESC; |
| |
| Unused indexes (drop candidates): |
| SELECT indexrelname, idx_scan FROM pg_stat_user_indexes |
| WHERE idx_scan = 0; |
| |
| Active connections by state: |
| SELECT state, count(*) FROM pg_stat_activity |
| GROUP BY state; |
| |
| Replication lag: |
| SELECT client_addr, replay_lag |
| FROM pg_stat_replication; |
| |
+---------------------------------------------------------------+
| Symptom | First Diagnostic | Likely Fix |
|---|---|---|
| Slow single query | EXPLAIN (ANALYZE, BUFFERS) | Add or fix index |
| Was fast, now slow | pg_stat_user_tables dead_pct | VACUUM ANALYZE |
| Slow page, fast queries | Per-request query count | Eliminate N+1 |
| "Too many connections" error | pg_stat_activity state count | Add PgBouncer |
| High DB CPU on read traffic | Read/write ratio in app | Add read replicas |
| Same query hit 1000x/sec | pg_stat_statements calls | Add Redis cache |
| Slow scans on huge table | Table size + query bounds | Partition by date |
| Stale data after write | Replication lag | Route reads to primary briefly |
Prev: Lesson 12.2 -- Stored Procedures and Functions Next: (Course Complete)
Course Complete — congratulations! You have worked through every chapter from the relational model to advanced performance tuning. You now have the vocabulary, the diagnostic instincts, and the production checklists to walk into any database interview — or any 3 AM incident — with confidence. Bookmark the cheat sheet above. The next slow query you meet will be easier than the last one.
This is Lesson 12.3 of the Database Interview Prep Course -- 12 chapters, 58 lessons. You have completed the course.