Database Interview Prep
CRUD Operations

CREATE

INSERT Data: Single Row, Multi-Row, RETURNING, and UPSERT

LinkedIn Hook

"Your import job has been running for six hours. It is inserting 2 million rows, one INSERT statement at a time, inside a for loop in application code. You could have finished it in 90 seconds."

Most developers learn INSERT INTO table VALUES (...) on day one and never revisit it. That is how you end up with production pipelines that fire a million round-trips at the database, crash halfway through, leave orphaned rows behind, and silently overwrite data because nobody reached for ON CONFLICT.

INSERT looks trivial, but the gap between "the syntax that works" and "the syntax a senior engineer uses" is huge. Multi-row INSERT is atomic and ten times faster than a loop. INSERT ... RETURNING hands you the generated primary key in the same round trip. INSERT ... SELECT copies millions of rows between tables without ever leaving the database. ON CONFLICT DO UPDATE gives you an idempotent upsert that survives retries. And COPY blows all of them out of the water for bulk loads.

In Lesson 3.1, I break down every production-grade form of INSERT in PostgreSQL: column lists, defaults, multi-row batching, RETURNING, INSERT ... SELECT, ON CONFLICT upserts, and when to reach for COPY or MERGE instead.

Read the full lesson -> [link]

#SQL #PostgreSQL #Database #BackendDevelopment #InterviewPrep #DataEngineering


CREATE thumbnail


What You'll Learn

  • Basic INSERT INTO syntax and why the column list is not optional in real code
  • Multi-row INSERT in a single statement vs firing many statements in a loop — atomicity and the real performance gap
  • The DEFAULT keyword and how it interacts with SERIAL, IDENTITY, now(), and schema defaults
  • INSERT ... SELECT for copying or transforming rows from one table into another
  • INSERT ... RETURNING to get generated IDs, timestamps, and computed columns without a second round trip
  • ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE SET (UPSERT) with correct conflict targets
  • When to abandon INSERT entirely and reach for COPY for bulk loads
  • The SQL standard MERGE statement and how it compares to PostgreSQL's ON CONFLICT

The Ledger Analogy — Adding New Entries to a Book

Imagine an old-fashioned accounting ledger: a leather book with ruled rows, one row per transaction. Each row has columns — date, description, debit, credit, balance. When a new sale happens, the bookkeeper opens the ledger, finds the next blank row, and writes the new entry with a fountain pen. The row is permanent, the column order is fixed, and once the ink dries, it is part of the official record.

Now imagine the bookkeeper has to enter 500 sales from a day's receipts. Option A: open the ledger, write one row, close the ledger, go get the next receipt, reopen the ledger, write the next row. Option B: sort all 500 receipts on the desk first, open the ledger once, and write all 500 rows in one sitting. Option A takes all afternoon. Option B takes twenty minutes. The ledger itself is the same — the difference is how many times you opened it.

That is exactly the difference between firing 500 single-row INSERT statements in a loop and sending one multi-row INSERT (or better, a COPY). Each round trip to the database is the bookkeeper opening and closing the ledger: parse, plan, acquire locks, write WAL, fsync, release locks, send result. Batch the work, and those fixed costs amortise across every row.

+---------------------------------------------------------------+
|           THE LEDGER MODEL OF INSERT                          |
+---------------------------------------------------------------+
|                                                                |
|  SINGLE ROW (one receipt at a time):                           |
|    INSERT INTO ledger (date, desc, amt) VALUES (...);          |
|    -> 1 round trip, 1 parse, 1 plan, 1 WAL flush               |
|                                                                |
|  MULTI ROW (a stack of receipts, one sitting):                 |
|    INSERT INTO ledger (date, desc, amt) VALUES                 |
|      (...), (...), (...), (...), (...);                       |
|    -> 1 round trip, 1 parse, 1 plan, 1 WAL flush, N rows       |
|                                                                |
|  BULK COPY (dump the whole shoebox onto the desk):             |
|    COPY ledger (date, desc, amt) FROM STDIN;                   |
|    -> streaming protocol, no per-row parsing, fastest path     |
|                                                                |
+---------------------------------------------------------------+

With the analogy in place, let's walk through every form of INSERT you will actually use in production.


The Basic INSERT — Syntax and Why the Column List Matters

The minimum viable INSERT names the table, lists the columns, and supplies a row of values.

-- Assume a simple users table
CREATE TABLE users (
  id          BIGSERIAL PRIMARY KEY,
  email       TEXT NOT NULL UNIQUE,
  full_name   TEXT NOT NULL,
  role        TEXT NOT NULL DEFAULT 'member',
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Basic single-row INSERT with an explicit column list
INSERT INTO users (email, full_name)
VALUES ('ada@example.com', 'Ada Lovelace');
-- Output: INSERT 0 1
-- id and role and created_at are filled in by their DEFAULTs.

You can technically omit the column list:

-- Works today, breaks tomorrow
INSERT INTO users
VALUES (DEFAULT, 'grace@example.com', 'Grace Hopper', 'admin', DEFAULT);

This "works" because PostgreSQL binds values to columns by position in the table's current definition. The moment someone adds a deleted_at column, runs a migration that reorders columns, or inserts a new status field, every unqualified INSERT in your codebase silently shifts. Column names cost three extra seconds to type and save three hours of debugging.

Rule: always write the column list. The only exception is throwaway psql one-liners.


DEFAULT Values and the DEFAULT Keyword

Every column in the table above has sensible defaults: id from the sequence, role as 'member', created_at as now(). You can explicitly request the default for a column with the DEFAULT keyword:

-- Explicitly ask for the default role, override created_at
INSERT INTO users (email, full_name, role, created_at)
VALUES (
  'linus@example.com',
  'Linus Torvalds',
  DEFAULT,                        -- -> 'member'
  '2024-01-15 10:00:00+00'        -- explicit timestamp
);

The DEFAULT keyword is useful in three situations:

  1. Multi-row inserts where some rows want the default and others do not.
  2. Generated columns (GENERATED ALWAYS AS IDENTITY) where you are forbidden from supplying a value and must write DEFAULT or omit the column.
  3. Readability when you want to make it obvious at the call site that a column is using its schema default rather than a hand-rolled value.

If you simply omit a column from the column list, PostgreSQL applies its default automatically — you do not need to write DEFAULT explicitly. Use the keyword only when you need it.


Multi-Row INSERT — One Statement, Many Rows

This is the single highest-leverage INSERT feature, and the one most often missed by juniors.

-- Insert five users in ONE statement
INSERT INTO users (email, full_name, role) VALUES
  ('alice@example.com',   'Alice Chen',    'admin'),
  ('bob@example.com',     'Bob Martinez',  'member'),
  ('carol@example.com',   'Carol Ng',      'member'),
  ('dave@example.com',    'Dave Patel',    'member'),
  ('eve@example.com',     'Eve Ibrahim',   'member');
-- Output: INSERT 0 5

This is not syntactic sugar for five separate inserts. A single multi-row INSERT gives you three concrete wins over a loop:

1. Atomicity

A multi-row INSERT is one statement. Either every row lands or none of them do. If row 4 violates a UNIQUE constraint, the whole statement rolls back and rows 1-3 are not in the table. If you ran five separate INSERTs without wrapping them in a transaction, rows 1-3 would already be committed when row 4 failed, leaving you with a partial, inconsistent dataset.

2. Performance

The performance gap is enormous and comes from amortising fixed per-statement costs across every row:

+---------------------------------------------------------------+
|     PER-STATEMENT COST BREAKDOWN (conceptual)                 |
+---------------------------------------------------------------+
|                                                                |
|   Loop of 1000 single-row INSERTs:                             |
|     1000 x (network round trip)                                |
|     1000 x (parse + plan)                                      |
|     1000 x (acquire locks / write WAL record / fsync?)         |
|     1000 x (send result, update session state)                 |
|                                                                |
|   One INSERT with 1000 VALUES tuples:                          |
|        1 x (network round trip)                                |
|        1 x (parse + plan)                                      |
|        1 x (WAL batched, single fsync at commit)               |
|        1 x (send result)                                       |
|                                                                |
|   Typical speedup on LAN: 10x - 50x                            |
|   Typical speedup over WAN: 100x+                              |
|                                                                |
+---------------------------------------------------------------+

3. Fewer lock acquisitions

Every statement takes and releases row/page locks. Batching reduces lock churn and contention with concurrent writers.

Practical limits: PostgreSQL accepts very large multi-row inserts, but the query parser allocates memory proportional to the statement size. Batch sizes of 500 to 5000 rows per statement hit a sweet spot — much larger and you risk parser memory pressure and statement timeouts; much smaller and you leave performance on the table. For millions of rows, reach for COPY instead.


INSERT ... SELECT — Copying and Transforming Between Tables

You can use the result of a SELECT as the source of rows for an INSERT. This keeps all the data inside the database — no round trips to the application — and works for thousands or millions of rows.

-- Suppose we have a staging table from a nightly import
CREATE TABLE staging_users (
  email      TEXT,
  full_name  TEXT,
  signup_src TEXT
);

-- Copy qualifying rows from staging into the real users table,
-- applying transformations and filters along the way.
INSERT INTO users (email, full_name, role)
SELECT
  lower(email),                          -- normalize case
  initcap(full_name),                    -- Title Case
  CASE signup_src
    WHEN 'enterprise' THEN 'admin'
    ELSE 'member'
  END
FROM staging_users
WHERE email IS NOT NULL
  AND email LIKE '%@%'                   -- basic sanity filter
  AND NOT EXISTS (                       -- skip already-imported
    SELECT 1 FROM users u WHERE u.email = lower(staging_users.email)
  );
-- Output: INSERT 0 1247

INSERT ... SELECT is the workhorse of ETL inside the database, archival jobs (INSERT INTO orders_archive SELECT * FROM orders WHERE created_at < ...), and denormalisation. It runs as a single transaction and a single statement, so atomicity and the batching performance win both apply.

One subtlety: when both the source and target are the same table, be careful about the visibility of rows you just inserted. PostgreSQL uses a snapshot of the source at the start of the statement, so self-inserts do not loop forever, but always double-check with a WHERE filter on an existing condition.


INSERT ... RETURNING — Getting the Generated ID in One Round Trip

Every ORM user has hit this problem: you insert a row with a SERIAL primary key, and now you need that key to insert child rows into another table. The naive pattern is INSERT then SELECT currval('users_id_seq') or SELECT max(id) FROM users WHERE email = .... Both are wrong — the second is a race condition waiting to happen, and both cost an extra round trip.

PostgreSQL's answer is RETURNING, which attaches a SELECT-like projection to the INSERT and hands the result back in the same statement.

-- Single row: get back the generated id and created_at
INSERT INTO users (email, full_name)
VALUES ('marie@example.com', 'Marie Curie')
RETURNING id, created_at;
-- id  |         created_at
-- ----+-------------------------------
-- 42  | 2026-04-13 10:15:32.114+00

-- Multi-row: RETURNING works for batches too
INSERT INTO users (email, full_name) VALUES
  ('nikola@example.com', 'Nikola Tesla'),
  ('rosalind@example.com', 'Rosalind Franklin')
RETURNING id, email;
-- id  |        email
-- ----+----------------------
-- 43  | nikola@example.com
-- 44  | rosalind@example.com

-- RETURNING * is valid and returns every column of every inserted row
INSERT INTO users (email, full_name)
VALUES ('alan@example.com', 'Alan Turing')
RETURNING *;

RETURNING is standard on INSERT, UPDATE, and DELETE in PostgreSQL. It is also perfect for capturing server-generated timestamps, computed columns, and columns modified by BEFORE triggers — you see exactly what was written, not what you sent. Every modern Postgres driver and ORM (node-postgres, SQLAlchemy, Prisma, Hibernate) exposes it.


Handling Conflicts — ON CONFLICT (UPSERT)

Real systems retry requests. A user clicks "Sign Up" twice. A Kafka consumer re-delivers a message. A cron job runs on two hosts because somebody forgot to turn off the old one. You need inserts that are idempotent — running them twice with the same input produces the same result, without errors.

PostgreSQL's ON CONFLICT clause turns a unique-constraint violation from an error into a controlled decision: skip the row, or update the existing row.

ON CONFLICT DO NOTHING — Silent Skip

-- "Insert this user unless the email is already taken."
-- No error on duplicates — the row is silently skipped.
INSERT INTO users (email, full_name)
VALUES ('ada@example.com', 'Ada Lovelace')
ON CONFLICT (email) DO NOTHING
RETURNING id;
-- If the email already exists, RETURNING returns zero rows.
-- This is the signal your application uses to detect the skip.

The (email) is the conflict target — it names the unique constraint or unique index that, if violated, should trigger the conflict clause. You must point to a column (or set of columns) covered by a unique constraint or unique index, otherwise PostgreSQL raises an error at plan time.

ON CONFLICT DO UPDATE — The True UPSERT

-- "Insert this user, or update the existing row with the new values."
-- This is the canonical idempotent upsert.
INSERT INTO users (email, full_name, role)
VALUES ('ada@example.com', 'Ada King-Lovelace', 'admin')
ON CONFLICT (email) DO UPDATE
SET
  full_name  = EXCLUDED.full_name,
  role       = EXCLUDED.role,
  -- Only bump a last_updated column when we actually change something:
  created_at = users.created_at   -- preserve original on update path
RETURNING id, (xmax <> 0) AS was_update;
-- xmax <> 0 is a classic trick: on the UPDATE path xmax is non-zero,
-- on the INSERT path it is zero. You can tell which branch fired.

Two things to notice:

  1. EXCLUDED is the pseudo-table containing the row the INSERT was trying to add. Inside DO UPDATE SET, use EXCLUDED.column to refer to the incoming value and table_name.column (or just the bare name) to refer to the existing value.
  2. The conflict target must exist as a unique constraint or unique index. ON CONFLICT (email) requires either UNIQUE (email) or a CREATE UNIQUE INDEX on email. Without it, Postgres cannot know which row is "the conflict."

Practical UPSERT Example — Page View Counter

CREATE TABLE page_views (
  page_path   TEXT PRIMARY KEY,
  view_count  BIGINT NOT NULL DEFAULT 0,
  last_seen   TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Record a page view. First hit inserts a new row. Every subsequent
-- hit updates the existing row atomically. Zero application logic.
INSERT INTO page_views (page_path, view_count, last_seen)
VALUES ('/blog/insert-data', 1, now())
ON CONFLICT (page_path) DO UPDATE
SET
  view_count = page_views.view_count + 1,
  last_seen  = EXCLUDED.last_seen;

This pattern — "insert the initial value, or increment the existing one" — is the single most common use of ON CONFLICT DO UPDATE in the wild. It is race-safe because the whole operation happens under a row-level lock inside one statement.

Partial Conflict Targets and WHERE

You can narrow the conflict further with an index predicate or a WHERE clause on the DO UPDATE branch:

-- Only upsert the row when the incoming version is newer
INSERT INTO documents (id, body, version)
VALUES (7, 'new body', 5)
ON CONFLICT (id) DO UPDATE
SET body = EXCLUDED.body, version = EXCLUDED.version
WHERE documents.version < EXCLUDED.version;
-- If the stored version is already >= 5, the UPDATE is skipped.

Bulk Inserts — When to Reach for COPY

Multi-row INSERT is great up to a few thousand rows. Beyond that, PostgreSQL offers COPY, a streaming protocol specifically designed for bulk data loading. It skips most of the per-statement parser overhead, avoids constructing a huge parse tree, and on most hardware is 2-10x faster than even a well-batched multi-row INSERT.

-- Load a CSV file from the server's filesystem (requires superuser).
COPY users (email, full_name, role)
FROM '/var/lib/postgresql/imports/users.csv'
WITH (FORMAT csv, HEADER true);

-- Load from the client via psql's \copy (no special privileges).
-- In psql:
--   \copy users (email, full_name, role) FROM 'users.csv' CSV HEADER

-- In Node.js (node-postgres + pg-copy-streams):
--   const stream = client.query(copyFrom('COPY users (email, full_name) FROM STDIN CSV'));
--   fs.createReadStream('users.csv').pipe(stream);

Performance tips for bulk loads (millions of rows):

  • Prefer COPY over INSERT ... VALUES for more than a few thousand rows.
  • Wrap multi-statement loads in a single transaction so WAL flushes once at commit.
  • Drop non-essential indexes before the load and rebuild them after — maintaining B-tree indexes row by row is the #1 bulk-load killer.
  • Disable triggers on the target table during the load if business rules allow (ALTER TABLE ... DISABLE TRIGGER USER).
  • Consider UNLOGGED tables for scratch staging areas — they skip WAL entirely but are truncated on crash.
  • Use synchronous_commit = off for the loading session if you can tolerate losing the last few seconds on crash.

COPY does not support ON CONFLICT. The standard workaround is to COPY into a staging table first, then INSERT ... SELECT ... ON CONFLICT ... from staging into the real table.


A Note on MERGE

PostgreSQL 15 added the SQL-standard MERGE statement, which other databases (Oracle, SQL Server, DB2) have had for years:

MERGE INTO users AS u
USING (VALUES ('ada@example.com', 'Ada King-Lovelace', 'admin')) AS s(email, full_name, role)
ON u.email = s.email
WHEN MATCHED THEN UPDATE SET full_name = s.full_name, role = s.role
WHEN NOT MATCHED THEN INSERT (email, full_name, role) VALUES (s.email, s.full_name, s.role);

MERGE is more expressive than ON CONFLICT — it supports multiple match clauses, a DELETE branch, and conditional WHEN predicates. But for the common case of "insert or update," ON CONFLICT is shorter, more battle-tested, and has one critical advantage: it is race-safe under concurrent writes without extra locking. MERGE can raise unique-constraint errors under concurrency because it does not use the same insertion-path conflict detection. For upsert-style idempotent writes, prefer ON CONFLICT; reach for MERGE when you need its multi-branch logic on top of a snapshot you already hold.


Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Center: a large white monospace rectangular table labelled 'users' with ruled rows. Three arrows flow into it: (1) a single sky blue (#4fc3f7) arrow labelled 'INSERT VALUES (...)', (2) a thicker sky blue bundled arrow labelled 'INSERT VALUES (...), (...), (...)' splitting into five row highlights, (3) a rose (#ff5c8a) lightning bolt labelled 'COPY FROM STDIN' streaming dozens of rows at once. Below the table, a rose callout box labelled 'ON CONFLICT (email) DO UPDATE' with a small loop icon showing idempotency. To the right, a sky blue 'RETURNING id' badge catching generated keys as they exit. White monospace labels throughout. Subtle grid pattern overlay."


Common Mistakes

1. Omitting the column list. INSERT INTO users VALUES (...) binds values to columns by position in the current table definition. Any migration that adds, removes, or reorders columns silently shifts every value to the wrong column. Your 'admin' role ends up in the full_name field and nobody notices until the audit report. Always write the column list explicitly — the three seconds you save are never worth the hours you lose.

2. Firing one INSERT at a time in a loop. The classic ORM sin: for (const row of rows) { await db.insert(row); }. Each iteration is a full round trip, parse, plan, WAL flush, and result. Two thousand rows over a transatlantic link can take ten minutes. The fix is a single multi-row INSERT (or COPY for large loads). If your ORM hides multi-row inserts, either use its bulk API (insertMany, createMany, bulkCreate) or drop to raw SQL for the hot path.

3. Not using RETURNING and doing a second SELECT for the generated ID. INSERT INTO orders (...) VALUES (...); SELECT max(id) FROM orders WHERE user_id = ?; is both slower (extra round trip) and a race condition (another session might insert between your statements). RETURNING id gives you the exact id of the row you just wrote, atomically, in the same statement. Every mainstream Postgres driver supports it.

4. Wrong conflict target in ON CONFLICT. ON CONFLICT (email) DO UPDATE ... requires a unique constraint or unique index on email. Writing ON CONFLICT DO UPDATE without a target tries to match any unique constraint violation, which is almost never what you want and fails if the table has multiple unique constraints. Worse, naming a column that is not part of a unique constraint raises "there is no unique or exclusion constraint matching the ON CONFLICT specification" at execution time. Always name the exact constraint (ON CONFLICT ON CONSTRAINT users_email_key) or the exact column set that has a unique index.

5. Using ON CONFLICT DO UPDATE as a blind overwrite. SET col = EXCLUDED.col for every column of the table sounds convenient but silently clobbers any field the incoming payload does not know about — including audit columns, soft-delete flags, and fields updated by other services. Be deliberate: list exactly the columns that should change on the update path, and preserve the rest. The xmax <> 0 trick or a dedicated WHERE clause on the DO UPDATE branch gives you finer control.


Interview Questions

1. "What is an UPSERT, and how do you implement it safely in PostgreSQL?"

An UPSERT is "insert a row, or update the existing row if a unique constraint conflicts" — a single idempotent operation that is safe to retry. In PostgreSQL you write it as INSERT ... VALUES (...) ON CONFLICT (conflict_target) DO UPDATE SET col = EXCLUDED.col, .... The conflict target names a unique constraint or unique index, and the EXCLUDED pseudo-table inside the DO UPDATE branch refers to the row you were trying to insert. The whole operation happens under a row-level lock inside one statement, so it is race-safe under concurrent writers — two sessions upserting the same key serialise cleanly. Before PostgreSQL 9.5 introduced ON CONFLICT, people hand-rolled upserts with a CTE, a SELECT ... FOR UPDATE, and retry loops, all of which were racy or slow. Since 9.5, ON CONFLICT is the correct answer for almost every upsert. PostgreSQL 15's MERGE offers more expressive multi-branch logic but is not automatically race-safe for upserts — stick with ON CONFLICT unless you specifically need MERGE.

2. "What is the difference between INSERT and COPY, and when would you use each?"

INSERT is a SQL statement: it is parsed, planned, and executed row-by-row through the query engine, applying all constraints, triggers, and per-row bookkeeping. It supports everything — RETURNING, ON CONFLICT, subqueries, expressions, multi-row values, INSERT ... SELECT. COPY is a bulk loader with its own wire protocol: it streams rows directly into the table's storage layer, skipping most of the per-statement parser overhead and processing rows in tight batches. On typical hardware, COPY is 2-10x faster than even a well-batched multi-row INSERT. The trade-off is that COPY does not support ON CONFLICT, RETURNING, or per-row expressions — it takes raw text or binary rows and drops them into the table, obeying constraints and triggers but not much else. Use INSERT for transactional work (one row to a few thousand), user-facing writes, anything that needs RETURNING or conflict handling, and any time you need SQL expressions in the values. Use COPY for bulk data loading (tens of thousands of rows or more), initial data imports, ETL pipelines, and restore operations. A common pattern combines both: COPY into a staging table, then INSERT ... SELECT ... ON CONFLICT ... from staging into the final table.

3. "What does INSERT ... RETURNING do, and why is it better than SELECT currval() after the insert?"

RETURNING attaches a projection to an INSERT (also UPDATE and DELETE) so the statement returns selected columns from the affected rows as if it were a SELECT. You can return the generated primary key, server-side defaults like created_at = now(), computed columns, or anything the BEFORE triggers wrote. For a multi-row INSERT, you get one result row per inserted row, in insertion order. Compared to SELECT currval('seq_name') or SELECT lastval(), RETURNING has three advantages. First, it works in one round trip instead of two, which matters over a WAN. Second, it works for multi-row inserts — currval only returns the most recent value. Third, it works for values that did not come from a sequence — timestamps, UUIDs generated by gen_random_uuid(), values computed by triggers, and anything else the database decides at write time. And unlike SELECT max(id) FROM table WHERE ..., RETURNING is race-free: it returns the exact row you inserted, not whatever row happened to be the max when the second query ran.

4. "Why should you prefer a multi-row INSERT over a loop of single-row inserts in application code?"

Three reasons, in order of importance. Atomicity: a multi-row INSERT is one statement, so either every row lands or none do. A loop of single inserts, even inside a transaction, is still N separate statements — if any one fails, you roll back the whole transaction, which is fine, but the failure mode is "we tried 1000 inserts, the 743rd failed, now figure out why" rather than "the one statement succeeded or not." Performance: each statement carries fixed costs for network round trip, parsing, planning, and result handling. Batching 1000 rows into one statement amortises those costs across all rows and is typically 10x-50x faster on a LAN and 100x+ faster over a WAN. It also reduces lock churn and WAL record overhead. Simpler error handling: one statement means one error code to inspect, not a loop with a partially-populated target table if you forgot the transaction. The only caveat is statement size — PostgreSQL can handle very large multi-row inserts but they consume parser memory proportional to the statement, so batch into chunks of 500-5000 rows for very large loads, or switch to COPY entirely for millions of rows.

5. "Walk me through what happens when two sessions simultaneously UPSERT the same row."

Session A sends INSERT ... ON CONFLICT (id) DO UPDATE ... and Session B sends an identical statement a millisecond later. Both statements try to insert the row. The first one to reach the unique index acquires the row-level lock and proceeds with the insert path, writing the new row into the heap and the index. The second one, on its attempt to insert into the unique index, hits the duplicate and enters the conflict branch — but before it can run DO UPDATE, it has to wait on the row lock held by the first session. When the first session commits, the second session sees the row the first session just wrote, applies the DO UPDATE branch against that row, and commits. Both statements succeed. No deadlock, no lost update, no unique-violation error surfacing to the application. This is why ON CONFLICT is the right primitive for idempotent retries: the lock-and-retry dance is built into the statement, not layered on top in application code. Contrast with a hand-rolled "SELECT then INSERT or UPDATE" pattern, which has a gap between the SELECT and the write where another session can race in and cause either a unique-violation error or a lost update, depending on isolation level.


Quick Reference — INSERT Cheat Sheet

+---------------------------------------------------------------+
|           INSERT CHEAT SHEET                                  |
+---------------------------------------------------------------+
|                                                                |
|  BASIC SINGLE ROW:                                             |
|    INSERT INTO t (c1, c2) VALUES (v1, v2);                     |
|                                                                |
|  MULTI-ROW (fast, atomic):                                     |
|    INSERT INTO t (c1, c2) VALUES                               |
|      (v1, v2), (v3, v4), (v5, v6);                             |
|                                                                |
|  USE A DEFAULT:                                                |
|    INSERT INTO t (c1, c2, c3) VALUES (v1, DEFAULT, v3);        |
|                                                                |
|  COPY FROM ANOTHER TABLE:                                      |
|    INSERT INTO t (c1, c2)                                      |
|    SELECT c1, c2 FROM src WHERE ...;                           |
|                                                                |
|  RETURN GENERATED KEYS:                                        |
|    INSERT INTO t (c1) VALUES (v1) RETURNING id, created_at;    |
|                                                                |
|  SKIP DUPLICATES:                                              |
|    INSERT INTO t (c1) VALUES (v1)                              |
|    ON CONFLICT (c1) DO NOTHING;                                |
|                                                                |
|  UPSERT:                                                       |
|    INSERT INTO t (c1, c2) VALUES (v1, v2)                      |
|    ON CONFLICT (c1) DO UPDATE                                  |
|    SET c2 = EXCLUDED.c2;                                       |
|                                                                |
|  BULK LOAD (fastest):                                          |
|    COPY t (c1, c2) FROM STDIN WITH (FORMAT csv, HEADER true);  |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           KEY RULES                                            |
+---------------------------------------------------------------+
|                                                                |
|  1. Always write the explicit column list                      |
|  2. Prefer ONE multi-row INSERT over N single-row inserts      |
|  3. Use RETURNING to grab generated ids in one round trip      |
|  4. Name the conflict target explicitly in ON CONFLICT         |
|  5. Use EXCLUDED.col to reference incoming values in DO UPDATE |
|  6. INSERT ... SELECT keeps data movement inside the database  |
|  7. Reach for COPY when you have more than a few thousand rows |
|  8. Batch multi-row INSERTs at 500-5000 rows per statement     |
|                                                                |
+---------------------------------------------------------------+
ConcernWrong WayRight Way
Column bindingINSERT INTO t VALUES (...)INSERT INTO t (c1, c2) VALUES (...)
Bulk writesLoop of single-row insertsOne multi-row INSERT or COPY
Generated IDsSELECT currval() after insertINSERT ... RETURNING id
Duplicate handlingCatch unique-violation in appON CONFLICT DO NOTHING / UPDATE
Conflict targetON CONFLICT DO UPDATE (no target)ON CONFLICT (email) DO UPDATE
Incoming value refVALUES.col (invalid)EXCLUDED.col
Millions of rowsMany multi-row INSERTsCOPY into staging, then merge
ETL between tablesSELECT then loop + INSERTINSERT ... SELECT
Idempotent retriesManual SELECT-then-INSERTINSERT ... ON CONFLICT
Index overhead in bulkLoad with indexes in placeDrop indexes, load, rebuild

Prev: Lesson 2.5 -- Surrogate Natural Alternate Key Next: Lesson 3.2 -- Select Data


This is Lesson 3.1 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page