Database Interview Prep
Sequences and Auto-Increment

UUID Primary Keys

v4 vs v7, Index Fragmentation, and Distributed Trade-offs

LinkedIn Hook

"Your team switched primary keys to UUIDs to feel modern, and your insert throughput dropped by 60 percent overnight."

Most engineers reach for UUIDs the moment a system grows beyond a single database. UUIDs feel safer than integers — clients can mint them offline, two services can insert without coordination, and nothing leaks the row count of your business to anyone scraping the URL. So the migration ships, the API gets cleaner, and then the database starts crying. INSERTs slow down. The B-tree fragments. The buffer cache hit rate collapses. Backups balloon.

The reason is almost always the same: someone picked UUID v4 — fully random, 122 bits of entropy — as the primary key on a clustered index. Every new row lands at a random spot in the B-tree, splitting pages, evicting hot pages from cache, and turning what should be an append-only workload into a random-write workload. The numbers on a billion-row table are brutal: roughly 4x the write amplification, 2x the index size, and a buffer cache that touches the entire index instead of just the right edge.

The fix is not to abandon UUIDs. The fix is to use UUID v7 — a time-ordered UUID standardized in RFC 9562 (2024) that puts a millisecond timestamp in the high bits, restoring the monotonic insert pattern that makes B-trees fast, while keeping the distributed, client-mintable, non-leaking properties that made you choose UUIDs in the first place.

In Lesson 7.3, I break down UUID primary keys end to end: v4 vs v7, the trade-offs against integer PKs, why index fragmentation kills v4 at scale, the storage cost of 16-byte keys, when distributed systems actually need UUIDs, and exactly how to use Postgres uuid with gen_random_uuid() and uuidv7().

Read the full lesson -> [link]

#SQL #PostgreSQL #UUID #Database #BackendDevelopment #DistributedSystems #InterviewPrep


UUID Primary Keys thumbnail


What You'll Learn

  • What a UUID actually is — 128 bits, the canonical 8-4-4-4-12 hex format, and the binary representation
  • The difference between UUID v4 (random) and UUID v7 (time-ordered) and why v7 was added in RFC 9562
  • Why random UUIDs cause B-tree index fragmentation and how that translates to real performance loss
  • The storage cost of UUIDs vs integers — 16 bytes vs 4 or 8, and how it cascades through every index
  • When UUIDs are genuinely the right choice — distributed systems, offline clients, public APIs, sharding
  • How to use Postgres uuid type with gen_random_uuid() for v4 and the uuidv7() extension or function for v7
  • The MySQL alternative: BINARY(16) storage with application-side generation
  • The hybrid pattern — BIGINT internal PK + UUID external ID — and when to reach for it

The Hotel Room Numbering Analogy — Sequential Halls vs Random Lottery

Picture a hotel with one thousand rooms. The traditional way to number them is sequential: room 101, 102, 103, all the way up to room 1099. When a new wing opens, you continue from 1100. Housekeeping pushes a single cart down a single hallway in order, the elevator stops at predictable floors, and the front desk can find any room in seconds because it knows exactly where each number lives. This is how an integer primary key works. Every new row is the next number, every insert lands at the right edge of the index, and the database never has to shuffle anything.

Now imagine a hotel where every guest is assigned a room number drawn from a lottery — twelve random digits. Room 487,201,553,910 might be next to room 091,338,772,004 in the registry, but they are on opposite ends of the building. Housekeeping cannot push a cart in any sensible order. The front desk has to look up the location of every single guest individually. When a new guest arrives, you do not just add them to the end of the list — you find the right alphabetical slot, possibly in the middle of a binder, possibly forcing you to rearrange surrounding entries. This is what UUID v4 does to a B-tree index: every insert is a random lookup followed by a random write.

Now picture a third hotel. Room numbers are still long and globally unique, but the first part of every number is the timestamp the guest checked in: 20260414-1422-.... Two guests checking in within the same minute get adjacent numbers. Housekeeping can still walk a single hallway in time order, the front desk can find recent guests near the end of the binder, and you keep the property that no two guests in any of your hotels worldwide will ever collide. This is UUID v7: globally unique, distributed-friendly, and time-ordered so the database can treat inserts like an append.

+---------------------------------------------------------------------+
|           THREE WAYS TO NUMBER ROWS                                  |
+---------------------------------------------------------------------+
|                                                                       |
|  INTEGER (BIGSERIAL)                                                  |
|    1, 2, 3, 4, 5, ...                                                 |
|    -> 8 bytes, sequential, fastest possible inserts                   |
|    -> Leaks row count, requires server-side coordination              |
|                                                                       |
|  UUID v4 (random)                                                     |
|    7b21c4e9-...   2a8b6f0c-...   91d4-...                             |
|    -> 16 bytes, fully random, client-mintable                         |
|    -> KILLS B-tree locality. Random writes everywhere.                |
|                                                                       |
|  UUID v7 (time-ordered, RFC 9562)                                     |
|    018f3a2c-7b21-7...    018f3a2c-7b22-7...    018f3a2c-7b23-7...     |
|    [---- ms timestamp ---] [---- random tail ----]                    |
|    -> 16 bytes, sortable, client-mintable, NO collision risk          |
|    -> Inserts append at the right edge like a serial column           |
|                                                                       |
+---------------------------------------------------------------------+

What a UUID Actually Is

A UUID is 128 bits — 16 bytes — formatted as 32 hexadecimal characters split into five groups by hyphens: xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx. The character M encodes the version (1-8), and the high bits of N encode the variant (almost always RFC 4122 / 9562). Everything else is payload determined by the version.

+---------------------------------------------------------------------+
|           UUID v4 LAYOUT (122 bits of entropy)                       |
+---------------------------------------------------------------------+
|                                                                       |
|  7b21c4e9 - 9d31 - 4 c4e - 8d31 - 2a8b6f0c4d11                        |
|  [random ] [rand] ^[rnd] ^[rnd] [-----random-----]                    |
|                   |     |                                              |
|                   |     +-- variant bits (10)                          |
|                   +-------- version = 4                                |
|                                                                       |
|  -> 122 random bits. No order, no time, no meaning.                   |
|                                                                       |
+---------------------------------------------------------------------+

+---------------------------------------------------------------------+
|           UUID v7 LAYOUT (RFC 9562, May 2024)                        |
+---------------------------------------------------------------------+
|                                                                       |
|  018f3a2c-7b21 - 7 c4e - 8d31 - 2a8b6f0c4d11                          |
|  [-- unix_ts_ms ---] ^[rnd] ^[rnd] [-----random-----]                 |
|                      |     |                                          |
|                      |     +-- variant bits                            |
|                      +-------- version = 7                             |
|                                                                       |
|  -> 48 bits = millisecond Unix timestamp (sortable to ~10000 AD)      |
|  -> 74 bits of randomness (still collision-safe within a ms)          |
|  -> Lexicographic sort == chronological sort                          |
|                                                                       |
+---------------------------------------------------------------------+

The key insight: a v4 UUID and a v7 UUID look almost identical, but v7 puts the high-entropy bits after the timestamp instead of throughout. That tiny rearrangement is the difference between a B-tree that fragments and a B-tree that appends.


The B-tree Fragmentation Problem — Why v4 Hurts

A B-tree primary key index is sorted. New rows are inserted in key order: the database walks down the tree, finds the leaf page that should contain the new key, and writes it there. With a sequential key like BIGSERIAL, every insert lands on the rightmost leaf page — the same page over and over until it fills, at which point a new page is allocated to its right and inserts continue there. This is the cheapest possible workload for a B-tree: one hot page in cache, no splits anywhere else, perfect packing.

With a random UUID v4, every insert lands on a different leaf page. The page is almost certainly not in the buffer cache, so the database reads it from disk. The page is likely full, so the database splits it in half — leaving both halves at roughly 50 percent fill — and writes both back to disk. Then the next insert hits another random page. After a few million rows you have an index that is twice as large as it should be, every leaf page is half-empty, and every insert touches a cold page.

+---------------------------------------------------------------------+
|           SEQUENTIAL INSERTS (BIGSERIAL or UUID v7)                  |
+---------------------------------------------------------------------+
|                                                                       |
|        [root]                                                         |
|       /  |  \                                                         |
|      .   .   .                                                        |
|             [leaf N-1] [leaf N] <- HOT PAGE, all writes go here       |
|                              ^                                        |
|                              +-- next insert appends here              |
|                                                                       |
|  -> 1 page in cache, append-only, ~95% fill                           |
|                                                                       |
+---------------------------------------------------------------------+

+---------------------------------------------------------------------+
|           RANDOM INSERTS (UUID v4)                                   |
+---------------------------------------------------------------------+
|                                                                       |
|                 [root]                                                |
|            /  /   |   \  \                                            |
|           .  .    .    .  .                                           |
|     [leaf] [leaf] [leaf] [leaf] [leaf] [leaf] [leaf] [leaf] ...       |
|       ^      ^               ^             ^                  ^       |
|       |      |               |             |                  |       |
|       insert insert         insert        insert              insert   |
|                                                                       |
|  -> Every insert hits a different page                                |
|  -> Random page split + cold buffer reads                             |
|  -> ~50% fill, 2x index size, 4x write amplification                  |
|                                                                       |
+---------------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). Side-by-side B-tree comparison. LEFT panel labeled 'UUID v4 — random inserts': a wide chaotic B-tree with rose (#ff5c8a) arrows pointing to many different leaf pages, half of them shaded gray to indicate cold pages, with a small label 'cache miss'. RIGHT panel labeled 'UUID v7 — sequential inserts': a clean B-tree with sky blue (#4fc3f7) arrows all pointing at the rightmost leaf page, with a small label 'hot page, append-only'. White monospace labels. Below both, a comparison strip: 'index size: 2.0x vs 1.0x', 'write amp: 4x vs 1x', 'cache hit: 30% vs 99%'."


Storage Cost — 16 Bytes Cascades Everywhere

A BIGINT primary key is 8 bytes. A UUID is 16 bytes. That sounds like a small difference until you remember that the primary key appears in every secondary index as the row pointer. A table with five secondary indexes pays the 8-byte penalty six times per row. On a billion-row table, that is 48 GB of extra index storage before you store a single byte of actual data.

+---------------------------------------------------------------------+
|           STORAGE COST PER ROW (1 billion rows)                      |
+---------------------------------------------------------------------+
|                                                                       |
|  PK type    | PK bytes | per row | + 5 indexes | total extra          |
|  -----------|----------|---------|-------------|--------------        |
|  INT        |  4       |    4    |    24       |     -                |
|  BIGINT     |  8       |    8    |    48       |  +24 GB              |
|  UUID v4    | 16       |   16    |    96       |  +72 GB              |
|  UUID v7    | 16       |   16    |    96       |  +72 GB              |
|                                                                       |
|  -> UUIDs cost 72 GB more index storage than INT                      |
|     on a 1B-row table with 5 secondary indexes                        |
|                                                                       |
+---------------------------------------------------------------------+

Storing UUIDs as strings (VARCHAR(36)) instead of the native binary type is a much worse mistake — you pay 36 bytes per occurrence, plus the cost of comparing characters instead of integers. Always use the native uuid type in Postgres or BINARY(16) in MySQL.


Example 1: Postgres uuid Type with gen_random_uuid (v4)

-- Create a table with a UUID v4 primary key.
-- gen_random_uuid() is built into Postgres 13+ (no extension needed).
CREATE TABLE orders_v4 (
    id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    customer_id bigint NOT NULL,
    amount_cents integer NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now()
);

-- Insert without specifying id — the default fires.
INSERT INTO orders_v4 (customer_id, amount_cents)
VALUES (101, 4999), (102, 12500), (103, 899)
RETURNING id, created_at;

Sample output:

                  id                  |          created_at
--------------------------------------+-------------------------------
 7b21c4e9-9d31-4c4e-8d31-2a8b6f0c4d11 | 2026-04-14 14:22:01.123+00
 a8f0c4d1-1f3a-4b2c-a04e-6b8d2f1c0a93 | 2026-04-14 14:22:01.124+00
 3e9d31c4-2a8b-4f0c-9d31-7b21c4e98d31 | 2026-04-14 14:22:01.125+00
(3 rows)

Notice the IDs are in no particular order. If you sort them lexicographically, you get a different sequence than the insertion order. This is the property that makes v4 random — and the property that fragments the index.


Example 2: Postgres UUID v7 — Time-Ordered Primary Keys

Postgres 17 ships uuidv7() as a built-in function. On Postgres 13-16 you can install the pg_uuidv7 extension, or generate v7 values in the application layer.

-- Postgres 17+: built-in uuidv7().
-- For older versions, use the pg_uuidv7 extension:
--   CREATE EXTENSION pg_uuidv7;
CREATE TABLE orders_v7 (
    id          uuid PRIMARY KEY DEFAULT uuidv7(),
    customer_id bigint NOT NULL,
    amount_cents integer NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now()
);

-- Insert three rows in quick succession.
INSERT INTO orders_v7 (customer_id, amount_cents)
VALUES (201, 1500), (202, 7800), (203, 2299)
RETURNING id, created_at;

-- Sort by id and observe that the order matches insertion order.
SELECT id, created_at
FROM orders_v7
ORDER BY id;

Sample output:

                  id                  |          created_at
--------------------------------------+-------------------------------
 018f3a2c-7b21-7c4e-8d31-2a8b6f0c4d11 | 2026-04-14 14:22:05.001+00
 018f3a2c-7b22-7a93-9d31-3e9d31c4f0c4 | 2026-04-14 14:22:05.002+00
 018f3a2c-7b23-7d11-a04e-1f3a4b2ca93e | 2026-04-14 14:22:05.003+00
(3 rows)

The first 12 hex characters (018f3a2c-7b21) are the millisecond timestamp. Rows inserted within the same millisecond differ only in their random tail. Lexicographic ordering equals chronological ordering, which is exactly what the B-tree wants.


Example 3: Measuring Fragmentation with pgstattuple

-- Install once.
CREATE EXTENSION IF NOT EXISTS pgstattuple;

-- Insert 1 million rows into both tables, then measure index bloat.
INSERT INTO orders_v4 (customer_id, amount_cents)
SELECT (random() * 1000)::int, (random() * 10000)::int
FROM generate_series(1, 1000000);

INSERT INTO orders_v7 (customer_id, amount_cents)
SELECT (random() * 1000)::int, (random() * 10000)::int
FROM generate_series(1, 1000000);

-- Compare the leaf-page fill ratios.
SELECT 'v4' AS variant,
       pg_size_pretty(pg_relation_size('orders_v4_pkey')) AS index_size,
       avg_leaf_density
FROM pgstatindex('orders_v4_pkey')
UNION ALL
SELECT 'v7',
       pg_size_pretty(pg_relation_size('orders_v7_pkey')),
       avg_leaf_density
FROM pgstatindex('orders_v7_pkey');

Sample output:

 variant | index_size | avg_leaf_density
---------+------------+------------------
 v4      |    62 MB   |       52.31
 v7      |    32 MB   |       91.04
(2 rows)

The v4 index is roughly twice as large with leaf pages only half full. The v7 index packs nearly perfectly because every insert appended to the right edge. This single measurement is why teams running into write throughput walls migrate from v4 to v7.


Example 4: The Hybrid Pattern — Internal BIGINT, External UUID

Sometimes you want the best of both worlds: a fast sequential BIGINT primary key for joins and indexes, and a UUID for external API exposure so you do not leak row counts.

-- Internal id is BIGSERIAL — used for foreign keys and joins.
-- public_id is UUID v7 — exposed in URLs and APIs.
CREATE TABLE invoices (
    id          bigserial PRIMARY KEY,
    public_id   uuid NOT NULL UNIQUE DEFAULT uuidv7(),
    customer_id bigint NOT NULL,
    total_cents integer NOT NULL,
    created_at  timestamptz NOT NULL DEFAULT now()
);

-- API queries hit the public_id index; internal joins use id.
SELECT id, public_id, total_cents
FROM invoices
WHERE public_id = '018f3a2c-7b21-7c4e-8d31-2a8b6f0c4d11';

Sample output:

   id   |              public_id              | total_cents
--------+--------------------------------------+-------------
 482193 | 018f3a2c-7b21-7c4e-8d31-2a8b6f0c4d11 |       4999
(1 row)

This pattern keeps the clustered primary key narrow (8 bytes) while still hiding internal counts from the outside world. The trade-off is one extra index (UNIQUE (public_id)), which is small compared to making every secondary index UUID-wide.

Napkin AI Visual Prompt: "Dark navy gradient (#0a0f1f -> #111a2e). A horizontal flow diagram showing a request entering from the left labeled 'API request: GET /invoices/018f3a2c-...'. An arrow points to a sky blue (#4fc3f7) box labeled 'public_id index (UUID v7)' which then arrows to a rose (#ff5c8a) box labeled 'PK id (BIGINT)' which then arrows to 'row data'. Below the diagram, two side-by-side chips: 'External: UUID v7 — no count leak' and 'Internal: BIGINT — fast joins, narrow indexes'. White monospace labels."


MySQL — BINARY(16) Storage Pattern

MySQL has no native uuid type. The wrong way is CHAR(36) (36 bytes per UUID). The right way is BINARY(16), generated either in the application or with UUID_TO_BIN().

-- MySQL 8.0+. Use BINARY(16) and the UUID_TO_BIN/BIN_TO_UUID helpers.
-- The second argument to UUID_TO_BIN swaps the time fields so that
-- v1 UUIDs become sortable — for v7 you do NOT need swapping.
CREATE TABLE orders_mysql (
    id           BINARY(16) PRIMARY KEY,
    customer_id  BIGINT NOT NULL,
    amount_cents INT NOT NULL,
    created_at   TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- v7 must be generated in the application layer (no built-in v7 in MySQL).
-- Round-trip a UUID for display.
INSERT INTO orders_mysql (id, customer_id, amount_cents)
VALUES (UUID_TO_BIN('018f3a2c-7b21-7c4e-8d31-2a8b6f0c4d11'), 201, 1500);

SELECT BIN_TO_UUID(id) AS id, customer_id, amount_cents
FROM orders_mysql;

InnoDB clusters the table by primary key, so the same fragmentation rules apply: random UUIDs destroy locality, time-ordered UUIDs preserve it. The fix is the same — generate v7 in the app and store as BINARY(16).


When to Actually Use UUIDs (and When Not To)

UUIDs are not the default. Reach for them when at least one of these is true:

  • Distributed writes — multiple databases, services, or shards need to mint IDs without coordinating.
  • Offline clients — mobile apps creating records while disconnected, syncing later without ID collisions.
  • Public APIs — you do not want competitors to scrape your row counts by incrementing IDs.
  • Idempotency keys — clients send a UUID with each request so retries are safe.
  • Merging databases — combining two systems without renumbering everything.

Stick with BIGSERIAL / IDENTITY when:

  • You have a single primary database doing all the writes.
  • The IDs are internal and never exposed to the outside world.
  • You care more about insert throughput and index size than about distributed minting.
  • You want effortless cursor pagination (WHERE id > $last_seen).
+---------------------------------------------------------------------+
|           DECISION FLOWCHART                                          |
+---------------------------------------------------------------------+
|                                                                       |
|  Q1: Do clients or remote services mint IDs?                          |
|       NO  -> use BIGSERIAL / IDENTITY                                 |
|       YES -> Q2                                                       |
|                                                                       |
|  Q2: Are IDs exposed in URLs / APIs?                                  |
|       NO  -> BIGSERIAL is fine, generate UUIDs only when needed       |
|       YES -> use UUID v7 (or hybrid: BIGINT pk + UUID public_id)      |
|                                                                       |
|  Q3: Random or time-ordered?                                          |
|       Random v4   -> only if you need maximum unpredictability        |
|                      (security tokens, share links — NOT primary keys)|
|       Time-ordered v7 -> default for distributed primary keys         |
|                                                                       |
+---------------------------------------------------------------------+

Common Mistakes

1. Using UUID v4 as a clustered primary key on a hot table. This is the single most expensive mistake on this page. Every random insert fragments the B-tree, doubles the index size, and turns a write-once workload into random I/O. If you must use UUIDs as a primary key, use v7. If you cannot use v7, use the hybrid pattern with an internal BIGINT PK and an external UUID column.

2. Storing UUIDs as VARCHAR(36) or CHAR(36). A 36-character string is more than twice the size of the native binary representation, comparisons are slower, and you waste cache on the hyphens. Always use Postgres uuid or MySQL BINARY(16). The pg_size_pretty(pg_relation_size(...)) numbers will speak for themselves.

3. Generating UUIDs on the database when the client could mint them. One of the main reasons to use UUIDs is so clients can mint IDs offline and avoid round-trips. If your application always calls gen_random_uuid() server-side, you are paying the storage cost of UUIDs without getting the architectural benefit. Mint them in the app.

4. Exposing UUID v7 in public URLs without realizing it leaks creation time. The high 48 bits of a v7 UUID are a Unix millisecond timestamp. Anyone holding a v7 UUID can decode exactly when the row was created. For most apps that is fine. For privacy-sensitive systems (medical, dating, anonymous reporting), use v4 for external IDs and time-ordered keys only internally.

5. Forgetting that UUIDs make cursor pagination harder. WHERE id > $last_seen works beautifully with a sequential integer. With UUID v4 it works against random order — completely useless. With UUID v7 it works again because v7 is sortable, but you need to remember that fact and use v7 specifically.


Interview Questions

1. "Why is UUID v4 considered bad for primary keys, and what does v7 fix?"

UUID v4 is 122 bits of pure randomness, which means every newly generated value lands at a random position in the B-tree primary key index. Each insert touches a different leaf page, almost always cold in the buffer cache, frequently triggering a page split that leaves both halves half-full. Over time the index grows to roughly twice its theoretical minimum size, write amplification climbs, and the buffer cache is forced to track the entire index instead of just the right edge. UUID v7, standardized in RFC 9562 in May 2024, replaces the high bits with a 48-bit Unix millisecond timestamp, so lexicographic order matches insertion order. Inserts append to the rightmost leaf page just like a BIGSERIAL would, the index packs to ~95 percent fill, and the cache hit rate stays high. v7 keeps every benefit of UUIDs — global uniqueness, client-side generation, no count leak — while restoring the locality that makes B-trees fast.

2. "What are the trade-offs between a BIGINT primary key and a UUID primary key?"

BIGINT wins on raw efficiency: 8 bytes versus 16 bytes per row, narrower secondary indexes, perfectly sequential inserts, and free cursor pagination. UUIDs win on distribution: any client or service can mint a unique ID without contacting a central sequence, IDs do not leak the row count of your business, and merging two databases never requires renumbering. The storage difference cascades — every secondary index pays the extra 8 bytes per row, so a billion-row table with five indexes costs about 72 GB more on UUIDs. The throughput difference depends on the variant: UUID v4 is dramatically slower because of B-tree fragmentation, while UUID v7 is roughly comparable to BIGINT for inserts. The pragmatic answer is to use BIGINT by default, switch to UUID v7 only when you genuinely need distributed minting or non-leaking external IDs, and consider the hybrid pattern (internal BIGINT PK + external UUID column) when you want both.

3. "Explain how UUID v7 maintains both global uniqueness and time ordering."

A UUID v7 packs a 48-bit Unix millisecond timestamp into the high bits, followed by 4 bits encoding version 7, then 74 bits of randomness, with the variant bits fixed by RFC 9562. Two UUIDs generated in different milliseconds are guaranteed to sort in time order because their timestamp prefixes differ. Two UUIDs generated within the same millisecond fall back to comparing the random tail, which is 74 bits of entropy — enough to make a collision astronomically unlikely even if a single process generates millions of UUIDs per second. Implementations can also reserve some of the random bits for a monotonic counter that increments within a millisecond, guaranteeing strict ordering even for same-millisecond inserts. The net result is that lexicographic byte order, numeric order, and chronological order all agree, which is precisely the property that lets a B-tree treat v7 inserts as appends.

4. "When should you use the hybrid pattern of an internal BIGINT primary key plus an external UUID column?"

Use the hybrid pattern when you need UUIDs in your public API (URLs, webhook payloads, mobile sync) but you do not want to pay the UUID cost on every internal join and secondary index. The internal BIGINT stays as the clustered primary key, foreign keys point at it, and every secondary index is narrow. The UUID column is UNIQUE so external requests look up the row by its public ID with one extra index hop. The trade-off is one additional unique index, which is far cheaper than widening the primary key by 8 bytes across every other index in the table. This pattern is especially common in large e-commerce, SaaS, and content systems where billions of rows would make UUID-wide indexes painful but where the public API absolutely cannot expose sequential integer IDs.

5. "How would you migrate a table from UUID v4 to UUID v7 in production?"

Migration is harder than starting fresh because the existing v4 IDs cannot become v7 — they have no embedded timestamp. The realistic options are: (1) leave existing rows alone, change the column default to uuidv7() going forward, and accept that the table is mixed; the index stays partially fragmented but new inserts append cleanly. (2) Add a new uuid_v7 column populated by uuidv7() for new rows and backfilled with synthesized v7 values from created_at for old rows, then promote it to the primary key during a maintenance window. (3) For a complete rebuild, do a logical dump and restore in v7 order — this rewrites the clustered index from scratch and gives you a perfectly packed B-tree, but requires significant downtime or a parallel cutover. In practice option (1) is what most teams do because it stops the bleeding without a risky migration: the fragmented v4 portion gradually shrinks as old rows are deleted and new v7 rows append cleanly.


Quick Reference — Cheat Sheet

+---------------------------------------------------------------------+
|           UUID PRIMARY KEY CHEAT SHEET                               |
+---------------------------------------------------------------------+
|                                                                       |
|  POSTGRES SYNTAX:                                                     |
|    id uuid PRIMARY KEY DEFAULT gen_random_uuid()  -- v4 (built-in)   |
|    id uuid PRIMARY KEY DEFAULT uuidv7()           -- v7 (PG17+)      |
|    CREATE EXTENSION pg_uuidv7;                    -- v7 on PG13-16   |
|                                                                       |
|  MYSQL SYNTAX:                                                        |
|    id BINARY(16) PRIMARY KEY                                          |
|    SELECT BIN_TO_UUID(id), UUID_TO_BIN('...');                        |
|    -- Generate v7 in the application layer                            |
|                                                                       |
|  HYBRID PATTERN:                                                      |
|    id        bigserial PRIMARY KEY                                    |
|    public_id uuid NOT NULL UNIQUE DEFAULT uuidv7()                    |
|                                                                       |
|  ALWAYS:                                                              |
|    - Use the native binary type (uuid / BINARY(16)), never CHAR(36)   |
|    - Prefer v7 over v4 for primary keys                               |
|    - Mint UUIDs in the app when possible                              |
|                                                                       |
+---------------------------------------------------------------------+
ConcernBIGINT (BIGSERIAL)UUID v4 (random)UUID v7 (time-ordered)
Size8 bytes16 bytes16 bytes
Insert patternSequential appendRandom — page splitsSequential append
Index fill ratio~95%~50%~95%
Index size (1B rows)Baseline~2x baseline~1x baseline (vs v4)
Client-mintableNoYesYes
Distributed safeNo (needs central seq)YesYes
Leaks row countYesNoNo
Leaks creation timeNoNoYes (high 48 bits)
Cursor paginationTrivial (id > $last)Useless (random order)Works (id > $last)
Postgres syntaxbigserialgen_random_uuid()uuidv7() (PG17+)
MySQL syntaxBIGINT AUTO_INCREMENTBINARY(16) + appBINARY(16) + app
Best forInternal PKs, single DBSecurity tokens, share linksDistributed PKs, public APIs
Worst forDistributed writesHot primary keysPrivacy-sensitive IDs

Prev: Lesson 7.2 -- AUTO_INCREMENT, SERIAL, IDENTITY Next: Lesson 8.1 -- One-to-One Relationships


This is Lesson 7.3 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page