Database Interview Prep
Database Fundamentals

Data Types

Picking the Right Column Type in PostgreSQL

LinkedIn Hook

"Your e-commerce app is off by one cent on every tenth invoice. Finance is furious. The bug? Someone used FLOAT for the price column."

Data types feel like a boring schema decision, but they are one of the most consequential choices you will ever make in a database. Pick INT when you needed BIGINT, and your user_id column wraps around at 2.1 billion. Pick FLOAT when you needed DECIMAL, and your accounting system silently loses pennies. Pick VARCHAR(255) because someone on Stack Overflow said so, and you have propagated a MySQL 4.0 myth into a Postgres 16 codebase where it means absolutely nothing.

The worst part is that none of these mistakes blow up immediately. Your TIMESTAMP column without a timezone works perfectly in dev because the server happens to be UTC. It fails six months later when a user in Sydney files a bug report because their "created at midnight" order shows up on the wrong day. Your TEXT vs VARCHAR debate rages in code review while the actual performance difference in Postgres is exactly zero.

The fix is not complicated. Learn the categories — numeric, string, temporal, boolean, JSON, UUID, array — know the storage size of each, and pick the type that matches what the column actually means. DECIMAL for money. TIMESTAMPTZ for anything a human cares about. JSONB when you need to query inside the blob. UUID when IDs must be globally unique. BIGINT by default for primary keys on anything that might grow.

In Lesson 1.2, I break down every common PostgreSQL data type, show the storage cost, call out the MySQL differences, and list the mistakes that keep showing up in production.

Read the full lesson -> [link]

#PostgreSQL #Database #SQL #DataModeling #BackendDevelopment #InterviewPrep


Data Types thumbnail


What You'll Learn

  • The major PostgreSQL data type categories: numeric, string, boolean, temporal, JSON, UUID, and array
  • Exact storage size of each type and the range limits you need to know
  • When to use DECIMAL instead of FLOAT or DOUBLE PRECISION (money, rates, quantities)
  • The difference between CHAR, VARCHAR, VARCHAR(n), and TEXT — and why the VARCHAR(255) convention is a myth in Postgres
  • TIMESTAMP vs TIMESTAMPTZ, DATE, and why "store everything in UTC" is only half the story
  • JSON vs JSONB and when to reach for a document type at all
  • UUID generation strategies (v4 vs v7) and the tradeoff against BIGSERIAL
  • Native ARRAY columns and why they exist in Postgres but not in MySQL
  • MySQL differences that bite when you port a schema across engines

The Tool Drawer Analogy — Right Tool for the Job

Imagine a carpenter with a drawer full of tools. There is a tape measure for distances, a level for angles, a caliper for fine measurements, a scale for weight. Each tool has a purpose. You do not measure a wall with a caliper — it only goes up to six inches. You do not weigh a bag of flour with a tape measure — that is not what a tape measure is for. A good carpenter reaches for the tool that fits the job, and the tool tells anyone watching what kind of measurement matters here.

A database column is the same. The data type you pick is a contract: it tells the engine how to store the value, how to compare it, how to index it, and how to reject bad input. An INT column promises "this will never exceed two billion" — and the engine reserves exactly four bytes per row, no more, no less. A DECIMAL(12,2) column promises "this is money, accurate to the cent, never rounded" — and the engine uses an exact base-10 representation so 0.1 + 0.2 equals 0.3, not 0.30000000000000004. A TIMESTAMPTZ column promises "this is a moment in time that everyone in the world will agree on" — and the engine stores UTC internally while accepting and returning values in whatever timezone the session is using.

The wrong type is not just wasted bytes. It is a silent lie. A FLOAT column for prices is a column that will round away money. A VARCHAR(20) column for phone numbers is a column that will reject every international format with a country code and a space. A TIMESTAMP (without time zone) for booking times is a column that will happily accept "2026-04-13 22:00" with zero information about what that string means — is it UTC? New York? Tokyo? Nobody knows, and nobody can reconstruct it later.

+---------------------------------------------------------------+
|           DATA TYPE CATEGORIES IN POSTGRESQL                  |
+---------------------------------------------------------------+
|                                                                |
|  NUMERIC    -> SMALLINT, INT, BIGINT, SERIAL, BIGSERIAL,       |
|                DECIMAL / NUMERIC, REAL, DOUBLE PRECISION       |
|                                                                |
|  STRING     -> CHAR(n), VARCHAR(n), VARCHAR, TEXT              |
|                                                                |
|  BOOLEAN    -> BOOLEAN (true / false / NULL)                   |
|                                                                |
|  TEMPORAL   -> DATE, TIME, TIMESTAMP, TIMESTAMPTZ, INTERVAL    |
|                                                                |
|  JSON       -> JSON (text), JSONB (binary, indexed)            |
|                                                                |
|  UUID       -> UUID (16 bytes, globally unique)                |
|                                                                |
|  ARRAY      -> any_type[] (native Postgres, not in MySQL)      |
|                                                                |
|  OTHER      -> BYTEA, ENUM, INET, CIDR, GEOMETRY, TSVECTOR     |
|                                                                |
+---------------------------------------------------------------+

Numeric Types — Integers, Serials, Decimals, and Floats

Integer Family

Postgres has three integer widths, and the only thing you need to know is their range and size.

TypeSizeRange
SMALLINT2 bytes-32,768 to 32,767
INTEGER (INT)4 bytes-2.1 billion to 2.1 billion
BIGINT8 bytes-9.2 quintillion to 9.2 quintillion

Default rule: use INTEGER for counts, ages, quantities, and foreign keys to small tables. Use BIGINT for primary keys on anything that might one day exceed two billion rows — which is most user-facing tables. The four bytes you save with INT are not worth the migration pain when you hit the ceiling at 3 AM.

Auto-Incrementing IDs — SERIAL and IDENTITY

SERIAL is Postgres shorthand for "INTEGER with a sequence attached that auto-increments on insert." BIGSERIAL is the BIGINT version. They are not real types — they are macros that expand to an INT/BIGINT column plus a sequence plus a default.

-- Old style (still works, still common)
CREATE TABLE users_old (
  id BIGSERIAL PRIMARY KEY,
  email TEXT NOT NULL
);

-- Modern SQL-standard style, preferred in Postgres 10+
CREATE TABLE users (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  email TEXT NOT NULL
);

GENERATED ALWAYS AS IDENTITY is the SQL standard equivalent and is preferred for new schemas. It makes the auto-generation explicit, prevents accidental inserts of the id column, and avoids the sequence-permission quirks that SERIAL has.

MySQL difference: MySQL uses AUTO_INCREMENT on INT/BIGINT columns. There is no SERIAL keyword.

Exact Decimals — NUMERIC / DECIMAL

For anything that must be exact — money, percentages, tax rates, inventory quantities — use NUMERIC(precision, scale), which is the same as DECIMAL(precision, scale) in Postgres.

  • precision = total number of digits
  • scale = digits to the right of the decimal point

NUMERIC(12, 2) stores up to ten digits before the decimal and two after — enough for 99,999,999.99. Storage is variable, roughly 2 bytes per group of 4 digits plus a small header. It is slower than fixed-width integers, but the arithmetic is exact.

-- Price in USD, accurate to the cent, up to ten million dollars
price NUMERIC(12, 2) NOT NULL

-- Tax rate as a percentage with four decimal places (e.g. 8.2500)
tax_rate NUMERIC(6, 4) NOT NULL

Floating Point — REAL and DOUBLE PRECISION

  • REAL — 4 bytes, ~6 significant digits (IEEE 754 single precision)
  • DOUBLE PRECISION — 8 bytes, ~15 significant digits (IEEE 754 double precision)

These are inexact. They are the right choice for scientific measurements, sensor readings, physics simulations, and machine learning features — places where a tiny rounding error is fine. They are the wrong choice for anything representing money, counts, or user-visible numbers that must add up.

The classic demonstration:

SELECT 0.1::DOUBLE PRECISION + 0.2::DOUBLE PRECISION;
-- Result: 0.30000000000000004

SELECT 0.1::NUMERIC + 0.2::NUMERIC;
-- Result: 0.3

String Types — CHAR, VARCHAR, VARCHAR(n), TEXT

Postgres gives you four string types, and here is the headline: in Postgres, they are all stored the same way internally and have essentially identical performance. The choice is about semantics and validation, not speed.

TypeMeaning
CHAR(n)Fixed length, blank-padded to exactly n characters
VARCHAR(n)Variable length with a maximum of n characters, enforced by the engine
VARCHARVariable length, unlimited
TEXTVariable length, unlimited (the canonical Postgres choice)

CHAR(n) is almost never what you want. It pads shorter values with trailing spaces, which confuses comparisons and wastes space. Only reach for it when you are modelling fixed-width codes that are always exactly n characters (ISO country codes, for example) and you really want the engine to enforce the length.

VARCHAR(n) is useful when you have a genuine business rule that says "this value must not exceed n characters" — for example, a Twitter post limited to 280 characters. In that case the length limit is data validation, not performance tuning.

TEXT is the right default for almost everything else: names, descriptions, comments, emails, URLs, tokens. It has no length limit, costs you nothing extra, and does not require guessing a maximum.

The VARCHAR(255) myth. A huge number of schemas use VARCHAR(255) for every string column because that is what some old tutorial said. The origin is MySQL: in MySQL 4.1 and earlier, 255 was the largest VARCHAR that could use a single byte for the length prefix, so it was a micro-optimization. In Postgres, that rule never applied. VARCHAR(255) in Postgres is exactly the same as TEXT with a weirdly specific and usually meaningless length cap. Pick TEXT unless you have a real reason.

MySQL difference: In MySQL, CHAR, VARCHAR(n), and TEXT are genuinely different storage types with different tradeoffs, and TEXT columns cannot have a default value or be fully indexed without a prefix length. The Postgres advice "just use TEXT" does not translate directly.


Boolean — BOOLEAN

Postgres has a real BOOLEAN type with three values: TRUE, FALSE, and NULL. It accepts TRUE, 't', 'true', 'y', 'yes', 'on', '1' as true, and the obvious opposites as false. Storage is 1 byte.

is_active BOOLEAN NOT NULL DEFAULT TRUE

MySQL difference: MySQL does not have a real BOOLEAN type. BOOLEAN and BOOL are just aliases for TINYINT(1) — a 1-byte integer where 0 means false and everything else means true. It works, but you lose the type-level guarantee, and SELECT is_active FROM users returns 1 and 0 instead of true and false.


Temporal Types — DATE, TIMESTAMP, TIMESTAMPTZ, INTERVAL

This is where the most painful bugs hide. Postgres gives you:

TypeSizeMeaning
DATE4 bytesCalendar date, no time, no zone
TIME8 bytesTime of day, no date, no zone
TIMESTAMP8 bytesDate + time, no timezone information
TIMESTAMPTZ8 bytesDate + time, stored as UTC, displayed in session zone
INTERVAL16 bytesA span of time (e.g. '3 days 4 hours')

The key insight: TIMESTAMP does not mean "UTC timestamp." It means "a date and time with no timezone attached." If you store '2026-04-13 14:00:00' in a TIMESTAMP column, the engine has no idea what zone that refers to. It is just text-as-a-number.

TIMESTAMPTZ (which is TIMESTAMP WITH TIME ZONE) is almost always what you want. Despite the name, it does not store a timezone. It stores a moment in UTC, and it uses the session's TimeZone setting to interpret input and format output. Two clients in New York and Tokyo inserting "right now" will store the same underlying UTC value, and each will see it in their own wall-clock time when they query.

-- WRONG: a booking time with no zone is ambiguous
booking_at TIMESTAMP NOT NULL

-- RIGHT: an unambiguous moment in time
booking_at TIMESTAMPTZ NOT NULL DEFAULT now()

Use DATE when you literally mean a calendar date with no time component — birthdays, holidays, report windows, fiscal periods. Use TIMESTAMP (without TZ) only for local wall-clock times that are intentionally zone-agnostic, such as "store opens at 09:00 local time" where local is resolved by the application.

MySQL difference: MySQL has DATETIME and TIMESTAMP, and they behave differently from each other and from Postgres. MySQL's TIMESTAMP is stored as UTC and converted on read (closer to Postgres TIMESTAMPTZ) but has a narrow range (1970 to 2038). MySQL's DATETIME stores the literal value with no zone (like Postgres TIMESTAMP). There is no direct equivalent to Postgres TIMESTAMPTZ with full range.


JSON and JSONB — Documents Inside a Row

Postgres has two JSON types:

  • JSON — stored as text exactly as written, re-parsed on every read. Preserves whitespace and key order.
  • JSONB — stored as a decomposed binary structure. Slightly slower to write, much faster to query, supports indexing (GIN), and is what you should almost always use.
-- Dynamic user preferences that vary per row
preferences JSONB NOT NULL DEFAULT '{}'::jsonb

JSONB lets you query inside the document:

-- Find users who have notifications enabled
SELECT id, email
FROM users
WHERE preferences->>'notifications' = 'true';

-- Index a specific path for fast lookups
CREATE INDEX idx_users_theme ON users ((preferences->>'theme'));

-- Or index the whole document for arbitrary key/value queries
CREATE INDEX idx_users_prefs ON users USING GIN (preferences);

When to reach for JSONB: genuinely schemaless data, user-defined fields, third-party API payloads you want to keep verbatim, feature-flag configs. When not to: anything with a stable shape you will query regularly. A real column is faster, smaller, type-checked, and far easier to reason about than a JSONB path expression.

MySQL difference: MySQL has a JSON type (since 5.7) that is closer to Postgres JSONB in behavior — it is stored binary and supports path queries — but indexing works differently and generally requires generated columns.


UUID — Globally Unique Identifiers

A UUID is a 128-bit (16-byte) identifier that is globally unique without coordination. Postgres has a built-in UUID type and operators, and you can generate them with gen_random_uuid() (from the pgcrypto extension, built in since Postgres 13).

id UUID PRIMARY KEY DEFAULT gen_random_uuid()

Why use UUID over BIGSERIAL:

  • IDs are generatable on the client before talking to the database (useful for offline-first apps and distributed systems)
  • No information leak — sequential IDs tell competitors how many users you have
  • Merging data from multiple databases does not collide

Why not:

  • 16 bytes vs 8 bytes for BIGINT — larger indexes, more memory
  • Random UUIDs (v4) are terrible for B-tree insert locality because every new row goes to a random page
  • UUID v7 (time-ordered, draft standard) fixes the locality problem and is the modern choice when your client or database supports it

Rule of thumb: BIGINT IDENTITY for internal tables where you never expose the ID. UUID v7 for anything external-facing or distributed.


Arrays — Multiple Values in One Column

Postgres has native array types. Any column type can be made into an array by adding [].

tags TEXT[] NOT NULL DEFAULT '{}'
scores INTEGER[]
-- Find rows where the tags array contains 'urgent'
SELECT * FROM tasks WHERE 'urgent' = ANY(tags);

-- GIN index for fast array containment queries
CREATE INDEX idx_tasks_tags ON tasks USING GIN (tags);

Arrays are ideal for small, bounded, naturally-ordered lists (tags, role names, scores per question). They are not a replacement for a join table — once you need to query or update individual elements, enforce foreign keys on them, or join against them, use a proper child table.

MySQL difference: MySQL has no native array type. The closest equivalents are JSON arrays or a child table.


Putting It All Together — One Table That Uses Every Type

-- A realistic users table that demonstrates each type category.
-- Run this on Postgres 13+ with the pgcrypto extension available.
CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE TABLE app_users (
  -- UUID primary key, generated server-side at insert time.
  -- gen_random_uuid() produces a random v4 UUID (16 bytes).
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),

  -- Internal sequential ID for compact indexing and joins.
  -- BIGINT IDENTITY is the SQL-standard way to declare an
  -- auto-incrementing column in Postgres 10+.
  internal_id BIGINT GENERATED ALWAYS AS IDENTITY UNIQUE,

  -- TEXT is the canonical Postgres string type — no length limit,
  -- no performance penalty, no VARCHAR(255) cargo cult.
  email TEXT NOT NULL UNIQUE,

  -- VARCHAR(n) is appropriate when the length is a genuine rule.
  -- A username capped at 32 characters is a business constraint,
  -- not a storage optimization.
  username VARCHAR(32) NOT NULL UNIQUE,

  -- A fixed 2-character country code: CHAR(2) is one of the
  -- rare justified uses of CHAR because the length is exact.
  country_code CHAR(2) NOT NULL,

  -- Real BOOLEAN, not TINYINT(1). Three-valued logic supported.
  is_active BOOLEAN NOT NULL DEFAULT TRUE,

  -- Exact decimal for money. NUMERIC(12, 2) holds up to
  -- 9,999,999,999.99 — more than enough for account balances.
  -- Never use FLOAT or DOUBLE PRECISION for currency.
  account_balance NUMERIC(12, 2) NOT NULL DEFAULT 0,

  -- FLOAT is fine for inexact measurements like a trust score.
  trust_score DOUBLE PRECISION NOT NULL DEFAULT 0.5,

  -- DATE is for calendar dates with no time component.
  -- A birthday in New York and Tokyo is the same date.
  date_of_birth DATE,

  -- TIMESTAMPTZ stores an absolute moment in UTC internally,
  -- and displays it in the session's configured timezone.
  -- This is almost always what you want for "created_at" columns.
  created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),

  -- TIMESTAMP (without TZ) for local wall-clock times that
  -- are intentionally zone-agnostic (e.g. "user prefers 09:00").
  preferred_contact_time TIMESTAMP,

  -- JSONB for flexible, queryable user preferences.
  -- Use JSONB, not JSON — binary storage, indexable, faster.
  preferences JSONB NOT NULL DEFAULT '{}'::jsonb,

  -- Native TEXT array for a small, bounded tag list.
  -- For anything larger or FK-constrained, use a child table.
  tags TEXT[] NOT NULL DEFAULT '{}'
);

-- A GIN index lets us query inside the JSONB document fast.
CREATE INDEX idx_app_users_prefs ON app_users USING GIN (preferences);

-- A GIN index on the tags array for containment queries.
CREATE INDEX idx_app_users_tags ON app_users USING GIN (tags);

-- Example insert showing how each type is written literally.
INSERT INTO app_users (
  email, username, country_code, account_balance,
  date_of_birth, preferences, tags
)
VALUES (
  'ada@example.com',
  'ada',
  'GB',
  1234.56,
  '1815-12-10',
  '{"theme": "dark", "notifications": true}'::jsonb,
  ARRAY['founder', 'vip']
);

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Central diagram: a PostgreSQL table schema with labeled columns color-coded by type category — sky blue (#4fc3f7) for numeric, white monospace for strings, rose (#ff5c8a) for temporal, sky blue for JSONB, rose for UUID. On the right, a size column showing 4B / 8B / 16B next to each type. At the top, a banner 'Right Type, Right Column'. Subtle grid pattern overlay."

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Split comparison labeled 'FLOAT vs NUMERIC for money'. LEFT panel shows '0.1 + 0.2 = 0.30000000000000004' in rose (#ff5c8a) with a red X. RIGHT panel shows '0.1 + 0.2 = 0.3' in sky blue (#4fc3f7) with a checkmark. Below, a receipt image with one cent circled. Title: 'Never Store Money in FLOAT'. White monospace font throughout."


Common Mistakes

1. Using FLOAT or DOUBLE PRECISION for money. Floating-point arithmetic is base-2, and most decimal fractions cannot be represented exactly. 0.1 + 0.2 is 0.30000000000000004, not 0.3. Summing a million tiny transactions produces visible drift that accountants notice and auditors care about. The fix is NUMERIC(p, s) with enough precision for your largest expected amount and scale equal to the smallest unit you track (2 for cents, 4 for basis points). It is slower than FLOAT, but "slower" here means microseconds per row, not anything a user will ever feel.

2. VARCHAR(255) cargo culting. The 255-character limit comes from a MySQL 4.x length-prefix optimization that was irrelevant even then for most queries and is entirely irrelevant in Postgres. In Postgres, VARCHAR(255), VARCHAR, and TEXT are stored the same way and have identical performance. Pick TEXT for unlimited strings and VARCHAR(n) only when n reflects a real business rule ("usernames must be under 32 characters"). Do not copy an arbitrary cap from an old schema without asking what it is enforcing.

3. Using TIMESTAMP instead of TIMESTAMPTZ. TIMESTAMP in Postgres stores a naked date+time with no timezone — the engine has no idea what zone that value refers to. It works in dev because everything happens to be UTC, then breaks in production when a user in Sydney inserts "now" and another process in New York reads it as local time. TIMESTAMPTZ stores an absolute moment in UTC and converts to the session zone on display, which is what "this event happened at this instant" actually means. Default to TIMESTAMPTZ for anything that represents a real-world event. The only legitimate TIMESTAMP (no TZ) use case is a literal wall-clock time intentionally detached from a zone.

4. Obsessing over TEXT vs VARCHAR performance in Postgres. This debate still shows up in code review in 2026. In Postgres, TEXT, VARCHAR, and VARCHAR(n) use exactly the same underlying storage (a variable-length varlena). There is no performance difference. The only reason to use VARCHAR(n) is to enforce a length constraint as validation, and even then a CHECK (length(column) <= n) constraint does the same thing more explicitly. Stop arguing about it and pick one.

5. Storing dates as strings. Every so often a schema appears with birthday VARCHAR(10) holding values like '1990-03-15' or, worse, '03/15/1990'. Now you cannot do date arithmetic, cannot sort correctly by date (string comparison of '10/01/1990' vs '2/01/1990' is wrong), cannot validate that February has 28 or 29 days, and cannot index efficiently for range queries. Use DATE. The engine will reject bad input, store it in 4 bytes, and let you write WHERE birthday BETWEEN '1980-01-01' AND '1989-12-31' with full index support.


Interview Questions

1. "Why should you use NUMERIC instead of FLOAT for monetary values?"

Because floating-point types use base-2 IEEE 754 representation, and most decimal fractions (like 0.1, 0.2, 0.99) have no exact binary form. Arithmetic on them introduces small rounding errors that compound. A classic demonstration: 0.1 + 0.2 in DOUBLE PRECISION is 0.30000000000000004, not 0.3. For a single row this is invisible, but summing a million transactions produces errors that accountants, auditors, and regulators care about. NUMERIC(precision, scale) (also spelled DECIMAL) uses an exact base-10 representation — every digit is stored literally — so arithmetic is exact within the declared scale. The tradeoff is that it is slower than fixed-width integers or floats, but "slower" here means a few microseconds per operation, which is irrelevant compared to the cost of miscounting money. The rule is simple: any column that represents currency, a quantity that must sum exactly, or a percentage that feeds into money — use NUMERIC. Use FLOAT or DOUBLE PRECISION only for inherently inexact measurements like sensor readings, ML features, or scientific computations.

2. "What is the difference between TIMESTAMP and TIMESTAMPTZ in PostgreSQL, and which should you use?"

TIMESTAMP WITHOUT TIME ZONE (usually written just TIMESTAMP) stores a date and time with no timezone information attached. The engine stores the literal values you gave it and has no idea what zone they refer to. TIMESTAMP WITH TIME ZONE (TIMESTAMPTZ) stores an absolute moment in UTC internally, converts input to UTC based on the client's session timezone, and converts output back to the session timezone on read. Despite the name, it does not actually store a timezone per row — it stores a UTC instant. For almost every real-world use case — "when did this event happen," "when was this row created," "when does this booking start" — you want TIMESTAMPTZ, because the question is about a specific moment that everyone in the world agrees on. The only legitimate use of plain TIMESTAMP is a wall-clock time that is intentionally zone-agnostic, like "the store opens at 09:00 local time" where the application resolves "local" at query time. A very common production bug is using TIMESTAMP because the name sounds right, having it work in dev because the server happens to be UTC, and then breaking when users in different zones insert or read values.

3. "When should you choose UUID over BIGSERIAL for a primary key, and what are the tradeoffs?"

Use UUID when IDs need to be generatable without talking to the database (offline-first mobile apps, distributed systems writing to multiple shards), when merging data from multiple databases without collisions, or when you want to avoid leaking information like "how many users we have" through sequential IDs. BIGSERIAL (or the SQL-standard BIGINT GENERATED ALWAYS AS IDENTITY) is 8 bytes, monotonically increasing, cache-friendly for B-tree indexes, and simpler to reason about. The tradeoffs for UUID are: (a) 16 bytes instead of 8, which doubles the size of every primary and foreign key index, (b) random UUIDs (v4) are terrible for insert locality because every new row lands on a random B-tree page, causing more I/O and write amplification, and (c) they are harder for humans to read and copy around. UUID v7, which is time-ordered, fixes the locality issue and is the modern choice for external IDs. A common pattern is to use BIGINT IDENTITY for internal primary keys and joins, plus a separate UUID column for external APIs — getting the index efficiency of BIGINT and the opacity of UUID where it matters.

4. "In PostgreSQL, what is the difference between VARCHAR(n), VARCHAR, and TEXT?"

Functionally and performance-wise in Postgres, they are all the same underlying storage — a variable-length varlena — and there is no performance difference between them. The only differences are semantic. VARCHAR(n) enforces a maximum length of n characters at insert/update time and rejects values that exceed it. VARCHAR (no length) and TEXT are interchangeable and allow strings of any size (up to 1 GB). The common recommendation in Postgres is to default to TEXT and only use VARCHAR(n) when n represents a real business rule — a username capped at 32 chars, a Twitter post capped at 280, an ISBN at 13. The widespread VARCHAR(255) pattern is a myth inherited from MySQL, where 255 used to be a storage optimization. In Postgres it has always been meaningless. Note that MySQL is genuinely different: there, CHAR, VARCHAR, and TEXT have different storage and indexing behaviors, so the "just use TEXT" advice is Postgres-specific.

5. "When should you use JSONB in PostgreSQL, and when should you use regular columns instead?"

Use JSONB when the shape of the data is genuinely dynamic or per-row — user preferences where every user has different keys, third-party API payloads you want to store verbatim, feature flag configurations, sparse attributes, or schemaless metadata. JSONB is stored in a decomposed binary format, supports operators like ->, ->>, and @> for path queries, and can be indexed with a GIN index for fast containment lookups or with an expression index on a specific path. The opposite case is any field with a stable, known shape that you will query or filter on regularly — email, status, created_at, user_id. A real typed column is smaller, faster, type-checked by the engine, enforceable with constraints and foreign keys, and far easier to query than data->>'email'. The failure mode is using JSONB as a lazy replacement for schema design: throwing every field into a data JSONB column so you can "add fields without migrations," which sacrifices type safety, referential integrity, query performance, and readability. JSONB is a scalpel for genuinely dynamic data, not a hammer for avoiding schema work.


Quick Reference — Data Types Cheat Sheet

+---------------------------------------------------------------+
|           POSTGRESQL DATA TYPES CHEAT SHEET                   |
+---------------------------------------------------------------+
|                                                                |
|  NUMERIC:                                                      |
|  SMALLINT         2B   -32K..32K                               |
|  INTEGER          4B   +/-2.1B         counts, small FKs       |
|  BIGINT           8B   +/-9.2 quint.   PKs, large counts       |
|  NUMERIC(p,s)    var   exact decimal   MONEY, rates            |
|  REAL             4B   ~6 digits       inexact scientific      |
|  DOUBLE PRECISION 8B   ~15 digits      inexact scientific      |
|                                                                |
|  STRING:                                                       |
|  CHAR(n)          n    blank-padded    fixed codes only        |
|  VARCHAR(n)      var   max n chars     real length rule        |
|  TEXT            var   unlimited       DEFAULT for strings     |
|                                                                |
|  BOOLEAN:                                                      |
|  BOOLEAN          1B   t/f/NULL        real boolean            |
|                                                                |
|  TEMPORAL:                                                     |
|  DATE             4B   YYYY-MM-DD      birthdays, holidays     |
|  TIME             8B   HH:MM:SS        time-of-day only        |
|  TIMESTAMP        8B   no zone         local wall-clock only   |
|  TIMESTAMPTZ      8B   UTC internally  DEFAULT for timestamps  |
|  INTERVAL        16B   '3 days 4 h'    durations               |
|                                                                |
|  SEMI-STRUCTURED:                                              |
|  JSON            var   text, reparsed  rarely correct          |
|  JSONB           var   binary+indexed  DEFAULT for JSON        |
|  ARRAY (type[])  var   native arrays   small bounded lists     |
|                                                                |
|  IDENTIFIERS:                                                  |
|  UUID            16B   globally unique external IDs, sharding  |
|  BIGINT IDENTITY  8B   sequential      internal PKs            |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           KEY RULES                                            |
+---------------------------------------------------------------+
|                                                                |
|  1. Money -> NUMERIC(p, s). Never FLOAT.                       |
|  2. Timestamps -> TIMESTAMPTZ. TIMESTAMP is almost never right |
|  3. Strings -> TEXT by default. VARCHAR(n) only for real caps  |
|  4. VARCHAR(255) is a MySQL 4.x myth. Stop copying it.         |
|  5. JSON -> JSONB, never plain JSON                            |
|  6. Primary keys -> BIGINT IDENTITY or UUID v7                 |
|  7. Dates -> DATE. Never store as VARCHAR.                     |
|  8. BOOLEAN is a real type in Postgres. Use it.                |
|                                                                |
+---------------------------------------------------------------+
NeedRight TypeWrong Type
Money, prices, balancesNUMERIC(12, 2)FLOAT, DOUBLE PRECISION
Created/updated timestampsTIMESTAMPTZTIMESTAMP, VARCHAR
Birthdays, holidaysDATETIMESTAMPTZ, VARCHAR(10)
Names, emails, descriptionsTEXTVARCHAR(255)
Username capped at 32VARCHAR(32)TEXT (no validation)
Country codeCHAR(2)VARCHAR(2)
Active flagBOOLEANSMALLINT, CHAR(1)
Internal primary keyBIGINT IDENTITYINT SERIAL (too small)
External, shareable IDUUID (v7 ideal)sequential INT
User preferences blobJSONBTEXT with parsed JSON
Small tag listTEXT[]comma-separated TEXT
Event durationINTERVALtwo TIMESTAMPTZ columns

Prev: Lesson 1.1 -- What is a Database Next: Lesson 1.3 -- Constraints


This is Lesson 1.2 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page