Database Interview Prep
Normalization

Third Normal Form (3NF)

Killing Transitive Dependencies

LinkedIn Hook

"You changed the city for one ZIP code. The other 4,217 rows still say the old city."

Most teams normalize to 2NF, see no obvious duplication, and stop. Then a customer moves, an address gets edited, and a single UPDATE leaves the database in two contradictory states at once. The query that powers your shipping label generator now produces both 'Brooklyn' and 'Brooklin' for the same ZIP, and nobody knows which one is right.

Third Normal Form is the rule that catches this: every non-key column must depend on the key, the whole key, and nothing but the key. If zip_code -> city and zip_code is itself a non-key column, then city is depending on a non-key — a transitive dependency — and you have just built yourself an update anomaly factory.

The fix is mechanical. Find every chain key -> A -> B, lift the dangling end (B) into its own table keyed by A, and replace the original column with a foreign key. The data shrinks, the anomalies vanish, and the only place a city name lives is the one row that owns it.

In Lesson 10.4, I break down 3NF: the formal definition, the ZIP-code-to-city anomaly that every interview asks about, the difference between 2NF and 3NF, and the runnable PostgreSQL refactor that turns a broken table into a clean one.

Read the full lesson -> [link]

#SQL #Database #PostgreSQL #Normalization #DataModeling #BackendDevelopment #InterviewPrep


Third Normal Form (3NF) thumbnail


What You'll Learn

  • The formal definition of Third Normal Form and the famous Codd mnemonic about depending on the key
  • What a transitive dependency is and why it is the silent killer of 2NF designs
  • The classic zip_code -> city -> state anomaly and the three update bugs it creates
  • How 3NF differs from 2NF — partial dependencies vs transitive dependencies
  • A step-by-step PostgreSQL refactor of a broken customers table into a clean 3NF schema
  • The difference between 3NF and BCNF (and why most apps stop at 3NF)
  • Insertion, update, and deletion anomalies that 3NF eliminates
  • When to deliberately denormalize away from 3NF for read performance

The Office Building Directory Analogy — Why Department Names Belong on One Door

Imagine a giant office building with one paper directory at the front desk. Each row of the directory lists an employee: name, employee ID, office number, department code, and department name. Five hundred employees, fifty departments. The directory is "organized" because every column tells you something useful — but watch what happens when the marketing department renames itself from "Marketing" to "Brand & Growth."

The poor receptionist has to walk down the list and update every single row that has department code MKT. Forty-two employees. Forty-two edits. If she misses one, the directory now shows two department names for the same code, and the next visitor who looks up "MKT" sees a contradiction. There is no single source of truth for "what is department MKT called?" — the answer is scattered across forty-two places that are supposed to agree but might not.

Now imagine the directory is split. The employee directory lists name, ID, office, and department code — nothing else. A separate one-page sheet lists each department code and its name. Fifty rows, one per department. When marketing renames itself, the receptionist updates a single row on the department sheet. Done. Every employee row still references MKT, and MKT now resolves to the new name automatically.

That is exactly the 3NF refactor. The employee table was storing department_name even though department_name is determined by department_code, and department_code is not the table's primary key. The chain employee_id -> department_code -> department_name is a transitive dependency: a non-key column (department_name) depending on another non-key column (department_code). Third Normal Form bans this. Lift department_name out into a departments table keyed by department_code, and the anomaly disappears.

+---------------------------------------------------------------+
|           BEFORE 3NF — Transitive Dependency Hidden In Plain  |
|           Sight                                                |
+---------------------------------------------------------------+
|                                                                |
|  employees                                                     |
|  +----+--------+--------+----------+------------------+        |
|  | id | name   | office | dept_code | dept_name       |        |
|  +----+--------+--------+----------+------------------+        |
|  |  1 | Alice  | 301    | MKT      | Marketing       |        |
|  |  2 | Bob    | 302    | MKT      | Marketing       |        |
|  |  3 | Carol  | 410    | ENG      | Engineering     |        |
|  |  4 | Dave   | 411    | ENG      | Engineering     |        |
|  |  5 | Eve    | 412    | ENG      | Enginering      |  <-- typo
|  +----+--------+--------+----------+------------------+        |
|                                                                |
|  Functional dependencies:                                      |
|    id          -> name, office, dept_code                      |
|    dept_code   -> dept_name     <-- transitive!                |
|                                                                |
|  Result: dept_name is duplicated, and row 5 contradicts        |
|  rows 3 and 4. The database has no way to enforce that         |
|  ENG always means "Engineering".                                |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           AFTER 3NF — Transitive Dependency Lifted Out        |
+---------------------------------------------------------------+
|                                                                |
|  employees                                                     |
|  +----+--------+--------+-----------+                          |
|  | id | name   | office | dept_code |                          |
|  +----+--------+--------+-----------+                          |
|  |  1 | Alice  | 301    | MKT       |                          |
|  |  2 | Bob    | 302    | MKT       |                          |
|  |  3 | Carol  | 410    | ENG       |                          |
|  |  4 | Dave   | 411    | ENG       |                          |
|  |  5 | Eve    | 412    | ENG       |                          |
|  +----+--------+--------+-----------+                          |
|              |                                                  |
|              | FK references departments.code                  |
|              v                                                  |
|  departments                                                   |
|  +-----------+--------------+                                   |
|  | code (PK) | name         |                                   |
|  +-----------+--------------+                                   |
|  | MKT       | Marketing    |                                   |
|  | ENG       | Engineering  |                                   |
|  +-----------+--------------+                                   |
|                                                                |
|  Now dept_name lives in exactly ONE row. A rename is a         |
|  single UPDATE. A typo is impossible because the FK rejects    |
|  any code that is not in departments.                          |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Split comparison: LEFT side labeled 'BEFORE 3NF' shows a wide employees table with a rose (#ff5c8a) highlight on the duplicated dept_name column and a red typo 'Enginering' on one row. A rose dashed arrow connects 'dept_code' to 'dept_name' labeled 'transitive dependency'. RIGHT side labeled 'AFTER 3NF' shows a narrower employees table and a small departments lookup table joined by a sky blue (#4fc3f7) foreign key line. White monospace labels throughout. Title at top: 'Lift the Transitive Dependency'."


The Formal Definition

A relation is in Third Normal Form if and only if:

  1. It is already in Second Normal Form (no partial dependencies on a composite key), and
  2. No non-prime attribute is transitively dependent on any candidate key.

A "non-prime attribute" is any column that is not part of any candidate key. A "transitive dependency" exists when A -> B and B -> C and B is not itself a key. The chain A -> B -> C makes C depend on A only by way of B — and that indirection is exactly what 3NF forbids.

The shortcut version every interviewer wants to hear is Codd's mnemonic, polished by Kent in 1983:

"Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key."

  • "The key" — 1NF (atomic values, a real key exists).
  • "The whole key" — 2NF (no partial dependency on a composite key).
  • "Nothing but the key" — 3NF (no transitive dependency through another non-key column).

If a column tells you something about a non-key column instead of about the key, it does not belong in this table.


The ZIP Code Anomaly — The Canonical Example

Every database textbook uses the same example because it is impossible to misread. Consider an order-shipping table:

+----+---------+---------------+----------+----------+-------+
| id | name    | street        | zip_code | city     | state |
+----+---------+---------------+----------+----------+-------+
|  1 | Alice   | 1 Main St     | 11201    | Brooklyn | NY    |
|  2 | Bob     | 2 Elm St      | 11201    | Brooklyn | NY    |
|  3 | Carol   | 9 Oak Ave     | 11201    | Brooklin | NY    |  <-- typo
|  4 | Dave    | 4 Pine St     | 94110    | SF       | CA    |
|  5 | Eve     | 5 Cedar Rd    | 94110    | S.F.     | CA    |  <-- inconsistent
+----+---------+---------------+----------+----------+-------+

The functional dependencies are:

id        -> name, street, zip_code     (key determines its row)
zip_code  -> city, state                 (a ZIP belongs to one city/state)

The second line is the problem. city and state do not depend on id directly — they depend on zip_code, which is itself a non-key column. The chain is id -> zip_code -> city, a textbook transitive dependency.

What goes wrong:

  • Update anomaly. Brooklyn renames a neighborhood and ZIP 11201 should now show "Brooklyn Heights." You have to update every row that uses 11201. Miss one and you have two cities for one ZIP forever.
  • Insertion anomaly. You want to record that ZIP 99950 is in Ketchikan, AK — but nobody from Ketchikan has placed an order yet. There is nowhere to put that fact, because the table is keyed on customer id. The information has no home until a customer happens to need it.
  • Deletion anomaly. Carol is the only customer in ZIP 11215. You delete her row. You just lost the only record that 11215 maps to Park Slope, Brooklyn. The fact is gone even though it had nothing to do with Carol personally.

3NF says: lift (zip_code -> city, state) out into its own table.

customers                              zip_codes
+----+-------+-----------+----------+   +----------+----------+-------+
| id | name  | street    | zip_code |   | zip (PK) | city     | state |
+----+-------+-----------+----------+   +----------+----------+-------+
|  1 | Alice | 1 Main St | 11201    |   | 11201    | Brooklyn | NY    |
|  2 | Bob   | 2 Elm St  | 11201    |   | 94110    | SF       | CA    |
|  3 | Carol | 9 Oak Ave | 11201    |   | 99950    | Ketchikan| AK    |
|  4 | Dave  | 4 Pine St | 94110    |   +----------+----------+-------+
|  5 | Eve   | 5 Cedar Rd| 94110    |
+----+-------+-----------+----------+
                |
                +---- FK references zip_codes.zip

Each ZIP-to-city fact lives in exactly one row. Renames are one UPDATE. New ZIPs can be inserted with no customer attached. Deleting a customer cannot delete a ZIP record.


Example 1 — Spotting and Fixing a 3NF Violation in PostgreSQL

Let us build the broken table and walk through the refactor end to end. This is a runnable script — paste it into psql and watch the anomalies appear, then disappear.

-- The broken table — 2NF compliant, but NOT 3NF.
-- id is the only candidate key, so there is no partial dependency
-- (2NF is automatically satisfied for any single-column key). But
-- city and state depend on zip_code, not on id. That is a transitive
-- dependency and a 3NF violation.
CREATE TABLE customers_bad (
  id        SERIAL PRIMARY KEY,
  name      TEXT NOT NULL,
  street    TEXT NOT NULL,
  zip_code  TEXT NOT NULL,
  city      TEXT NOT NULL,
  state     CHAR(2) NOT NULL
);

INSERT INTO customers_bad (name, street, zip_code, city, state) VALUES
  ('Alice', '1 Main St',  '11201', 'Brooklyn', 'NY'),
  ('Bob',   '2 Elm St',   '11201', 'Brooklyn', 'NY'),
  ('Carol', '9 Oak Ave',  '11201', 'Brooklin', 'NY'),  -- typo
  ('Dave',  '4 Pine St',  '94110', 'SF',       'CA'),
  ('Eve',   '5 Cedar Rd', '94110', 'S.F.',     'CA');  -- inconsistent

-- Demonstrate the anomaly: one ZIP, three different city spellings
SELECT zip_code, COUNT(DISTINCT city) AS distinct_cities
FROM customers_bad
GROUP BY zip_code;
 zip_code | distinct_cities
----------+-----------------
 11201    |               2
 94110    |               2
(2 rows)

Two ZIPs, four different city spellings. The database has no way to stop this because nothing constrains city to be a function of ZIP.

Now the refactor:

-- Step 1: create the lookup table for the dependency we are lifting out.
CREATE TABLE zip_codes (
  zip   TEXT    PRIMARY KEY,
  city  TEXT    NOT NULL,
  state CHAR(2) NOT NULL
);

-- Step 2: seed it with the canonical, deduplicated values.
INSERT INTO zip_codes (zip, city, state) VALUES
  ('11201', 'Brooklyn', 'NY'),
  ('94110', 'San Francisco', 'CA');

-- Step 3: create the clean customers table that references it.
CREATE TABLE customers (
  id       SERIAL PRIMARY KEY,
  name     TEXT NOT NULL,
  street   TEXT NOT NULL,
  zip_code TEXT NOT NULL REFERENCES zip_codes(zip)
);

-- Step 4: migrate the data. Note we no longer carry city/state.
INSERT INTO customers (name, street, zip_code)
SELECT name, street, zip_code FROM customers_bad;

-- Step 5: prove the FK now enforces consistency. This must FAIL.
INSERT INTO customers (name, street, zip_code)
VALUES ('Frank', '6 Birch Ln', '00000');
ERROR:  insert or update on table "customers" violates foreign key
constraint "customers_zip_code_fkey"
DETAIL:  Key (zip_code)=(00000) is not present in table "zip_codes".

The bad insert is impossible. The database now refuses to record a customer whose ZIP is not a real, known ZIP. And renaming "SF" to "San Francisco" is a single UPDATE on a single row:

UPDATE zip_codes SET city = 'San Francisco' WHERE zip = '94110';

-- A join shows the new value for everyone who lives in 94110
SELECT c.name, c.street, z.city, z.state
FROM customers c
JOIN zip_codes z ON z.zip = c.zip_code
ORDER BY c.id;
 name  |   street    |     city      | state
-------+-------------+---------------+-------
 Alice | 1 Main St   | Brooklyn      | NY
 Bob   | 2 Elm St    | Brooklyn      | NY
 Carol | 9 Oak Ave   | Brooklyn      | NY
 Dave  | 4 Pine St   | San Francisco | CA
 Eve   | 5 Cedar Rd  | San Francisco | CA
(5 rows)

Five customers, one city update. The typo for Carol is gone — she gets the canonical value automatically because the city is no longer stored on her row at all.


Example 2 — Employees and Departments

A second classic. The employees table stores department_name and department_manager_email even though both are determined by department_id.

CREATE TABLE employees_bad (
  id                       SERIAL PRIMARY KEY,
  name                     TEXT NOT NULL,
  email                    TEXT NOT NULL,
  department_id            INT NOT NULL,
  department_name          TEXT NOT NULL,
  department_manager_email TEXT NOT NULL
);

INSERT INTO employees_bad
  (name, email, department_id, department_name, department_manager_email) VALUES
  ('Alice', 'alice@co.io', 10, 'Engineering', 'cto@co.io'),
  ('Bob',   'bob@co.io',   10, 'Engineering', 'cto@co.io'),
  ('Carol', 'carol@co.io', 20, 'Marketing',   'cmo@co.io'),
  ('Dave',  'dave@co.io',  10, 'Engineering', 'cto@co.io');

-- Functional dependencies:
--   id            -> name, email, department_id
--   department_id -> department_name, department_manager_email
-- The second line is transitive. 3NF violation.

The 3NF refactor splits departments out:

CREATE TABLE departments (
  id            SERIAL PRIMARY KEY,
  name          TEXT NOT NULL UNIQUE,
  manager_email TEXT NOT NULL
);

INSERT INTO departments (id, name, manager_email) VALUES
  (10, 'Engineering', 'cto@co.io'),
  (20, 'Marketing',   'cmo@co.io');

CREATE TABLE employees (
  id            SERIAL PRIMARY KEY,
  name          TEXT NOT NULL,
  email         TEXT NOT NULL UNIQUE,
  department_id INT  NOT NULL REFERENCES departments(id)
);

INSERT INTO employees (name, email, department_id)
SELECT name, email, department_id FROM employees_bad;

-- Promote a new CTO. Single row update; everyone sees it on the next join.
UPDATE departments SET manager_email = 'new-cto@co.io' WHERE id = 10;

SELECT e.name, d.name AS dept, d.manager_email
FROM employees e
JOIN departments d ON d.id = e.department_id
ORDER BY e.id;
 name  |    dept     |  manager_email
-------+-------------+-----------------
 Alice | Engineering | new-cto@co.io
 Bob   | Engineering | new-cto@co.io
 Carol | Marketing   | cmo@co.io
 Dave  | Engineering | new-cto@co.io
(4 rows)

One write, three correct rows. And inserting a new department before any employee joins it is now legal — there is a departments table that holds the fact independently.


Example 3 — A Subtle Violation You Almost Miss

Not every transitive dependency screams "ZIP code." Watch this one:

CREATE TABLE invoices_bad (
  id              SERIAL PRIMARY KEY,
  customer_id     INT NOT NULL,
  amount_cents    INT NOT NULL,
  tax_rate_pct    NUMERIC(5,2) NOT NULL,  -- e.g. 8.25
  tax_cents       INT NOT NULL             -- = amount_cents * tax_rate_pct / 100
);

tax_cents is fully determined by amount_cents and tax_rate_pct. It is a derived value — and that derivation is a kind of transitive dependency from the key through other non-key columns. If anyone ever updates tax_rate_pct without recomputing tax_cents, the row contradicts itself.

The 3NF-aligned fix is to not store the derived column at all. Compute it on read with a generated column or a view:

CREATE TABLE invoices (
  id           SERIAL PRIMARY KEY,
  customer_id  INT NOT NULL,
  amount_cents INT NOT NULL,
  tax_rate_pct NUMERIC(5,2) NOT NULL,
  tax_cents    INT GENERATED ALWAYS AS
    (ROUND(amount_cents * tax_rate_pct / 100)) STORED
);

INSERT INTO invoices (customer_id, amount_cents, tax_rate_pct)
VALUES (1, 10000, 8.25), (2, 25000, 8.25), (3, 5000, 6.00);

SELECT id, amount_cents, tax_rate_pct, tax_cents FROM invoices;
 id | amount_cents | tax_rate_pct | tax_cents
----+--------------+--------------+-----------
  1 |        10000 |         8.25 |       825
  2 |        25000 |         8.25 |      2063
  3 |         5000 |         6.00 |       300
(3 rows)

The generated column physically stores the derived value but the database — not the application — guarantees it stays consistent with its inputs. There is no UPDATE path that can leave the row contradictory.


2NF vs 3NF — The Exact Difference

Both rules are about non-key attributes depending on the wrong thing. The difference is what kind of wrong thing.

+---------------------------------------------------------------+
|           2NF vs 3NF — Side By Side                           |
+---------------------------------------------------------------+
|                                                                |
|  2NF VIOLATION (partial dependency):                           |
|    Composite key (order_id, product_id)                        |
|    product_name depends only on product_id                     |
|    -> product_name is determined by PART of the key            |
|                                                                |
|  3NF VIOLATION (transitive dependency):                        |
|    Single key id                                               |
|    zip_code depends on id   (fine)                             |
|    city     depends on zip_code  (NOT fine)                    |
|    -> city is determined by a NON-KEY column                   |
|                                                                |
|  RULE OF THUMB:                                                |
|    2NF asks: "does this column depend on the WHOLE key?"       |
|    3NF asks: "does this column depend on the KEY at all,       |
|               or is it really about another column?"           |
|                                                                |
+---------------------------------------------------------------+

If your table has only a single-column primary key, you cannot have a 2NF violation by definition (there is no "part" of a one-column key). Tables with single-column surrogate keys (SERIAL PRIMARY KEY) skip 2NF entirely and only need to worry about 3NF. This is one reason surrogate keys are popular in practice — they remove an entire class of normalization mistakes.


3NF vs BCNF — A Quick Preview

3NF says non-key columns must depend only on the key. Boyce-Codd Normal Form tightens this: every non-trivial functional dependency must have a superkey on its left side. The difference only matters when a table has multiple overlapping candidate keys, which is rare in application schemas. For 99% of real-world tables, 3NF and BCNF are the same thing in practice. Lesson 10.5 covers the cases where they diverge.


When to Deliberately Denormalize Away From 3NF

3NF is a default, not a religion. There are real reasons to store derived or duplicated data:

  • Read-heavy reporting tables. Joining to zip_codes on every query may cost more than the storage. Materialized views or denormalized read tables are fair game.
  • Immutable historical snapshots. An orders table should store the customer's address as it was at the time of the order, not via a join to the current address. That is not a 3NF violation — it is a different fact ("address at order time") that genuinely belongs on the order row.
  • Full-text search and analytics. Columns like tsvector summaries or pre-aggregated counters are denormalized on purpose for performance.

The rule: normalize first, then denormalize with a clear, written reason and a mechanism (trigger, materialized view, scheduled job) to keep the duplicated data consistent.


Common Mistakes

1. Stopping at 2NF and assuming the design is "normalized enough." 2NF only catches partial dependencies on composite keys. A table with a single-column surrogate key is automatically in 2NF — but it can still have brutal 3NF violations. Always run the 3NF check separately: for each non-key column, ask "is this fact really about the primary key, or is it about another column in this row?" If the answer is "about another column," lift it out.

2. Confusing immutable historical snapshots with 3NF violations. Storing the customer's address on an orders row is not a violation if the intent is "address at the time of purchase." That is a different functional fact than "current address." The mistake is normalizing too aggressively and breaking historical accuracy. The test: would changing the source row in the parent table also be expected to retroactively change this row? If yes, it is a violation. If no, it is a snapshot.

3. Storing derived columns without a generated column or trigger. Putting tax_cents next to amount_cents and tax_rate_pct is a transitive dependency through computation. If you really need the column physically present (for indexing, for example), use GENERATED ALWAYS AS ... STORED so the database — not your application — owns the invariant. Never let two code paths compute the same value and trust them to agree.

4. Lifting out a dependency but forgetting the foreign key. A 3NF refactor that creates a zip_codes table but does not add REFERENCES zip_codes(zip) to customers.zip_code only solves half the problem. Without the FK, you can still insert a customer with a ZIP that does not exist in the lookup table, and you are back to inconsistencies. The structural change and the constraint are one fix, not two.

5. Treating 3NF as a performance enemy. Many engineers denormalize "for speed" before they have measured anything. A 3NF schema with proper indexes and joins is fast for the vast majority of OLTP workloads — Postgres is extremely good at hash and merge joins. Denormalize only when you have a profiled query that proves the join is the bottleneck, and document the tradeoff so the next person knows the duplication is intentional.


Interview Questions

1. "Define Third Normal Form in your own words and give a one-sentence example of a violation."

A relation is in 3NF if it is in 2NF and no non-key column is transitively dependent on the primary key — meaning no non-key column is determined by another non-key column. The mnemonic is "every non-key attribute must depend on the key, the whole key, and nothing but the key." A one-sentence example: storing zip_code, city, and state on a customers table is a 3NF violation because city and state are determined by zip_code (a non-key column), not directly by the customer's primary key, so the same ZIP can end up with conflicting city values across different rows.

2. "Explain the difference between 2NF and 3NF. Can a table be in 2NF but not 3NF?"

Yes — that is exactly the case 3NF is designed to catch. 2NF only forbids partial dependencies on a composite key: a non-key column depending on only part of a multi-column key. 3NF forbids transitive dependencies: a non-key column depending on another non-key column, regardless of whether the key is composite. A table with a single-column primary key is automatically in 2NF (there is no "part" of a one-column key to depend on partially) but it can still be in violation of 3NF. The classic example is a customers table with a SERIAL PRIMARY KEY that stores zip_code, city, and state — it is trivially in 2NF, but it has an obvious transitive dependency id -> zip_code -> city, so it is not in 3NF. The 2NF check and the 3NF check are separate steps and both must be performed.

3. "Walk me through the anomalies that a 3NF violation creates. Use the customer/ZIP example."

There are three classical anomalies. The update anomaly: if ZIP 11201 is renamed from "Brooklyn" to "Brooklyn Heights," every row using that ZIP must be updated, and missing one creates a permanent contradiction. The insertion anomaly: you cannot record the fact that ZIP 99950 belongs to Ketchikan, AK until at least one customer with that ZIP exists, because the table is keyed on customer — there is no place for ZIP-to-city facts to live independently. The deletion anomaly: if the only customer in ZIP 11215 is deleted, the database also loses the only record that 11215 maps to Park Slope, even though the deletion had nothing to do with that fact. All three vanish once (zip_code -> city, state) is lifted into a zip_codes table with zip as its primary key, and customers.zip_code becomes a foreign key reference.

4. "When is it acceptable to violate 3NF deliberately?"

When the duplication serves a clear, documented purpose that 3NF cannot express. The most common cases are immutable historical snapshots — for example, an orders row should record the shipping address as it was at the time of the order, not a foreign key to a mutable addresses table, because changing the customer's current address must not retroactively rewrite past invoices. Other valid cases include materialized views and denormalized read-side tables for high-throughput reporting, full-text search indexes (tsvector), and pre-aggregated counters where the cost of computing the aggregate live exceeds the cost of keeping it in sync. In every case, the rule is: normalize first, then denormalize with a written reason and a mechanism (trigger, generated column, scheduled job, or application-level invariant) that keeps the duplicate consistent.

5. "How do you actually find 3NF violations in an existing schema during a code review?"

I scan each table looking for groups of columns that "travel together" — a set of columns that are always set, updated, and meaningful as a unit, and that look like they describe something other than the row's primary entity. Examples: zip_code, city, state traveling together on a customers table (they describe a place, not a customer); department_name, department_manager_email on an employees table (they describe a department); currency_code, currency_symbol, currency_name on a transactions table. For each such group, I ask "is there a non-key column here that determines the others?" If yes, that is a transitive dependency and a 3NF candidate to extract. I also look for _name columns sitting next to _id columns — those are almost always denormalized lookups that should live in a separate table behind a foreign key.


Quick Reference — 3NF Cheat Sheet

+---------------------------------------------------------------+
|           3NF AT A GLANCE                                      |
+---------------------------------------------------------------+
|                                                                |
|  DEFINITION:                                                   |
|    1. Already in 2NF                                           |
|    2. No non-key column transitively depends on the key        |
|       (i.e. no non-key -> non-key dependency)                  |
|                                                                |
|  MNEMONIC (Codd / Kent):                                       |
|    "Every non-key attribute must depend on the key,            |
|     the whole key, and nothing but the key."                   |
|                                                                |
|  THE VIOLATION SIGNATURE:                                      |
|    id          -> non_key_A                                    |
|    non_key_A   -> non_key_B   <-- transitive, kill this        |
|                                                                |
|  THE FIX:                                                      |
|    1. Create a new table keyed on non_key_A                    |
|    2. Move non_key_B (and any other non_key_A-dependents)      |
|       into the new table                                       |
|    3. Drop those columns from the original table               |
|    4. Add a FOREIGN KEY from original.non_key_A                |
|       to new_table.non_key_A                                   |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           ANOMALIES 3NF ELIMINATES                             |
+---------------------------------------------------------------+
|                                                                |
|  UPDATE   : rename one fact -> N rows must change              |
|  INSERT   : cannot record a fact until a parent row exists     |
|  DELETE   : removing a parent row destroys an independent fact |
|                                                                |
+---------------------------------------------------------------+
SmellLikely ViolationFix
zip_code, city, state on customers3NF: zip_code -> cityLift to zip_codes table
department_name next to department_id3NF: dept_id -> dept_nameLift to departments table
currency_symbol next to currency_code3NF: code -> symbolLift to currencies table
tax_cents stored alongside amount and rateDerived/transitiveGENERATED ALWAYS AS ... STORED
Same _name repeated across thousands of rows3NF lookupExtract + FK
Address on an orders rowNOT a violationSnapshot — leave alone

Prev: Lesson 10.3 -- Second Normal Form Next: Lesson 10.5 -- Boyce-Codd Normal Form (BCNF)


This is Lesson 10.4 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page