LinkedIn Hook

"Your schema is in 3NF. Your data still has anomalies. How is that possible?"

Most engineers stop at Third Normal Form and call it a day. 3NF is good enough for the vast majority of schemas, and textbooks treat it as the practical finish line. But there is a sharper, stricter cousin lurking just beyond it — Boyce-Codd Normal Form — and it exists because 3NF leaves one specific loophole open.

The loophole shows up when a table has overlapping candidate keys. Imagine a course-scheduling table where (student, course) is unique and (student, instructor) is also unique, because every instructor only teaches one course. The table satisfies 3NF — every non-key attribute depends on a key. But there is still a hidden functional dependency from instructor to course, and that dependency causes update anomalies that 3NF does not catch.

BCNF closes the loophole with one rule: for every functional dependency X -> Y in your table, X must be a superkey. No exceptions. No "well, the right-hand side is a prime attribute, so it's fine." If a determinant is not a candidate key, you decompose.

In Lesson 10.5, I break down BCNF in PostgreSQL: the formal definition, the classic overlapping-candidate-key example, the decomposition algorithm, and the trade-off you make when BCNF and dependency preservation collide.

Read the full lesson -> [link]

#PostgreSQL #DatabaseDesign #Normalization #BCNF #SQL #InterviewPrep

Boyce-Codd Normal Form (BCNF) thumbnail

What You'll Learn

The formal definition of BCNF and how it differs from 3NF in one precise rule
Why 3NF can leave update anomalies behind even when it is technically satisfied
The classic overlapping-candidate-key example that motivates BCNF
How to find a BCNF-violating functional dependency in any schema
The decomposition algorithm to convert a 3NF table into BCNF
The trade-off between BCNF and dependency preservation, and when to stop at 3NF
How to verify BCNF compliance against a list of functional dependencies

The Two Locks On A Door — Why One More Rule Matters

Imagine your front door has two locks. The lower lock keeps out casual intruders and works ninety-nine percent of the time. The upper deadbolt is slower to engage and you almost never need it — but on the one night a determined thief shows up, only the deadbolt actually keeps them out. Most homeowners never bother with the deadbolt. They live their entire lives without an incident. But the people who do install it sleep a little better, because they know there is no scenario where their door is "almost locked."

Third Normal Form is the lower lock. It catches the obvious problems: transitive dependencies, redundant non-key data, the kind of mistakes a junior engineer makes their first week. For ninety-nine percent of schemas, 3NF is enough — the data is clean, the anomalies are gone, and shipping is more important than perfection. But there is a specific situation where 3NF is "almost locked." It happens when a table has two or more candidate keys that share an attribute, and one of the non-key columns turns out to functionally determine part of a key. 3NF technically allows this. BCNF does not.

That is exactly what Boyce-Codd Normal Form is — the deadbolt on top of 3NF. One rule, ruthlessly applied: for every non-trivial functional dependency X -> Y in the table, X must be a superkey. No "but the right-hand side is part of some key" exception. No "well, it's already 3NF." If a determinant is not a candidate key, the table is not in BCNF, and you decompose. The result is a schema where every fact about the data is anchored to a key, every update touches exactly one row, and there is genuinely no way for two copies of the same fact to disagree.

+---------------------------------------------------------------+
|           3NF (THE LOWER LOCK)                                |
+---------------------------------------------------------------+
|                                                                |
|  Rule: Every non-key attribute must depend on the key,        |
|        the whole key, and nothing but the key.                |
|                                                                |
|  Loophole: A non-key column may determine PART of a key       |
|            (a "prime attribute") and 3NF still passes.        |
|                                                                |
|  Example FD that 3NF allows but BCNF rejects:                 |
|     instructor -> course                                       |
|     (where 'course' is part of the candidate key)              |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           BCNF (THE DEADBOLT)                                  |
+---------------------------------------------------------------+
|                                                                |
|  Rule: For every non-trivial FD X -> Y in the table,          |
|        X must be a SUPERKEY. No exceptions.                   |
|                                                                |
|  Effect: Every determinant is a candidate key. Every fact     |
|          lives in exactly one row. Update anomalies cannot    |
|          exist by construction.                               |
|                                                                |
|  Cost: Some BCNF decompositions lose the ability to enforce   |
|        an original functional dependency without a join.     |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Split comparison: LEFT side labeled '3NF' shows a single table with overlapping candidate keys highlighted in pink (#ff5c8a) and a small red warning icon next to a 'instructor -> course' arrow. RIGHT side labeled 'BCNF' shows the same data split into two clean tables connected by a sky-blue (#4fc3f7) foreign key arrow with a glowing padlock icon. White monospace labels. A pink arrow at the bottom labeled 'decompose'."

The Formal Definition

A relation R is in Boyce-Codd Normal Form if and only if, for every non-trivial functional dependency X -> Y that holds on R, X is a superkey of R.

Unpacking the jargon:

Functional dependency X -> Y means: if you know the value of X, you can determine exactly one value of Y. "Knowing the SSN tells you the person's birth date" is a functional dependency from SSN to birth date.
Non-trivial means Y is not a subset of X. "Knowing first_name and last_name tells you first_name" is trivial and does not count.
Superkey means a set of columns that uniquely identifies every row. Every candidate key is a superkey, and so is any superset of a candidate key.

Compare this to 3NF, which says: for every non-trivial FD X -> Y, either X is a superkey or Y is a prime attribute (a column that belongs to some candidate key). BCNF removes the second escape hatch. That single removal is the entire difference.

+---------------------------------------------------------------+
|           3NF vs BCNF — THE ONE LINE DIFFERENCE               |
+---------------------------------------------------------------+
|                                                                |
|  3NF:  For every FD X -> Y, either                            |
|          (a) X is a superkey, OR                              |
|          (b) Y is part of some candidate key                  |
|                                                                |
|  BCNF: For every FD X -> Y,                                   |
|          (a) X is a superkey                                  |
|                                                                |
|        ^^^ option (b) deleted ^^^                             |
|                                                                |
+---------------------------------------------------------------+

The takeaway: BCNF is strictly stronger. Every BCNF table is also in 3NF, but not every 3NF table is in BCNF. The gap between them is small in practice but real, and the next section shows exactly where it shows up.

The Classic Overlapping-Candidate-Key Example

Consider a university scheduling table. The rules of the university are:

A student can take many courses, and a course has many students.
Each course is taught by exactly one instructor (no team teaching).
An instructor teaches exactly one course (the university is small).

We want to record which student is taught by which instructor in which course. The naive design is one table:

CREATE TABLE class_enrollment (
    student_id  INT  NOT NULL,
    course      TEXT NOT NULL,
    instructor  TEXT NOT NULL,
    PRIMARY KEY (student_id, course)
);

INSERT INTO class_enrollment (student_id, course, instructor) VALUES
    (1, 'Databases',     'Dr. Codd'),
    (2, 'Databases',     'Dr. Codd'),
    (3, 'Databases',     'Dr. Codd'),
    (1, 'Algorithms',    'Dr. Knuth'),
    (4, 'Algorithms',    'Dr. Knuth'),
    (2, 'Compilers',     'Dr. Aho');

Let's enumerate the functional dependencies:

(student_id, course) -> instructor — given a student and a course, the instructor is determined (because each course has one instructor).
(student_id, instructor) -> course — given a student and an instructor, the course is determined (because each instructor teaches exactly one course).
instructor -> course — given an instructor alone, the course is determined (rule 3 of the university).

The candidate keys are (student_id, course) and (student_id, instructor). They overlap on student_id. Every column appears in some candidate key, which means every column is a prime attribute.

Is this table in 3NF? Yes. The only "interesting" FD is instructor -> course. The right-hand side course is part of a candidate key, so 3NF's escape hatch (b) applies. 3NF is satisfied.

Is this table in BCNF? No. The FD instructor -> course has instructor on the left, and instructor alone is not a superkey — knowing only the instructor does not uniquely identify a row, because the same instructor appears in multiple rows (one per student). BCNF is violated.

+---------------------------------------------------------------+
|           THE ANOMALY 3NF DID NOT CATCH                        |
+---------------------------------------------------------------+
|                                                                |
|  class_enrollment:                                             |
|  +------------+-------------+--------------+                   |
|  | student_id | course      | instructor   |                   |
|  +------------+-------------+--------------+                   |
|  |     1      | Databases   | Dr. Codd     |  <-- Codd teaches |
|  |     2      | Databases   | Dr. Codd     |      Databases    |
|  |     3      | Databases   | Dr. Codd     |      stored 3x    |
|  |     1      | Algorithms  | Dr. Knuth    |                   |
|  |     4      | Algorithms  | Dr. Knuth    |                   |
|  |     2      | Compilers   | Dr. Aho      |                   |
|  +------------+-------------+--------------+                   |
|                                                                |
|  Update anomaly: Dr. Codd switches to teaching 'Data Systems'.|
|  -> You must update 3 rows. Miss one and the table claims    |
|     Codd teaches two different courses.                       |
|                                                                |
|  Insertion anomaly: A new instructor Dr. Lamport is hired to  |
|  teach 'Distributed Systems' but has no students yet.         |
|  -> You cannot record this fact without a fake student row.   |
|                                                                |
|  Deletion anomaly: Student 2 drops 'Compilers' (the last      |
|  Compilers row).                                              |
|  -> You lose the fact that Dr. Aho teaches Compilers at all.  |
|                                                                |
+---------------------------------------------------------------+

These are the exact same anomalies that normalization is supposed to prevent. 3NF missed them because the dependency instructor -> course happens to point at a prime attribute. BCNF catches them because BCNF does not care whether the right-hand side is prime — it only cares whether the left-hand side is a key.

Decomposing Into BCNF

The decomposition algorithm is mechanical. Find a non-trivial FD X -> Y where X is not a superkey, then split the table into two:

A table with X and everything X determines (the closure of X).
A table with X and everything else from the original table.

The shared column X becomes the join key between them.

For the class enrollment example, the violating FD is instructor -> course. So we split:

DROP TABLE class_enrollment;

-- Table 1: facts about instructors (instructor -> course)
CREATE TABLE instructor_course (
    instructor TEXT PRIMARY KEY,
    course     TEXT NOT NULL
);

-- Table 2: facts about which student studies under which instructor
CREATE TABLE student_instructor (
    student_id INT  NOT NULL,
    instructor TEXT NOT NULL,
    PRIMARY KEY (student_id, instructor),
    FOREIGN KEY (instructor) REFERENCES instructor_course(instructor)
);

INSERT INTO instructor_course (instructor, course) VALUES
    ('Dr. Codd',  'Databases'),
    ('Dr. Knuth', 'Algorithms'),
    ('Dr. Aho',   'Compilers');

INSERT INTO student_instructor (student_id, instructor) VALUES
    (1, 'Dr. Codd'),
    (2, 'Dr. Codd'),
    (3, 'Dr. Codd'),
    (1, 'Dr. Knuth'),
    (4, 'Dr. Knuth'),
    (2, 'Dr. Aho');

Now the original query — "what course is student 1 taking with whom?" — becomes a join:

SELECT si.student_id, ic.course, si.instructor
FROM student_instructor si
JOIN instructor_course   ic ON ic.instructor = si.instructor
WHERE si.student_id = 1
ORDER BY ic.course;

 student_id |   course   | instructor
------------+------------+------------
          1 | Algorithms | Dr. Knuth
          1 | Databases  | Dr. Codd
(2 rows)

Verify each new table is in BCNF:

instructor_course: the only non-trivial FD is instructor -> course. instructor is the primary key, so it is a superkey. BCNF holds.
student_instructor: the only candidate key is (student_id, instructor) and there are no other FDs. BCNF holds trivially.

Anomalies are now impossible. Dr. Codd switching courses is a one-row update. Hiring Dr. Lamport with no students is a one-row insert into instructor_course. Student 2 dropping Compilers does not delete the fact that Dr. Aho teaches Compilers, because that fact lives in instructor_course independently.

A Second Worked Example — Address Book

Consider a customer address table:

CREATE TABLE customer_address (
    customer_id INT  NOT NULL,
    street      TEXT NOT NULL,
    city        TEXT NOT NULL,
    zip_code    TEXT NOT NULL,
    PRIMARY KEY (customer_id)
);

INSERT INTO customer_address (customer_id, street, city, zip_code) VALUES
    (101, '1 Main St',    'Springfield', '01103'),
    (102, '5 Oak Ave',    'Springfield', '01103'),
    (103, '7 Pine Rd',    'Cambridge',   '02139'),
    (104, '12 Elm St',    'Cambridge',   '02139'),
    (105, '88 River Way', 'Boston',      '02108');

Functional dependencies:

customer_id -> street, city, zip_code (the primary key determines everything).
zip_code -> city (in the US, a ZIP code determines its city).

Is this in 3NF? Let's check the second FD. zip_code is not a superkey. Is city a prime attribute? No — city appears in no candidate key. So 3NF's escape hatch (b) does NOT apply, and customer_address is not even in 3NF. (This is the classic transitive dependency: customer_id -> zip_code -> city.)

So this example actually fails at 3NF, not just BCNF. Let's fix it once and observe that the fixed schema is BCNF too:

DROP TABLE customer_address;

CREATE TABLE zip_codes (
    zip_code TEXT PRIMARY KEY,
    city     TEXT NOT NULL
);

CREATE TABLE customer_address (
    customer_id INT  PRIMARY KEY,
    street      TEXT NOT NULL,
    zip_code    TEXT NOT NULL REFERENCES zip_codes(zip_code)
);

INSERT INTO zip_codes (zip_code, city) VALUES
    ('01103', 'Springfield'),
    ('02139', 'Cambridge'),
    ('02108', 'Boston');

INSERT INTO customer_address (customer_id, street, zip_code) VALUES
    (101, '1 Main St',    '01103'),
    (102, '5 Oak Ave',    '01103'),
    (103, '7 Pine Rd',    '02139'),
    (104, '12 Elm St',    '02139'),
    (105, '88 River Way', '02108');

SELECT ca.customer_id, ca.street, z.city, ca.zip_code
FROM customer_address ca
JOIN zip_codes        z ON z.zip_code = ca.zip_code
ORDER BY ca.customer_id;

 customer_id |    street    |    city     | zip_code
-------------+--------------+-------------+----------
         101 | 1 Main St    | Springfield | 01103
         102 | 5 Oak Ave    | Springfield | 01103
         103 | 7 Pine Rd    | Cambridge   | 02139
         104 | 12 Elm St    | Cambridge   | 02139
         105 | 88 River Way | Boston      | 02108
(5 rows)

In zip_codes, the only FD is zip_code -> city and zip_code is the primary key. BCNF. In customer_address, the only FD is customer_id -> street, zip_code and customer_id is the primary key. BCNF. The schema is now both 3NF and BCNF, and renaming Cambridge to "Cambridge, MA" is a one-row update.

This example shows that most real-world fixes are actually 3NF fixes, and BCNF only matters when the trickier overlapping-key situation arises.

When BCNF And Dependency Preservation Collide

There is one honest catch with BCNF, and interviewers love to ask about it.

Some schemas have a functional dependency that cannot be preserved under BCNF decomposition without joining tables. The textbook example is a course-room-time scheduling table where:

(course, time) -> room
room -> course

If you decompose to satisfy BCNF (the second FD violates it), the resulting tables make it impossible to enforce the first FD with a single-table constraint — you would need a multi-table check involving a join. This is called the non-dependency-preserving decomposition problem.

The trade-off:

Stay at 3NF (which you can always achieve with a dependency-preserving decomposition) and accept the small redundancy as the price for cheap constraint enforcement.
Go to BCNF and accept that some constraints can only be enforced via triggers, materialized views, or application code.

In practice, ninety-nine percent of schemas can reach BCNF without losing any dependencies, so this is mostly an edge case — but it is the reason 3NF, not BCNF, is the conventional default in textbooks. The rule of thumb most senior engineers follow: aim for BCNF, and only fall back to 3NF if BCNF would force you to give up a constraint that the database needs to enforce by itself.

Common Mistakes

1. Assuming 3NF and BCNF are the same thing. They are not. BCNF is strictly stronger. The difference shows up only when the table has overlapping candidate keys and a non-key column functionally determines part of a key. If you cannot picture that scenario, you cannot reason about BCNF correctly. Any time you have two candidate keys that share a column, slow down and check for hidden FDs.

2. Forgetting to enumerate all the candidate keys. Most BCNF violations are missed because the engineer only thought of the obvious primary key and never asked "is there another set of columns that is also unique?" The class-enrollment example is invisible until you notice that (student_id, instructor) is also unique. Always list every candidate key before checking BCNF.

3. Pushing every schema to BCNF regardless of cost. BCNF can sometimes destroy dependency preservation, forcing you to enforce business rules in application code. For a low-traffic schema with no real anomaly risk, this trade is bad. Stop at 3NF when BCNF would cost you a database-enforced constraint that actually matters. Engineering is about picking the right point on the curve, not the rightmost point.

4. Confusing prime attributes with primary key columns. A "prime attribute" is any column that belongs to any candidate key, not just the chosen primary key. In the class-enrollment example, course and instructor are both prime attributes because they each belong to some candidate key. Missing this leads to wrong 3NF analyses and wrong BCNF analyses.

5. Decomposing without checking that the new tables are in BCNF too. Decomposition is recursive. After splitting on one violating FD, the resulting tables may still violate BCNF on a different FD. Always re-verify BCNF on every output table, and split again if needed. The algorithm terminates because each step strictly reduces the number of attributes per table.

Interview Questions

1. "What is the precise difference between 3NF and BCNF?"

3NF and BCNF differ by exactly one rule. Both require that for every non-trivial functional dependency X -> Y in a table, X must be a superkey. 3NF, however, allows an escape hatch: if X is not a superkey, the FD is still acceptable as long as Y is a prime attribute (a column belonging to some candidate key). BCNF removes that escape hatch — there are no exceptions, every determinant must be a superkey, period. This means every BCNF table is automatically in 3NF, but not every 3NF table is in BCNF. The gap shows up only when a table has overlapping candidate keys and a non-key column functionally determines part of a key. In that situation, 3NF technically passes but update anomalies still exist; BCNF catches them and forces a decomposition.

2. "Show me an example of a table that is in 3NF but not in BCNF."

The classic example is a class-enrollment table with columns (student_id, course, instructor), where the rules are: each course has exactly one instructor, and each instructor teaches exactly one course. The candidate keys are (student_id, course) and (student_id, instructor). The hidden functional dependency is instructor -> course. The table is in 3NF because the right-hand side course is a prime attribute (it belongs to one of the candidate keys), satisfying 3NF's escape hatch. But the table is not in BCNF because the left-hand side instructor alone is not a superkey — the same instructor appears in many rows, one per student. The result is the same kind of redundancy and update anomalies that normalization is supposed to prevent: if Dr. Codd starts teaching a renamed course, you must update many rows, and a single missed row leaves the table inconsistent.

3. "How do you decompose a table to put it into BCNF?"

The algorithm is mechanical. First, enumerate all functional dependencies and all candidate keys. Find any non-trivial FD X -> Y where X is not a superkey — this is the BCNF violation. Then split the table into two: one new table containing X and everything that X functionally determines (the attribute closure of X), and a second new table containing X plus all the other attributes of the original table. The shared column X becomes the join key between them, and in PostgreSQL you would model it as a primary key in the first table and a foreign key in the second. After the split, verify that both new tables are in BCNF. If either still has a violating FD, recurse. The algorithm terminates because each step strictly reduces the number of attributes in at least one table.

4. "What is the trade-off between BCNF and dependency preservation?"

A decomposition is dependency preserving if every functional dependency from the original table can be enforced on one of the new tables without joining them back together. 3NF decomposition is always both lossless and dependency preserving — that is a guarantee. BCNF decomposition is always lossless, but it is sometimes not dependency preserving. The classic case is a schema with FDs (course, time) -> room and room -> course: any BCNF decomposition leaves you unable to enforce the first FD on a single table, so you would need a trigger, materialized view, or application-level check to maintain it. The trade-off is: BCNF gives you stricter anomaly prevention, but in rare cases it costs you the ability to enforce a business rule purely in the database. Most engineers aim for BCNF by default and only fall back to 3NF when BCNF would force them to give up a database-enforced constraint that actually matters in production.

5. "If 3NF is usually good enough, why does BCNF exist at all?"

BCNF exists because 3NF leaves a real loophole, and in any schema where that loophole opens, the resulting anomalies are exactly the ones normalization is supposed to eliminate. The loophole is small — it requires overlapping candidate keys, which most everyday schemas do not have — but when it shows up, 3NF gives you false confidence. The table looks normalized, the textbook says it passes, and yet a single instructor change forces a multi-row update with a real risk of inconsistency. BCNF is the stricter rule that closes that loophole by requiring every determinant to be a candidate key, no exceptions. The reason textbooks treat 3NF as the practical default is that achieving 3NF is always possible without sacrificing dependency preservation, while BCNF occasionally is not. So the honest answer is: aim for BCNF when it is free, accept 3NF when BCNF would cost you a database-enforced constraint, and always know which one you actually have.

Quick Reference — Cheat Sheet

+---------------------------------------------------------------+
|           NORMAL FORMS COMPARED                                |
+---------------------------------------------------------------+
|                                                                |
|  1NF:  Atomic columns, no repeating groups, no arrays         |
|        in a single cell.                                      |
|                                                                |
|  2NF:  1NF + every non-key attribute depends on the WHOLE     |
|        primary key (no partial dependencies on a              |
|        composite key).                                        |
|                                                                |
|  3NF:  2NF + no transitive dependencies. Every non-key        |
|        attribute depends on the key, the whole key, and       |
|        nothing but the key.                                   |
|                                                                |
|  BCNF: 3NF + every determinant must be a superkey.            |
|        No exception for prime attributes.                     |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           BCNF RULE IN ONE LINE                                |
+---------------------------------------------------------------+
|                                                                |
|  For every non-trivial FD X -> Y in the table,                |
|  X must be a SUPERKEY of the table.                           |
|                                                                |
|  Equivalently: every determinant is a candidate key.          |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           DECOMPOSITION RECIPE                                 |
+---------------------------------------------------------------+
|                                                                |
|  1. List all functional dependencies.                          |
|  2. List all candidate keys.                                   |
|  3. Find an FD X -> Y where X is NOT a superkey.              |
|  4. Split:                                                     |
|       Table A:  X + closure(X)                                 |
|       Table B:  X + (original attrs - closure(X) + X)          |
|  5. Recurse on each new table until BCNF holds.                |
|  6. Verify decomposition is lossless (it always is by         |
|     this algorithm).                                          |
|  7. Check whether all original FDs are still enforceable      |
|     on a single table. If not, decide whether to              |
|     fall back to 3NF.                                         |
|                                                                |
+---------------------------------------------------------------+

Property	1NF	2NF	3NF	BCNF
Atomic values	yes	yes	yes	yes
No partial dep on key	-	yes	yes	yes
No transitive dep	-	-	yes	yes
Every det is a key	-	-	-	yes
Always achievable	yes	yes	yes	yes
Always dep preserving	yes	yes	yes	no
Always lossless	yes	yes	yes	yes

Question	If yes ->	If no ->
Are columns atomic?	check 2NF	fix to 1NF
Do all non-key cols depend on whole key?	check 3NF	fix to 2NF
Any transitive dependencies?	check BCNF	fix to 3NF
Is every determinant a superkey?	BCNF	decompose

Prev: Lesson 10.4 -- Third Normal Form Next: Lesson 11.1 -- What Is a Transaction?

This is Lesson 10.5 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

Boyce-Codd Normal Form (BCNF)