Composite Key and Super Key
Multi-Column Identity and the Full Key Hierarchy
LinkedIn Hook
"Your primary key is a single
idcolumn. Your junction table has two. Your reporting query joins six tables on three-column keys. Do you actually understand what a composite key is — or are you just copying Stack Overflow answers?"Most developers can recite "primary key" and "foreign key" in their sleep. Ask them to explain the difference between a super key and a candidate key, or why a composite primary key on a junction table is often better than a surrogate
id, and the room goes quiet. The vocabulary of keys is not academic trivia — it is the language the database uses to reason about identity, uniqueness, and referential integrity.A super key is any set of columns that uniquely identifies a row — even if that set includes useless extras. A candidate key is a minimal super key with no redundant columns. A composite key is simply any key (candidate, primary, or otherwise) made of more than one column. These are not competing concepts; they are nested layers of the same idea: super key contains candidate key contains primary key.
Get this hierarchy wrong and you end up with junction tables that allow duplicate rows, composite indexes that never get used because the column order is wrong, and foreign keys that fall apart the moment someone updates a "natural" identifier. Get it right and your schema enforces real-world constraints the application layer cannot violate.
In Lesson 2.4, I break down composite keys and super keys: the formal definitions, the set-theoretic relationship between them, when to declare a composite primary key on a junction table, when to reach for a surrogate instead, and how composite column order determines whether your indexes actually do anything.
Read the full lesson -> [link]
#Database #SQL #DataModeling #CompositeKey #BackendDevelopment #InterviewPrep
What You'll Learn
- The formal definition of a super key and why it can contain redundant columns
- How composite keys differ from single-column keys and where they naturally appear
- The nested relationship: super key ⊇ candidate key ⊇ primary key
- When to declare a composite primary key on a junction table with real SQL
- Why composite index column order determines whether queries use the index
- How composite keys behave when used as the target of a foreign key
- The tradeoff between composite natural keys and surrogate
idcolumns - Common mistakes: confusing composite keys with compound indexes, bad column order, and cascading join pain
The Student ID Card Analogy — When One Column Is Not Enough
Imagine you work at a university registration office. A student walks up and says "I am Alex." Is that enough to find their record? No — there are forty students named Alex across the campus. "I am Alex Chen." Better, but there are still three. "I am Alex Chen, born March 14, 2003." Now the record is unique — first name plus last name plus date of birth together identify exactly one student.
No single one of those fields is enough. First name alone is not unique. Last name alone is not unique. Even first and last together are not unique. Only the combination of all three pins down the student. That combination is a composite key — a key made of more than one column, where only the whole set has the uniqueness property.
Now suppose you add the student's home city to that set: (first_name, last_name, date_of_birth, home_city). This four-column combination is still unique — of course it is, because the first three columns were already unique, and adding more columns can only make things more specific, not less. But the home city is doing no work. Remove it and uniqueness is preserved. This larger set is still a super key, but it is not a candidate key because it is not minimal.
And if the university issues each student a nine-digit ID card, then student_id alone is unique. That single column is also a super key (a trivial one), a candidate key (it is minimal), and usually the one chosen as the primary key. Meanwhile (first_name, last_name, date_of_birth) remains a valid candidate key that the registrar could have chosen instead — it just did not win the primary key election.
+---------------------------------------------------------------+
| THE STUDENT IDENTITY HIERARCHY |
+---------------------------------------------------------------+
| |
| SUPER KEY (any uniquely-identifying set, extras allowed): |
| - {student_id} |
| - {student_id, first_name} |
| - {first_name, last_name, date_of_birth} |
| - {first_name, last_name, date_of_birth, home_city} |
| - {first_name, last_name, date_of_birth, shoe_size, ...} |
| |
| CANDIDATE KEY (minimal super key, no redundant column): |
| - {student_id} |
| - {first_name, last_name, date_of_birth} |
| |
| PRIMARY KEY (the one candidate chosen as the PK): |
| - {student_id} |
| |
| COMPOSITE KEY (any key with > 1 column, at any level): |
| - {first_name, last_name, date_of_birth} |
| - {first_name, last_name, date_of_birth, home_city} |
| |
+---------------------------------------------------------------+
This is the mental model to carry into every schema design conversation. Super key, candidate key, and primary key are not three unrelated ideas — they are three nested rings. Composite is an orthogonal adjective: it describes how many columns a key has, not which ring it sits in.
Formal Definitions
Let R be a relation (table) with a set of attributes (columns) A = {a1, a2, ..., an}. A super key of R is any subset K of A such that no two rows of R can share the same values for all attributes in K. In plain English: a super key is any combination of columns that is guaranteed unique across every row.
A candidate key is a super key K such that no proper subset of K is also a super key. That is, if you remove any single column from a candidate key, uniqueness is lost. Candidate keys are the minimal super keys.
A primary key is one candidate key chosen by the schema designer to be the canonical row identifier. Every table has at most one primary key, even though it may have several candidate keys. The primary key is also used by most databases as the default clustering key and as the natural target for foreign keys.
A composite key is any key — super, candidate, or primary — that consists of two or more columns. The word "composite" only describes arity (how many columns), not role. A single-column primary key is not composite; a two-column primary key is composite. A five-column super key is composite. A six-column candidate key is composite.
+---------------------------------------------------------------+
| THE SET DIAGRAM (ASCII VENN) |
+---------------------------------------------------------------+
| |
| +--------------------------------------------------+ |
| | SUPER KEYS | |
| | (any uniquely-identifying set) | |
| | | |
| | +------------------------------------+ | |
| | | CANDIDATE KEYS | | |
| | | (minimal super keys) | | |
| | | | | |
| | | +------------------------+ | | |
| | | | PRIMARY KEY | | | |
| | | | (the chosen candidate) | | | |
| | | +------------------------+ | | |
| | | | | |
| | | (other candidates = alternate) | | |
| | +------------------------------------+ | |
| | | |
| | (non-minimal super keys live out here) | |
| +--------------------------------------------------+ |
| |
| SUPER KEY ⊇ CANDIDATE KEY ⊇ PRIMARY KEY |
| |
| "Composite" is ORTHOGONAL — it describes arity, |
| not which ring the key lives in. |
| |
+---------------------------------------------------------------+
Notice that every primary key is a candidate key, and every candidate key is a super key, but not vice versa. Notice also that "composite" is not a ring in this diagram — it is a property any key in any ring can have.
Worked Example — A Student Enrollment Table
The archetypal place to meet a composite primary key is a junction table (also called an associative table or bridge table) that models a many-to-many relationship. A student can enroll in many courses, and a course can enroll many students. You cannot model this with a foreign key on either students or courses — you need a third table that holds one row per (student, course) pair.
-- Parent tables: each has its own surrogate primary key
CREATE TABLE students (
student_id BIGSERIAL PRIMARY KEY,
first_name TEXT NOT NULL,
last_name TEXT NOT NULL,
email TEXT NOT NULL UNIQUE
);
CREATE TABLE courses (
course_id BIGSERIAL PRIMARY KEY,
course_code TEXT NOT NULL UNIQUE, -- e.g. 'CS101'
title TEXT NOT NULL
);
-- Junction table with a COMPOSITE PRIMARY KEY on (student_id, course_id).
-- The PK is the pair itself — no surrogate id column.
CREATE TABLE enrollments (
student_id BIGINT NOT NULL,
course_id BIGINT NOT NULL,
enrolled_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
grade TEXT,
-- Composite primary key: the pair must be unique across the table.
-- This single declaration enforces "a student cannot enroll in the
-- same course twice" at the database level. The application layer
-- cannot violate it, even with a race condition.
PRIMARY KEY (student_id, course_id),
-- Each component of the composite PK is also a foreign key to the
-- corresponding parent table. The composite PK itself is NOT a
-- foreign key — the two columns are FKs independently.
FOREIGN KEY (student_id) REFERENCES students(student_id)
ON DELETE CASCADE,
FOREIGN KEY (course_id) REFERENCES courses(course_id)
ON DELETE CASCADE
);
A few things to notice. First, the composite primary key does real work: it prevents a student from enrolling in the same course twice. Without it, you would need a separate UNIQUE (student_id, course_id) constraint (which is valid but redundant), or worse, an application-level check that breaks under concurrent inserts. Second, the order of columns inside PRIMARY KEY (student_id, course_id) matters — not for uniqueness, but for the index the database silently builds to enforce the constraint.
Composite Index Column Order — The Left-Prefix Rule
When you declare PRIMARY KEY (student_id, course_id), PostgreSQL, MySQL, and most other relational databases build a B-tree index on the ordered pair. That index supports fast lookups for:
WHERE student_id = 42 AND course_id = 7— full key matchWHERE student_id = 42— leading column only (a "left prefix")
But it does not efficiently support:
WHERE course_id = 7— skips the leading column
This is the famous left-prefix rule: a composite B-tree index can only be used for queries that filter on a contiguous prefix of the index columns, starting from the leftmost. If your most common query is "list all courses a given student is enrolled in," put student_id first. If the most common query is "list all students in a given course," put course_id first — or keep student_id first and add a separate index on course_id alone.
-- Query A: "What courses is student 42 taking?"
-- Uses the PK index efficiently (left prefix on student_id).
SELECT course_id FROM enrollments WHERE student_id = 42;
-- Query B: "Who is enrolled in course 7?"
-- With PK (student_id, course_id), the PK index does NOT help.
-- Add a secondary index to support this access pattern:
CREATE INDEX idx_enrollments_course ON enrollments (course_id);
SELECT student_id FROM enrollments WHERE course_id = 7;
The rule of thumb: put the most-filtered column first in the composite PK, and add secondary indexes for the other access patterns. Choosing column order is a design decision, not a cosmetic one.
Composite Keys as Foreign Key Targets
A composite primary key can be the target of a foreign key — but the child table must reference all the columns of the composite PK together. This is where composite keys start to hurt deeply-nested schemas.
-- Suppose we want to record assignments submitted for each enrollment.
-- The parent table (enrollments) has a composite PK (student_id, course_id),
-- so the child must carry BOTH columns as the foreign key.
CREATE TABLE assignment_submissions (
student_id BIGINT NOT NULL,
course_id BIGINT NOT NULL,
assignment_id BIGINT NOT NULL,
submitted_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
score NUMERIC(5,2),
PRIMARY KEY (student_id, course_id, assignment_id),
-- The FK references the PAIR in the parent, not just one column.
FOREIGN KEY (student_id, course_id)
REFERENCES enrollments (student_id, course_id)
ON DELETE CASCADE
);
Already this table's primary key has three columns. Add one more level of nesting (say, submission_feedback referencing assignment_submissions) and you carry four columns in the child PK. Every join must match on all of them, every foreign key must list all of them, and every index must accommodate their combined width. This is the price of composite natural keys in deep hierarchies, and it is exactly why surrogate keys exist.
When Composite Is Better, and When Surrogate Wins
Composite primary keys are the right choice when:
- Junction tables modelling many-to-many relationships. The pair
(student_id, course_id)is the natural identity of an enrollment, nothing else is needed, and theUNIQUEconstraint is free. - Natural compound identifiers that the real world already treats as keys. An
(order_id, line_number)pair on order lines. A(country_code, postal_code)pair on address zones. A(tenant_id, username)pair in multi-tenant systems. - Tables that will never be the parent of a deeply-nested hierarchy. If nothing will ever foreign-key to this table, carrying two columns in the PK costs nothing downstream.
Surrogate keys (a single BIGSERIAL id) win when:
- The table is the parent of a deep FK chain. Every level below pays the cost of carrying the composite columns. A single
idkeeps child tables narrow. - The "natural" key can change. Email addresses, usernames, and ISBNs all look stable until they are not. A surrogate
idnever changes, so child FKs never have to cascade-update. - ORMs or frameworks assume single-column PKs. Many ORMs (older Hibernate, ActiveRecord, Sequelize) have first-class support for
idcolumns and awkward support for composite PKs — strings of boilerplate, generated class annotations, broken auto-relations. - You need a stable, opaque external identifier. UUIDs and auto-increment ids are safe to expose in URLs; real-world compound keys may leak information.
A common compromise on junction tables is to use both: declare a surrogate id as the primary key for ORM friendliness, and add a UNIQUE (student_id, course_id) constraint to preserve the real-world uniqueness rule. You get narrow child references and the integrity guarantee of a composite key at the cost of one extra column and one extra index.
+---------------------------------------------------------------+
| COMPOSITE vs SURROGATE — DECISION TABLE |
+---------------------------------------------------------------+
| |
| Situation | Pick |
| ----------------------------------|---------------------------|
| Pure junction table, no children | Composite PK |
| Junction + ORM that hates it | Surrogate + UNIQUE(a,b) |
| Deep FK chain expected | Surrogate PK |
| Natural key might change | Surrogate PK |
| Need external/public identifier | Surrogate (UUID) |
| Compound real-world identity | Composite PK |
| Multi-tenant row scoping | Composite (tenant_id,...)|
| |
+---------------------------------------------------------------+
Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). LEFT side: three concentric rounded rectangles labelled 'SUPER KEY' (outer, rose #ff5c8a outline), 'CANDIDATE KEY' (middle, sky blue #4fc3f7 outline), 'PRIMARY KEY' (inner, filled sky blue). Arrow labelled 'minimal' pointing from SUPER to CANDIDATE; arrow labelled 'chosen' pointing from CANDIDATE to PRIMARY. RIGHT side: a junction table diagram with a 'students' table and a 'courses' table linked through an 'enrollments' table whose header highlights 'student_id + course_id' as a single composite PK in rose. Bottom caption in white monospace: 'Super key ⊇ Candidate key ⊇ Primary key. Composite = multi-column.'"
Common Mistakes
1. Confusing a composite key with a compound index. A composite key is a constraint (uniqueness + identity) declared on a set of columns. A compound index is a data structure (typically B-tree) built on a set of columns to speed up queries. Every composite primary key automatically creates a compound index to enforce itself, but the reverse is not true: you can create compound indexes on columns that are not unique, have nothing to do with identity, and have no constraint attached. Saying "I added a composite key" when you really added a non-unique compound index confuses teammates and muddies design reviews. Use the right vocabulary: key for constraints, index for the structure that supports them.
2. Declaring a composite PK and then joining five tables through it.
The enrollment example above already carries two columns through every child table. By the time you reach submission_feedback -> submission -> assignment -> enrollment, you are joining on four columns and your queries look like ON a.student_id = b.student_id AND a.course_id = b.course_id AND a.assignment_id = b.assignment_id. Every line of every query is polluted with join fields that exist only because the root table chose a composite natural key. If you can foresee this depth during design, use a surrogate id on the junction table and keep the UNIQUE (student_id, course_id) constraint to preserve semantics. Your future self will thank you.
3. Wrong column order in the composite PK or index.
Because composite indexes obey the left-prefix rule, the column order of a composite primary key is a query-plan decision, not a stylistic one. If you declare PRIMARY KEY (course_id, student_id) but your hottest query is WHERE student_id = ?, you are sending every query to a full index scan. Pick the column order based on the queries you actually run: put the most-filtered, most-selective column first. If you genuinely need both directions, declare the PK with one order and add a secondary index CREATE INDEX ... ON enrollments (course_id, student_id) for the other. Do not leave column order to alphabetical habit.
4. Treating "composite" and "super key" as synonyms for "weird". Juniors often think of composite keys and super keys as exotic animals that only appear in textbooks. In reality every junction table has a composite key, every row in a normalized schema has multiple super keys (any column set containing a candidate key is one), and the vocabulary shows up in every competent schema review. Not knowing the difference between "super", "candidate", "primary", and "composite" is a red flag in senior interviews — not because the terms are clever, but because they are the minimum shared language for discussing identity and uniqueness.
Interview Questions
1. "Explain the difference between a super key, a candidate key, and a primary key. How does a composite key fit into that hierarchy?"
A super key is any set of columns whose combined values are unique across every row of a table. That set is allowed to contain redundant columns — if {a} is already unique, then {a, b}, {a, b, c}, and {a, b, c, d} are all also super keys. A candidate key is a minimal super key: removing any column from it destroys uniqueness. Every table typically has one or more candidate keys. A primary key is the one candidate key that the schema designer has chosen to be the canonical identifier and the default target for foreign keys; a table has at most one primary key even when several candidates exist. The relationship is a strict nesting: every primary key is a candidate key, every candidate key is a super key, but not every super key is a candidate key and not every candidate key is the primary key. "Composite" is an orthogonal adjective that simply means "made of more than one column." A composite key can live in any of those three rings — a composite super key, a composite candidate key, or a composite primary key — and the same table can have all three at once.
2. "Why is a composite primary key a good choice for a junction table, and when would you avoid it?"
A junction table models a many-to-many relationship by storing one row per associated pair, such as (student_id, course_id) in an enrollments table. The pair itself is the natural identity of the row, so declaring PRIMARY KEY (student_id, course_id) does two useful things in one line: it establishes the row identifier, and it enforces the rule that the same pair cannot appear twice — a student cannot enroll in the same course twice — at the database level, immune to application race conditions. You get the semantics and the uniqueness constraint for free, without an extra column or extra index. The time to avoid a composite PK on a junction table is when the junction will itself be the parent of child tables: every level beneath will have to carry both composite columns through its own foreign key, and deep hierarchies end up joining on four or five columns per query. In that case declare a surrogate BIGSERIAL id as the primary key to keep child FKs narrow, and add UNIQUE (student_id, course_id) to preserve the real-world uniqueness rule. ORM friendliness is another reason to prefer the surrogate: many ORMs assume single-column PKs and handle composite keys awkwardly.
3. "Why does the column order of a composite primary key matter, and how does it interact with indexing?"
A composite primary key is enforced by a compound B-tree index on the columns in the order they are declared, and B-tree indexes obey the left-prefix rule: the index can efficiently serve queries that filter on a contiguous prefix of the indexed columns, starting from the leftmost. If your PK is (student_id, course_id), the index helps WHERE student_id = ? and WHERE student_id = ? AND course_id = ?, but it does not efficiently help WHERE course_id = ? — the leading column is missing, so the database has to scan. Uniqueness is unaffected by order: (a, b) and (b, a) enforce the same constraint. But query performance is not, and query performance is usually what matters. The correct way to choose column order is to look at the actual access patterns: put the most commonly filtered column first. If the table is filtered on both columns independently, declare the PK in one order and add a secondary index in the other, or consider whether a surrogate PK plus two separate single-column indexes is actually simpler.
4. "What does it mean to say that {first_name, last_name, date_of_birth, home_city} is a super key but not a candidate key for a students table?"
It means the combined values of those four columns are unique across every student — no two students share the same first name, last name, date of birth, and home city — so that set satisfies the super-key definition. But it fails the minimality test for candidate keys: you can remove home_city from the set and the remaining three columns are still unique on their own (assuming no two students with identical names happen to share a birthday, which is the real-world assumption being made here). Because a proper subset of the set is itself a super key, the larger set is not minimal, therefore not a candidate key. The candidate key is the three-column subset {first_name, last_name, date_of_birth}. The four-column set is still a valid super key — the database would happily enforce uniqueness on it if you wrote the constraint — but it carries an unnecessary column whose only effect is to bloat any index and make child foreign keys wider. That is exactly why the formal definition distinguishes the two concepts: the superset is valid but wasteful, the minimal set is the one worth naming.
5. "A coworker says 'I added a composite key to speed up this query.' What do you ask them to clarify?"
The phrasing conflates two different things. A composite key is an integrity constraint — a primary key, unique constraint, or candidate key — built from more than one column, whose job is to enforce row identity or uniqueness. A composite (or compound) index is a data structure built on more than one column, whose job is to accelerate queries that filter or sort on those columns. Every composite key is backed by a compound index, but you can create compound indexes that are not unique, are not primary keys, and have no constraint attached at all — these are pure performance features with no semantic meaning. So the clarifying question is: "Did you add a uniqueness or identity constraint, or did you just add a non-unique index for performance?" If they mean the former, you need to review whether the new constraint matches real-world business rules and whether existing data satisfies it. If they mean the latter, you need to review whether the column order matches the query's filter predicate (left-prefix rule) and whether the index is actually being chosen by the planner. The right vocabulary changes the entire review.
Quick Reference — Composite and Super Key Cheat Sheet
+---------------------------------------------------------------+
| KEY HIERARCHY (VENN) |
+---------------------------------------------------------------+
| |
| +--------------------------------------------------+ |
| | SUPER KEYS | |
| | +------------------------------------+ | |
| | | CANDIDATE KEYS | | |
| | | +------------------------+ | | |
| | | | PRIMARY KEY | | | |
| | | +------------------------+ | | |
| | | (others = alternate keys) | | |
| | +------------------------------------+ | |
| | (non-minimal super keys live out here) | |
| +--------------------------------------------------+ |
| |
| SUPER ⊇ CANDIDATE ⊇ PRIMARY |
| "Composite" = arity > 1 (any ring) |
| |
+---------------------------------------------------------------+
+---------------------------------------------------------------+
| COMPOSITE PK RULES OF THUMB |
+---------------------------------------------------------------+
| |
| 1. Junction tables want composite PKs by default |
| 2. Column order follows the hottest query's filter |
| 3. Secondary index for the other access pattern |
| 4. Deep FK chain -> switch to surrogate + UNIQUE(a,b) |
| 5. Every composite PK = one compound index for free |
| 6. FK to a composite PK must list all columns |
| 7. Composite key != compound index (constraint vs structure) |
| 8. "Super key" is not an insult — it is a valid superset |
| |
+---------------------------------------------------------------+
| Key Type | Minimal? | Multi-column allowed? | How many per table? | Purpose |
|---|---|---|---|---|
| Super key | No (may have extras) | Yes | Many | Formal definition of "uniquely identifies a row" |
| Candidate key | Yes | Yes | One or more | Minimal uniqueness — eligible to become PK |
| Primary key | Yes (it is a candidate) | Yes (then composite) | Exactly one | Chosen canonical identifier, default FK target |
| Alternate key | Yes (other candidates) | Yes | Zero or more | Candidates not chosen as PK, often UNIQUE |
| Composite key | Depends on ring | Yes (by definition > 1) | Any | Multi-column variant of any of the above |
| Concern | Composite Natural PK | Surrogate PK + UNIQUE(a,b) |
|---|---|---|
| Junction table, no children | Preferred | Overkill |
| Deep FK hierarchy | Painful (wide children) | Preferred |
| ORM compatibility | Often awkward | Native support |
| Enforce real-world uniqueness | Built into PK | Needs explicit UNIQUE |
| Index cost | One compound index | One PK index + one UNIQUE |
| Changing identifier values | Cascades everywhere | PK never changes |
| Public/external identifier | Leaks real data | Opaque id is safe |
Prev: Lesson 2.3 -- Unique and Candidate Key Next: Lesson 2.5 -- Surrogate, Natural, Alternate Key
This is Lesson 2.4 of the Database Interview Prep Course -- 12 chapters, 58 lessons.