LinkedIn Hook

"Your startup stored user accounts in a CSV file. It worked perfectly -- until the second person signed up at the same time and one of them vanished."

Every developer eventually meets the flat-file graveyard. A JSON file for users, a text file for orders, a spreadsheet for inventory. It feels simple and fast, right up until two processes write at once, a line gets corrupted mid-flush, and nobody can answer the question "how many orders did we ship last Tuesday?" without a 40-line script that takes four minutes.

A database is not a fancier file. It is a completely different category of tool -- one that solves concurrency, durability, integrity, and querying as first-class concerns. Relational databases like PostgreSQL and MySQL organise data into tables with enforced schemas, run thousands of transactions per second without losing a byte, and answer complex questions in milliseconds through an optimiser you never have to write.

If you cannot explain what a database gives you that a file cannot, you cannot explain why your architecture looks the way it does. Every backend interview eventually lands here, and the candidates who handwave it lose the offer to the ones who can say "ACID, concurrency control, declarative queries, and enforced schemas" without blinking.

In Lesson 1.1, I break down what a database actually is: relational vs non-relational, the RDBMS model, tables/rows/columns/schema, and the concrete reasons flat files break at the first sign of real load.

Read the full lesson -> [link]

#Database #PostgreSQL #SQL #BackendDevelopment #InterviewPrep #SoftwareEngineering

What Is a Database? thumbnail

What You'll Learn

What a database actually is, and how it differs from a DBMS and an RDBMS
The relational model: tables, rows, columns, and the role of a schema
Why flat files collapse under concurrency, integrity, and query pressure
A runnable PostgreSQL example: CREATE TABLE, INSERT, and SELECT with sample output
A preview of ACID and why transactions are the quiet superpower of databases
How SQL and NoSQL compare at a high level, and when each one earns its keep
The vocabulary you need to survive every follow-up question in a backend interview

The Library Analogy -- Why a Database Is Not Just a Bigger File

Picture a small neighbourhood library in its first month. The librarian keeps one notebook on the front desk. Every time a book is borrowed, they scribble a line: "The Hobbit - John - 2026-04-13". It works for a week. Then two patrons arrive at once, the librarian writes half of John's name, gets interrupted, and finishes it for Jane instead. Later, someone asks "which books has John borrowed in the last year?" and the librarian has to flip through every page of the notebook by hand.

Now picture the same library a year later. There is a card catalogue with one drawer per category, a borrowing register with strict columns (book ID, patron ID, date out, date due), a rule that every card must reference a real book and a real patron, and a procedure that ensures two clerks cannot scribble on the same card at the same time. Answering "which books has John borrowed?" takes ten seconds because the register is sorted by patron. Nothing is lost, nothing is corrupted, nothing contradicts itself.

That is exactly the difference between a flat file and a database. The notebook is a file: simple, linear, and utterly defenceless against concurrency, corruption, and queries. The card catalogue plus register plus rules is a database: a disciplined structure, enforced constraints, concurrent access, and a query language that turns "find all X where Y" from a chore into a one-liner. Your code is the librarian -- and a librarian with a proper catalogue will always beat one with a notebook, no matter how fast they write.

+---------------------------------------------------------------+
|           FLAT FILE (The Notebook)                             |
+---------------------------------------------------------------+
|                                                                |
|  users.csv                                                     |
|  id,name,email,signup                                          |
|  1,John,john@x.com,2026-01-02                                  |
|  2,Jane,jane@x.com,2026-01-03                                  |
|  3,Bob ,bob@x.com ,2026-                    <- half-written   |
|                                                                |
|  Problems:                                                     |
|   - Two writers corrupt each other mid-append                  |
|   - No type checking: "2026-" is accepted as a date            |
|   - "find users who signed up in Q1" -> scan the whole file    |
|   - Rename a user -> rewrite the entire file                   |
|   - Crash mid-write -> file is torn, data is lost              |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           DATABASE (The Card Catalogue)                        |
+---------------------------------------------------------------+
|                                                                |
|  users TABLE (enforced schema)                                 |
|  +----+-------+--------------+------------+                    |
|  | id | name  | email        | signup     |                    |
|  +----+-------+--------------+------------+                    |
|  | 1  | John  | john@x.com   | 2026-01-02 |                    |
|  | 2  | Jane  | jane@x.com   | 2026-01-03 |                    |
|  +----+-------+--------------+------------+                    |
|                                                                |
|  Guarantees:                                                   |
|   - Types enforced: signup must be a valid DATE                |
|   - Concurrent writes serialised by the engine                 |
|   - Indexes make Q1 lookups a millisecond scan                 |
|   - Transactions: crash recovery is automatic                  |
|   - Declarative queries: "SELECT * WHERE signup >= '2026-01'"  |
|                                                                |
+---------------------------------------------------------------+

DBMS vs RDBMS -- Getting the Vocabulary Right

Three words get thrown around interchangeably in interviews, and confusing them is an instant credibility hit.

A database is the organised collection of data itself -- the tables, the rows, the indexes, the on-disk bytes. It is the "thing" being managed.

A DBMS (Database Management System) is the software that manages that data: it handles storage, concurrency, queries, backups, and access control. MongoDB, Redis, SQLite, and PostgreSQL are all DBMSs, even though the shape of the data they manage is very different.

An RDBMS (Relational Database Management System) is a DBMS that organises data specifically into relations -- the mathematical term for tables -- and speaks SQL. PostgreSQL, MySQL, SQL Server, Oracle, and SQLite are all RDBMSs. The "relational" part is not about foreign keys between tables (that is a common misconception); it refers to the relational model from E. F. Codd's 1970 paper, where a relation is a set of tuples conforming to a fixed schema.

+---------------------------------------------------------------+
|           THE HIERARCHY                                        |
+---------------------------------------------------------------+
|                                                                |
|   DATABASE                                                     |
|    \__ the data itself (tables, rows, indexes on disk)         |
|                                                                |
|   DBMS                                                         |
|    \__ software managing one or more databases                 |
|    \__ examples: MongoDB, Redis, PostgreSQL, SQLite            |
|                                                                |
|   RDBMS                                                        |
|    \__ a DBMS that follows the relational model + SQL          |
|    \__ examples: PostgreSQL, MySQL, SQL Server, Oracle         |
|                                                                |
|   A PostgreSQL server (DBMS) hosts many databases, each of     |
|   which contains many tables, each of which holds many rows.   |
|                                                                |
+---------------------------------------------------------------+

When a recruiter asks "what database do you use?" the honest answer is usually "PostgreSQL, which is an RDBMS, and inside it we have a database called app_prod with about forty tables." Precision here signals seniority.

Tables, Rows, Columns, and the Schema

The relational model is built on four interlocking ideas, and every SQL concept you will ever learn is just a variation on them.

A table (formally, a relation) is a two-dimensional grid of data about one kind of thing -- users, orders, products, events. Every table has a name and a fixed set of columns.

A column (formally, an attribute) is a named, typed slot. email TEXT, age INTEGER, created_at TIMESTAMP. Every value in that column must match the declared type; the database refuses anything that does not fit.

A row (formally, a tuple) is one record -- one user, one order, one event. It has exactly one value for every column in the table, and rows inside a table are unordered by definition (you get them back in the order you asked for, via ORDER BY, not the order they were inserted).

A schema is the formal declaration of the table: its name, its columns, their types, and the constraints (NOT NULL, UNIQUE, PRIMARY KEY, CHECK, FOREIGN KEY). The schema is a contract. Once it is in place, the database physically refuses to store data that violates it. This is a massive win over flat files, where the only "schema" is whatever your application code happens to remember to validate.

+---------------------------------------------------------------+
|           ANATOMY OF A TABLE                                   |
+---------------------------------------------------------------+
|                                                                |
|   table name: users                                            |
|                                                                |
|   +----+--------+------------------+---------------------+     |
|   | id | name   | email            | created_at          |     |
|   +----+--------+------------------+---------------------+     |
|   | 1  | John   | john@example.com | 2026-01-02 09:14:00 | <- row
|   | 2  | Jane   | jane@example.com | 2026-01-03 17:02:11 |     |
|   | 3  | Bob    | bob@example.com  | 2026-02-14 08:00:00 |     |
|   +----+--------+------------------+---------------------+     |
|      ^      ^            ^                    ^               |
|      |      |            |                    |               |
|    column column       column              column             |
|                                                                |
|   schema: (id INT PK, name TEXT NOT NULL,                      |
|            email TEXT UNIQUE, created_at TIMESTAMP)            |
|                                                                |
+---------------------------------------------------------------+

The schema is why "twenty-nine" can never accidentally land in an age INTEGER column, why duplicate emails are rejected before they corrupt your login flow, and why a NOT NULL column guarantees downstream code never has to handle a surprise missing value.

A Runnable Example -- PostgreSQL in Action

Enough theory. Here is a complete, runnable PostgreSQL script that creates a table, inserts data, and queries it.

-- schema.sql
-- Create a users table with a primary key, a unique email,
-- and a timestamp column with a server-side default.
CREATE TABLE users (
    id          SERIAL PRIMARY KEY,           -- auto-incrementing ID
    name        TEXT        NOT NULL,         -- required text column
    email       TEXT        NOT NULL UNIQUE,  -- required and unique
    age         INTEGER     CHECK (age >= 0), -- must be non-negative
    created_at  TIMESTAMP   NOT NULL DEFAULT NOW()
);
-- In MySQL the equivalent would use AUTO_INCREMENT instead of SERIAL
-- and DATETIME instead of TIMESTAMP:
--   id INT AUTO_INCREMENT PRIMARY KEY,
--   created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP

-- Insert three rows. The database will assign ids 1, 2, 3
-- and fill created_at with the current server time.
INSERT INTO users (name, email, age) VALUES
    ('John',  'john@example.com',  29),
    ('Jane',  'jane@example.com',  34),
    ('Bob',   'bob@example.com',   41);

-- Ask a question declaratively: who is under 35, sorted by age?
SELECT id, name, email, age
FROM users
WHERE age < 35
ORDER BY age ASC;

-- Sample output from psql:
--  id | name | email             | age
-- ----+------+-------------------+-----
--   1 | John | john@example.com  |  29
--   2 | Jane | jane@example.com  |  34
-- (2 rows)

-- Try to violate the schema -- the database stops you cold.
INSERT INTO users (name, email, age) VALUES ('Eve', 'jane@example.com', 22);
-- ERROR:  duplicate key value violates unique constraint "users_email_key"
-- DETAIL: Key (email)=(jane@example.com) already exists.

INSERT INTO users (name, email, age) VALUES ('Mallory', 'm@example.com', -1);
-- ERROR:  new row for relation "users" violates check constraint "users_age_check"
-- DETAIL: Failing row contains (4, Mallory, m@example.com, -1, 2026-04-13 ...).

Notice what you did not have to write: no loop over file lines, no string parsing, no locking code, no sort algorithm, no validation logic. You declared what you wanted and the database figured out how. That is the heart of the relational model, and it is why a one-line SELECT replaces pages of imperative file-munging.

Why Not Just Use Flat Files?

Every new developer asks this eventually. The answer is four concrete failures, each of which individually justifies the existence of databases.

1. Concurrency. Two processes writing to the same file at the same time will corrupt it -- interleaved bytes, half-written rows, truncated lines. Operating system file locks help a little, but they turn the whole file into a bottleneck: only one writer at a time, readers blocked, throughput pitiful. A database handles thousands of concurrent writers through row-level locks, multiversion concurrency control (MVCC in PostgreSQL), and transaction isolation. You get parallelism without corruption.

2. Integrity. A flat file accepts any garbage you write to it. Negative ages, duplicate primary keys, dates that are actually phone numbers, rows missing half their fields -- all fine as far as the file system is concerned. A database enforces types, uniqueness, foreign keys, check constraints, and NOT NULL at the storage layer. Bad data is rejected before it lands, so downstream code can trust what it reads.

3. Queries. "Find all users who signed up in January and have spent more than $100" is a one-line SQL query over two joined tables. In a flat file, it is a script: open file, parse each line, filter, open the second file, join manually, accumulate totals, sort, return. It takes minutes on a million rows. The database, with the right indexes, answers in milliseconds, and the query optimiser picks the fastest strategy for you.

4. Durability and recovery. A flat file written with a partial flush and then a power loss is torn -- half old, half new, unrecoverable. A database uses a write-ahead log (WAL): every change is appended to the log and fsynced before the data file is touched. On crash recovery, the engine replays the log to bring the database back to a consistent state. You cannot easily build this yourself without reinventing half a DBMS.

+---------------------------------------------------------------+
|           FLAT FILE vs DATABASE -- THE FOUR FAILURES           |
+---------------------------------------------------------------+
|                                                                |
|  CONCERN      | FLAT FILE            | DATABASE                |
|  -------------+----------------------+-------------------------|
|  Concurrency  | One writer, corrupts | MVCC, row locks, 1000s  |
|  Integrity    | Accepts any bytes    | Types, constraints, FK  |
|  Queries      | Handwritten scripts  | SQL + query optimiser   |
|  Durability   | Torn writes on crash | WAL + crash recovery    |
|  Indexing     | Linear scan only     | B-trees, hash, GIN      |
|  Access ctrl  | File permissions     | Per-table, per-column   |
|                                                                |
+---------------------------------------------------------------+

ACID -- The Quiet Superpower (Preview)

Everything above is held together by a single acronym that will show up in every database interview you ever take: ACID. We devote a whole lesson to it later, but the one-line version is worth planting here.

Atomicity -- a transaction is all-or-nothing. Transfer $100 from Alice to Bob? Either both sides update, or neither does. No money is ever half-transferred.
Consistency -- a transaction moves the database from one valid state to another. All constraints hold before and after.
Isolation -- concurrent transactions do not see each other's partial work. It looks as if they ran one after another.
Durability -- once a transaction commits, its changes survive crashes, power loss, and restarts, because they were written to the WAL and fsynced to disk.

ACID is the property that makes a database trustworthy. A bank ledger, an inventory counter, a login session -- none of these tolerate "most of the write happened." Databases deliver this guarantee at the engine level so your application code does not have to.

SQL vs NoSQL -- A Brief Honest Comparison

The modern database landscape is split into two broad camps, and you should be able to talk about both without tribalism.

SQL (relational) databases -- PostgreSQL, MySQL, SQL Server, Oracle, SQLite -- use tables with fixed schemas, speak SQL, and prioritise ACID guarantees. They are the default choice for any system where data has clear structure, relationships between entities matter, and integrity is non-negotiable. That covers roughly 90% of backend systems: user accounts, orders, payments, bookings, inventory.

NoSQL databases are a grab-bag of non-relational stores, each shaped for a specific access pattern:

Document stores (MongoDB, CouchDB) -- JSON-like documents, flexible schemas, good for content with varying shape.
Key-value stores (Redis, DynamoDB) -- ultra-fast lookups by key, great for caches, sessions, rate limiters.
Wide-column stores (Cassandra, HBase) -- massive horizontal scale for append-heavy workloads like time series and logs.
Graph databases (Neo4j, ArangoDB) -- relationships as first-class citizens, ideal for social graphs and recommendation engines.

+---------------------------------------------------------------+
|           SQL vs NoSQL AT A GLANCE                             |
+---------------------------------------------------------------+
|                                                                |
|                | SQL (Relational) | NoSQL                      |
|  --------------+------------------+----------------------------|
|  Schema        | Fixed, enforced  | Flexible or schemaless     |
|  Query lang    | SQL (standard)   | Varies per engine          |
|  Transactions  | Full ACID        | Often limited / eventual   |
|  Joins         | First class      | Usually avoided            |
|  Scale model   | Vertical + read  | Horizontal sharding        |
|  Best for      | Structured data, | Caches, docs, time series, |
|                | integrity, joins | massive scale              |
|  Examples      | PostgreSQL,MySQL | MongoDB, Redis, Cassandra  |
|                                                                |
+---------------------------------------------------------------+

The honest interview answer is "start with PostgreSQL, add a NoSQL store when a specific access pattern demands it." A Redis cache in front of Postgres, a MongoDB for unstructured event blobs, a Neo4j for the social graph -- polyglot persistence is normal. But pretending NoSQL replaces SQL is how teams end up reimplementing transactions badly.

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Center: a large sky-blue (#4fc3f7) table grid showing columns 'id | name | email | age' with three rows of sample data in white monospace. Above the table: a rose (#ff5c8a) banner reading 'SCHEMA' with arrows pointing to each column labelled with its type (INT, TEXT, TEXT UNIQUE, INT CHECK). Below the table: three small sky-blue cylinder icons labelled 'WAL', 'Index', 'MVCC' with short captions 'crash safe', 'fast lookup', 'concurrent writes'. To the left of the table: a greyed-out crumpled flat file with a rose X through it labelled 'users.csv'. Title at top: 'Table + Schema + Engine'. Subtitle: 'What a Database Actually Gives You'. Subtle dotted grid overlay."

Second Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). A horizontal timeline showing a transaction labelled 'BEGIN -> UPDATE alice -> UPDATE bob -> COMMIT' in sky blue (#4fc3f7) monospace. Above the timeline, four rose (#ff5c8a) badges labelled A, C, I, D with one-line captions: 'all or nothing', 'constraints hold', 'no dirty reads', 'survives crash'. Below the timeline, a small disk icon labelled 'WAL fsync' in white. Title at top: 'ACID in One Picture'."

Common Mistakes

1. Confusing "database" with "DBMS" with "RDBMS". Interviewers notice immediately when a candidate says "I use a MySQL database" and means "I use MySQL, which runs a database called app_prod." The database is the data, the DBMS is the software, and the RDBMS is a DBMS that follows the relational model. Get the vocabulary right the first time you say it in an interview and everything downstream sounds more credible.

2. Treating schemaless as "no rules". A common NoSQL trap: developers pick MongoDB because "we don't need a schema" and then discover six months later that their code is riddled with defensive checks for missing fields, type mismatches, and half-migrated documents. Schemaless means the database does not enforce shape -- it does not mean shape stops mattering. In a SQL database, the engine rejects bad data at write time; in a schemaless store, your application code has to do it every read.

3. Assuming flat files are "faster because they are simpler". For a single writer, a single reader, and tiny data, a flat file can look faster in a microbenchmark. Add a second concurrent writer, a query that is not "read the whole file," or a requirement that data survives a crash, and the flat file collapses. Databases are not slower than files; they are doing vastly more work per operation, and they are highly optimised for the things that matter in real systems.

4. Thinking relational means "has relationships between tables". The "relational" in RDBMS comes from Codd's relational algebra, where a relation is a set of tuples with a fixed schema -- basically, a table. Foreign keys between tables are a useful feature, but they are not why the model is called relational. A single table with no foreign keys is still a relation, and a database full of isolated tables is still relational.

5. Picking NoSQL because "it scales better". NoSQL stores often scale horizontally more easily than traditional SQL databases, but modern Postgres and MySQL scale to tens of terabytes and millions of requests per second with read replicas, partitioning, and connection pooling. Most teams hit their own architectural limits long before they hit the database's. Pick NoSQL for a specific access pattern -- document shape, key-value latency, graph traversal -- not as a generic "scalability" insurance policy.

Interview Questions

1. "What is the difference between a database, a DBMS, and an RDBMS?"

A database is the organised collection of data itself -- the tables, rows, and indexes that live on disk. A DBMS (Database Management System) is the software that manages one or more databases: it handles storage, queries, concurrency, backups, and access control. An RDBMS is a specific kind of DBMS that follows the relational model -- data organised into tables with fixed schemas -- and speaks SQL as its query language. PostgreSQL and MySQL are RDBMSs; MongoDB and Redis are DBMSs but not RDBMSs because they do not use the relational model. In everyday speech the terms get blurred, but in an interview it is worth being precise because it signals you actually understand the layers.

2. "Why would you use a database instead of just writing to a file?"

Four concrete reasons, and any one of them is enough to justify a database. First, concurrency: a database handles thousands of simultaneous readers and writers without corruption via row-level locks and MVCC, while a file tears under a second concurrent writer. Second, integrity: the schema enforces types, uniqueness, foreign keys, and constraints at write time, so bad data is impossible to store, whereas a file accepts any bytes. Third, queries: SQL lets you ask declarative questions like "all users who signed up in January and spent more than $100" and the query optimiser finds the fastest plan, where in a file you have to hand-write the scan, filter, join, and sort. Fourth, durability: the write-ahead log guarantees that committed transactions survive crashes and power loss, where a file write interrupted by a crash is torn and often unrecoverable.

3. "Explain what a schema is and why it matters."

A schema is the formal declaration of a table's shape: its name, columns, their types, and the constraints that every row must satisfy. For example, CREATE TABLE users (id SERIAL PRIMARY KEY, email TEXT UNIQUE NOT NULL, age INTEGER CHECK (age >= 0)) declares four facts the database will enforce on every write: id is unique and auto-generated, email is unique and required, age is an integer that cannot be negative. It matters because the schema is a contract the database enforces at the storage layer, which means downstream code -- your API, your analytics, your migration scripts -- can trust the data without having to re-validate it on every read. Without a schema, validation has to live in every caller, which is how half-valid data ends up corrupting production.

4. "What is the difference between SQL and NoSQL, and when would you pick each?"

SQL databases (PostgreSQL, MySQL, SQL Server) organise data into tables with fixed, enforced schemas, use SQL as a standard query language, support joins between tables as a first-class operation, and provide full ACID transactions. NoSQL is an umbrella over several non-relational models: document stores like MongoDB, key-value stores like Redis, wide-column stores like Cassandra, and graph databases like Neo4j. Each NoSQL type is optimised for a specific access pattern -- flexible documents, low-latency key lookups, massive append-heavy workloads, or graph traversals. The sensible default for a new backend is a relational database because most business data is structured and integrity matters, and then you add a NoSQL store for a specific need: Redis for caching and sessions, MongoDB for unstructured event blobs, Cassandra for time-series at scale. "Polyglot persistence" -- using the right store for each workload -- is how mature systems are built.

5. "What does ACID stand for and why does it matter?"

ACID stands for Atomicity, Consistency, Isolation, and Durability. Atomicity means a transaction is all-or-nothing: if you transfer money from Alice to Bob, either both updates happen or neither does, so you never lose half a transfer. Consistency means every transaction moves the database from one valid state to another, with all constraints (types, uniqueness, foreign keys) holding before and after. Isolation means concurrent transactions do not see each other's partial work -- two transfers running at the same time behave as if they ran one after the other. Durability means once a transaction commits, its changes survive crashes and power loss because they were written to the write-ahead log and fsynced to disk. Together these four properties are what make a database trustworthy for anything that looks like money, inventory, or identity, and they are the single biggest reason you cannot replace a database with a clever file format.

Quick Reference -- Cheat Sheet

+---------------------------------------------------------------+
|           DATABASE FUNDAMENTALS CHEAT SHEET                    |
+---------------------------------------------------------------+
|                                                                |
|  VOCABULARY:                                                   |
|   Database -> the data on disk                                 |
|   DBMS     -> software managing the data                       |
|   RDBMS    -> a DBMS using the relational model + SQL          |
|   Table    -> a relation (rows + columns with a schema)        |
|   Row      -> one record (a tuple)                             |
|   Column   -> a typed attribute                                |
|   Schema   -> the formal contract the DB enforces              |
|                                                                |
|  WHY NOT FLAT FILES:                                           |
|   1. Concurrency -> MVCC + row locks                           |
|   2. Integrity   -> types, constraints, foreign keys           |
|   3. Queries     -> SQL + optimiser + indexes                  |
|   4. Durability  -> write-ahead log, crash recovery            |
|                                                                |
|  ACID:                                                         |
|   Atomicity   -> all-or-nothing                                |
|   Consistency -> constraints hold before and after             |
|   Isolation   -> no dirty reads from concurrent txns           |
|   Durability  -> committed changes survive crashes             |
|                                                                |
|  POSTGRES QUICKSTART:                                          |
|   CREATE TABLE users (                                         |
|     id SERIAL PRIMARY KEY,                                     |
|     email TEXT UNIQUE NOT NULL,                                |
|     age INTEGER CHECK (age >= 0)                               |
|   );                                                           |
|   INSERT INTO users (email, age) VALUES ('a@b.com', 29);       |
|   SELECT * FROM users WHERE age < 35 ORDER BY age;             |
|                                                                |
|  MYSQL SYNTAX DIFFERENCES:                                     |
|   SERIAL         -> INT AUTO_INCREMENT                         |
|   TIMESTAMP      -> DATETIME                                   |
|   NOW()          -> CURRENT_TIMESTAMP                          |
|   TEXT           -> VARCHAR(n) or TEXT (both work)             |
|                                                                |
+---------------------------------------------------------------+

Concept	Flat File	Database
Storage unit	Line in a file	Row in a table
Schema	Implicit in code	Explicit, engine-enforced
Concurrent writes	Corruption likely	MVCC / row locks
Query language	Custom script	SQL
Crash recovery	Often torn data	WAL replay
Find by key	Linear scan	Index lookup (O(log n))
Rename a field	Rewrite every line	`ALTER TABLE`
Transactions	None	Full ACID
Access control	File permissions	Per-table, per-column roles
Typical scale	Thousands of rows	Billions of rows

Previous: Course start Next: Lesson 1.2 -- Data Types

This is Lesson 1.1 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

What Is a Database?