Database Interview Prep
Filtering, Sorting and Aggregation

HAVING vs WHERE

Filtering Rows Before and After Aggregation

LinkedIn Hook

"Your query returned zero rows and the bug was one keyword."

A junior engineer wrote WHERE COUNT(*) > 5 and spent an hour staring at a "column does not exist" error. A senior engineer wrote HAVING region = 'EU' on a million-row table and watched the query scan every single row before filtering. Both made the same category mistake: they confused WHERE and HAVING.

The rule is brutally simple. WHERE filters rows before grouping. HAVING filters groups after aggregation. WHERE runs early, cuts the dataset down, and lets indexes do their job. HAVING runs late, operates on aggregate results, and is the only place you can legally write COUNT(*) > 5 or SUM(amount) > 1000.

Get this backwards and one of two things happens: the database throws a syntax error because you tried to aggregate in WHERE, or the database silently runs a slow query because you filtered non-aggregate columns in HAVING instead of WHERE. One crashes the query, the other crashes the p95 latency.

In Lesson 4.5, I break down WHERE vs HAVING — execution order, side-by-side examples, performance, and the three mistakes that show up in every interview.

Read the full lesson -> [link]

#SQL #Database #BackendEngineering #InterviewPrep #PostgreSQL #DataEngineering


HAVING vs WHERE thumbnail


What You'll Learn

  • The exact execution order of a SELECT statement and where WHERE and HAVING fit in
  • Why WHERE cannot reference aggregate functions and why HAVING can
  • When a filter condition belongs in WHERE (non-aggregate) vs HAVING (aggregate)
  • How to combine WHERE and HAVING in the same query for maximum efficiency
  • Why WHERE is almost always cheaper than HAVING when both are valid
  • The three common mistakes interviewers love to ask about
  • Whether HAVING can reference SELECT aliases (Postgres says yes, the SQL standard says no)

The Bouncer Analogy — Two Doors, Two Filters

Picture a nightclub on a Saturday night. There are two bouncers. The first bouncer stands at the front door. His job is to check each person's ID as they arrive and decide whether they get into the club at all. He is fast, decisive, and filters one person at a time based on visible attributes — age, dress code, whether they are on the guest list. That is WHERE. It runs on individual rows, before anything is grouped, and it uses properties each row already has.

The second bouncer stands at the velvet rope in front of the VIP section. By the time someone reaches him, they are already inside the club and they are no longer an individual — they are part of a table, a party, a group. His job is to look at the group as a whole and decide whether the group qualifies for VIP access. "Does this table have more than four people? Has this group spent over $500?" He cannot ask those questions at the front door, because at the front door nobody has ordered anything yet and nobody is grouped with anyone. That is HAVING. It runs on aggregated groups, after GROUP BY has assembled them, and its filter conditions can reference sums, counts, averages, and other aggregates.

This is why the two bouncers exist. WHERE gets rid of the people who should never have been in the club in the first place — cheap, fast, and before the crowd forms. HAVING deals with questions that only make sense once people are grouped. Try to ask the front-door bouncer "is this group of four spending over $500?" and he will look at you like you are insane, because neither "this group" nor "spending" exists yet. Try to ask the VIP-rope bouncer "is this person over 21?" and you are wasting his time — that should have been handled at the door.

+---------------------------------------------------------------+
|           WHERE = FRONT DOOR BOUNCER                          |
+---------------------------------------------------------------+
|                                                                |
|  Input:   individual rows (people walking up)                  |
|  Checks:  row-level columns (age, country, status)             |
|  Cannot:  reference SUM, COUNT, AVG, MAX, MIN                  |
|  Goal:    shrink the dataset BEFORE grouping                   |
|  Uses:    indexes, cheap to evaluate                           |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           HAVING = VIP ROPE BOUNCER                           |
+---------------------------------------------------------------+
|                                                                |
|  Input:   groups (tables of people already inside)             |
|  Checks:  aggregate expressions (SUM, COUNT, AVG)              |
|  Can:     reference both aggregates AND grouping columns       |
|  Goal:    filter groups AFTER aggregation                      |
|  Uses:    runs after GROUP BY, cannot use indexes on aggs      |
|                                                                |
+---------------------------------------------------------------+

SELECT Execution Order — The Map That Explains Everything

SQL is written in one order and executed in another. Every confusion about WHERE vs HAVING dissolves the moment you memorize the real execution order. The written order is SELECT -> FROM -> WHERE -> GROUP BY -> HAVING -> ORDER BY -> LIMIT, but the database does not run it that way. It runs it like this:

+---------------------------------------------------------------+
|           LOGICAL EXECUTION ORDER                             |
+---------------------------------------------------------------+
|                                                                |
|    1. FROM       -> read rows from tables, apply joins        |
|           |                                                    |
|           v                                                    |
|    2. WHERE      -> filter individual rows (no aggregates)    |
|           |                                                    |
|           v                                                    |
|    3. GROUP BY   -> collapse rows into groups                 |
|           |                                                    |
|           v                                                    |
|    4. HAVING     -> filter groups (aggregates allowed)        |
|           |                                                    |
|           v                                                    |
|    5. SELECT     -> compute output columns, apply DISTINCT    |
|           |                                                    |
|           v                                                    |
|    6. ORDER BY   -> sort the final result                     |
|           |                                                    |
|           v                                                    |
|    7. LIMIT      -> take the first N rows                     |
|                                                                |
+---------------------------------------------------------------+

Everything makes sense once you see this. WHERE runs at step 2, before any grouping has happened — so it literally cannot know what SUM(amount) is, because no group exists yet. HAVING runs at step 4, after GROUP BY has already bundled rows together and computed aggregates — so SUM(amount) > 1000 is a perfectly valid question. And SELECT runs at step 5, which is why in the strict SQL standard you cannot reference a SELECT alias from HAVING: the alias does not exist yet at the moment HAVING runs.


Side-by-Side — Same Data, Different Filters

Let us work through a concrete example. Here is an orders table and two queries that look almost identical but return different results.

-- Schema and sample data
CREATE TABLE orders (
  id           SERIAL PRIMARY KEY,
  customer_id  INT NOT NULL,
  country      TEXT NOT NULL,
  amount       NUMERIC(10,2) NOT NULL,
  created_at   DATE NOT NULL
);

INSERT INTO orders (customer_id, country, amount, created_at) VALUES
  (1, 'US', 120.00, '2026-01-05'),
  (1, 'US',  45.00, '2026-01-12'),
  (2, 'US', 900.00, '2026-02-01'),
  (3, 'UK',  60.00, '2026-02-03'),
  (3, 'UK',  80.00, '2026-02-18'),
  (4, 'UK',  15.00, '2026-03-02'),
  (5, 'DE', 500.00, '2026-03-10'),
  (5, 'DE', 300.00, '2026-03-15');

Query A — WHERE filters rows before grouping

-- "Per country, total revenue from orders over $50"
SELECT country, SUM(amount) AS total
FROM   orders
WHERE  amount > 50           -- filters individual rows FIRST
GROUP BY country
ORDER BY total DESC;
 country | total
---------+--------
 DE      | 800.00
 US      | 1020.00   -> wait, let us recount
 UK      | 140.00

Recomputing by hand: rows surviving amount > 50 are (US 120), (US 900), (UK 60), (UK 80), (DE 500), (DE 300). Grouped: US -> 1020, UK -> 140, DE -> 800.

 country | total
---------+---------
 US      | 1020.00
 DE      |  800.00
 UK      |  140.00

Query B — HAVING filters groups after aggregation

-- "Per country, total revenue, but only countries whose TOTAL is over $500"
SELECT country, SUM(amount) AS total
FROM   orders
GROUP BY country
HAVING SUM(amount) > 500     -- filters GROUPS after aggregation
ORDER BY total DESC;

Here no rows are dropped before grouping. Every row participates in the sum. US -> 1065, UK -> 155, DE -> 800. Then HAVING keeps only groups whose total is over 500.

 country | total
---------+---------
 US      | 1065.00
 DE      |  800.00

UK disappears, not because any row was too small, but because the entire group's sum did not clear the bar. Note that US now sums to 1065 instead of 1020 — the $45 order is included because no WHERE pruned it. Same data, same columns, completely different meaning.


When Does a Condition Belong in WHERE vs HAVING?

The rule is mechanical and has no edge cases:

  • If the condition references only non-aggregate columns, it belongs in WHERE. It can run earlier, cut the dataset before grouping, and often use an index. Putting it in HAVING is legal in some dialects but wasteful.
  • If the condition references an aggregate function (SUM, COUNT, AVG, MIN, MAX), it must go in HAVING. Putting it in WHERE is a syntax error — WHERE runs before any aggregation has happened, so there is no sum to compare against.
-- VALID: non-aggregate in WHERE, aggregate in HAVING
SELECT country, COUNT(*) AS order_count, SUM(amount) AS total
FROM   orders
WHERE  created_at >= '2026-02-01'     -- row-level, goes in WHERE
GROUP BY country
HAVING COUNT(*) >= 2                  -- aggregate, must go in HAVING
   AND SUM(amount) > 200;             -- aggregate, must go in HAVING
-- INVALID: aggregate in WHERE
SELECT country, SUM(amount) AS total
FROM   orders
WHERE  SUM(amount) > 500              -- ERROR: aggregate functions
GROUP BY country;                     --  are not allowed in WHERE

Postgres error: aggregate functions are not allowed in WHERE. MySQL, SQL Server, Oracle, and SQLite all produce equivalent errors. This is not a quirk — it is required by the execution order, because at the moment WHERE runs there is no group yet and therefore no sum.


Combining WHERE and HAVING — The Common Pattern

Real queries almost always use both. Filter at the row level with WHERE to shrink the dataset, then filter at the group level with HAVING to cut down the aggregated result. This is the standard shape of any "top customers", "top regions", "top products" query.

-- "Countries where, among orders placed in Q1 2026,
--  at least 2 orders totaled more than $200"
SELECT country,
       COUNT(*)        AS order_count,
       SUM(amount)     AS total_revenue,
       AVG(amount)     AS avg_order
FROM   orders
WHERE  created_at BETWEEN '2026-01-01' AND '2026-03-31'  -- row filter
   AND amount > 0                                         -- row filter
GROUP BY country
HAVING COUNT(*) >= 2                                      -- group filter
   AND SUM(amount) > 200                                  -- group filter
ORDER BY total_revenue DESC;

The pipeline for this query looks like this:

+---------------------------------------------------------------+
|           QUERY PIPELINE — WHERE + HAVING                     |
+---------------------------------------------------------------+
|                                                                |
|  [orders table: 8 rows]                                        |
|           |                                                    |
|           v                                                    |
|  FROM + WHERE (created_at in Q1 AND amount > 0)                |
|           |                                                    |
|           v                                                    |
|  [6 rows survive row filter]                                   |
|           |                                                    |
|           v                                                    |
|  GROUP BY country                                              |
|           |                                                    |
|           v                                                    |
|  [3 groups: US, UK, DE with counts and sums]                   |
|           |                                                    |
|           v                                                    |
|  HAVING COUNT(*) >= 2 AND SUM(amount) > 200                    |
|           |                                                    |
|           v                                                    |
|  [2 groups survive group filter: US, DE]                       |
|           |                                                    |
|           v                                                    |
|  SELECT -> ORDER BY -> final result                            |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Horizontal pipeline: a sky blue (#4fc3f7) funnel labeled WHERE receiving individual row icons on the left, passing surviving rows into a rose (#ff5c8a) GROUP BY bundler that collapses them into group cards, then flowing into a second rose funnel labeled HAVING that drops some group cards. White monospace labels at each stage: FROM, WHERE, GROUP BY, HAVING, SELECT. Arrows use -> ASCII style. Subtle grid."


Performance — Prefer WHERE Whenever You Can

Both of these queries return the same rows:

-- Version 1: condition in HAVING (works but wasteful)
SELECT country, SUM(amount) AS total
FROM   orders
GROUP BY country
HAVING country = 'US';

-- Version 2: condition in WHERE (much faster)
SELECT country, SUM(amount) AS total
FROM   orders
WHERE  country = 'US'
GROUP BY country;

Version 1 reads every row in the table, groups by country, computes sums for every country, and only then throws away all the non-US groups. Version 2 uses the index on country (if one exists), reads only US rows, groups only those, and computes only the sum it needs. On a table of ten million rows, the difference can be three orders of magnitude.

The rule: if a condition is expressible in WHERE, put it in WHERE. HAVING is for conditions that depend on aggregates and therefore cannot physically run earlier. Using HAVING for a non-aggregate filter is not wrong, but it is an unforced performance hit. EXPLAIN plans will usually show the WHERE version using an index scan and the HAVING version doing a sequential scan followed by a hash aggregate — same result, very different cost.


Common Mistakes

1. Using HAVING when WHERE would work — the performance tax. Writing HAVING country = 'US' instead of WHERE country = 'US' produces the correct answer but forces the database to aggregate the entire table before filtering. This is one of the most common mistakes in interview code reviews, and it shows up in production too. The fix is mechanical: if the condition does not reference an aggregate, move it to WHERE. Indexes on country will not be used from HAVING, but they will be used from WHERE.

2. Using WHERE with an aggregate — the syntax error. Writing WHERE SUM(amount) > 1000 produces an immediate error in every major database because WHERE runs before GROUP BY — there is no sum yet. The fix is to move the condition to HAVING. If you need to filter both row-level and group-level conditions, use both WHERE and HAVING together. Candidates who forget this in an interview usually recover by saying "right, aggregates need HAVING because they do not exist at WHERE time" — which is exactly the correct mental model.

3. Filtering on SELECT aliases in HAVING — dialect landmine. Postgres lets you write HAVING total > 500 where total is an alias defined in SELECT as SUM(amount) AS total. That is a Postgres extension. The SQL standard and several other databases (including older MySQL, DB2, and Oracle in strict mode) require you to repeat the full aggregate expression: HAVING SUM(amount) > 500. If you are writing portable SQL, always repeat the aggregate in HAVING. The same caveat applies to ORDER BY, where alias support is better but still not universal. The reason this even works in Postgres is an intentional convenience — by the formal execution order, HAVING runs before SELECT, so the alias should not be visible at that point.

4. Putting WHERE amount > 50 AND SUM(amount) > 500 all in WHERE. A compound variation of mistake 2. The row-level part is legal in WHERE and the aggregate part is not, so the query fails. Split the condition: WHERE amount > 50 ... HAVING SUM(amount) > 500. This gives the database the best of both worlds — it can filter rows early using WHERE, then filter groups later using HAVING.

5. Forgetting that HAVING without GROUP BY operates on the whole table as one group. SELECT SUM(amount) FROM orders HAVING SUM(amount) > 1000 is valid — with no GROUP BY, the entire table is treated as a single group, and HAVING decides whether that single group survives. This is rarely useful but shows up in quiz questions.


Interview Questions

1. "Explain the execution order of a SELECT statement and use it to justify why WHERE cannot contain aggregate functions."

A SELECT runs logically in the order FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY, LIMIT. FROM reads and joins rows, WHERE filters individual rows, GROUP BY collapses rows into groups, HAVING filters groups using aggregates, SELECT computes output columns and DISTINCT, ORDER BY sorts, and LIMIT takes the first N. Because WHERE runs at step 2 and GROUP BY runs at step 3, no group exists yet when WHERE is evaluated — which means aggregate functions like SUM, COUNT, and AVG have nothing to operate on. This is why every major database rejects WHERE SUM(amount) > 100 with a syntax error. HAVING exists specifically to fill that gap: it runs at step 4, after aggregation, so it is the only legal place to compare aggregate expressions. The rule "WHERE filters rows, HAVING filters groups" is not arbitrary — it falls directly out of the execution order.

2. "Given a query that uses HAVING to filter on a non-aggregate column, is it correct? Would you rewrite it?"

It is usually correct in the sense that it returns the right rows — Postgres, MySQL, SQL Server, and Oracle all allow non-aggregate conditions in HAVING. But it is almost always a performance mistake. HAVING runs after GROUP BY, which means the database performs aggregation over the entire dataset before applying the filter, and it cannot use any indexes on the filtered column for that step. Moving the condition to WHERE lets the database prune rows before grouping, which is cheaper both in CPU and in I/O, and it lets the optimizer use indexes on that column. The rewrite is mechanical: if the condition does not contain an aggregate function, move it to WHERE. I would only leave it in HAVING in the rare case where mixing it with WHERE would be strictly less clear — for example, if the condition naturally reads as a property of the group rather than the row.

3. "Write a query that finds customers who placed at least 3 orders in 2026 with an average order value over $100."

SELECT customer_id,
       COUNT(*)    AS order_count,
       AVG(amount) AS avg_amount
FROM   orders
WHERE  created_at >= '2026-01-01'
   AND created_at <  '2027-01-01'
GROUP BY customer_id
HAVING COUNT(*) >= 3
   AND AVG(amount) > 100
ORDER BY avg_amount DESC;

The date range belongs in WHERE because it is a row-level condition and it lets the database use the index on created_at to prune rows before grouping. The count and average conditions belong in HAVING because they are aggregates that can only be computed after grouping. Combining both is the canonical pattern: filter rows early with WHERE, then filter groups with HAVING. A weaker answer would put the date filter in HAVING with HAVING MIN(created_at) >= ... — it would return the correct customers but force the database to aggregate over the whole table.

4. "Can HAVING be used without GROUP BY? What does that mean semantically?"

Yes. When HAVING appears without GROUP BY, the entire result of the WHERE step is treated as one implicit group, and HAVING decides whether that single group survives. For example, SELECT COUNT(*) FROM orders HAVING COUNT(*) > 100 returns the count only if it exceeds 100, otherwise it returns zero rows. This is rarely useful in practice — the same logic is usually clearer with a subquery or a CASE expression — but it is valid SQL and it shows up in quiz questions as a way to check whether candidates really understand that HAVING operates on groups, not rows, and that "no GROUP BY" means "one group that is the whole table."

5. "Postgres lets me write HAVING total > 500 when total is a SELECT alias. Is that standard SQL?"

No. The SQL standard says HAVING runs before SELECT in the logical execution order, so SELECT aliases should not yet be defined when HAVING is evaluated. Postgres extends the standard for convenience and allows aliases in HAVING, as does SQLite in most cases. MySQL has historically allowed it too. But Oracle and SQL Server, and older or stricter MySQL modes, will reject it and require you to repeat the full aggregate expression: HAVING SUM(amount) > 500. If portability matters, always repeat the expression. If you are writing Postgres-only code, the alias form is fine but slightly less explicit and can confuse readers who are thinking about execution order. I generally repeat the expression in HAVING regardless, for the same reason I repeat it in ORDER BY when mixing dialects — the cost is two extra words and the benefit is zero dialect surprises.


Quick Reference — WHERE vs HAVING Cheat Sheet

+---------------------------------------------------------------+
|           WHERE vs HAVING CHEAT SHEET                         |
+---------------------------------------------------------------+
|                                                                |
|  WHERE                                                         |
|   - Filters INDIVIDUAL ROWS                                    |
|   - Runs BEFORE GROUP BY                                       |
|   - CANNOT contain aggregates (SUM, COUNT, AVG, MAX, MIN)      |
|   - CAN use indexes on filtered columns                        |
|   - Use for: date ranges, status filters, id matches           |
|                                                                |
|  HAVING                                                        |
|   - Filters GROUPS                                             |
|   - Runs AFTER GROUP BY                                        |
|   - CAN contain aggregates                                     |
|   - Cannot use indexes on aggregate expressions                |
|   - Use for: COUNT(*) >= N, SUM(x) > N, AVG(x) > N             |
|                                                                |
|  EXECUTION ORDER:                                              |
|   FROM -> WHERE -> GROUP BY -> HAVING -> SELECT ->             |
|   ORDER BY -> LIMIT                                            |
|                                                                |
|  DECISION RULE:                                                |
|   Does the condition reference an aggregate?                   |
|     yes -> HAVING (mandatory)                                  |
|     no  -> WHERE  (for performance)                            |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           KEY RULES                                            |
+---------------------------------------------------------------+
|                                                                |
|  1. WHERE filters rows, HAVING filters groups                  |
|  2. Aggregates in WHERE = syntax error                         |
|  3. Non-aggregates in HAVING = legal but slow                  |
|  4. Combine WHERE + HAVING in most real queries                |
|  5. Repeat aggregate expressions in HAVING for portability     |
|  6. HAVING without GROUP BY = one implicit group               |
|  7. Prefer WHERE whenever the condition allows it              |
|  8. Use EXPLAIN to confirm index usage on WHERE predicates     |
|                                                                |
+---------------------------------------------------------------+
ConcernWHEREHAVING
Operates onRowsGroups
Runs at step2 (before GROUP BY)4 (after GROUP BY)
Aggregates allowedNoYes
Non-aggregates allowedYesYes (but wasteful)
Can use indexesYesNot on aggregates
Typical usestatus = 'paid'SUM(amount) > 1000
Cost when misusedSyntax errorFull scan + aggregate
Alias referenceNoPostgres yes, standard no

Prev: Lesson 4.4 -- GROUP BY Next: Lesson 4.6 -- ORDER BY


This is Lesson 4.5 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page