Database Interview Prep
Filtering, Sorting and Aggregation

Pattern Matching

LIKE, ILIKE, Wildcards, Regex, and the Index Trap

LinkedIn Hook

"We added WHERE email LIKE '%@gmail.com' to a 40-million-row users table. The query went from 8 ms to 14 seconds. The index was still there — Postgres just refused to use it."

Pattern matching is one of those features every SQL developer learns in their first week and then misuses for the rest of their career. LIKE 'foo%' is fast. LIKE '%foo' is a full table scan. LIKE '%foo%' is a full table scan that also makes the DBA cry. And nobody tells you why until production is on fire.

The rules are not arbitrary. A B-tree index stores strings sorted left-to-right, which means it can seek to every row starting with foo in logarithmic time — but it has no way to find rows ending with foo without reading every single one. Worse, if your database column uses a non-C collation, even LIKE 'foo%' will not use the index unless you explicitly built it with text_pattern_ops. And if you really need both-sided wildcards at speed, you need a completely different index type: a trigram GIN index from the pg_trgm extension.

Then there is the case-sensitivity trap. LIKE is case-sensitive in Postgres. ILIKE is the case-insensitive twin. SIMILAR TO is a weird SQL-99 hybrid nobody uses. And ~ / ~* drop you into full POSIX regex territory, where you can match anything — at a cost.

In Lesson 4.2, I break down pattern matching in SQL end to end: wildcards, escapes, regex operators, when LIKE can use an index, when it cannot, and when you should stop using LIKE entirely and reach for full-text search.

Read the full lesson -> [link]

#SQL #PostgreSQL #Database #Performance #Indexing #InterviewPrep


Pattern Matching thumbnail


What You'll Learn

  • The difference between LIKE and ILIKE and why case sensitivity matters
  • How % and _ wildcards differ and when each is appropriate
  • Using the ESCAPE clause to match literal % and _ characters
  • POSIX regex operators ~, ~*, !~, !~* and the SIMILAR TO hybrid
  • Why LIKE 'foo%' can use a B-tree index but LIKE '%foo' cannot
  • The text_pattern_ops operator class and when you need it
  • Trigram GIN indexes via pg_trgm for fast substring and fuzzy search
  • When to abandon LIKE entirely and reach for full-text search (tsvector)

The Ctrl+F With Fuzzy Matching Analogy

Imagine you are reading a 5,000-page legal document and you need to find every paragraph mentioning "Johnson." You hit Ctrl+F and type Johnson. Your reader jumps straight to the first match, then the next, then the next — almost instantly. Behind the scenes, the reader is not scanning every page character by character. It built an index when the document loaded: for every word, a sorted list of where it appears. A search for Johnson is a binary lookup in that list. That is a B-tree index in action, and LIKE 'John%' works exactly the same way — the database seeks to John in the sorted index and walks forward.

Now imagine you need every paragraph containing a name that ends in son. Johnson, Peterson, Jackson, Anderson. Ctrl+F cannot help you. The index is sorted by the first letter of each word, not the last. You have no choice but to read every single page and check every single word. That is LIKE '%son' — a full table scan, no matter how many indexes you have, because the structure of a B-tree cannot answer "which strings end with X" any more than an alphabetized phone book can answer "whose last name ends in -son."

Now imagine you need paragraphs containing any word that has ohn in it anywhere — Johnson, Mahoney, Cohn, Johnny. Still no help from the normal index. But there is another kind of index: one that, instead of sorting whole words, chops every word into overlapping three-letter chunks (Joh, ohn, hns, nso, son for "Johnson") and builds a lookup from each chunk to the words that contain it. Now %ohn% is a direct lookup: find me every word containing the chunk ohn. That is a trigram index, and it is what Postgres gives you via pg_trgm. Different problem, different index, wildly different performance.

The lesson: pattern matching performance depends entirely on what the index can physically answer, not what the query looks like. LIKE is a tool with sharp edges, and knowing which edge cuts which way is the difference between an 8 ms query and a 14-second one.


LIKE vs ILIKE — Case Sensitivity

LIKE is the SQL standard operator for pattern matching. It is case-sensitive in Postgres, MySQL's default on Linux, and most modern databases. ILIKE is a Postgres extension that does the same job case-insensitively. MySQL has no ILIKE — case insensitivity there is controlled by the column's collation.

-- Sample data
CREATE TABLE users (id serial PRIMARY KEY, email text, name text);
INSERT INTO users (email, name) VALUES
  ('alice@Gmail.com',  'Alice Johnson'),
  ('bob@gmail.com',    'Bob Peterson'),
  ('carol@yahoo.com',  'Carol Jackson'),
  ('dave@GMAIL.COM',   'Dave Anderson');

-- LIKE is case-sensitive: only the exact lowercase '@gmail.com' matches
SELECT email FROM users WHERE email LIKE '%@gmail.com';
-- email
-- ------------------
-- bob@gmail.com
-- (1 row)

-- ILIKE matches regardless of case
SELECT email FROM users WHERE email ILIKE '%@gmail.com';
-- email
-- ------------------
-- alice@Gmail.com
-- bob@gmail.com
-- dave@GMAIL.COM
-- (3 rows)

-- Portable alternative: lowercase both sides
SELECT email FROM users WHERE lower(email) LIKE '%@gmail.com';
-- Same result as ILIKE, but works in MySQL, SQLite, SQL Server too.
-- Note: this disables any plain index on `email` — you need a
-- functional index on lower(email) to make it fast.

The portable lower() approach is what you reach for when your app has to run on multiple databases. The trade-off is that you must create a functional index: CREATE INDEX ON users (lower(email));. Without it, the database has to call lower() on every row, which defeats any existing index.


Wildcards — % and _

SQL's pattern language has exactly two wildcards, and confusing them is a classic beginner mistake.

  • % matches zero or more characters of any kind (including none).
  • _ matches exactly one character of any kind.
-- % wildcard: zero or more characters
SELECT name FROM users WHERE name LIKE 'J%';
-- name
-- -----
-- (no rows — names start with 'A', 'B', 'C', 'D')

SELECT name FROM users WHERE name LIKE '%son%';
-- name
-- ----------------
-- Alice Johnson
-- Bob Peterson
-- Carol Jackson
-- Dave Anderson
-- (4 rows)

-- _ wildcard: exactly one character
-- Find 4-letter first names
SELECT name FROM users WHERE name LIKE '____ %';
-- name
-- ----------------
-- Dave Anderson   -- "Dave" is 4 letters, followed by space
-- (1 row)

-- Mix them: starts with 'A', second char is anything, third is 'i'
SELECT name FROM users WHERE name LIKE 'A_i%';
-- name
-- ----------------
-- Alice Johnson
-- (1 row)

Every other character in a LIKE pattern is a literal. LIKE 'abc' matches only the string abc, not "anything starting with abc" — without wildcards, LIKE is just an expensive =.


ESCAPE Clause — Matching Literal % and _

What if you need to find rows where the text actually contains a % or _ character? By default, those are wildcards. The fix is the ESCAPE clause, which tells LIKE "this character is my escape marker — the next character after it is a literal."

CREATE TABLE products (id serial PRIMARY KEY, sku text);
INSERT INTO products (sku) VALUES
  ('A_100'),
  ('A1100'),
  ('50%_OFF'),
  ('50X_OFF');

-- Without escape: _ is a wildcard, so A_100 and A1100 BOTH match
SELECT sku FROM products WHERE sku LIKE 'A_100';
-- sku
-- -------
-- A_100
-- A1100
-- (2 rows)  -- probably not what you wanted

-- With escape: '\' marks the next char as literal
SELECT sku FROM products WHERE sku LIKE 'A\_100' ESCAPE '\';
-- sku
-- -------
-- A_100
-- (1 row)

-- Find SKUs that literally contain '50%'
SELECT sku FROM products WHERE sku LIKE '50\%%' ESCAPE '\';
-- sku
-- ---------
-- 50%_OFF
-- (1 row)
-- Pattern breakdown: 50\% = literal "50%", then % = any trailing chars

You can pick any escape character you want. Some codebases use ! or # to avoid clashing with backslash conventions in the host language. The rule is: whichever character you declare in ESCAPE, placing it before a %, _, or itself turns that next character into a literal.


POSIX Regex — ~, ~*, !~, !~*

When LIKE is not expressive enough, Postgres gives you full POSIX regular expressions via four operators:

  • ~ — matches regex, case-sensitive
  • ~* — matches regex, case-insensitive
  • !~ — does not match regex, case-sensitive
  • !~* — does not match regex, case-insensitive
-- Find emails with a digit in the local part
SELECT email FROM users WHERE email ~ '^[^@]*[0-9]';
-- (no rows in our sample)

-- Case-insensitive match: names starting with 'a' or 'd'
SELECT name FROM users WHERE name ~* '^(a|d)';
-- name
-- ----------------
-- Alice Johnson
-- Dave Anderson
-- (2 rows)

-- Anchors: ^ and $ for start/end of string
SELECT name FROM users WHERE name ~ 'son$';
-- name
-- ----------------
-- Alice Johnson
-- Bob Peterson
-- Carol Jackson
-- Dave Anderson
-- (4 rows)

-- Negation: emails NOT ending with a common provider
SELECT email FROM users WHERE email !~* '@(gmail|yahoo|hotmail)\.com$';
-- (no rows)

Regex is powerful but never uses a normal B-tree index. Every row is evaluated against the pattern. For pattern matching that must scale, regex is almost always the wrong tool — use LIKE with a leading prefix, or a trigram index, or full-text search.


SIMILAR TO — The SQL-99 Hybrid Nobody Uses

SIMILAR TO is a SQL-99 operator that sits awkwardly between LIKE and POSIX regex. It supports % and _ wildcards like LIKE, plus regex-style |, *, +, ?, (), and []. It does not support anchors — the whole string must match.

-- Emails ending in gmail.com or yahoo.com
SELECT email FROM users WHERE email SIMILAR TO '%@(gmail|yahoo)\.com';
-- email
-- ------------------
-- bob@gmail.com
-- carol@yahoo.com
-- (2 rows — case-sensitive)

In practice, almost nobody uses SIMILAR TO. If you need LIKE, use LIKE. If you need regex, use ~. SIMILAR TO exists mostly for SQL standard compliance — know it exists, mention it in interviews, but do not put it in production code.


Performance — The Index Trap

This is where most developers get burned. LIKE looks like a simple operator, but its performance depends on whether the database can use an index, and that depends on where the wildcards are.

The B-tree Rule

A B-tree index stores values sorted left-to-right. It can answer:

  • "Exact match" — email = 'x'
  • "Range match" — email > 'a' AND email < 'b'
  • "Prefix match" — email LIKE 'foo%' (which is really a range scan)

It cannot answer "suffix match" or "substring match" because those are not contiguous ranges in the sorted index.

-- FAST: prefix-only pattern, can use index
EXPLAIN SELECT * FROM users WHERE email LIKE 'bob%';
--  Index Scan using users_email_idx on users
--    Index Cond: ((email >= 'bob') AND (email < 'boc'))
--    Filter: (email ~~ 'bob%'::text)

-- SLOW: leading wildcard, full table scan
EXPLAIN SELECT * FROM users WHERE email LIKE '%gmail.com';
--  Seq Scan on users
--    Filter: (email ~~ '%gmail.com'::text)
--    -- Every row read, pattern tested on every row

The Collation Trap — text_pattern_ops

Even a prefix search may not use your index if the column uses a non-C collation (the default on most modern Postgres installs). A collation like en_US.UTF-8 has locale-specific sort rules that do not match the byte-level comparisons LIKE needs. The fix is to create the index with an explicit pattern operator class:

-- Regular index — great for equality, useless for LIKE under non-C collation
CREATE INDEX users_email_idx ON users (email);

-- Pattern-ops index — forces byte-level comparison, usable by LIKE 'foo%'
CREATE INDEX users_email_pattern_idx ON users (email text_pattern_ops);

-- Now this is an index scan even under en_US.UTF-8
SELECT * FROM users WHERE email LIKE 'bob%';

Use text_pattern_ops for text, varchar_pattern_ops for varchar, and bpchar_pattern_ops for fixed-char columns. You can keep both indexes — the regular one for = queries and the pattern-ops one for LIKE.

Trigram GIN Indexes — pg_trgm

For fast substring and fuzzy matching (including both-sided wildcards and ILIKE '%foo%'), Postgres has the pg_trgm extension. It breaks strings into overlapping three-character sequences (trigrams) and indexes them with a GIN index. Queries like LIKE '%foo%', ILIKE '%foo%', and even similarity-based searches become index lookups.

-- Install the extension once per database
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create a trigram GIN index on the column
CREATE INDEX users_email_trgm_idx ON users USING gin (email gin_trgm_ops);

-- Now both-sided and leading-wildcard patterns use the index
EXPLAIN SELECT * FROM users WHERE email ILIKE '%gmail%';
--  Bitmap Heap Scan on users
--    Recheck Cond: (email ~~* '%gmail%'::text)
--    -> Bitmap Index Scan on users_email_trgm_idx
--         Index Cond: (email ~~* '%gmail%'::text)

-- Bonus: similarity search
SELECT email, similarity(email, 'bob@gmial.com') AS score
FROM users
WHERE email % 'bob@gmial.com'       -- % = similarity operator
ORDER BY score DESC;
-- email           | score
-- ----------------+-------
-- bob@gmail.com   | 0.62

Trigram indexes are larger and slower to update than B-tree indexes, so reserve them for columns that are genuinely searched by substring. For a username or product name column, the trade-off is worth it. For a log message column with billions of rows, you probably want full-text search instead.


ASCII Diagram — Where Pattern Matching Lives

+---------------------------------------------------------------+
|           PATTERN MATCHING DECISION TREE                      |
+---------------------------------------------------------------+
|                                                                |
|   Need pattern matching on a column. Which tool?              |
|                                                                |
|     |                                                          |
|     v                                                          |
|   [Exact match?] ---yes---> use =  (B-tree index, fastest)    |
|     |                                                          |
|     no                                                         |
|     |                                                          |
|     v                                                          |
|   [Prefix only? 'foo%']                                        |
|     |                                                          |
|     yes --> LIKE 'foo%'                                        |
|     |        + B-tree with text_pattern_ops                    |
|     |        -> Index range scan, fast                         |
|     |                                                          |
|     no                                                         |
|     |                                                          |
|     v                                                          |
|   [Substring / leading wildcard? '%foo%' ]                     |
|     |                                                          |
|     yes --> LIKE/ILIKE '%foo%'                                 |
|     |        + GIN index with gin_trgm_ops (pg_trgm)           |
|     |        -> Bitmap index scan, fast                        |
|     |                                                          |
|     no                                                         |
|     |                                                          |
|     v                                                          |
|   [Natural language / ranked relevance?]                       |
|     |                                                          |
|     yes --> tsvector + tsquery + GIN                           |
|              (full-text search, stemming, ranking)             |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           LIKE INDEX USAGE CHEAT TABLE                        |
+---------------------------------------------------------------+
|                                                                |
|   Pattern          B-tree    B-tree+pattern_ops   GIN+trgm    |
|   ---------------  -------   -------------------  ---------   |
|   = 'foo'          YES       YES                  n/a         |
|   LIKE 'foo'       YES*      YES                  YES         |
|   LIKE 'foo%'      maybe     YES                  YES         |
|   LIKE '%foo'      NO        NO                   YES         |
|   LIKE '%foo%'     NO        NO                   YES         |
|   ILIKE 'foo%'     NO        NO                   YES         |
|   ~ 'regex'        NO        NO                   sometimes   |
|                                                                |
|   * depends on collation                                       |
|                                                                |
+---------------------------------------------------------------+

Napkin AI Visual Prompt: "Dark gradient (#0a0f1f -> #111a2e). Split comparison diagram. LEFT panel labeled 'B-tree Index' shows an alphabetized sorted list of strings with a sky blue (#4fc3f7) arrow pointing at LIKE 'foo%' lighting up a contiguous range, and a rose (#ff5c8a) X over LIKE '%foo'. RIGHT panel labeled 'Trigram GIN (pg_trgm)' shows the word Johnson shattered into overlapping chunks Joh, ohn, hns, nso, son connected by sky blue lines to a hash-table icon, with a green checkmark next to LIKE '%ohn%'. White monospace labels. Title 'Why LIKE Cares Where the Wildcard Is'. Subtle grid overlay."


Common Mistakes

1. Leading wildcards destroying query performance. Writing WHERE email LIKE '%@gmail.com' on a large table is the most common LIKE mistake. The query is fine at 1,000 rows, acceptable at 100,000 rows, and a production incident at 40 million. The fix is either to restructure the data (store the email domain in a separate column with its own index), add a trigram GIN index if substring search is genuinely needed, or switch to full-text search. Always check EXPLAIN for Seq Scan on LIKE queries before shipping them.

2. Forgetting ILIKE (or equivalent) for case-insensitive search. LIKE 'Gmail%' does not match gmail. Developers often notice this in staging, but only after a user complains. Use ILIKE in Postgres, or LOWER(col) LIKE LOWER(pattern) for portability — and remember that the lower-case approach needs a functional index on lower(col) to stay fast.

3. Using LIKE where full-text search is the right tool. LIKE treats strings as opaque byte sequences. It does not understand stemming (run vs running), stop words (the, a), word boundaries, or ranking by relevance. Searching article bodies or product descriptions with LIKE '%running shoes%' misses ran shoes, runs shoes, and cannot rank the 10 best matches. That is what tsvector and tsquery are for. Reach for full-text search the moment you are searching natural-language text.

4. Not escaping literal % and _ in user-provided search strings. If a user types 50% into a search box and your code does WHERE name LIKE '%' || :input || '%', the % inside their input becomes a wildcard and silently matches far more than intended. Always sanitize: escape %, _, and your escape character itself before concatenating user input into a LIKE pattern, and pass ESCAPE '\' (or your chosen character) to the query.

5. Assuming LIKE 'foo%' uses an index without checking collation. On Postgres with a non-C collation (the default on most installs), a plain B-tree index on a text column will not be used for LIKE queries. You need a second index with text_pattern_ops, or you need to run EXPLAIN and discover this the hard way. Many developers have spent hours debugging "why is this slow, I have an index" only to learn about operator classes.

6. Using ~ regex for simple prefix checks. WHERE email ~ '^bob' is equivalent to WHERE email LIKE 'bob%', but the regex version cannot use a B-tree index even with text_pattern_ops. Prefer LIKE when LIKE is expressive enough. Save regex for patterns that LIKE genuinely cannot express.


Interview Questions

1. "What is the difference between LIKE and ILIKE in Postgres, and how would you achieve case-insensitive matching in a database that does not have ILIKE?"

LIKE is the SQL standard operator for pattern matching and is case-sensitive in Postgres — LIKE 'Gmail%' will not match gmail.com. ILIKE is a Postgres extension that does the same job case-insensitively. In databases without ILIKE (MySQL, SQL Server, Oracle, SQLite), the portable approach is to lowercase both sides: WHERE LOWER(email) LIKE LOWER(:pattern). The trade-off is that calling LOWER() on the column disables any plain index on that column — the database has to compute lower(email) for every row. The fix is a functional index: CREATE INDEX ON users (lower(email));. With that index in place, WHERE LOWER(email) LIKE 'gmail%' uses the index and runs in logarithmic time. MySQL's case sensitivity also depends on the column's collation — a utf8mb4_general_ci collation makes = and LIKE case-insensitive by default, while utf8mb4_bin makes them case-sensitive. Always know your collation before writing pattern-matching code.

2. "Why is LIKE 'foo%' fast but LIKE '%foo' slow, even when there is an index on the column?"

Because a B-tree index stores values sorted from the left. The database can seek to the first value starting with foo in logarithmic time, then walk forward in the sorted leaf pages collecting matches until it hits the first value that no longer starts with foo. Internally, the planner rewrites LIKE 'foo%' as a range scan: email >= 'foo' AND email < 'fop'. That is a contiguous slice of the index. LIKE '%foo' is a suffix search, and the index has no concept of suffixes — strings ending in foo are scattered all across the sorted tree (aafoo, bfoo, zfoo), with no contiguous range to seek. The only way to answer the query is to read every row in the table and test each one against the pattern. The same applies to LIKE '%foo%'. There is an additional subtlety in Postgres: even prefix matching only uses the index if the column's collation supports byte-level ordering. Under a non-C collation like en_US.UTF-8, you need to create the index with text_pattern_ops (or varchar_pattern_ops) to force byte-level comparisons, otherwise even LIKE 'foo%' will fall back to a sequential scan.

3. "How would you make LIKE '%foo%' fast on a large table?"

Plain B-tree indexes cannot help — by definition, a leading wildcard defeats left-to-right sorting. There are three main approaches, each with trade-offs. First, install the pg_trgm extension and create a GIN index with gin_trgm_ops: CREATE INDEX ON users USING gin (email gin_trgm_ops);. This breaks every string into overlapping three-character trigrams and indexes each one, so LIKE '%foo%' becomes a lookup for all rows containing the trigrams foo. It works for both LIKE and ILIKE, supports similarity search via the % operator and similarity() function, and handles fuzzy matching. The costs are larger index size (roughly 2-5x a B-tree) and slower writes. Second, for natural-language text, switch from LIKE to full-text search: create a tsvector column or expression index, query with to_tsquery, and you get stemming, stop-word removal, and relevance ranking on top of fast index lookups. Third, if the pattern is actually structural (email domain, URL hostname, file extension), reshape the data: store the domain in its own column and put a regular B-tree index on it, so the query becomes WHERE email_domain = 'gmail.com' — an exact match, which is always the fastest thing a database can do. Choose based on whether the search is structural (reshape), substring (trigram), or linguistic (full-text).

4. "When should you stop using LIKE and reach for full-text search instead?"

The moment you are searching natural-language text and users expect linguistic matching. LIKE treats strings as opaque byte sequences: it has no concept of word boundaries, no stemming (so run does not match running, runs, or ran), no stop words (so the cat and cat are different queries), no relevance ranking, no synonyms, and no language awareness. A user searching for running shoes on a product catalogue with LIKE '%running shoes%' will miss listings titled shoes for running, runner's shoes, or athletic footwear. Full-text search in Postgres uses tsvector (a preprocessed lexeme representation of a document) and tsquery (a parsed search expression). You index the tsvector with a GIN index, and queries become fast, stemming-aware, language-aware, and rankable with ts_rank. The rule of thumb: if the column holds natural language (product descriptions, article bodies, comments, support tickets) and users care about relevance rather than exact substring presence, use full-text search. If the column holds structured identifiers (email addresses, SKUs, URLs, filenames) and users care about substring presence, LIKE with a trigram index is the right tool. If the column holds exact values and users look them up by key, plain = with a B-tree index is the right tool. Pick the index type that matches the query pattern, not the other way around.

5. "What does the ESCAPE clause do, and when would you need it?"

ESCAPE declares a character in the pattern that marks the next character as a literal rather than a wildcard. By default, % and _ are wildcards in LIKE patterns, so if you need to match a string that genuinely contains % or _, you have to escape them. The clause LIKE 'A\_100' ESCAPE '\' says "use \ as my escape character," turning the _ after the backslash into a literal underscore. You can pick any character you want as the escape marker — it does not have to be backslash. This matters in practice when users type search input that may contain these characters: if your code builds WHERE name LIKE :input || '%' and a user types 50%, the user's % becomes a wildcard and the query silently matches far more than intended. The fix is to sanitize the user input before concatenation, escaping every %, every _, and the escape character itself, then pass ESCAPE '\' to the query. Many ORMs have helper functions for this (pg-escape, like-escape, etc.), and you should always use them for user-provided LIKE input rather than building patterns by hand.


Quick Reference — Pattern Matching Cheat Sheet

+---------------------------------------------------------------+
|           PATTERN MATCHING CHEAT SHEET                        |
+---------------------------------------------------------------+
|                                                                |
|  WILDCARDS:                                                    |
|   %   -> zero or more characters                               |
|   _   -> exactly one character                                 |
|                                                                |
|  OPERATORS:                                                    |
|   LIKE       -> case-sensitive (SQL standard)                  |
|   ILIKE      -> case-insensitive (Postgres extension)          |
|   NOT LIKE   -> negation                                       |
|   ~          -> POSIX regex, case-sensitive                    |
|   ~*         -> POSIX regex, case-insensitive                  |
|   !~  / !~*  -> negated regex                                  |
|   SIMILAR TO -> SQL-99 hybrid (rarely used)                    |
|                                                                |
|  ESCAPE:                                                       |
|   LIKE 'A\_B' ESCAPE '\'   -> matches literal "A_B"            |
|   LIKE '50\%' ESCAPE '\'   -> matches literal "50%"            |
|                                                                |
|  INDEX USAGE (B-tree):                                         |
|   LIKE 'foo%'  -> YES (with text_pattern_ops under non-C)      |
|   LIKE '%foo'  -> NO  (leading wildcard)                       |
|   LIKE '%foo%' -> NO                                           |
|   ILIKE 'foo%' -> NO  (case transform breaks index)            |
|                                                                |
|  INDEX USAGE (GIN with pg_trgm):                               |
|   LIKE 'foo%'  -> YES                                          |
|   LIKE '%foo'  -> YES                                          |
|   LIKE '%foo%' -> YES                                          |
|   ILIKE ...    -> YES                                          |
|                                                                |
+---------------------------------------------------------------+

+---------------------------------------------------------------+
|           KEY RULES                                            |
+---------------------------------------------------------------+
|                                                                |
|  1. LIKE is case-sensitive, ILIKE is not                       |
|  2. % = many chars, _ = exactly one char                       |
|  3. LIKE 'foo%' can use a B-tree index                         |
|  4. LIKE '%foo' or '%foo%' cannot use a B-tree index           |
|  5. Non-C collation -> needs text_pattern_ops for LIKE         |
|  6. Trigram GIN (pg_trgm) makes '%foo%' fast                   |
|  7. For natural-language text, use full-text search            |
|  8. Escape user-provided %, _ before building LIKE patterns    |
|  9. Always EXPLAIN pattern-matching queries before shipping    |
|                                                                |
+---------------------------------------------------------------+
TaskWrong WayRight Way
Case-insensitive matchLIKE 'gmail%' (misses Gmail)ILIKE 'gmail%' or LOWER(col) LIKE ...
Substring search on big tableLIKE '%foo%' with B-treeGIN index with pg_trgm
Prefix search under en_US.UTF-8Plain B-tree indexB-tree with text_pattern_ops
Searching article bodiesLIKE '%running shoes%'tsvector + tsquery full-text search
Matching literal %LIKE '%50%%' (broken)LIKE '50\%' ESCAPE '\'
Simple prefix check~ '^foo' (no index)LIKE 'foo%'
Domain search on emailsLIKE '%@gmail.com'Separate email_domain column
Fuzzy / typo-tolerant searchLIKE '%jonson%'pg_trgm + similarity()

Prev: Lesson 4.1 -- WHERE Clause Next: Lesson 4.3 -- Aggregate Functions


This is Lesson 4.2 of the Database Interview Prep Course -- 12 chapters, 58 lessons.

On this page