Every application eventually hits a database wall. What ran fine in development starts timing out in production. Pages that loaded in milliseconds now spin for seconds. The database team points fingers at the application team. Everyone opens a dashboard full of red.
The frustrating part? Most of these performance problems are avoidable. They stem from a relatively small set of patterns that get repeated across codebases — patterns that seem harmless at low data volumes but become catastrophic at scale.
This article is about those patterns. We'll look at the mistakes in real queries, understand why they are slow, and replace them with faster alternatives. The examples use PostgreSQL syntax, but the principles apply equally to MySQL, SQL Server, and most other relational databases.
1. SELECT * — The Silent Bandwidth Killer
This is the most common SQL anti-pattern in the wild. It feels convenient and looks harmless, but it is anything but.
The Problem
-- You only need the user's name and email on a profile card
SELECT * FROM users WHERE id = 42;
If the users table has 30 columns — including a profile_picture BYTEA column, a settings JSONB blob, and audit fields — you are pulling all of that data from disk, across the network, and into your application's memory for every single row, even though you need two fields.
At scale, this compounds across:
- Disk I/O — wider rows mean fewer rows fit in a single page read
- Network overhead — unnecessary data travels from DB to app server
- Memory pressure — your connection pool and ORM need to buffer all of it
- Query planner —
SELECT *prevents index-only scans (more on this shortly)
The Fix
Be explicit. Fetch only what you need.
-- Fast: only the required columns
SELECT id, name, email FROM users WHERE id = 42;
When you specify columns, the query planner can also exploit covering indexes — indexes that contain all queried columns, meaning the engine never has to touch the actual table heap at all.
-- With a covering index: (id, name, email)
-- This query is served entirely from the index — zero heap access
SELECT name, email FROM users WHERE id = 42;
Rule of thumb:
SELECT *is fine in exploratory queries at a psql/mysql prompt. It has no place in application code.
2. Wrapping Indexed Columns in Functions
This is one of the sneakiest performance killers. You have an index on a column. The query looks like it should use that index. It doesn't.
The Problem
-- Find all users who registered in 2024
-- There's an index on created_at — but the query is doing a full table scan
SELECT id, email FROM users
WHERE YEAR(created_at) = 2024;
Or in PostgreSQL:
SELECT id, email FROM users
WHERE DATE_TRUNC('year', created_at) = '2024-01-01';
When you wrap a column in a function, the database cannot use a standard B-tree index on that column. The index is built on the raw column values, not on YEAR(created_at). The engine has no choice but to evaluate the function for every row — a full scan of potentially millions of rows.
The same trap shows up with:
-- Casting
WHERE CAST(user_id AS TEXT) = '1001'
-- String manipulation
WHERE LOWER(email) = 'alice@example.com'
-- Arithmetic
WHERE price * 1.2 > 100
The Fix
Rewrite the predicate to keep the column bare on one side:
-- Instead of YEAR(created_at) = 2024
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
For case-insensitive lookups, use a functional index instead of transforming the column at query time:
-- Create the index once
CREATE INDEX idx_users_email_lower ON users (LOWER(email));
-- Now this query uses the index
SELECT id FROM users WHERE LOWER(email) = 'alice@example.com';
For price * 1.2 > 100, flip the arithmetic to the constant side:
-- Move the math to the right-hand side — column stays bare
WHERE price > 100 / 1.2;
3. The N+1 Query Problem
This one lives primarily in ORM-heavy codebases and is arguably the most impactful mistake on this list. A page that makes 1 query in development makes 501 queries in production once the dataset grows.
The Problem
Imagine rendering a page that lists 500 orders with the customer name for each one. Using an ORM naively:
# Pseudocode — but you'll recognise this pattern
orders = Order.find_all(limit=500)
for order in orders:
# This fires a new query for EVERY order
customer_name = order.customer.name # SELECT * FROM customers WHERE id = ?
This generates:
- 1 query to fetch 500 orders
- 500 queries to fetch each customer
501 queries total — each with its own round-trip to the database, parsing overhead, and connection acquisition cost. Triple-digit milliseconds quietly stack up.
The Fix: Use JOINs or eager loading
At the SQL level, this should always be a single query:
SELECT
o.id AS order_id,
o.total,
o.created_at,
c.name AS customer_name
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id
WHERE o.created_at >= NOW() - INTERVAL '30 days'
ORDER BY o.created_at DESC
LIMIT 500;
With an ORM, use the eager loading mechanism it provides:
# SQLAlchemy — joinedload fetches customers in the same query
orders = (
session.query(Order)
.options(joinedload(Order.customer))
.filter(Order.created_at >= thirty_days_ago)
.limit(500)
.all()
)
// Prisma — include fetches related data in one trip
const orders = await prisma.order.findMany({
take: 500,
where: { createdAt: { gte: thirtyDaysAgo } },
include: { customer: { select: { name: true } } },
});
If your application logs show query counts in the hundreds for a single page load, you almost certainly have an N+1 problem.
4. OFFSET Pagination at Scale
OFFSET-based pagination is in virtually every tutorial. At page 1, it is fine. At page 5000, it becomes a catastrophe.
The Problem
-- "Give me page 500 of results (10 per page)"
SELECT id, title, created_at
FROM articles
ORDER BY created_at DESC
LIMIT 10 OFFSET 4990;
What happens internally: the database scans through 5,000 rows, discards the first 4,990, and returns 10. The higher the page number, the more rows are scanned and thrown away. There is no shortcut.
On a table with 10 million rows, a request for page 50,000 forces the engine to process 500,000 rows and keep 10. This is why infinite-scroll feeds and deep pagination become unbearably slow.
The Fix: Keyset (Cursor) Pagination
Instead of skipping rows with OFFSET, use the last seen value as an anchor:
-- First page
SELECT id, title, created_at
FROM articles
ORDER BY created_at DESC, id DESC
LIMIT 10;
-- Next page — use the last row's values as the cursor
SELECT id, title, created_at
FROM articles
WHERE (created_at, id) < ('2026-03-15 10:22:00', 8453)
ORDER BY created_at DESC, id DESC
LIMIT 10;
The WHERE clause jumps directly to the right place using the index on (created_at DESC, id DESC). The database never touches rows you've already seen.
| Approach | Page 1 | Page 100 | Page 10,000 |
|---|---|---|---|
| OFFSET | ~1 ms | ~15 ms | ~1,200 ms |
| Keyset cursor | ~1 ms | ~1 ms | ~1 ms |
The trade-off is that keyset pagination does not support jumping to arbitrary page numbers — only "next" / "previous". For most real-world use cases (feeds, dashboards, infinite scroll), this is perfectly acceptable.
5. The Leading Wildcard Trap
Full-text search built with LIKE '%keyword%' is one of those things that works just fine until it utterly doesn't.
The Problem
-- Find all products whose name contains "wireless"
SELECT id, name, price
FROM products
WHERE name LIKE '%wireless%';
A leading wildcard (%wireless) tells the engine it cannot know where in the string the match will occur. There is no way to use a B-tree index here — it has to read every row and scan every character. On a products table with millions of rows, this is a full sequential scan.
The Fix: Appropriate Search Tooling
Option 1 — If only a prefix search is needed, remove the leading wildcard:
-- "Starts with wireless" — this CAN use an index
WHERE name LIKE 'wireless%';
Option 2 — For full-text search, use the database's native full-text search capabilities:
-- PostgreSQL full-text search
-- Create a GIN index on the tsvector
CREATE INDEX idx_products_name_fts ON products USING GIN (to_tsvector('english', name));
-- The query
SELECT id, name, price
FROM products
WHERE to_tsvector('english', name) @@ to_tsquery('english', 'wireless');
Option 3 — At significant scale, consider a dedicated search engine (Elasticsearch, Typesense, Meilisearch). They are purpose-built for this and return sub-millisecond results on dataset sizes that would make a relational DB cry.
6. Ignoring EXPLAIN ANALYZE — Fixing Blind
This is less a query mistake and more a debugging mistake. Most developers write a slow query, tweak it by feel, and hope it gets faster. Reading the query plan takes the guesswork out entirely.
How to Read a Basic Plan
EXPLAIN ANALYZE
SELECT u.id, u.name, COUNT(o.id) AS order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id, u.name;
Example output (condensed):
HashAggregate (cost=4821.33..5021.33 rows=20000 width=40) (actual time=89.4..102.3 rows=18450)
-> Hash Left Join (cost=912.00..4421.33 rows=80000 width=32) (actual time=11.2..72.1 rows=80000)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders (cost=0.00..2450.00 rows=150000 width=8) (actual time=0.1..24.6 rows=150000)
-> Hash (cost=662.00..662.00 rows=20000 width=28) (actual time=10.9..10.9 rows=20000)
-> Index Scan using idx_users_created_at on users ...
What to look for
| Signal | What it means |
|---|---|
Seq Scan on a large table | Missing or unused index — investigate |
Very high rows= estimate vs actual | Stale statistics — run ANALYZE |
Nested Loop with large inner rows | N+1-style join — consider a Hash Join |
Hash node building a large hash | High memory usage — may spill to disk |
High actual time vs cost | Possible I/O bottleneck or lock contention |
Run EXPLAIN (ANALYZE, BUFFERS) to also see buffer hits vs disk reads — this alone often pinpoints whether the query is I/O-bound or CPU-bound.
Tip: Tools like
explain.depesz.com(PG) orexplain.tensor.rulet you paste a plan and get it colour-coded and annotated in a browser.
7. Correlated Subqueries Instead of JOINs or Window Functions
Correlated subqueries execute once per row in the outer query. They are intuitive to write but scale terribly.
The Problem
-- For each employee, find their department's average salary
SELECT
e.id,
e.name,
e.salary,
(
SELECT AVG(salary)
FROM employees e2
WHERE e2.department_id = e.department_id
) AS dept_avg_salary
FROM employees e;
For 10,000 employees across 50 departments, the subquery runs 10,000 times — even though there are only 50 distinct results to compute. The database often can't cache the intermediate results, so you get 10,000 full aggregation scans.
The Fix: Window Functions or JOIN + CTE
Window function approach (cleanest):
SELECT
id,
name,
salary,
AVG(salary) OVER (PARTITION BY department_id) AS dept_avg_salary
FROM employees;
One pass over the data. The window function computes the average per partition as it scans the table.
CTE + JOIN approach (when you need the aggregate separately):
WITH dept_averages AS (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
)
SELECT
e.id,
e.name,
e.salary,
da.avg_salary AS dept_avg_salary
FROM employees e
INNER JOIN dept_averages da ON da.department_id = e.department_id;
The CTE computes 50 averages once. The join is then trivial.
8. Implicit Type Conversions
Type mismatches are a subtle index killer. The column type and the comparison value type need to match — otherwise the database silently casts the entire column to make the comparison work.
The Problem
-- order_id is an INTEGER column, but we're comparing it to a string
SELECT * FROM orders WHERE order_id = '1001';
PostgreSQL will cast the string '1001' to an integer and still use the index here. But MySQL may behave differently. More dangerous is the reverse — casting the column:
-- phone column is VARCHAR, but we're comparing to a number (no quotes)
SELECT * FROM users WHERE phone = 5551234567;
MySQL will cast phone to a number for every row — making index usage impossible and causing a full table scan. Worse, numbers like 0552... become 552..., so you may also get incorrect results.
Or in ORM code that generates:
-- The ORM passed a string '42' for a BIGINT column
WHERE user_id = '42'
Depending on the DB and driver, this can silently coerce and throw off statistics.
The Fix
Always match your literal types to the column type:
-- Integer column → unquoted integer literal
WHERE order_id = 1001;
-- VARCHAR column → quoted string literal
WHERE phone = '5551234567';
-- Timestamp column → properly typed literal
WHERE created_at > '2025-01-01 00:00:00'::timestamptz;
In application code, use parameterised queries with proper type binding — never concatenate values into query strings (which is also a SQL injection risk):
// Correct — parameterised with proper type
const result = await pool.query(
"SELECT * FROM orders WHERE id = $1",
[parseInt(orderId, 10)], // ensure it's a number, not a string
);
9. Using OR on Indexed Columns
ORs that span different columns can defeat index usage in surprising ways.
The Problem
-- Find users by email OR phone
-- Separate indexes exist on both columns
SELECT id, name FROM users
WHERE email = 'alice@example.com'
OR phone = '555-1234';
Many database planners struggle to efficiently use two separate indexes with an OR. The planner may fall back to a sequential scan or use a costly bitmap OR operation that hurts performance on large tables.
The Fix: UNION ALL
Split the OR into two separate queries and combine with UNION ALL:
SELECT id, name FROM users WHERE email = 'alice@example.com'
UNION ALL
SELECT id, name FROM users WHERE phone = '555-1234';
Each branch now gets its own efficient index lookup. UNION ALL is preferred over UNION here because we are not trying to eliminate duplicates at the database level (and in practice, the same user would rarely match both).
If you need deduplication:
SELECT DISTINCT id, name FROM (
SELECT id, name FROM users WHERE email = 'alice@example.com'
UNION ALL
SELECT id, name FROM users WHERE phone = '555-1234'
) combined;
10. Long-Running Transactions and Lock Contention
Slow queries aren't always the root cause of database performance problems. Sometimes the culprit is a valid fast query being blocked by a long-running transaction holding locks somewhere else.
The Problem
Consider this sequence in two concurrent connections:
-- Connection A opens a transaction and does a large batch update
BEGIN;
UPDATE orders SET status = 'archived' WHERE created_at < '2020-01-01';
-- ... application pauses here, waiting on some external API call ...
-- Transaction is still open, holding row locks on ~500k rows
-- Connection B tries to read one of those rows
SELECT * FROM orders WHERE id = 12345;
-- Blocked. Waiting for Connection A's lock to be released.
This cascades across connection pools. The pool exhausts its connections waiting for locks. New requests queue up. The application appears to hang — even though the individual queries are not slow.
The Fix
1. Keep transactions as short as possible. Do your business logic before opening the transaction. Only the actual database mutations should be inside the transaction boundary.
-- Bad: external API call inside a transaction
BEGIN;
-- long running API call happens here in application code...
INSERT INTO audit_log ...;
UPDATE orders SET ...;
COMMIT;
-- Good: do the external work first, then open a short transaction
-- [application makes the API call outside any DB transaction]
BEGIN;
INSERT INTO audit_log ...;
UPDATE orders SET ...;
COMMIT;
2. Use NOWAIT or SKIP LOCKED for non-critical paths:
-- Skip locked rows instead of waiting — useful for job queues
SELECT id FROM job_queue
WHERE status = 'pending'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
3. Monitor for lock contention:
-- PostgreSQL: find queries waiting for locks
SELECT
pid,
now() - pg_stat_activity.query_start AS duration,
query,
state
FROM pg_stat_activity
WHERE state != 'idle'
AND now() - query_start > INTERVAL '5 seconds'
ORDER BY duration DESC;
11. Over-Indexing and Wrong Index Types
Indexes are not free. Every index you add must be updated on every INSERT, UPDATE, and DELETE. A table with 15 indexes can have slower writes than one with 3 well-chosen indexes delivering the same read coverage.
The Problem Areas
Indexing low-cardinality columns — a column like status with 3 possible values (active, inactive, deleted) is often a poor candidate for a standalone B-tree index. The planner may skip it entirely because a sequential scan is cheaper when a third of the table matches.
-- Likely ignored by the planner for status alone
CREATE INDEX idx_users_status ON users (status);
-- Much more useful: composite with a high-cardinality column
CREATE INDEX idx_users_status_created ON users (status, created_at DESC);
Using B-tree for full-text or JSON — B-tree indexes cannot index inside JSON documents or support full-text matching. Use GIN or GiST:
-- GIN index for JSONB containment queries
CREATE INDEX idx_events_metadata ON events USING GIN (metadata);
-- Query that uses it
SELECT * FROM events WHERE metadata @> '{"type": "purchase"}';
Indexing every column individually instead of covering indexes — If a query always fetches the same 3 columns with the same filter, a single covering index beats three separate indexes:
-- Instead of separate indexes on user_id, status, created_at
CREATE INDEX idx_orders_covering
ON orders (user_id, status, created_at DESC)
INCLUDE (total, currency);
-- The INCLUDE columns are in the leaf pages but not the B-tree nodes
-- This makes the index usable as an index-only scan for these queries
12. Missing or Stale Table Statistics
The query planner makes decisions based on statistics about your data — how many rows a table has, how selective a column is, what the most common values are. When those statistics are stale, the planner makes bad decisions.
The Problem
After a large data migration or a bulk insert of millions of rows, the planner may still think your table has 10,000 rows when it actually has 10,000,000. It then picks a Nested Loop join instead of a Hash Join, or chooses a sequential scan over an index scan. The queries are correct but painfully slow.
The Fix
-- Update statistics for a specific table
ANALYZE orders;
-- Update statistics for the entire database
ANALYZE;
-- In PostgreSQL, also check for table bloat after large deletes/updates
VACUUM ANALYZE orders;
Set autovacuum to run more aggressively for tables with high churn. Check current statistics age:
-- PostgreSQL: check last analyze time
SELECT schemaname, tablename, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE tablename = 'orders';
If a table consistently has outdated statistics despite autovacuum, increase the analysis target for that table:
-- Collect more samples for better statistics on this table
ALTER TABLE orders SET (autovacuum_analyze_scale_factor = 0.01);
Pulling It All Together
SQL performance issues rarely exist in isolation. A real-world slow query might combine several of these mistakes: a SELECT * that fetches unnecessary data, an OFFSET on a large result set, and stale statistics causing the planner to pick the wrong join strategy.
Here is a quick diagnostic checklist to run through whenever you're investigating a slow query:
- Run
EXPLAIN (ANALYZE, BUFFERS)— identify sequential scans on large tables and high actual vs estimated row counts - Check column selectivity — is the WHERE clause filter actually selective enough to warrant an index?
- Look at the column in the predicate — is it wrapped in a function or type-mismatched?
- Count query volume — is the total number of queries per request unexpectedly high? (N+1)
- Check pagination strategy — is OFFSET involved on a large result set?
- Inspect transaction duration — are long-running transactions blocking others?
- Review index coverage — does the index cover all columns in SELECT, WHERE, and ORDER BY?
- Check statistics freshness — when was the last
ANALYZErun?
-- A useful quick check for the slowest queries in PostgreSQL
-- (requires pg_stat_statements extension)
SELECT
query,
calls,
mean_exec_time,
total_exec_time,
rows / calls AS avg_rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
Enabling pg_stat_statements in production is one of the best investments you can make for long-term database observability. It gives you a ranked list of your slowest queries over time — no guesswork required.
Final Thoughts
The patterns in this article are not exotic edge cases. They are the everyday habits that quietly accumulate into a slow database. Some of them — like SELECT * and OFFSET pagination — are so ingrained that they feel normal until you've experienced the pain they cause at scale.
The good news is that fixing most of them is straightforward once you know what to look for. A few well-placed indexes, a query plan inspection, and a shift to cursor-based pagination can cut query times by orders of magnitude without touching your schema structure.
Build the habit of running EXPLAIN ANALYZE before shipping a new query to production. Keep your transactions tight. Let the database do what it's designed to do — and get out of its way.