There’s a ritual that happens in every engineering team eventually. Someone notices an API endpoint is slow. Someone else looks at the database and says “we need an index.” They add an index on what seems like the right column, deploy it, and… the query is still slow. Or it’s faster for that one query but now three other queries have mysteriously gotten worse.
This happens because adding an index without reading the query plan is like prescribing medicine without diagnosing the patient. Sometimes the problem is a missing index. But just as often, the problem is stale statistics, a badly written query, a join that the planner is executing in the wrong order, or an ORM that’s generating SQL you’d be embarrassed to write by hand.
PostgreSQL gives you the exact diagnostic tool to figure this out. It’s called EXPLAIN ANALYZE, and if you’re not using it every time you investigate a slow query, you’re guessing. Let’s stop guessing.
Step 1: Find the Queries That Actually Matter
Before you optimize anything, you need to know what to optimize. The slowest query isn’t necessarily the one that matters most. A query that takes 2 seconds but runs once a day isn’t as urgent as a query that takes 50ms but runs 100,000 times a day. The second one consumes way more server time overall.
Enable pg_stat_statements — this extension tracks execution statistics for every query that runs on your server.
Add it to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
Restart PostgreSQL:
sudo systemctl restart postgresql
Create the extension:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Now find the queries that consume the most total time:
SELECT
substring(query, 1, 120) AS query_preview,
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
This shows your top 10 resource consumers. The total_exec_time column is what matters — it’s the cumulative time spent on that query across all calls. A query with 5ms average execution time and 2 million calls has consumed 10,000 seconds of total server time. That matters a lot more than the 800ms query that runs 50 times a day.
For finding queries that are slow per-execution (the ones your users actually feel):
SELECT
substring(query, 1, 120) AS query_preview,
calls,
round(mean_exec_time::numeric, 2) AS avg_ms,
round(max_exec_time::numeric, 2) AS max_ms
FROM pg_stat_statements
WHERE mean_exec_time > 100
ORDER BY mean_exec_time DESC
LIMIT 10;
This catches the individually slow queries — anything averaging over 100ms. The max_exec_time column is useful too; if a query averages 50ms but maxes at 12,000ms, it has an intermittent performance problem likely related to lock contention, resource exhaustion, or cold cache.
You can also enable slow query logging directly in PostgreSQL as a safety net:
# In postgresql.conf
log_min_duration_statement = 1000 # Log any query taking > 1 second
This writes slow queries directly to the PostgreSQL log file, which is useful if pg_stat_statements isn’t available or if you want to see the exact parameters used in slow queries.
Step 2: Read the Execution Plan (This Is Where Understanding Begins)
You’ve identified a slow query. Now run it through EXPLAIN ANALYZE:
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.status = 'completed'
AND o.created_at > '2025-01-01'
ORDER BY o.created_at DESC
LIMIT 20;
The BUFFERS option adds information about how many disk pages were read, which helps distinguish between I/O problems and CPU problems.
Here’s what an output might look like:
Limit (cost=15234.52..15234.57 rows=20 width=52) (actual time=892.45..892.48 rows=20 loops=1)
-> Sort (cost=15234.52..15456.23 rows=88682 width=52) (actual time=892.44..892.46 rows=20 loops=1)
Sort Key: o.created_at DESC
Sort Method: top-N heapsort Memory: 27kB
-> Hash Join (cost=12.50..13012.34 rows=88682 width=52) (actual time=0.82..845.20 rows=89542 loops=1)
Hash Cond: (o.customer_id = c.id)
-> Seq Scan on orders o (cost=0.00..11842.00 rows=88682 width=44) (actual time=0.04..780.32 rows=89542 loops=1)
Filter: ((status = 'completed') AND (created_at > '2025-01-01'))
Rows Removed by Filter: 410458
-> Hash (cost=10.00..10.00 rows=200 width=12) (actual time=0.42..0.42 rows=200 loops=1)
-> Seq Scan on customers c (cost=0.00..10.00 rows=200 width=12) (actual time=0.01..0.18 rows=200 loops=1)
Planning Time: 0.38 ms
Execution Time: 892.72 ms
Now let’s read this like a professional:
The bottleneck is obvious. The Seq Scan on orders o line shows actual time=0.04..780.32. That’s 780ms spent reading the entire orders table sequentially — all 500,000 rows — to find the 89,542 that match the filter. That’s 82% of the total execution time in one node.
The estimated vs actual rows match. rows=88682 estimated, rows=89542 actual. That’s close enough — the statistics are fine. The planner isn’t making a bad decision because of stale stats; it’s making the only decision it can because there’s no suitable index.
The Seq Scan on customers is fine. It takes 0.18ms because the table has only 200 rows. Every table under a few thousand rows is fastest with a sequential scan. Don’t index tables with 200 rows — it’s pointless overhead.
The sort is cheap. top-N heapsort with 27kB memory means PostgreSQL used an efficient algorithm for the LIMIT 20 ORDER BY — it found the top 20 without sorting all 89,542 matching rows.
The diagnosis: this query needs a composite index on orders(status, created_at) to avoid the sequential scan.
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
Run EXPLAIN ANALYZE again after creating the index:
-> Index Scan using idx_orders_status_created on orders o
(cost=0.42..1823.56 rows=88682 width=44)
(actual time=0.03..45.21 rows=89542 loops=1)
Index Cond: ((status = 'completed') AND (created_at > '2025-01-01'))
780ms → 45ms. The sequential scan is gone, replaced by an index scan that reads only the matching rows.
Step 3: When Stale Statistics Are the Real Problem
Sometimes the execution plan shows something strange: the estimated row count is wildly different from the actual row count.
-> Seq Scan on events (cost=0.00..25.00 rows=5 width=32) (actual time=0.04..312.45 rows=147823 loops=1)
The planner estimated 5 rows. The actual result was 147,823. That’s not a rounding error — the planner is working with completely wrong statistics and making terrible decisions as a result.
When estimated rows are much lower than actual rows, the planner tends to choose nested loop joins (good for small sets, terrible for large ones) and avoids using hash joins or merge joins that would be much more efficient. The entire execution plan downstream of the bad estimate is suboptimal.
Fix it with ANALYZE:
ANALYZE events;
This collects fresh statistics about the table’s data distribution — how many rows, how many distinct values per column, most common values, histogram boundaries. After running ANALYZE, the planner has accurate information and can make better decisions.
If specific columns have unusual distributions (lots of NULLs, extreme skew, or a huge number of distinct values), increase the statistics target for those columns:
ALTER TABLE events ALTER COLUMN event_type SET STATISTICS 1000;
ANALYZE events;
The default statistics target is 100, which means PostgreSQL samples 100 × 300 = 30,000 rows to build histograms. Increasing it to 1000 means 300,000 rows are sampled, giving more accurate statistics for high-cardinality or skewed columns.
Check if autovacuum is keeping up:
SELECT
relname,
n_live_tup,
n_dead_tup,
last_autoanalyze,
last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;
If last_autoanalyze was a long time ago and n_dead_tup is high, autovacuum isn’t keeping up. For high-churn tables, tune it:
ALTER TABLE events SET (
autovacuum_analyze_scale_factor = 0.02,
autovacuum_vacuum_scale_factor = 0.05
);
This triggers ANALYZE after 2% of the table changes (instead of the default 10%) and VACUUM after 5%.
Step 4: Query Patterns That No Index Can Fix
Some queries are slow not because of missing indexes but because the SQL itself prevents efficient execution. No amount of indexing fixes a fundamentally bad query pattern.
Functions on indexed columns:
-- PostgreSQL CANNOT use an index on created_at here
SELECT * FROM orders WHERE EXTRACT(YEAR FROM created_at) = 2026;
The function EXTRACT() is evaluated on every single row. The index on created_at exists but is useless because PostgreSQL would need an index on EXTRACT(YEAR FROM created_at) — which doesn’t exist.
Rewrite as a range:
-- This uses the index on created_at
SELECT * FROM orders
WHERE created_at >= '2026-01-01'
AND created_at < '2027-01-01';
Same results. But now PostgreSQL does a quick index range scan instead of reading the entire table.
OFFSET pagination:
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
This looks efficient — “give me 20 rows starting at position 10,000.” But PostgreSQL must read and sort all 10,020 rows, then throw away the first 10,000. The deeper you paginate, the slower it gets. At OFFSET 100,000, it’s reading 100,020 rows to return 20.
Use keyset pagination instead:
SELECT * FROM orders
WHERE created_at < '2026-03-15T10:30:00Z'
ORDER BY created_at DESC
LIMIT 20;
The WHERE clause on created_at replaces the OFFSET. The query starts reading from the right position in the index and returns 20 rows immediately, regardless of which “page” you’re on. Page 1 and page 5,000 take the same amount of time.
SELECT * through an ORM:
SELECT * FROM orders
JOIN customers ON customers.id = orders.customer_id
JOIN order_items ON order_items.order_id = orders.id
JOIN products ON products.id = order_items.product_id;
ORMs love eager loading, and eager loading loves SELECT * with multiple JOINs. The result set explodes — if an order has 5 items, you get 5 rows per order, each containing every column from all four tables. Most of that data is duplicated and never used.
The fix depends on what you actually need. If you only need order totals and customer names:
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at > '2026-01-01';
Selecting only the columns you need means smaller result sets, less data transfer, and potentially index-only scans (where PostgreSQL can answer the query entirely from the index without touching the table at all).
Correlated subqueries:
SELECT *,
(SELECT COUNT(*) FROM order_items WHERE order_id = orders.id) AS item_count
FROM orders
WHERE status = 'completed';
That subquery runs once for every row in the outer query. If the outer query returns 50,000 rows, the subquery executes 50,000 times. Replace it with a JOIN:
SELECT o.*, COUNT(oi.id) AS item_count
FROM orders o
LEFT JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'completed'
GROUP BY o.id;
One query, one pass. The planner can use a hash join and process everything in bulk.
Step 5: Index Strategy — When, What, and How Many
After you’ve confirmed through EXPLAIN ANALYZE that a missing index is genuinely the bottleneck, be strategic about what you create.
Composite indexes — column order matters:
-- Good: status is the equality filter, created_at is the range filter
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
-- Less useful: reversed order doesn't help if you're filtering by status first
CREATE INDEX idx_orders_created_status ON orders (created_at, status);
The general rule: equality columns first, range columns second. PostgreSQL can use a composite index for a prefix — an index on (status, created_at) helps queries filtering on status alone, but an index on (created_at, status) doesn’t help queries filtering on status alone.
Partial indexes — index only what matters:
CREATE INDEX idx_orders_pending ON orders (customer_id, created_at)
WHERE status = 'pending';
This index is smaller than a full index because it only includes rows where status = 'pending'. If your query always filters for pending orders, this index is both smaller (faster to scan, less memory) and more precise than a full index.
Covering indexes (index-only scans):
CREATE INDEX idx_orders_covering ON orders (status, created_at)
INCLUDE (id, total, customer_id);
The INCLUDE columns are stored in the index but not used for searching. If your query only selects id, total, and customer_id, PostgreSQL can answer it entirely from the index without reading the table at all. This is called an index-only scan and it’s the fastest possible execution path.
Don’t over-index. Every index on a table slows down writes. Each INSERT must update every index. Each UPDATE on an indexed column must update that index. Each DELETE must mark the row as dead in every index. A table with 15 indexes has 15 times the write overhead.
Check for unused indexes periodically:
SELECT
indexrelname AS index_name,
idx_scan AS times_used,
pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE '%pkey%'
ORDER BY pg_relation_size(indexrelid) DESC;
These are indexes that have never been used since the last statistics reset. If they’ve been unused for months, drop them. They’re consuming disk space, memory, and write performance for zero benefit.
The Complete Slow Query Diagnostic Sequence
When a query is slow, run through this:
- Find it — use
pg_stat_statementsto identify the query by total time or average time - Read the plan —
EXPLAIN (ANALYZE, BUFFERS)on the query - Check estimated vs actual rows — if they’re wildly different, run
ANALYZEon the tables involved - Look for Seq Scans on large tables — if the filter is selective (returns < 15% of rows), an index is likely needed
- Check the query pattern — look for functions on indexed columns, OFFSET pagination, SELECT *, correlated subqueries
- Add indexes strategically — composite, partial, covering — based on what the plan tells you
- Verify the fix — run
EXPLAIN ANALYZEagain and confirm the plan improved
Don’t skip step 3. I’ve seen teams spend days tuning queries that were slow because of stale statistics. A single ANALYZE command fixed the whole thing in under a second.
And don’t skip step 7. I’ve seen people create indexes that PostgreSQL ignores because the query pattern doesn’t match. You haven’t fixed anything until EXPLAIN ANALYZE confirms it.
Database performance is a discipline, not a guessing game. The tools exist. The plans are readable. The fixes are usually straightforward once you know where to look. The hard part isn’t the fix — it’s convincing yourself to actually look at the plan instead of throwing indexes at the wall and hoping one sticks.
If you found this guide helpful, check out our other resources:
- (More articles coming soon in the Database Systems category)