SQL Query Optimization: Indexes, Execution Plans & Performance (Complete Guide)
The Problem You're Solving
Your query takes 8 seconds to run on 1M rows:
-- ā Slow: Full table scan
SELECT * FROM users WHERE email = 'alice@example.com';
-- Scans all 1,000,000 rows ā
-- ā
Fast: Uses index
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'alice@example.com';
-- Scans only 1 row ā
-- Time: 8 seconds ā 2 milliseconds (4000x faster!)
That difference = timeouts and frustrated users vs instant responses.
SQL optimization appears in 25% of database interviews and directly impacts application scalability.
How Databases Work
B-Tree Index
Without index (full scan):
āāāāāāāāāāāāāāāāāāāāāāāāāāā
ā Row 1: alice@... ā ā Check
ā Row 2: bob@... ā ā Check
ā Row 3: charlie@... ā ā Check
ā ... (1M rows) ... ā ā Check each
āāāāāāāāāāāāāāāāāāāāāāāāāāā
Total: 1M comparisons
With index (B-Tree):
email index
ā
āāāāāā“āāāāā
ā ā
a-m n-z
ā ā
āāāā“āāā āāāā“āāā
a-c d-m n-s t-z
ā ā
alice@... (not here)
āā Found in 3 lookups!
Total: 3-4 lookups (vs 1M scans)
Basic Indexes
Single Column Index
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- Check it exists
SHOW INDEX FROM users;
-- idx_users_email on email column
-- Query now fast
SELECT * FROM users WHERE email = 'alice@example.com';
-- Uses index: index seek (2-3ms)
Composite Index (Multiple Columns)
-- Create on multiple columns
CREATE INDEX idx_users_name_email ON users(name, email);
-- Good for:
SELECT * FROM users WHERE name = 'Alice' AND email = 'alice@example.com';
-- Uses index
-- Less good for:
SELECT * FROM users WHERE email = 'alice@example.com';
-- Index scan (worse than specific email index)
-- Bad for:
SELECT * FROM users WHERE name = 'Alice';
-- Uses index (name is first)
-- Bad for:
SELECT * FROM users WHERE email = 'alice@example.com' AND name = 'Alice';
-- Uses index but less efficient (column order matters)
Column Order Matters (Leftmost Prefix Rule)
-- Index on (category, price, date)
CREATE INDEX idx_products ON products(category, price, date);
-- ā
GOOD - Uses index
SELECT * FROM products WHERE category = 'Electronics';
SELECT * FROM products WHERE category = 'Electronics' AND price > 100;
SELECT * FROM products WHERE category = 'Electronics' AND price > 100 AND date > '2024-01-01';
-- ā ļø SLOWER - Doesn't use index efficiently
SELECT * FROM products WHERE price > 100 AND date > '2024-01-01';
-- (missing category in WHERE)
-- ā BAD - Doesn't use index
SELECT * FROM products WHERE date > '2024-01-01';
-- (missing category and price)
Execution Plans: Read What Database Does
MySQL EXPLAIN
-- See how query executes
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Output:
-- id | select_type | table | type | possible_keys | key | rows | Extra
-- 1 | SIMPLE | users | ref | idx_users_email | idx_users_email | 1 | NULL
-- Key columns to understand:
-- type: ref (good), index, range (ok), ALL (bad)
-- rows: 1 (good), 1000000 (bad)
-- Extra: NULL (good), using filesort (bad), using temporary (bad)
EXPLAIN Output Types (Best to Worst)
system - 1 row (system table)
const - 1 row (index on primary key)
eq_ref - Joins using primary key
ref - Index lookup returning multiple rows (GOOD)
range - Index range query (GOOD)
index - Full index scan
ALL - Full table scan (WORST)
Detect Bad Queries
-- ā ALL - Full table scan (slow on large tables)
EXPLAIN SELECT * FROM users;
-- type: ALL (scans all rows)
-- ā
FIX - Add WHERE clause or index
EXPLAIN SELECT * FROM users WHERE status = 'active';
-- type: ref (uses index on status)
Real-World Optimization Examples
Example 1: Slow Report Query
-- ā SLOW - 45 seconds
SELECT
users.name,
COUNT(orders.id) as order_count,
SUM(orders.total) as total_spent
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE orders.created_at > '2024-01-01'
GROUP BY users.id;
-- Uses full table scans
-- ā
OPTIMIZED - 200ms
-- Add indexes
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Better query (filter before join)
SELECT
u.name,
COUNT(o.id) as order_count,
SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id AND o.created_at > '2024-01-01'
GROUP BY u.id;
-- 45s ā 200ms (225x faster!)
Example 2: N+1 Query Problem
-- ā N+1 PROBLEM - 1001 queries
SELECT id, name FROM users LIMIT 100;
-- Returns 100 users
-- Then in application code:
for user in users:
orders = SELECT * FROM orders WHERE user_id = user.id;
# Now 100 more queries! Total: 101 queries
-- ā
FIX - Single JOIN
SELECT
u.id,
u.name,
o.order_id,
o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.id IN (1, 2, 3, ..., 100);
-- 1 query instead of 101
Example 3: Avoid Functions on Indexed Columns
-- ā SLOW - Index not used
SELECT * FROM users
WHERE YEAR(created_at) = 2024;
-- Function YEAR() prevents index use
-- ā
FAST - Index used
SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- Index can be used for range query
Example 4: Use Covering Index
-- ā Requires full row lookup
SELECT name, email FROM users WHERE email = 'alice@example.com';
-- Index idx_email lookup, then fetch full row
-- ā
Covering index (has all needed columns)
CREATE INDEX idx_users_email_name_email ON users(email, name);
SELECT name, email FROM users WHERE email = 'alice@example.com';
-- Index contains all data needed (no table lookup)
Query Optimization Techniques
1. Use Prepared Statements
-- ā Inefficient - Query compiled each time
SELECT * FROM users WHERE id = 123;
SELECT * FROM users WHERE id = 456;
SELECT * FROM users WHERE id = 789;
-- 3 separate compilations
-- ā
Efficient - Compiled once, reused
PREPARE stmt FROM 'SELECT * FROM users WHERE id = ?';
EXECUTE stmt USING 123;
EXECUTE stmt USING 456;
EXECUTE stmt USING 789;
-- Compiled once, executed 3x
2. Batch Queries
-- ā Slow - 1000 individual INSERTs
INSERT INTO logs (user_id, action) VALUES (1, 'login');
INSERT INTO logs (user_id, action) VALUES (2, 'logout');
-- ... 1000 times
-- ā
Fast - Batch INSERT
INSERT INTO logs (user_id, action) VALUES
(1, 'login'),
(2, 'logout'),
(3, 'click'),
-- ... all 1000 rows
(1000, 'logout');
-- 1000x faster
3. Denormalization (Strategic)
-- Normalized (requires join):
SELECT user.name, SUM(order.total)
FROM users
JOIN orders ON user.id = order.user_id
GROUP BY user.id;
-- Denormalized (cached calculation):
ALTER TABLE users ADD COLUMN total_spent DECIMAL(10,2);
-- Update total_spent when new order created
-- SELECT user.name, total_spent FROM users;
-- Faster (no join) but need to maintain total_spent
Common Mistakes
ā Mistake 1: Indexing Everything
-- WRONG - Index creates overhead (storage, writes slower)
CREATE INDEX idx_col1 ON table(col1);
CREATE INDEX idx_col2 ON table(col2);
Create INDEX idx_col3 ON table(col3);
-- Too many indexes slow down inserts/updates
-- CORRECT - Index only frequently searched columns
CREATE INDEX idx_email ON users(email); -- 100 searches/day
-- Don't index rarely-searched columns
ā Mistake 2: Not Using EXPLAIN
-- WRONG - Assume query is optimized
SELECT * FROM huge_table WHERE name LIKE '%alice%';
-- Might be slow (full scan) but don't know
-- CORRECT - Always check
EXPLAIN SELECT * FROM huge_table WHERE name LIKE '%alice%';
-- Output shows it's full scan ā add index
CREATE INDEX idx_name ON huge_table(name);
ā Mistake 3: Using SELECT * Instead of Specific Columns
-- WRONG - Gets all columns even if don't need them
SELECT * FROM huge_table;
-- Transfers unnecessary data, slower network
-- CORRECT - Select only needed columns
SELECT id, name FROM huge_table;
-- Faster network transfer, smaller result set
FAQ: SQL Optimization
Q1: When should I add an index?
A: When query filtering on column is slow.
-- Slow query?
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Output: type ALL, rows 1000000
-- Add index
CREATE INDEX idx_users_email ON users(email);
-- Now fast
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Output: type ref, rows 1
Q2: What's the impact of indexes on writes?
A: Inserts/updates slower (must update index too).
No index:
INSERT: 1ms
1 index:
INSERT: 1.5ms (update index too)
10 indexes:
INSERT: 5ms (update all 10 indexes)
Balance: Fast reads vs fast writes.
Q3: Interview Question: Optimize this query.
A: Here's methodology:
-- SLOW: 30 seconds
SELECT u.name, COUNT(o.id) as orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;
-- Analysis:
-- 1. Check EXPLAIN (it shows ALL scan on orders)
-- 2. Add index on orders.user_id
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- 3. Add index on users.created_at
CREATE INDEX idx_users_created_at ON users(created_at);
-- 4. Test again: now 500ms
-- 5. If still slow, add composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Final: 500ms ā 50ms (10x improvement)
Interview insight: "I'd start with EXPLAIN, identify bottlenecks, add indexes to slow columns, test, then denormalize if needed."
Q4: What's a good query?
A: Less rows examined than returned.
-- BAD
EXPLAIN SELECT * FROM users WHERE active = 1;
-- rows examined: 1000000, rows returned: 100
-- (scanning 10,000x more rows than needed)
-- GOOD
CREATE INDEX idx_users_active ON users(active);
-- rows examined: 100, rows returned: 100
-- (scanning exactly what's needed)
Conclusion
SQL optimization fundamentals:
- Understand EXPLAIN - Read execution plans
- Index correctly - Leftmost prefix rule
- Avoid pitfalls - Functions, N+1 queries, SELECT *
- Test changes - Benchmark before/after
- Balance trade-offs - Read speed vs write speed
Master these and you'll write queries that scale to billions of rows.