SQL Query Optimization: Indexes, Execution Plans & Performance (Complete Guide)

The Problem You're Solving

Your query takes 8 seconds to run on 1M rows:

-- ❌ Slow: Full table scan
SELECT * FROM users WHERE email = 'alice@example.com';
-- Scans all 1,000,000 rows ✗

-- ✅ Fast: Uses index
CREATE INDEX idx_users_email ON users(email);
SELECT * FROM users WHERE email = 'alice@example.com';
-- Scans only 1 row ✓

-- Time: 8 seconds → 2 milliseconds (4000x faster!)

That difference = timeouts and frustrated users vs instant responses.

SQL optimization appears in 25% of database interviews and directly impacts application scalability.

How Databases Work

B-Tree Index

Without index (full scan):
┌─────────────────────────┐
│ Row 1: alice@...        │ ← Check
│ Row 2: bob@...          │ ← Check
│ Row 3: charlie@...      │ ← Check
│ ... (1M rows) ...       │ ← Check each
└─────────────────────────┘
Total: 1M comparisons

With index (B-Tree):
         email index
             │
        ┌────┴────┐
        │          │
      a-m        n-z
       │          │
    ┌──┴──┐    ┌──┴──┐
   a-c  d-m   n-s  t-z
    │         │
alice@...  (not here)
           └─ Found in 3 lookups!

Total: 3-4 lookups (vs 1M scans)

Basic Indexes

Single Column Index

-- Create index
CREATE INDEX idx_users_email ON users(email);

-- Check it exists
SHOW INDEX FROM users;
-- idx_users_email on email column

-- Query now fast
SELECT * FROM users WHERE email = 'alice@example.com';
-- Uses index: index seek (2-3ms)

Composite Index (Multiple Columns)

-- Create on multiple columns
CREATE INDEX idx_users_name_email ON users(name, email);

-- Good for:
SELECT * FROM users WHERE name = 'Alice' AND email = 'alice@example.com';
-- Uses index

-- Less good for:
SELECT * FROM users WHERE email = 'alice@example.com';
-- Index scan (worse than specific email index)

-- Bad for:
SELECT * FROM users WHERE name = 'Alice';
-- Uses index (name is first)

-- Bad for:
SELECT * FROM users WHERE email = 'alice@example.com' AND name = 'Alice';
-- Uses index but less efficient (column order matters)

Column Order Matters (Leftmost Prefix Rule)

-- Index on (category, price, date)
CREATE INDEX idx_products ON products(category, price, date);

-- ✅ GOOD - Uses index
SELECT * FROM products WHERE category = 'Electronics';
SELECT * FROM products WHERE category = 'Electronics' AND price > 100;
SELECT * FROM products WHERE category = 'Electronics' AND price > 100 AND date > '2024-01-01';

-- ⚠️ SLOWER - Doesn't use index efficiently
SELECT * FROM products WHERE price > 100 AND date > '2024-01-01';
-- (missing category in WHERE)

-- ❌ BAD - Doesn't use index
SELECT * FROM products WHERE date > '2024-01-01';
-- (missing category and price)

Execution Plans: Read What Database Does

MySQL EXPLAIN

-- See how query executes
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

-- Output:
-- id | select_type | table | type   | possible_keys      | key                | rows | Extra
-- 1  | SIMPLE      | users | ref    | idx_users_email    | idx_users_email    | 1    | NULL

-- Key columns to understand:
-- type: ref (good), index, range (ok), ALL (bad)
-- rows: 1 (good), 1000000 (bad)
-- Extra: NULL (good), using filesort (bad), using temporary (bad)

EXPLAIN Output Types (Best to Worst)

system    - 1 row (system table)
const     - 1 row (index on primary key)
eq_ref    - Joins using primary key
ref       - Index lookup returning multiple rows (GOOD)
range     - Index range query (GOOD)
index     - Full index scan
ALL       - Full table scan (WORST)

Detect Bad Queries

-- ❌ ALL - Full table scan (slow on large tables)
EXPLAIN SELECT * FROM users;
-- type: ALL (scans all rows)

-- ✅ FIX - Add WHERE clause or index
EXPLAIN SELECT * FROM users WHERE status = 'active';
-- type: ref (uses index on status)

Real-World Optimization Examples

Example 1: Slow Report Query

-- ❌ SLOW - 45 seconds
SELECT
  users.name,
  COUNT(orders.id) as order_count,
  SUM(orders.total) as total_spent
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE orders.created_at > '2024-01-01'
GROUP BY users.id;
-- Uses full table scans

-- ✅ OPTIMIZED - 200ms
-- Add indexes
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_created_at ON orders(created_at);

-- Better query (filter before join)
SELECT
  u.name,
  COUNT(o.id) as order_count,
  SUM(o.total) as total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id AND o.created_at > '2024-01-01'
GROUP BY u.id;
-- 45s → 200ms (225x faster!)

Example 2: N+1 Query Problem

-- ❌ N+1 PROBLEM - 1001 queries
SELECT id, name FROM users LIMIT 100;
-- Returns 100 users

-- Then in application code:
for user in users:
    orders = SELECT * FROM orders WHERE user_id = user.id;
    # Now 100 more queries! Total: 101 queries

-- ✅ FIX - Single JOIN
SELECT
  u.id,
  u.name,
  o.order_id,
  o.total
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.id IN (1, 2, 3, ..., 100);
-- 1 query instead of 101

Example 3: Avoid Functions on Indexed Columns

-- ❌ SLOW - Index not used
SELECT * FROM users
WHERE YEAR(created_at) = 2024;
-- Function YEAR() prevents index use

-- ✅ FAST - Index used
SELECT * FROM users
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';
-- Index can be used for range query

Example 4: Use Covering Index

-- ❌ Requires full row lookup
SELECT name, email FROM users WHERE email = 'alice@example.com';
-- Index idx_email lookup, then fetch full row

-- ✅ Covering index (has all needed columns)
CREATE INDEX idx_users_email_name_email ON users(email, name);
SELECT name, email FROM users WHERE email = 'alice@example.com';
-- Index contains all data needed (no table lookup)

Query Optimization Techniques

1. Use Prepared Statements

-- ❌ Inefficient - Query compiled each time
SELECT * FROM users WHERE id = 123;
SELECT * FROM users WHERE id = 456;
SELECT * FROM users WHERE id = 789;
-- 3 separate compilations

-- ✅ Efficient - Compiled once, reused
PREPARE stmt FROM 'SELECT * FROM users WHERE id = ?';
EXECUTE stmt USING 123;
EXECUTE stmt USING 456;
EXECUTE stmt USING 789;
-- Compiled once, executed 3x

2. Batch Queries

-- ❌ Slow - 1000 individual INSERTs
INSERT INTO logs (user_id, action) VALUES (1, 'login');
INSERT INTO logs (user_id, action) VALUES (2, 'logout');
-- ... 1000 times

-- ✅ Fast - Batch INSERT
INSERT INTO logs (user_id, action) VALUES
  (1, 'login'),
  (2, 'logout'),
  (3, 'click'),
  -- ... all 1000 rows
  (1000, 'logout');
-- 1000x faster

3. Denormalization (Strategic)

-- Normalized (requires join):
SELECT user.name, SUM(order.total)
FROM users
JOIN orders ON user.id = order.user_id
GROUP BY user.id;

-- Denormalized (cached calculation):
ALTER TABLE users ADD COLUMN total_spent DECIMAL(10,2);
-- Update total_spent when new order created
-- SELECT user.name, total_spent FROM users;
-- Faster (no join) but need to maintain total_spent

Common Mistakes

❌ Mistake 1: Indexing Everything

-- WRONG - Index creates overhead (storage, writes slower)
CREATE INDEX idx_col1 ON table(col1);
CREATE INDEX idx_col2 ON table(col2);
Create INDEX idx_col3 ON table(col3);
-- Too many indexes slow down inserts/updates

-- CORRECT - Index only frequently searched columns
CREATE INDEX idx_email ON users(email);  -- 100 searches/day
-- Don't index rarely-searched columns

❌ Mistake 2: Not Using EXPLAIN

-- WRONG - Assume query is optimized
SELECT * FROM huge_table WHERE name LIKE '%alice%';
-- Might be slow (full scan) but don't know

-- CORRECT - Always check
EXPLAIN SELECT * FROM huge_table WHERE name LIKE '%alice%';
-- Output shows it's full scan → add index
CREATE INDEX idx_name ON huge_table(name);

❌ Mistake 3: Using SELECT * Instead of Specific Columns

-- WRONG - Gets all columns even if don't need them
SELECT * FROM huge_table;
-- Transfers unnecessary data, slower network

-- CORRECT - Select only needed columns
SELECT id, name FROM huge_table;
-- Faster network transfer, smaller result set

FAQ: SQL Optimization

Q1: When should I add an index?

A: When query filtering on column is slow.

-- Slow query?
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Output: type ALL, rows 1000000

-- Add index
CREATE INDEX idx_users_email ON users(email);

-- Now fast
EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
-- Output: type ref, rows 1

Q2: What's the impact of indexes on writes?

A: Inserts/updates slower (must update index too).

No index:
  INSERT: 1ms

1 index:
  INSERT: 1.5ms (update index too)

10 indexes:
  INSERT: 5ms (update all 10 indexes)

Balance: Fast reads vs fast writes.

Q3: Interview Question: Optimize this query.

A: Here's methodology:

-- SLOW: 30 seconds
SELECT u.name, COUNT(o.id) as orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

-- Analysis:
-- 1. Check EXPLAIN (it shows ALL scan on orders)
-- 2. Add index on orders.user_id
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- 3. Add index on users.created_at
CREATE INDEX idx_users_created_at ON users(created_at);

-- 4. Test again: now 500ms
-- 5. If still slow, add composite index
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Final: 500ms → 50ms (10x improvement)

Interview insight: "I'd start with EXPLAIN, identify bottlenecks, add indexes to slow columns, test, then denormalize if needed."

Q4: What's a good query?

A: Less rows examined than returned.

-- BAD
EXPLAIN SELECT * FROM users WHERE active = 1;
-- rows examined: 1000000, rows returned: 100
-- (scanning 10,000x more rows than needed)

-- GOOD
CREATE INDEX idx_users_active ON users(active);
-- rows examined: 100, rows returned: 100
-- (scanning exactly what's needed)

Conclusion

SQL optimization fundamentals:

Understand EXPLAIN - Read execution plans
Index correctly - Leftmost prefix rule
Avoid pitfalls - Functions, N+1 queries, SELECT *
Test changes - Benchmark before/after
Balance trade-offs - Read speed vs write speed

Master these and you'll write queries that scale to billions of rows.

SQL Query Optimization: Indexes, Execution Plans & Performance (Complete Guide)

SQL Query Optimization: Indexes, Execution Plans & Performance (Complete Guide)

The Problem You're Solving

How Databases Work

B-Tree Index

Basic Indexes

Single Column Index

Composite Index (Multiple Columns)

Column Order Matters (Leftmost Prefix Rule)

Execution Plans: Read What Database Does

MySQL EXPLAIN

EXPLAIN Output Types (Best to Worst)

Detect Bad Queries

Real-World Optimization Examples

Example 1: Slow Report Query

Example 2: N+1 Query Problem

Example 3: Avoid Functions on Indexed Columns

Example 4: Use Covering Index

Query Optimization Techniques

1. Use Prepared Statements

2. Batch Queries

3. Denormalization (Strategic)

Common Mistakes

❌ Mistake 1: Indexing Everything

❌ Mistake 2: Not Using EXPLAIN

❌ Mistake 3: Using SELECT * Instead of Specific Columns

FAQ: SQL Optimization

Q1: When should I add an index?

Q2: What's the impact of indexes on writes?

Q3: Interview Question: Optimize this query.

Q4: What's a good query?

Conclusion

Learn More

Tags

Related Articles

Git Advanced Workflows: Rebasing, Cherry-Pick & Flow (Complete Guide)

JavaScript Promises vs Async/Await: Master Async Code (Complete Guide)

Python Type Hints & Pydantic: Runtime Validation (Complete Guide)

Related Articles

💻
💻Tech Skills
Git Advanced Workflows: Rebasing, Cherry-Pick & Flow (Complete Guide)
Master Git rebase, cherry-pick, git flow, maintain clean history, prevent merge disasters in production.
March 7, 20268 min read
#Git#Version Control#DevOps
March 7, 20268 min read

💻
💻Tech Skills
JavaScript Promises vs Async/Await: Master Async Code (Complete Guide)
Master JavaScript promises and async/await, parallelize operations, speed up code 3x, handle errors correctly.
March 7, 20267 min read
#JavaScript#Async#Promises
March 7, 20267 min read

💻
💻Tech Skills
Python Type Hints & Pydantic: Runtime Validation (Complete Guide)
Master Python type hints, Pydantic validation, catch runtime errors at input, prevent 70% of production bugs.
March 7, 20267 min read
#Python#Type Safety#Pydantic
March 7, 20267 min read