Pandas pd.concat(): Merge & Combine DataFrames (Complete Guide)

pd.concat() is one of the most essential Pandas functions for combining multiple DataFrames. Whether you're concatenating rows, columns, or handling indices, this guide covers everything you need to know.

What is pd.concat()?

pd.concat() combines multiple Pandas DataFrames or Series into a single DataFrame.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Combine them
result = pd.concat([df1, df2])
print(result)
#    A  B
# 0  1  3
# 1  2  4
# 0  5  7
# 1  6  8

Notice the indices repeat (0, 1, 0, 1). You often need to reset them.

Basic Syntax

pd.concat(objs, axis=0, ignore_index=False, keys=None, sort=False)

Parameters:

objs: List of DataFrames/Series to combine
axis: 0 (rows) or 1 (columns)
ignore_index: If True, create new 0-1-2... index
keys: Add hierarchical index levels
sort: Sort column names (default False)

Combining Rows (axis=0) - Most Common

Basic Row Concatenation

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30],
    'City': ['NYC', 'LA']
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Age': [35, 28],
    'City': ['Chicago', 'Houston']
})

# Combine by rows
result = pd.concat([df1, df2], axis=0)
print(result)
#      Name  Age     City
# 0   Alice   25      NYC
# 1     Bob   30       LA
# 0 Charlie   35  Chicago
# 1   David   28  Houston

Reset Index for Clean Results

result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
#      Name  Age     City
# 0   Alice   25      NYC
# 1     Bob   30       LA
# 2 Charlie   35  Chicago
# 3   David   28  Houston

Much cleaner! ignore_index=True creates a new 0-1-2-3 index.

Combining Rows with Different Columns

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})  # Note: has C, not B

result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
#    A    B  C
# 0  1  3.0  NaN
# 1  2  4.0  NaN
# 2  5  NaN  7.0
# 3  6  NaN  8.0

Missing columns are filled with NaN.

Combining Columns (axis=1)

Basic Column Concatenation

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Salary': [50000, 60000],
    'Department': ['Sales', 'Engineering']
})

# Combine by columns (side by side)
result = pd.concat([df1, df2], axis=1)
print(result)
#    Name  Age  Salary Department
# 0 Alice   25   50000      Sales
# 1   Bob   30   60000 Engineering

This is equivalent to adding new columns to df1.

Column Concatenation with Different Indices

df1 = pd.DataFrame({'A': [1, 2]}, index=['row1', 'row2'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['row2', 'row3'])

result = pd.concat([df1, df2], axis=1)
print(result)
#       A    B
# row1  1.0  NaN
# row2  2.0  3.0
# row3  NaN  4.0

Missing index combinations are filled with NaN.

pd.concat() vs merge() vs join()

Method	Use Case	Axis	Index
concat()	Simple row/column stacking	Both	Preserves or resets
merge()	SQL-like joins (inner, outer)	1 (columns)	Match on keys
join()	Join by index	1 (columns)	Always by index

When to Use Each

Use concat() for:

Stacking multiple DataFrames (same structure)
Combining rows from different sources
Simple column addition

Use merge() for:

Joining on specific columns
Inner/outer/left/right joins
When you need SQL-like behavior

Use join() for:

Quick joins by index (no key matching)
Cleaner syntax when both DataFrames have matching indices

Example Comparison

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
}, index=[1, 2])

df2 = pd.DataFrame({
    'Salary': [50000, 60000]
}, index=[1, 2])

# All three give the same result:
print(pd.concat([df1, df2], axis=1))
print(df1.merge(df2, left_index=True, right_index=True))
print(df1.join(df2))

#    Name  Age  Salary
# 1 Alice   25   50000
# 2   Bob   30   60000

All work, but concat() is simplest for this case.

Real-World Example: Combining CSV Files

import pandas as pd
import glob

# Read all CSV files from a folder
csv_files = glob.glob('data/*.csv')
dfs = [pd.read_csv(file) for file in csv_files]

# Combine all into one DataFrame
result = pd.concat(dfs, axis=0, ignore_index=True)

# Save combined data
result.to_csv('all_data_combined.csv', index=False)

This is perfect for processing data from multiple sources.

Handling Indices: keys Parameter

df1 = pd.DataFrame({'data': [1, 2]})
df2 = pd.DataFrame({'data': [3, 4]})
df3 = pd.DataFrame({'data': [5, 6]})

# Add hierarchical index showing source
result = pd.concat(
    [df1, df2, df3],
    keys=['Jan', 'Feb', 'Mar']
)
print(result)
#         data
# Jan  0     1
#      1     2
# Feb  0     3
#      1     4
# Mar  0     5
#      1     6

Useful for tracking where each row came from.

Performance: concat() vs append()

Warning: DataFrame.append() is deprecated in modern Pandas. Use concat() instead.

# OLD (deprecated)
result = df1.append(df2)

# NEW (correct)
result = pd.concat([df1, df2])

concat() is faster and more flexible.

Handling Duplicate Columns

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Duplicate column names
result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
#    A  B
# 0  1  3
# 1  2  4
# 2  5  7
# 3  6  8

Works fine. The 'A' column from both DataFrames gets combined into one column.

Sorting Columns After Concatenation

df1 = pd.DataFrame({'B': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

# Default (no sorting)
result = pd.concat([df1, df2], axis=1)
print(result)
#    B  A
# 0  1  3
# 1  2  4

# Sort columns
result = pd.concat([df1, df2], axis=1, sort=True)
print(result)
#    A  B
# 0  3  1
# 1  4  2

Use sort=True when you want alphabetical column ordering.

Ignoring Index vs Preserving It

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'A': [3, 4]}, index=['c', 'd'])

# Preserve original indices
result = pd.concat([df1, df2], ignore_index=False)
print(result)
#    A
# a  1
# b  2
# c  3
# d  4

# Reset to 0-1-2-3
result = pd.concat([df1, df2], ignore_index=True)
print(result)
#    A
# 0  1
# 1  2
# 2  3
# 3  4

Choose based on whether you need the original indices.

Common Errors

❌ Error 1: Empty DataFrame List

# WRONG
result = pd.concat([])  # Error: no objects to concatenate

# CORRECT
result = pd.concat([df1, df2])

❌ Error 2: Forgetting ignore_index

# WRONG - duplicate indices
result = pd.concat([df1, df2])  # Indices: [0, 1, 0, 1]

# CORRECT
result = pd.concat([df1, df2], ignore_index=True)  # Indices: [0, 1, 2, 3]

❌ Error 3: Using append() (Deprecated)

# WRONG - deprecated
result = df1.append(df2)

# CORRECT
result = pd.concat([df1, df2])

Best Practices

Always use ignore_index=True when combining row-wise data (unless you need specific indices)
Use concat() for simple stacking, merge() for complex joins
Check column alignment when combining DataFrames with different columns
Watch out for NaN values when columns don't match
Use keys= to track source of rows in multi-source data
Prefer concat() over deprecated append()

Real-World Workflow

import pandas as pd

# 1. Read multiple data sources
sales_jan = pd.read_csv('sales_jan.csv')
sales_feb = pd.read_csv('sales_feb.csv')
sales_mar = pd.read_csv('sales_mar.csv')

# 2. Combine all months
all_sales = pd.concat([sales_jan, sales_feb, sales_mar], ignore_index=True)

# 3. Add computed columns from another source
regions = pd.read_csv('regions.csv')
merged = all_sales.merge(regions, on='store_id')

# 4. Save combined data
merged.to_csv('all_sales_with_regions.csv', index=False)

Conclusion

pd.concat() is essential for any data work in Python:

Simple row stacking: concat([df1, df2], ignore_index=True)
Column addition: concat([df1, df2], axis=1)
Multi-source data: Use keys= to track origins
Performance: Faster than deprecated append()

Master concat() and you've mastered one of Pandas' most powerful tools.

Pandas pd.concat(): Merge & Combine DataFrames (Complete Guide)

Pandas pd.concat(): Merge & Combine DataFrames (Complete Guide)

What is pd.concat()?

Basic Syntax

Combining Rows (axis=0) - Most Common

Basic Row Concatenation

Reset Index for Clean Results

Combining Rows with Different Columns

Combining Columns (axis=1)

Basic Column Concatenation

Column Concatenation with Different Indices

pd.concat() vs merge() vs join()

When to Use Each

Example Comparison

Real-World Example: Combining CSV Files

Handling Indices: keys Parameter

Performance: concat() vs append()

Handling Duplicate Columns

Sorting Columns After Concatenation

Ignoring Index vs Preserving It

Common Errors

❌ Error 1: Empty DataFrame List

❌ Error 2: Forgetting ignore_index

❌ Error 3: Using append() (Deprecated)

Best Practices

Real-World Workflow

Conclusion

Learn More

Tags

Related Articles

SQL Query Optimization: Indexes, Execution Plans & Performance (Complete Guide)

Git Advanced Workflows: Rebasing, Cherry-Pick & Flow (Complete Guide)

JavaScript Promises vs Async/Await: Master Async Code (Complete Guide)

Related Articles

💻
💻Tech Skills
SQL Query Optimization: Indexes, Execution Plans & Performance (Complete Guide)
Master SQL optimization, use indexes correctly, read execution plans, speed up queries 4000x on 1M rows.
March 7, 20269 min read
#SQL#Database#Performance
Master SQL optimization, use indexes correctly, read execution plans, speed up queries 4000x on 1M rows.

💻
💻Tech Skills
Git Advanced Workflows: Rebasing, Cherry-Pick & Flow (Complete Guide)
Master Git rebase, cherry-pick, git flow, maintain clean history, prevent merge disasters in production.
March 7, 20268 min read
#Git#Version Control#DevOps
March 7, 20268 min read

💻
💻Tech Skills
JavaScript Promises vs Async/Await: Master Async Code (Complete Guide)
Master JavaScript promises and async/await, parallelize operations, speed up code 3x, handle errors correctly.
March 7, 20267 min read
#JavaScript#Async#Promises
March 7, 20267 min read