Skip to main content
Back to Articles

Pandas pd.concat(): Merge & Combine DataFrames (Complete Guide)

Complete guide to Pandas pd.concat() for combining DataFrames. Learn axis stacking, index handling, and comparisons with merge() and join().

March 1, 202615 min readBy Mathematicon

Pandas pd.concat(): Merge & Combine DataFrames (Complete Guide)

pd.concat() is one of the most essential Pandas functions for combining multiple DataFrames. Whether you're concatenating rows, columns, or handling indices, this guide covers everything you need to know.

What is pd.concat()?

pd.concat() combines multiple Pandas DataFrames or Series into a single DataFrame.

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Combine them
result = pd.concat([df1, df2])
print(result)
#    A  B
# 0  1  3
# 1  2  4
# 0  5  7
# 1  6  8

Notice the indices repeat (0, 1, 0, 1). You often need to reset them.

Basic Syntax

pd.concat(objs, axis=0, ignore_index=False, keys=None, sort=False)

Parameters:

  • objs: List of DataFrames/Series to combine
  • axis: 0 (rows) or 1 (columns)
  • ignore_index: If True, create new 0-1-2... index
  • keys: Add hierarchical index levels
  • sort: Sort column names (default False)

Combining Rows (axis=0) - Most Common

Basic Row Concatenation

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30],
    'City': ['NYC', 'LA']
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Age': [35, 28],
    'City': ['Chicago', 'Houston']
})

# Combine by rows
result = pd.concat([df1, df2], axis=0)
print(result)
#      Name  Age     City
# 0   Alice   25      NYC
# 1     Bob   30       LA
# 0 Charlie   35  Chicago
# 1   David   28  Houston

Reset Index for Clean Results

result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
#      Name  Age     City
# 0   Alice   25      NYC
# 1     Bob   30       LA
# 2 Charlie   35  Chicago
# 3   David   28  Houston

Much cleaner! ignore_index=True creates a new 0-1-2-3 index.

Combining Rows with Different Columns

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'C': [7, 8]})  # Note: has C, not B

result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
#    A    B  C
# 0  1  3.0  NaN
# 1  2  4.0  NaN
# 2  5  NaN  7.0
# 3  6  NaN  8.0

Missing columns are filled with NaN.

Combining Columns (axis=1)

Basic Column Concatenation

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Salary': [50000, 60000],
    'Department': ['Sales', 'Engineering']
})

# Combine by columns (side by side)
result = pd.concat([df1, df2], axis=1)
print(result)
#    Name  Age  Salary Department
# 0 Alice   25   50000      Sales
# 1   Bob   30   60000 Engineering

This is equivalent to adding new columns to df1.

Column Concatenation with Different Indices

df1 = pd.DataFrame({'A': [1, 2]}, index=['row1', 'row2'])
df2 = pd.DataFrame({'B': [3, 4]}, index=['row2', 'row3'])

result = pd.concat([df1, df2], axis=1)
print(result)
#       A    B
# row1  1.0  NaN
# row2  2.0  3.0
# row3  NaN  4.0

Missing index combinations are filled with NaN.

pd.concat() vs merge() vs join()

Method Use Case Axis Index
concat() Simple row/column stacking Both Preserves or resets
merge() SQL-like joins (inner, outer) 1 (columns) Match on keys
join() Join by index 1 (columns) Always by index

When to Use Each

Use concat() for:

  • Stacking multiple DataFrames (same structure)
  • Combining rows from different sources
  • Simple column addition

Use merge() for:

  • Joining on specific columns
  • Inner/outer/left/right joins
  • When you need SQL-like behavior

Use join() for:

  • Quick joins by index (no key matching)
  • Cleaner syntax when both DataFrames have matching indices

Example Comparison

df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
}, index=[1, 2])

df2 = pd.DataFrame({
    'Salary': [50000, 60000]
}, index=[1, 2])

# All three give the same result:
print(pd.concat([df1, df2], axis=1))
print(df1.merge(df2, left_index=True, right_index=True))
print(df1.join(df2))

#    Name  Age  Salary
# 1 Alice   25   50000
# 2   Bob   30   60000

All work, but concat() is simplest for this case.

Real-World Example: Combining CSV Files

import pandas as pd
import glob

# Read all CSV files from a folder
csv_files = glob.glob('data/*.csv')
dfs = [pd.read_csv(file) for file in csv_files]

# Combine all into one DataFrame
result = pd.concat(dfs, axis=0, ignore_index=True)

# Save combined data
result.to_csv('all_data_combined.csv', index=False)

This is perfect for processing data from multiple sources.

Handling Indices: keys Parameter

df1 = pd.DataFrame({'data': [1, 2]})
df2 = pd.DataFrame({'data': [3, 4]})
df3 = pd.DataFrame({'data': [5, 6]})

# Add hierarchical index showing source
result = pd.concat(
    [df1, df2, df3],
    keys=['Jan', 'Feb', 'Mar']
)
print(result)
#         data
# Jan  0     1
#      1     2
# Feb  0     3
#      1     4
# Mar  0     5
#      1     6

Useful for tracking where each row came from.

Performance: concat() vs append()

Warning: DataFrame.append() is deprecated in modern Pandas. Use concat() instead.

# OLD (deprecated)
result = df1.append(df2)

# NEW (correct)
result = pd.concat([df1, df2])

concat() is faster and more flexible.

Handling Duplicate Columns

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# Duplicate column names
result = pd.concat([df1, df2], axis=0, ignore_index=True)
print(result)
#    A  B
# 0  1  3
# 1  2  4
# 2  5  7
# 3  6  8

Works fine. The 'A' column from both DataFrames gets combined into one column.

Sorting Columns After Concatenation

df1 = pd.DataFrame({'B': [1, 2]})
df2 = pd.DataFrame({'A': [3, 4]})

# Default (no sorting)
result = pd.concat([df1, df2], axis=1)
print(result)
#    B  A
# 0  1  3
# 1  2  4

# Sort columns
result = pd.concat([df1, df2], axis=1, sort=True)
print(result)
#    A  B
# 0  3  1
# 1  4  2

Use sort=True when you want alphabetical column ordering.

Ignoring Index vs Preserving It

df1 = pd.DataFrame({'A': [1, 2]}, index=['a', 'b'])
df2 = pd.DataFrame({'A': [3, 4]}, index=['c', 'd'])

# Preserve original indices
result = pd.concat([df1, df2], ignore_index=False)
print(result)
#    A
# a  1
# b  2
# c  3
# d  4

# Reset to 0-1-2-3
result = pd.concat([df1, df2], ignore_index=True)
print(result)
#    A
# 0  1
# 1  2
# 2  3
# 3  4

Choose based on whether you need the original indices.

Common Errors

āŒ Error 1: Empty DataFrame List

# WRONG
result = pd.concat([])  # Error: no objects to concatenate

# CORRECT
result = pd.concat([df1, df2])

āŒ Error 2: Forgetting ignore_index

# WRONG - duplicate indices
result = pd.concat([df1, df2])  # Indices: [0, 1, 0, 1]

# CORRECT
result = pd.concat([df1, df2], ignore_index=True)  # Indices: [0, 1, 2, 3]

āŒ Error 3: Using append() (Deprecated)

# WRONG - deprecated
result = df1.append(df2)

# CORRECT
result = pd.concat([df1, df2])

Best Practices

  1. Always use ignore_index=True when combining row-wise data (unless you need specific indices)
  2. Use concat() for simple stacking, merge() for complex joins
  3. Check column alignment when combining DataFrames with different columns
  4. Watch out for NaN values when columns don't match
  5. Use keys= to track source of rows in multi-source data
  6. Prefer concat() over deprecated append()

Real-World Workflow

import pandas as pd

# 1. Read multiple data sources
sales_jan = pd.read_csv('sales_jan.csv')
sales_feb = pd.read_csv('sales_feb.csv')
sales_mar = pd.read_csv('sales_mar.csv')

# 2. Combine all months
all_sales = pd.concat([sales_jan, sales_feb, sales_mar], ignore_index=True)

# 3. Add computed columns from another source
regions = pd.read_csv('regions.csv')
merged = all_sales.merge(regions, on='store_id')

# 4. Save combined data
merged.to_csv('all_sales_with_regions.csv', index=False)

Conclusion

pd.concat() is essential for any data work in Python:

  • Simple row stacking: concat([df1, df2], ignore_index=True)
  • Column addition: concat([df1, df2], axis=1)
  • Multi-source data: Use keys= to track origins
  • Performance: Faster than deprecated append()

Master concat() and you've mastered one of Pandas' most powerful tools.


Learn More

Share this article

Related Articles