Skip to main content

Python Pandas: How to Copy Columns Between DataFrames

Copying one or more columns from one Pandas DataFrame to another is a frequent operation when merging datasets, creating new features, or restructuring data. Pandas offers several ways to achieve this, but it's crucial to understand how DataFrame indexes affect the alignment of data during these operations to avoid unexpected NaN values.

This guide explains how to copy columns between DataFrames using direct assignment, the .copy() method, and techniques to handle index misalignments.

The Goal: Transferring Column Data

We want to take one or more columns from a source DataFrame (df1) and add them as new columns (or overwrite existing ones) in a target DataFrame (df2).

Example DataFrames

import pandas as pd
import numpy as np # For NaN example

# Source DataFrame
df1_data = {
'ID': [101, 102, 103, 104],
'Product_Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics'],
'Sales_Q1': [150, 2000, 750, 3000]
}
df1 = pd.DataFrame(df1_data)
# df1 uses default 0-based index: 0, 1, 2, 3

# Target DataFrame
df2_data = {
'Employee': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['Sales', 'IT', 'Sales', 'Marketing']
}
df2 = pd.DataFrame(df2_data)
# df2 also uses default 0-based index: 0, 1, 2, 3

print("Original df1:")
print(df1)
print()

print("Original df2:")
print(df2)

Output:

Original df1:
ID Product_Name Category Sales_Q1
0 101 Laptop Electronics 150
1 102 Mouse Accessory 2000
2 103 Keyboard Accessory 750
3 104 Monitor Electronics 3000

Original df2:
Employee Department
0 Alice Sales
1 Bob IT
2 Charlie Sales
3 David Marketing

Method 1: Direct Assignment (Basic Column Copying)

This is the most straightforward way when the DataFrames have aligning indexes or you intend to align based on the existing index.

Copying a Single Column

Assign the Series from df1 to a new column name in df2.

import pandas as pd

df1_example = pd.DataFrame({
'ID': [101, 102, 103, 104], 'Sales_Q1': [150, 2000, 750, 3000]
})
df2_example = pd.DataFrame({
'Employee': ['Alice', 'Bob', 'Charlie', 'David']
})

# ✅ Copy 'Sales_Q1' from df1_example to df2_example as a new column
df2_example['Q1_Sales'] = df1_example['Sales_Q1']

print("df2 after copying 'Sales_Q1' as 'Q1_Sales':")
print(df2_example)

Output:

df2 after copying 'Sales_Q1' as 'Q1_Sales':
Employee Q1_Sales
0 Alice 150
1 Bob 2000
2 Charlie 750
3 David 3000
note

Pandas aligns the data based on the index. Since both df1 and df2 have the default 0-based index here, the copy is row-by-row.

Copying Multiple Columns

Select multiple columns from df1 and assign them to df2.

import pandas as pd

df1_example = pd.DataFrame({
'ID': [101, 102, 103, 104],
'Product_Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics']
})
df2_example = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West']
})

# ✅ Copy 'Product_Name' and 'Category' from df1_example to df2_example
columns_to_copy = ['Product_Name', 'Category']
df2_example[columns_to_copy] = df1_example[columns_to_copy]
# Or more directly:
df2_example[['Product_Name', 'Category']] = df1_example[['Product_Name', 'Category']]

print("df2 after copying multiple columns:")
print(df2_example)

Output:

df2 after copying multiple columns:
Region Product_Name Category
0 North Laptop Electronics
1 South Mouse Accessory
2 East Keyboard Accessory
3 West Monitor Electronics
note

You must use double square brackets df1[['col1', 'col2']] to select multiple columns as a DataFrame for assignment.

Important: Use Bracket Notation [], Not Dot Notation . for Assignment

When creating new columns in df2, always use bracket notation df2['NewColumn'] = .... Using dot notation (df2.NewColumn = ...) can sometimes work for accessing existing columns but is not reliable for creating new ones and can lead to warnings or incorrect behavior (e.g., creating an attribute instead of a DataFrame column).

# Correct:
df2['New_Col'] = df1['Source_Col']

# Incorrect for creating new columns:
df2.New_Col = df1.Source_Col # Avoid this for assignment

Method 2: Using DataFrame.copy() (for Subsets or Independent Copies)

If you want to copy a subset of df1 into a new DataFrame df2 (or overwrite df2 completely), you can select the desired columns from df1 and then use the .copy() method.

import pandas as pd

df1_example = pd.DataFrame({
'ID': [101, 102, 103, 104],
'Product_Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics'],
'Sales_Q1': [150, 2000, 750, 3000]
})

# Select desired columns from df1
columns_to_extract = ['Product_Name', 'Sales_Q1']
df1_subset = df1_example[columns_to_extract]

# ✅ Create df2 as a copy of this subset
df2_from_copy = df1_subset.copy()
# df2_from_copy will be an independent copy; changes to df1_subset or df1 won't affect it.

print("New df2 created from a copy of df1 subset:")
print(df2_from_copy)

Output:

New df2 created from a copy of df1 subset:
Product_Name Sales_Q1
0 Laptop 150
1 Mouse 2000
2 Keyboard 750
3 Monitor 3000
note

df.copy(deep=True): Creates a deep copy. Changes to the copy won't affect the original. deep=True is the default.

Handling Index Alignment: The Cause of NaN Values

A common pitfall when copying columns is getting NaN (Not a Number) values in the target DataFrame. This happens if the indexes of df1 and df2 do not align for the rows being assigned. Pandas tries to match rows based on their index labels during assignment.

When Indexes Differ

import pandas as pd
import numpy as np

df1_custom_idx = pd.DataFrame({
'Value_A': [10, 20, 30, 40]
}, index=['a', 'b', 'c', 'd']) # Custom string index

df2_numeric_idx = pd.DataFrame({
'Property_X': [100, 200, 300, 400]
}, index=[0, 1, 2, 3]) # Default numeric index

print("df1_custom_idx:")
print(df1_custom_idx)
print()
print("df2_numeric_idx:")
print(df2_numeric_idx)
print()

# Attempt to copy 'Value_A' to df2_numeric_idx
df2_numeric_idx['Copied_Value_A'] = df1_custom_idx['Value_A']

print("df2 after attempting copy with misaligned indexes:")
print(df2_numeric_idx)

Output:

df1_custom_idx:
Value_A
a 10
b 20
c 30
d 40

df2_numeric_idx:
Property_X
0 100
1 200
2 300
3 400

df2 after attempting copy with misaligned indexes:
Property_X Copied_Value_A
0 100 NaN
1 200 NaN
2 300 NaN
3 400 NaN
note

Since there are no matching index labels, Pandas can not align the data and fills the new column in df2 with NaN.

Solution 1: Homogenize Indexes

If the rows correspond logically but have different index labels, you can make the target DataFrame's index match the source's index before the copy.

import pandas as pd
import numpy as np

df1_custom_idx = pd.DataFrame({'Value_A': [10,20,30,40]}, index=['a','b','c','d'])
df2_numeric_idx_copy = pd.DataFrame({'Property_X': [100,200,300,400]}, index=[0,1,2,3])


# ✅ Make df2's index match df1's index
# WARNING: This assumes df2 has the same number of rows and logical order as df1.
df2_numeric_idx_copy.index = df1_custom_idx.index

df2_numeric_idx_copy['Copied_Value_A'] = df1_custom_idx['Value_A']
print("df2 after homogenizing index and copying:")
print(df2_numeric_idx_copy)

Output:

df2 after homogenizing index and copying:
Property_X Copied_Value_A
a 100 10
b 200 20
c 300 30
d 400 40
note

This solution only works if the number of rows is the same and the order implies a one-to-one mapping.

If you want to copy the column data based purely on its order (position), ignoring the index labels, convert the source column(s) to a NumPy array first using .to_numpy() (preferred) or the .values attribute. Assigning a NumPy array to a DataFrame column bypasses Pandas' index alignment.

import pandas as pd

df1_custom_idx = pd.DataFrame({'Value_A': [10,20,30,40]}, index=['a','b','c','d'])
df2_numeric_idx_copy2 = pd.DataFrame({'Property_X': [100,200,300,400]}, index=[0,1,2,3])

# ✅ Copy using .to_numpy() to bypass index alignment
df2_numeric_idx_copy2['Value_A_numpy'] = df1_custom_idx['Value_A'].to_numpy()

print("df2 after copying with .to_numpy():")
print(df2_numeric_idx_copy2)
print()

# For multiple columns:
df1_multi_cols = pd.DataFrame({'A': [1,2], 'B': [3,4]}, index=['x','y'])
df2_target = pd.DataFrame({'C': [5,6]}, index=[0,1])
df2_target[['New_A', 'New_B']] = df1_multi_cols[['A', 'B']].to_numpy()
print("Multiple columns with .to_numpy():\n", df2_target)

Output:

df2 after copying with .to_numpy():
Property_X Value_A_numpy
0 100 10
1 200 20
2 300 30
3 400 40

Multiple columns with .to_numpy():
C New_A New_B
0 5 1 3
1 6 2 4
note
  • df1['Column'].to_numpy(): Converts the Series to a NumPy array, discarding index information.
  • df1['Column'].values: Similar effect, but .to_numpy() is generally preferred in modern Pandas. This assumes the source and target DataFrames have the same number of rows if you're assigning to an existing column or creating new columns for the entire length of the target.

Conclusion

Copying columns between Pandas DataFrames is a fundamental operation:

  • Direct assignment (df2['NewCol'] = df1['SourceCol'] or df2[cols_list] = df1[cols_list]) is the simplest method when indexes align or are intended to align. Always use bracket notation [] for assigning to new columns.
  • Use df_subset.copy() if you are creating a new DataFrame from a subset of another.
  • To avoid NaN values due to misaligned indexes:
    • Either homogenize the indexes (df2.index = df1.index) if the DataFrames correspond row-by-row.
    • Or, more commonly and robustly, bypass index alignment by assigning the .to_numpy() (or .values) representation of the source column(s) to the target DataFrame if you want to copy based on position/order.

Understanding Pandas' index alignment behavior is key to successfully copying columns without introducing unintended missing values.