Python Pandas: How to Copy Columns Between DataFrames
Copying one or more columns from one Pandas DataFrame to another is a frequent operation when merging datasets, creating new features, or restructuring data. Pandas offers several ways to achieve this, but it's crucial to understand how DataFrame indexes affect the alignment of data during these operations to avoid unexpected NaN
values.
This guide explains how to copy columns between DataFrames using direct assignment, the .copy()
method, and techniques to handle index misalignments.
The Goal: Transferring Column Data
We want to take one or more columns from a source DataFrame (df1
) and add them as new columns (or overwrite existing ones) in a target DataFrame (df2
).
Example DataFrames
import pandas as pd
import numpy as np # For NaN example
# Source DataFrame
df1_data = {
'ID': [101, 102, 103, 104],
'Product_Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics'],
'Sales_Q1': [150, 2000, 750, 3000]
}
df1 = pd.DataFrame(df1_data)
# df1 uses default 0-based index: 0, 1, 2, 3
# Target DataFrame
df2_data = {
'Employee': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['Sales', 'IT', 'Sales', 'Marketing']
}
df2 = pd.DataFrame(df2_data)
# df2 also uses default 0-based index: 0, 1, 2, 3
print("Original df1:")
print(df1)
print()
print("Original df2:")
print(df2)
Output:
Original df1:
ID Product_Name Category Sales_Q1
0 101 Laptop Electronics 150
1 102 Mouse Accessory 2000
2 103 Keyboard Accessory 750
3 104 Monitor Electronics 3000
Original df2:
Employee Department
0 Alice Sales
1 Bob IT
2 Charlie Sales
3 David Marketing
Method 1: Direct Assignment (Basic Column Copying)
This is the most straightforward way when the DataFrames have aligning indexes or you intend to align based on the existing index.
Copying a Single Column
Assign the Series from df1
to a new column name in df2
.
import pandas as pd
df1_example = pd.DataFrame({
'ID': [101, 102, 103, 104], 'Sales_Q1': [150, 2000, 750, 3000]
})
df2_example = pd.DataFrame({
'Employee': ['Alice', 'Bob', 'Charlie', 'David']
})
# ✅ Copy 'Sales_Q1' from df1_example to df2_example as a new column
df2_example['Q1_Sales'] = df1_example['Sales_Q1']
print("df2 after copying 'Sales_Q1' as 'Q1_Sales':")
print(df2_example)
Output:
df2 after copying 'Sales_Q1' as 'Q1_Sales':
Employee Q1_Sales
0 Alice 150
1 Bob 2000
2 Charlie 750
3 David 3000
Pandas aligns the data based on the index. Since both df1
and df2
have the default 0-based index here, the copy is row-by-row.
Copying Multiple Columns
Select multiple columns from df1
and assign them to df2
.
import pandas as pd
df1_example = pd.DataFrame({
'ID': [101, 102, 103, 104],
'Product_Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics']
})
df2_example = pd.DataFrame({
'Region': ['North', 'South', 'East', 'West']
})
# ✅ Copy 'Product_Name' and 'Category' from df1_example to df2_example
columns_to_copy = ['Product_Name', 'Category']
df2_example[columns_to_copy] = df1_example[columns_to_copy]
# Or more directly:
df2_example[['Product_Name', 'Category']] = df1_example[['Product_Name', 'Category']]
print("df2 after copying multiple columns:")
print(df2_example)
Output:
df2 after copying multiple columns:
Region Product_Name Category
0 North Laptop Electronics
1 South Mouse Accessory
2 East Keyboard Accessory
3 West Monitor Electronics
You must use double square brackets df1[['col1', 'col2']]
to select multiple columns as a DataFrame for assignment.
Important: Use Bracket Notation []
, Not Dot Notation .
for Assignment
When creating new columns in df2
, always use bracket notation df2['NewColumn'] = ...
. Using dot notation (df2.NewColumn = ...
) can sometimes work for accessing existing columns but is not reliable for creating new ones and can lead to warnings or incorrect behavior (e.g., creating an attribute instead of a DataFrame column).
# Correct:
df2['New_Col'] = df1['Source_Col']
# Incorrect for creating new columns:
df2.New_Col = df1.Source_Col # Avoid this for assignment
Method 2: Using DataFrame.copy()
(for Subsets or Independent Copies)
If you want to copy a subset of df1
into a new DataFrame df2
(or overwrite df2
completely), you can select the desired columns from df1
and then use the .copy()
method.
import pandas as pd
df1_example = pd.DataFrame({
'ID': [101, 102, 103, 104],
'Product_Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics'],
'Sales_Q1': [150, 2000, 750, 3000]
})
# Select desired columns from df1
columns_to_extract = ['Product_Name', 'Sales_Q1']
df1_subset = df1_example[columns_to_extract]
# ✅ Create df2 as a copy of this subset
df2_from_copy = df1_subset.copy()
# df2_from_copy will be an independent copy; changes to df1_subset or df1 won't affect it.
print("New df2 created from a copy of df1 subset:")
print(df2_from_copy)
Output:
New df2 created from a copy of df1 subset:
Product_Name Sales_Q1
0 Laptop 150
1 Mouse 2000
2 Keyboard 750
3 Monitor 3000
df.copy(deep=True)
: Creates a deep copy. Changes to the copy won't affect the original. deep=True
is the default.
Handling Index Alignment: The Cause of NaN
Values
A common pitfall when copying columns is getting NaN
(Not a Number) values in the target DataFrame. This happens if the indexes of df1
and df2
do not align for the rows being assigned. Pandas tries to match rows based on their index labels during assignment.
When Indexes Differ
import pandas as pd
import numpy as np
df1_custom_idx = pd.DataFrame({
'Value_A': [10, 20, 30, 40]
}, index=['a', 'b', 'c', 'd']) # Custom string index
df2_numeric_idx = pd.DataFrame({
'Property_X': [100, 200, 300, 400]
}, index=[0, 1, 2, 3]) # Default numeric index
print("df1_custom_idx:")
print(df1_custom_idx)
print()
print("df2_numeric_idx:")
print(df2_numeric_idx)
print()
# Attempt to copy 'Value_A' to df2_numeric_idx
df2_numeric_idx['Copied_Value_A'] = df1_custom_idx['Value_A']
print("df2 after attempting copy with misaligned indexes:")
print(df2_numeric_idx)
Output:
df1_custom_idx:
Value_A
a 10
b 20
c 30
d 40
df2_numeric_idx:
Property_X
0 100
1 200
2 300
3 400
df2 after attempting copy with misaligned indexes:
Property_X Copied_Value_A
0 100 NaN
1 200 NaN
2 300 NaN
3 400 NaN
Since there are no matching index labels, Pandas can not align the data and fills the new column in df2
with NaN
.
Solution 1: Homogenize Indexes
If the rows correspond logically but have different index labels, you can make the target DataFrame's index match the source's index before the copy.
import pandas as pd
import numpy as np
df1_custom_idx = pd.DataFrame({'Value_A': [10,20,30,40]}, index=['a','b','c','d'])
df2_numeric_idx_copy = pd.DataFrame({'Property_X': [100,200,300,400]}, index=[0,1,2,3])
# ✅ Make df2's index match df1's index
# WARNING: This assumes df2 has the same number of rows and logical order as df1.
df2_numeric_idx_copy.index = df1_custom_idx.index
df2_numeric_idx_copy['Copied_Value_A'] = df1_custom_idx['Value_A']
print("df2 after homogenizing index and copying:")
print(df2_numeric_idx_copy)
Output:
df2 after homogenizing index and copying:
Property_X Copied_Value_A
a 100 10
b 200 20
c 300 30
d 400 40
This solution only works if the number of rows is the same and the order implies a one-to-one mapping.
Solution 2: Bypass Index Alignment with .to_numpy()
or .values
(Recommended if Indexes Should Differ)
If you want to copy the column data based purely on its order (position), ignoring the index labels, convert the source column(s) to a NumPy array first using .to_numpy()
(preferred) or the .values
attribute. Assigning a NumPy array to a DataFrame column bypasses Pandas' index alignment.
import pandas as pd
df1_custom_idx = pd.DataFrame({'Value_A': [10,20,30,40]}, index=['a','b','c','d'])
df2_numeric_idx_copy2 = pd.DataFrame({'Property_X': [100,200,300,400]}, index=[0,1,2,3])
# ✅ Copy using .to_numpy() to bypass index alignment
df2_numeric_idx_copy2['Value_A_numpy'] = df1_custom_idx['Value_A'].to_numpy()
print("df2 after copying with .to_numpy():")
print(df2_numeric_idx_copy2)
print()
# For multiple columns:
df1_multi_cols = pd.DataFrame({'A': [1,2], 'B': [3,4]}, index=['x','y'])
df2_target = pd.DataFrame({'C': [5,6]}, index=[0,1])
df2_target[['New_A', 'New_B']] = df1_multi_cols[['A', 'B']].to_numpy()
print("Multiple columns with .to_numpy():\n", df2_target)
Output:
df2 after copying with .to_numpy():
Property_X Value_A_numpy
0 100 10
1 200 20
2 300 30
3 400 40
Multiple columns with .to_numpy():
C New_A New_B
0 5 1 3
1 6 2 4
df1['Column'].to_numpy()
: Converts the Series to a NumPy array, discarding index information.df1['Column'].values
: Similar effect, but.to_numpy()
is generally preferred in modern Pandas. This assumes the source and target DataFrames have the same number of rows if you're assigning to an existing column or creating new columns for the entire length of the target.
Conclusion
Copying columns between Pandas DataFrames is a fundamental operation:
- Direct assignment (
df2['NewCol'] = df1['SourceCol']
ordf2[cols_list] = df1[cols_list]
) is the simplest method when indexes align or are intended to align. Always use bracket notation[]
for assigning to new columns. - Use
df_subset.copy()
if you are creating a new DataFrame from a subset of another. - To avoid
NaN
values due to misaligned indexes:- Either homogenize the indexes (
df2.index = df1.index
) if the DataFrames correspond row-by-row. - Or, more commonly and robustly, bypass index alignment by assigning the
.to_numpy()
(or.values
) representation of the source column(s) to the target DataFrame if you want to copy based on position/order.
- Either homogenize the indexes (
Understanding Pandas' index alignment behavior is key to successfully copying columns without introducing unintended missing values.