Skip to main content

Python Pandas: How to Merge Only Specific DataFrame Columns

When merging two Pandas DataFrames, you often don't need to bring in all columns from both DataFrames into the final result. Pandas merge() functionality allows you to selectively choose which columns from each DataFrame participate in the merge and which ones are carried over into the resulting merged DataFrame.

This guide explains how to merge only specific columns from one or both DataFrames, ensuring your merged DataFrame contains precisely the information you need.

The Goal: Selective Column Merging

Given two Pandas DataFrames, df1 (left) and df2 (right), that share one or more common "key" columns for joining, we want to create a new merged DataFrame that includes:

  • All columns from df1.
  • Only a specific subset of columns from df2. Or, more generally, a specific subset of columns from both df1 and df2.

Example DataFrames

import pandas as pd

# DataFrame 1 (e.g., Sales Data)
data1 = {
'OrderID': [101, 102, 103, 104, 105],
'ProductID': ['P20', 'P21', 'P20', 'P22', 'P21'],
'Quantity': [2, 1, 5, 3, 2],
'SaleDate': pd.to_datetime(['2025-01-15', '2025-01-16', '2025-01-17', '2025-01-18', '2025-01-19'])
}
df1 = pd.DataFrame(data1)
print("Original df1 (Sales Data):")
print(df1)
print()

# DataFrame 2 (e.g., Product Details)
data2 = {
'ProductID': ['P20', 'P21', 'P22', 'P23', 'P24'],
'ProductName': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics', 'Accessory'],
'Supplier': ['SupplierX', 'SupplierY', 'SupplierX', 'SupplierZ', 'SupplierY'],
'Cost': [800, 15, 45, 200, 30]
}

df2 = pd.DataFrame(data2)
print("Original df2 (Product Details):")
print(df2)

Output:

Original df1 (Sales Data):
OrderID ProductID Quantity SaleDate
0 101 P20 2 2025-01-15
1 102 P21 1 2025-01-16
2 103 P20 5 2025-01-17
3 104 P22 3 2025-01-18
4 105 P21 2 2025-01-19

Original df2 (Product Details):
ProductID ProductName Category Supplier Cost
0 P20 Laptop Electronics SupplierX 800
1 P21 Mouse Accessory SupplierY 15
2 P22 Keyboard Accessory SupplierX 45
3 P23 Monitor Electronics SupplierZ 200
4 P24 Webcam Accessory SupplierY 30

We want to merge these on 'ProductID', bringing all columns from df1 but only 'ProductName' and 'Category' from df2.

The most straightforward way is to select the desired columns from the right DataFrame (df2) before passing it to the merge() method.

Selecting Specific Columns from the Right DataFrame

Create a subset of df2 containing only the merge key(s) and the other columns you want to include from df2.

import pandas as pd

df1_example = pd.DataFrame({
'OrderID': [101, 102], 'ProductID': ['P20', 'P21'], 'Quantity': [2, 1]
})
df2_example = pd.DataFrame({
'ProductID': ['P20', 'P21', 'P22'],
'ProductName': ['Laptop', 'Mouse', 'Keyboard'],
'Category': ['Electronics', 'Accessory', 'Accessory'],
'Supplier': ['X', 'Y', 'X']
})

# Define the merge key and the specific columns to bring from df2
merge_key = 'ProductID'
columns_from_df2 = ['ProductName', 'Category'] # Columns we want from df2

# ✅ Create a subset of df2: includes the merge key AND the desired columns
df2_subset = df2_example[[merge_key] + columns_from_df2]
print("Subset of df2 to be merged:")
print(df2_subset)
print()

# Merge df1 with this subset of df2
merged_df_specific_from_right = pd.merge(
df1_example,
df2_subset,
on=merge_key,
how='left' # or 'inner', etc.
)

print("Merged DataFrame with specific columns from df2:")
print(merged_df_specific_from_right)

Output:

Subset of df2 to be merged:
ProductID ProductName Category
0 P20 Laptop Electronics
1 P21 Mouse Accessory
2 P22 Keyboard Accessory

Merged DataFrame with specific columns from df2:
OrderID ProductID Quantity ProductName Category
0 101 P20 2 Laptop Electronics
1 102 P21 1 Mouse Accessory
  • df2_example[[merge_key] + columns_from_df2]: This creates a new DataFrame from df2_example containing only the 'ProductID' (the merge key) and the 'ProductName' and 'Category' columns.
  • The merge operation then proceeds, and only these selected columns from df2_subset are added to df1_example.

Selecting Specific Columns from Both DataFrames

If you also want to use only a subset of columns from the left DataFrame (df1), apply the same subsetting logic to it.

import pandas as pd

df1_example = pd.DataFrame({
'OrderID': [101, 102], 'ProductID': ['P20', 'P21'], 'Quantity': [2, 1], 'Region': ['N', 'S']
})
df2_example = pd.DataFrame({
'ProductID': ['P20', 'P21'], 'ProductName': ['Laptop', 'Mouse'], 'Supplier': ['X', 'Y']
})

merge_key = 'ProductID'
cols_from_df1_to_merge = ['OrderID', 'ProductID', 'Quantity'] # Keep these from df1
cols_from_df2_to_merge = ['ProductName'] # Keep this from df2 (plus merge key)

# Create subsets of both DataFrames
df1_subset_for_merge = df1_example[cols_from_df1_to_merge]
df2_subset_for_merge = df2_example[[merge_key] + cols_from_df2_to_merge]

# Merge the subsets
merged_df_specific_both = pd.merge(
df1_subset_for_merge,
df2_subset_for_merge,
on=merge_key,
how='left'
)

print("Merged DataFrame with specific columns from both df1 and df2:")
print(merged_df_specific_both)

Output:

Merged DataFrame with specific columns from both df1 and df2:
OrderID ProductID Quantity ProductName
0 101 P20 2 Laptop
1 102 P21 1 Mouse

Method 2: Merging All then Dropping Unwanted Columns

An alternative is to merge all (or more than needed) columns initially and then use DataFrame.drop() to remove the columns you don't want in the final result.

import pandas as pd

df1_example = pd.DataFrame({'OrderID': [101], 'ProductID': ['P20']})
df2_example = pd.DataFrame({
'ProductID': ['P20'], 'ProductName': ['Laptop'],
'Category': ['Electronics'], 'Supplier': ['X'], 'Cost': [800]
})

merge_key = 'ProductID'

# Step 1: Merge, potentially bringing in all columns from df2 initially
# For this example, let's assume df2 has many columns.
# df_merged_all = pd.merge(df1, df2, on=merge_key, how='left')
# In this case, df2_example is small, so let's merge with it directly
df_merged_temp = pd.merge(df1_example, df2_example, on=merge_key, how='left')
print("Temporarily merged DataFrame (all df2 cols):")
print(df_merged_temp)
print()

# Step 2: Define columns to drop from the merged result
# These would be columns from df2 that we didn't want (e.g., 'Supplier', 'Cost')
columns_to_remove_after_merge = ['Supplier', 'Cost']

# ✅ Drop the unwanted columns
df_final_dropped = df_merged_temp.drop(columns=columns_to_remove_after_merge)

print("Merged DataFrame after dropping unwanted columns:")
print(df_final_dropped)

Output:

Temporarily merged DataFrame (all df2 cols):
OrderID ProductID ProductName Category Supplier Cost
0 101 P20 Laptop Electronics X 800

Merged DataFrame after dropping unwanted columns:
OrderID ProductID ProductName Category
0 101 P20 Laptop Electronics

This method can be useful if the list of columns to drop is shorter or easier to define than the list of columns to keep, or if some columns from df2 are needed for intermediate steps before being dropped. However, subsetting before the merge (Method 1) is often more memory and computationally efficient if many columns from df2 are not needed at all.

Using DataFrame.loc for Subsetting Before Merging (Alternative to Direct Slicing)

Instead of df2_subset = df2[[merge_key] + columns_from_df2], you can use .loc to select columns. The result is the same.

import pandas as pd

df1_example = pd.DataFrame({'OrderID': [101], 'ProductID': ['P20']})
df2_example = pd.DataFrame({
'ProductID': ['P20'], 'ProductName': ['Laptop'], 'Category': ['Electronics']
})
merge_key = 'ProductID'
columns_from_df2 = ['ProductName', 'Category']

# ✅ Select columns from df2 using .loc
# The first ':' means all rows.
df2_subset_loc = df2_example.loc[:, [merge_key] + columns_from_df2]

merged_df_loc = pd.merge(df1_example, df2_subset_loc, on=merge_key, how='left')
print("Merged using .loc for subsetting df2:")
print(merged_df_loc)

Output:

Merged using .loc for subsetting df2:
OrderID ProductID ProductName Category
0 101 P20 Laptop Electronics

This is just a different syntax for selecting the subset of df2 prior to merging.

Key merge() Parameters (on, how)

Remember to correctly specify:

  • on: The column name(s) to join on (must be present in both DataFrames). Use left_on and right_on if key column names differ.
  • how: Type of merge to be performed:
    • 'left': Keeps all keys from df1.
    • 'right': Keeps all keys from df2.
    • 'outer': Keeps all keys from both DataFrames.
    • 'inner' (default): Keeps only keys common to both DataFrames.

Conclusion

To merge only specific columns from one Pandas DataFrame (df2) into another (df1):

  1. The recommended and most efficient approach is to create a subset of df2 containing only the merge key(s) and the desired columns before calling pd.merge():
    df2_selected_cols = df2[['merge_key_col'] + ['col_to_keep1', 'col_to_keep2']]
    merged = pd.merge(df1, df2_selected_cols, on='merge_key_col', how='left')
  2. You can apply the same subsetting logic to df1 if you only want specific columns from it as well.
  3. Alternatively, merge more columns than needed and then use merged_df.drop(columns=['unwanted_col1', ...]) to remove the excess columns from the result. This is generally less efficient if many columns are to be dropped.

By selectively choosing columns before or after the merge, you can construct a final DataFrame that precisely contains the data required for your analysis.