Python Pandas: How to Merge Only Specific DataFrame Columns

When merging two Pandas DataFrames, you often don't need to bring in all columns from both DataFrames into the final result. Pandas merge() functionality allows you to selectively choose which columns from each DataFrame participate in the merge and which ones are carried over into the resulting merged DataFrame.

This guide explains how to merge only specific columns from one or both DataFrames, ensuring your merged DataFrame contains precisely the information you need.

The Goal: Selective Column Merging

Given two Pandas DataFrames, df1 (left) and df2 (right), that share one or more common "key" columns for joining, we want to create a new merged DataFrame that includes:

All columns from df1.
Only a specific subset of columns from df2. Or, more generally, a specific subset of columns from both df1 and df2.

Example DataFrames

import pandas as pd

# DataFrame 1 (e.g., Sales Data)
data1 = {
    'OrderID': [101, 102, 103, 104, 105],
    'ProductID': ['P20', 'P21', 'P20', 'P22', 'P21'],
    'Quantity': [2, 1, 5, 3, 2],
    'SaleDate': pd.to_datetime(['2025-01-15', '2025-01-16', '2025-01-17', '2025-01-18', '2025-01-19'])
}
df1 = pd.DataFrame(data1)
print("Original df1 (Sales Data):")
print(df1)
print()

# DataFrame 2 (e.g., Product Details)
data2 = {
    'ProductID': ['P20', 'P21', 'P22', 'P23', 'P24'],
    'ProductName': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam'],
    'Category': ['Electronics', 'Accessory', 'Accessory', 'Electronics', 'Accessory'],
    'Supplier': ['SupplierX', 'SupplierY', 'SupplierX', 'SupplierZ', 'SupplierY'],
    'Cost': [800, 15, 45, 200, 30]
}

df2 = pd.DataFrame(data2)
print("Original df2 (Product Details):")
print(df2)

Output:

Original df1 (Sales Data):
   OrderID ProductID  Quantity   SaleDate
    101       P20         2 2025-01-15
    102       P21         1 2025-01-16
    103       P20         5 2025-01-17
    104       P22         3 2025-01-18
    105       P21         2 2025-01-19

Original df2 (Product Details):
  ProductID ProductName     Category   Supplier  Cost
     P20      Laptop  Electronics  SupplierX   800
     P21       Mouse    Accessory  SupplierY    15
     P22    Keyboard    Accessory  SupplierX    45
     P23     Monitor  Electronics  SupplierZ   200
     P24      Webcam    Accessory  SupplierY    30

We want to merge these on 'ProductID', bringing all columns from df1 but only 'ProductName' and 'Category' from df2.

Method 1: Subsetting the "Right" DataFrame Before Merging (Recommended)

The most straightforward way is to select the desired columns from the right DataFrame (df2) before passing it to the merge() method.

Selecting Specific Columns from the Right DataFrame

Create a subset of df2 containing only the merge key(s) and the other columns you want to include from df2.

import pandas as pd

df1_example = pd.DataFrame({
    'OrderID': [101, 102], 'ProductID': ['P20', 'P21'], 'Quantity': [2, 1]
})
df2_example = pd.DataFrame({
    'ProductID': ['P20', 'P21', 'P22'],
    'ProductName': ['Laptop', 'Mouse', 'Keyboard'],
    'Category': ['Electronics', 'Accessory', 'Accessory'],
    'Supplier': ['X', 'Y', 'X']
})

# Define the merge key and the specific columns to bring from df2
merge_key = 'ProductID'
columns_from_df2 = ['ProductName', 'Category'] # Columns we want from df2

# ✅ Create a subset of df2: includes the merge key AND the desired columns
df2_subset = df2_example[[merge_key] + columns_from_df2]
print("Subset of df2 to be merged:")
print(df2_subset)
print()

# Merge df1 with this subset of df2
merged_df_specific_from_right = pd.merge(
    df1_example,
    df2_subset,
    on=merge_key,
    how='left' # or 'inner', etc.
)

print("Merged DataFrame with specific columns from df2:")
print(merged_df_specific_from_right)

Output:

Subset of df2 to be merged:
  ProductID ProductName     Category
0       P20      Laptop  Electronics
1       P21       Mouse    Accessory
2       P22    Keyboard    Accessory

Merged DataFrame with specific columns from df2:
   OrderID ProductID  Quantity ProductName     Category
0      101       P20         2      Laptop  Electronics
1      102       P21         1       Mouse    Accessory

df2_example[[merge_key] + columns_from_df2]: This creates a new DataFrame from df2_example containing only the 'ProductID' (the merge key) and the 'ProductName' and 'Category' columns.
The merge operation then proceeds, and only these selected columns from df2_subset are added to df1_example.

Selecting Specific Columns from Both DataFrames

If you also want to use only a subset of columns from the left DataFrame (df1), apply the same subsetting logic to it.

import pandas as pd

df1_example = pd.DataFrame({
    'OrderID': [101, 102], 'ProductID': ['P20', 'P21'], 'Quantity': [2, 1], 'Region': ['N', 'S']
})
df2_example = pd.DataFrame({
    'ProductID': ['P20', 'P21'], 'ProductName': ['Laptop', 'Mouse'], 'Supplier': ['X', 'Y']
})

merge_key = 'ProductID'
cols_from_df1_to_merge = ['OrderID', 'ProductID', 'Quantity'] # Keep these from df1
cols_from_df2_to_merge = ['ProductName'] # Keep this from df2 (plus merge key)

# Create subsets of both DataFrames
df1_subset_for_merge = df1_example[cols_from_df1_to_merge]
df2_subset_for_merge = df2_example[[merge_key] + cols_from_df2_to_merge]

# Merge the subsets
merged_df_specific_both = pd.merge(
    df1_subset_for_merge,
    df2_subset_for_merge,
    on=merge_key,
    how='left'
)

print("Merged DataFrame with specific columns from both df1 and df2:")
print(merged_df_specific_both)

Output:

Merged DataFrame with specific columns from both df1 and df2:
   OrderID ProductID  Quantity ProductName
0      101       P20         2      Laptop
1      102       P21         1       Mouse

Method 2: Merging All then Dropping Unwanted Columns

An alternative is to merge all (or more than needed) columns initially and then use DataFrame.drop() to remove the columns you don't want in the final result.

import pandas as pd

df1_example = pd.DataFrame({'OrderID': [101], 'ProductID': ['P20']})
df2_example = pd.DataFrame({
    'ProductID': ['P20'], 'ProductName': ['Laptop'],
    'Category': ['Electronics'], 'Supplier': ['X'], 'Cost': [800]
})

merge_key = 'ProductID'

# Step 1: Merge, potentially bringing in all columns from df2 initially
# For this example, let's assume df2 has many columns.
# df_merged_all = pd.merge(df1, df2, on=merge_key, how='left')
# In this case, df2_example is small, so let's merge with it directly
df_merged_temp = pd.merge(df1_example, df2_example, on=merge_key, how='left')
print("Temporarily merged DataFrame (all df2 cols):")
print(df_merged_temp)
print()

# Step 2: Define columns to drop from the merged result
# These would be columns from df2 that we didn't want (e.g., 'Supplier', 'Cost')
columns_to_remove_after_merge = ['Supplier', 'Cost']

# ✅ Drop the unwanted columns
df_final_dropped = df_merged_temp.drop(columns=columns_to_remove_after_merge)

print("Merged DataFrame after dropping unwanted columns:")
print(df_final_dropped)

Output:

Temporarily merged DataFrame (all df2 cols):
   OrderID ProductID ProductName     Category Supplier  Cost
0      101       P20      Laptop  Electronics        X   800

Merged DataFrame after dropping unwanted columns:
   OrderID ProductID ProductName     Category
0      101       P20      Laptop  Electronics

This method can be useful if the list of columns to drop is shorter or easier to define than the list of columns to keep, or if some columns from df2 are needed for intermediate steps before being dropped. However, subsetting before the merge (Method 1) is often more memory and computationally efficient if many columns from df2 are not needed at all.

Using `DataFrame.loc` for Subsetting Before Merging (Alternative to Direct Slicing)

Instead of df2_subset = df2[[merge_key] + columns_from_df2], you can use .loc to select columns. The result is the same.

import pandas as pd

df1_example = pd.DataFrame({'OrderID': [101], 'ProductID': ['P20']})
df2_example = pd.DataFrame({
    'ProductID': ['P20'], 'ProductName': ['Laptop'], 'Category': ['Electronics']
})
merge_key = 'ProductID'
columns_from_df2 = ['ProductName', 'Category']

# ✅ Select columns from df2 using .loc
# The first ':' means all rows.
df2_subset_loc = df2_example.loc[:, [merge_key] + columns_from_df2]

merged_df_loc = pd.merge(df1_example, df2_subset_loc, on=merge_key, how='left')
print("Merged using .loc for subsetting df2:")
print(merged_df_loc)

Output:

Merged using .loc for subsetting df2:
   OrderID ProductID ProductName     Category
0      101       P20      Laptop  Electronics

This is just a different syntax for selecting the subset of df2 prior to merging.

Key `merge()` Parameters (`on`, `how`)

Remember to correctly specify:

on: The column name(s) to join on (must be present in both DataFrames). Use left_on and right_on if key column names differ.
how: Type of merge to be performed:
- 'left': Keeps all keys from df1.
- 'right': Keeps all keys from df2.
- 'outer': Keeps all keys from both DataFrames.
- 'inner' (default): Keeps only keys common to both DataFrames.

Conclusion

To merge only specific columns from one Pandas DataFrame (df2) into another (df1):

The recommended and most efficient approach is to create a subset of df2 containing only the merge key(s) and the desired columns before calling pd.merge():

df2_selected_cols = df2[['merge_key_col'] + ['col_to_keep1', 'col_to_keep2']]
merged = pd.merge(df1, df2_selected_cols, on='merge_key_col', how='left')

You can apply the same subsetting logic to df1 if you only want specific columns from it as well.
Alternatively, merge more columns than needed and then use merged_df.drop(columns=['unwanted_col1', ...]) to remove the excess columns from the result. This is generally less efficient if many columns are to be dropped.

By selectively choosing columns before or after the merge, you can construct a final DataFrame that precisely contains the data required for your analysis.

The Goal: Selective Column Merging​

Example DataFrames​

Method 1: Subsetting the "Right" DataFrame Before Merging (Recommended)​

Selecting Specific Columns from the Right DataFrame​

Selecting Specific Columns from Both DataFrames​

Method 2: Merging All then Dropping Unwanted Columns​

Using DataFrame.loc for Subsetting Before Merging (Alternative to Direct Slicing)​

Key merge() Parameters (on, how)​

Conclusion​

Table of Contents

The Goal: Selective Column Merging

Example DataFrames

Method 1: Subsetting the "Right" DataFrame Before Merging (Recommended)

Selecting Specific Columns from the Right DataFrame

Selecting Specific Columns from Both DataFrames

Method 2: Merging All then Dropping Unwanted Columns

Using `DataFrame.loc` for Subsetting Before Merging (Alternative to Direct Slicing)

Key `merge()` Parameters (`on`, `how`)

Conclusion