Python Pandas: How to Swap Two DataFrame Columns
Reordering columns in a Pandas DataFrame is a common data manipulation task, often needed for presentation, analysis, or preparing data for specific models. You might want to swap the positions of two specific columns or define a completely new column order.
This guide explains several methods to swap two DataFrame columns in Pandas, covering scenarios where you want to swap only their positions or swap both their names and underlying data.
Understanding Column Swapping: Position vs. Name/Content
It's crucial to distinguish between:
- Swapping Column Positions: Changing the display order of columns while keeping their names and data associated. For example, if you have columns
A, B, C
, you might wantA, C, B
. The data within columnB
remains with columnB
, it just moves. This is the most common interpretation of "swapping columns." - Swapping Column Names and Content (Effectively Renaming and Reassigning): This is less common for a simple "swap" operation. It would mean if column
B
had data X and columnC
had data Y, after the swap, the column now namedC
would have data X, and the column now namedB
would have data Y. This is more akin to renaming columns and potentially reassigning their data, whichdf.columns = new_names_list
can achieve ifnew_names_list
is a permutation of the old names. This article primarily focuses on swapping column positions.
Example DataFrame:
import pandas as pd
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'StartDate': ['2020-01-15', '2019-03-01', '2021-06-10', '2020-08-20'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)
Output:
Original DataFrame:
ID Name Department StartDate Salary
0 101 Alice HR 2020-01-15 60000
1 102 Bob Engineering 2019-03-01 85000
2 103 Charlie HR 2021-06-10 62000
3 104 David Sales 2020-08-20 70000
Method 1: Reordering by Providing a New Column List (Swaps Positions)
This is the standard way to change the order of columns, including swapping two specific ones. You create a new list representing the desired order of column names and then reindex or select the DataFrame columns based on this new list.
Using DataFrame.reindex(columns=...)
The reindex()
method can conform the DataFrame to a new set of column labels.
import pandas as pd
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'StartDate': ['2020-01-15', '2019-03-01', '2021-06-10', '2020-08-20'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy() # Work on a copy
# Original columns: ['ID', 'Name', 'Department', 'StartDate', 'Salary']
# Let's swap 'Department' and 'StartDate'
new_column_order = ['ID', 'Name', 'StartDate', 'Department', 'Salary']
df_reindexed = df.reindex(columns=new_column_order)
print("DataFrame after reindex (Department and StartDate swapped):")
print(df_reindexed)
Output:
DataFrame after reindex (Department and StartDate swapped):
ID Name StartDate Department Salary
0 101 Alice 2020-01-15 HR 60000
1 102 Bob 2019-03-01 Engineering 85000
2 103 Charlie 2021-06-10 HR 62000
3 104 David 2020-08-20 Sales 70000
- The
columns
argument inreindex
specifies the new order.
Using DataFrame.loc[:, new_column_list]
or df[new_column_list]
You can also use standard DataFrame indexing by passing the new list of column names.
import pandas as pd
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'StartDate': ['2020-01-15', '2019-03-01', '2021-06-10', '2020-08-20'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
new_column_order = ['ID', 'Name', 'StartDate', 'Department', 'Salary']
# Using df[new_column_order]
df_bracket_selection = df[new_column_order]
print("DataFrame after bracket selection (Department and StartDate swapped):")
print(df_bracket_selection)
# Using df.loc[:, new_column_order] (more explicit)
df_loc_selection = df.loc[:, new_column_order]
print("DataFrame after .loc selection (Department and StartDate swapped):")
print(df_loc_selection)
Output:
DataFrame after bracket selection (Department and StartDate swapped):
ID Name StartDate Department Salary
0 101 Alice 2020-01-15 HR 60000
1 102 Bob 2019-03-01 Engineering 85000
2 103 Charlie 2021-06-10 HR 62000
3 104 David 2020-08-20 Sales 70000
DataFrame after .loc selection (Department and StartDate swapped):
ID Name StartDate Department Salary
0 101 Alice 2020-01-15 HR 60000
1 102 Bob 2019-03-01 Engineering 85000
2 103 Charlie 2021-06-10 HR 62000
3 104 David 2020-08-20 Sales 70000
Both df[new_column_order]
and df.loc[:, new_column_order]
achieve the same result of reordering columns based on the provided list.
Creating the New Column List
To swap two specific columns, say col_A
and col_B
, while keeping others in place:
- Get the current list of column names.
- Find the indices of
col_A
andcol_B
. - Swap these names in the list.
import pandas as pd
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'StartDate': ['2020-01-15', '2019-03-01', '2021-06-10', '2020-08-20'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
cols_to_swap = ['Department', 'Salary']
col1_name, col2_name = cols_to_swap[0], cols_to_swap[1]
current_columns = list(df.columns) # Get list of current column names
print(f"Original column order: {current_columns}")
# Find indices of the columns to swap
idx1 = current_columns.index(col1_name)
idx2 = current_columns.index(col2_name)
# Swap the names in the list
current_columns[idx1], current_columns[idx2] = current_columns[idx2], current_columns[idx1]
swapped_column_order = current_columns
print(f"New column order after swapping '{col1_name}' and '{col2_name}': {swapped_column_order}")
# Apply the new order
df_swapped_specific = df[swapped_column_order]
print("\nDataFrame with specific columns swapped:")
print(df_swapped_specific)
Output:
Original column order: ['ID', 'Name', 'Department', 'StartDate', 'Salary']
New column order after swapping 'Department' and 'Salary': ['ID', 'Name', 'Salary', 'StartDate', 'Department']
DataFrame with specific columns swapped:
ID Name Salary StartDate Department
0 101 Alice 60000 2020-01-15 HR
1 102 Bob 85000 2019-03-01 Engineering
2 103 Charlie 62000 2021-06-10 HR
3 104 David 70000 2020-08-20 Sales
Method 2: Swapping Column Names and Content (Rename-like behavior)
If the goal is not just to change the display order but to effectively swap the names associated with the data of two columns, this is more like a renaming operation where the data under the old names effectively moves to the new names. A direct assignment to df.columns
achieves this if the new list of names is a permutation.
Using df.columns
Assignment
This method directly reassigns the columns
attribute of the DataFrame. If you provide a list of column names that is a permutation of the original (e.g., two names swapped), Pandas will reassign the names. The underlying data itself doesn't move; the labels pointing to the data series are what change.
import pandas as pd
data_for_rename = {
'Col_X': [1, 2, 3],
'Col_Y': ['A', 'B', 'C'],
'Col_Z': [True, False, True]
}
df_rename = pd.DataFrame(data_for_rename)
print("Original DataFrame for renaming:")
print(df_rename)
# Original columns: ['Col_X', 'Col_Y', 'Col_Z']
# We want the data under 'Col_X' to now be under 'Col_Y_NewName',
# and data under 'Col_Y' to be under 'Col_X_NewName'.
# This is NOT a simple positional swap, but a relabeling.
# To truly swap contents AND names:
# Step 1: Create the desired new order of NAMES
cols = list(df_rename.columns)
idx_x = cols.index('Col_X')
idx_y = cols.index('Col_Y')
cols[idx_x], cols[idx_y] = cols[idx_y], cols[idx_x] # Swap names in list: ['Col_Y', 'Col_X', 'Col_Z']
# Step 2: Assign these new names to the DataFrame's columns attribute
df_rename.columns = cols
print("\nDataFrame after swapping names via df.columns = ...:")
print(df_rename)
Output:
Original DataFrame for renaming:
Col_X Col_Y Col_Z
0 1 A True
1 2 B False
2 3 C True
DataFrame after swapping names via df.columns = ...:
Col_Y Col_X Col_Z
0 1 A True
1 2 B False
2 3 C True
Important Distinction: Using df.columns = new_list_of_names
re-labels the existing series of data. If new_list_of_names
is just a reordering of the original names, it looks like the columns and their data have swapped positions. If new_list_of_names
contains different names, it's a rename. This is fundamentally different from Method 1, where df.reindex(columns=...)
or df[new_order]
selects and reorders existing named columns along with their data.
Creating a Reusable Function to Swap Two Columns (Positions)
For convenience, especially when swapping two specific columns by name: this function swap_columns_by_name()
safely creates a list of current column names, finds the indices of the two columns to be swapped, swaps their names in the list, and then re-selects the DataFrame columns in this new order.
import pandas as pd
def swap_columns_by_name(df_input, col_name1, col_name2):
"""Swaps the positions of two columns in a DataFrame by their names."""
df = df_input.copy() # Work on a copy to avoid modifying original DataFrame
column_list = list(df.columns)
try:
idx1 = column_list.index(col_name1)
idx2 = column_list.index(col_name2)
except ValueError:
print(f"Error: One or both column names ('{col_name1}', '{col_name2}') not found in DataFrame.")
return df_input # Return original if columns not found
# Swap the names in the list
column_list[idx1], column_list[idx2] = column_list[idx2], column_list[idx1]
# Return DataFrame with reordered columns
return df[column_list]
data = {
'ID': [101, 102, 103, 104],
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'StartDate': ['2020-01-15', '2019-03-01', '2021-06-10', '2020-08-20'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
df_swapped_func = swap_columns_by_name(df, 'Department', 'Salary')
print("DataFrame after swapping 'Department' and 'Salary' using function:")
print(df_swapped_func.head()) # .head() for brevity if needed
Output:
DataFrame after swapping 'Department' and 'Salary' using function:
ID Name Salary StartDate Department
0 101 Alice 60000 2020-01-15 HR
1 102 Bob 85000 2019-03-01 Engineering
2 103 Charlie 62000 2021-06-10 HR
3 104 David 70000 2020-08-20 Sales
Conclusion
To swap the positions of columns in a Pandas DataFrame:
- Create a list of column names in the new desired order.
- Use this list to reindex or select columns:
df_new = df.reindex(columns=new_order_list)
df_new = df[new_order_list]
(ordf.loc[:, new_order_list]
) (Most common)
If you need to swap columns by name programmatically, create a function that manipulates the list of column names and then applies this new order.
Assigning directly to df.columns = new_name_list
primarily re-labels the columns. If the new_name_list
is a permutation of the old names, it effectively swaps the names associated with the underlying data series, which can appear as a content swap if you track by name. For simply changing the display order of existing named columns, reindexing with a new column order list is the standard approach.