Python Pandas: Convert Pivot Table Back to DataFrame (Flatten)
Pandas pivot tables (DataFrame.pivot_table()
or pd.pivot_table()
) are excellent for summarizing and reshaping data. However, the resulting pivot table often has a hierarchical index (MultiIndex) on its rows and/or columns, and the name of the original column used for columns
in the pivot operation becomes the name of the column index. For further analysis, plotting, or exporting, you might need to convert this structured pivot table back into a "flat" DataFrame with a simple index and clear column names.
This guide explains how to effectively convert a Pandas pivot table back into a standard DataFrame format.
Understanding Pivot Table Structure
A pivot table typically transforms your data as follows:
- Index: Unique values from the column(s) specified in the
index
argument become the new row index. If multiple columns are used forindex
, a MultiIndex is created. - Columns: Unique values from the column(s) specified in the
columns
argument become the new column headers. If multiple columns are used, a MultiIndex is created for columns. The name of the original column used forcolumns
becomes thename
of the column index (e.g.,pivot_table.columns.name
). - Values: Cells are populated by aggregating values from the column specified in the
values
argument, using the function inaggfunc
.
This often results in a DataFrame where the index is one or more of your original data columns, and the column headers are also derived from data. To "flatten" it means to move the index levels back into regular data columns and simplify the column header structure.
Example: Creating a Pivot Table
Let's create a sample DataFrame and a pivot table from it.
import pandas as pd
import numpy as np
data = {
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb', 'Jan', 'Feb'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Create a pivot table: Sum of Sales by Region (index) and Product (columns)
pivot_table_df = df.pivot_table(
index='Region',
columns='Product',
values='Sales',
aggfunc='sum',
fill_value=0 # Fill missing combinations with 0
)
print("Generated Pivot Table:")
print(pivot_table_df)
print()
print(f"Pivot table's column index name: '{pivot_table_df.columns.name}'")
Output:
Original DataFrame:
Region Product Month Sales
0 North A Jan 100
1 North B Jan 150
2 South A Jan 200
3 South B Feb 50
4 North A Feb 120
5 West A Feb 300
6 South B Jan 180
7 West C Feb 90
Generated Pivot Table:
Product A B C
Region
North 220 150 0
South 200 230 0
West 300 0 90
Pivot table's column index name: 'Product'
Our goal is to convert pivot_table_df
into a DataFrame where 'Region' is a regular column, and the 'Product' name above columns A, B, C is removed or flattened.