Skip to main content

Python Pandas: Convert Pivot Table Back to DataFrame (Flatten)

Pandas pivot tables (DataFrame.pivot_table() or pd.pivot_table()) are excellent for summarizing and reshaping data. However, the resulting pivot table often has a hierarchical index (MultiIndex) on its rows and/or columns, and the name of the original column used for columns in the pivot operation becomes the name of the column index. For further analysis, plotting, or exporting, you might need to convert this structured pivot table back into a "flat" DataFrame with a simple index and clear column names.

This guide explains how to effectively convert a Pandas pivot table back into a standard DataFrame format.

Understanding Pivot Table Structure

A pivot table typically transforms your data as follows:

  • Index: Unique values from the column(s) specified in the index argument become the new row index. If multiple columns are used for index, a MultiIndex is created.
  • Columns: Unique values from the column(s) specified in the columns argument become the new column headers. If multiple columns are used, a MultiIndex is created for columns. The name of the original column used for columns becomes the name of the column index (e.g., pivot_table.columns.name).
  • Values: Cells are populated by aggregating values from the column specified in the values argument, using the function in aggfunc.

This often results in a DataFrame where the index is one or more of your original data columns, and the column headers are also derived from data. To "flatten" it means to move the index levels back into regular data columns and simplify the column header structure.

Example: Creating a Pivot Table

Let's create a sample DataFrame and a pivot table from it.

import pandas as pd
import numpy as np

data = {
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb', 'Jan', 'Feb'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

# Create a pivot table: Sum of Sales by Region (index) and Product (columns)
pivot_table_df = df.pivot_table(
index='Region',
columns='Product',
values='Sales',
aggfunc='sum',
fill_value=0 # Fill missing combinations with 0
)
print("Generated Pivot Table:")
print(pivot_table_df)
print()

print(f"Pivot table's column index name: '{pivot_table_df.columns.name}'")

Output:

Original DataFrame:
Region Product Month Sales
0 North A Jan 100
1 North B Jan 150
2 South A Jan 200
3 South B Feb 50
4 North A Feb 120
5 West A Feb 300
6 South B Jan 180
7 West C Feb 90

Generated Pivot Table:
Product A B C
Region
North 220 150 0
South 200 230 0
West 300 0 90
Pivot table's column index name: 'Product'

Our goal is to convert pivot_table_df into a DataFrame where 'Region' is a regular column, and the 'Product' name above columns A, B, C is removed or flattened.

This is a two-step process: first, clean up the column index name, then move the row index levels to become regular columns.

Removing the Column Index Name (df.columns.name = None)

The pivot_table often sets the name attribute of the columns index (e.g., to 'Product' in our example). To get a cleaner "flat" DataFrame, it's good to remove this.

Resetting the Row Index (df.reset_index())

The reset_index() method converts the current DataFrame index (or levels of a MultiIndex) into regular data columns.

import pandas as pd

pivot_table_df = pd.DataFrame({
'A': {'North': 220, 'South': 200, 'West': 300},
'B': {'North': 150, 'South': 230, 'West': 0},
'C': {'North': 0, 'South': 0, 'West': 90}
}, dtype=int)
pivot_table_df.index.name = 'Region'
pivot_table_df.columns.name = 'Product' # Simulate pivot table output


# Make a copy to work with
df_flattened = pivot_table_df.copy()

# Step 1: Remove the name of the columns index
df_flattened.columns.name = None
print("After setting columns.name = None:")
print(df_flattened)
print()

# Step 2: Convert the row index ('Region') into a regular column
df_flattened = df_flattened.reset_index()
print("After reset_index():")
print(df_flattened)

Output:

After setting columns.name = None:
A B C
Region
North 220 150 0
South 200 230 0
West 300 0 90

After reset_index():
Region A B C
0 North 220 150 0
1 South 200 230 0
2 West 300 0 90

This is generally the most straightforward and common approach.

Method 2: Using reset_index() and rename_axis(None, axis=1)

This method achieves a similar result by first resetting the index and then removing the name from the column axis.

import pandas as pd

pivot_table_df = pd.DataFrame({
'A': {'North': 220, 'South': 200, 'West': 300},
'B': {'North': 150, 'South': 230, 'West': 0},
'C': {'North': 0, 'South': 0, 'West': 90}
}, dtype=int)
pivot_table_df.index.name = 'Region'
pivot_table_df.columns.name = 'Product'


df_flattened_rename = pivot_table_df.copy()

# Chain reset_index() and rename_axis()
df_flattened_rename = df_flattened_rename.reset_index().rename_axis(None, axis=1)

print("DataFrame flattened using reset_index() and rename_axis(None, axis=1):")
print(df_flattened_rename)

Output:

DataFrame flattened using reset_index() and rename_axis(None, axis=1):
Region A B C
0 North 220 150 0
1 South 200 230 0
2 West 300 0 90
  • reset_index(): Moves 'Region' from index to a column.
  • rename_axis(None, axis=1): Sets the name of the column index (which was 'Product') to None. axis=1 specifies that we are renaming the column axis's name.

Method 3: Using DataFrame.to_records() and pd.DataFrame() Constructor

The to_records() method can convert a DataFrame (including one with a MultiIndex from a pivot table) into a NumPy record array. This record array can then be used to construct a new, flat DataFrame.

import pandas as pd

pivot_table_df = pd.DataFrame({
'A': {'North': 220, 'South': 200, 'West': 300},
'B': {'North': 150, 'South': 230, 'West': 0},
'C': {'North': 0, 'South': 0, 'West': 90}
}, dtype=int)
pivot_table_df.index.name = 'Region'
pivot_table_df.columns.name = 'Product'

# Convert pivot table to NumPy record array
# The index levels become fields in the record array
records = pivot_table_df.to_records()
print("Records from pivot table:")
print(records)

# ✅ Create a new DataFrame from the records
df_from_records = pd.DataFrame(records)

print("DataFrame flattened using to_records():")
print(df_from_records)

Output:

Records from pivot table:
[('North', 220, 150, 0) ('South', 200, 230, 0) ('West', 300, 0, 90)]
DataFrame flattened using to_records():
Region A B C
0 North 220 150 0
1 South 200 230 0
2 West 300 0 90
note

to_records() automatically includes the index levels as fields in the resulting record array, which then become columns in the new DataFrame. The original column names from the pivot table are usually preserved as field names.

Handling MultiIndex Columns (e.g., from multiple values or aggfunc)

If your pivot table has a MultiIndex on the columns (e.g., if you used multiple values arguments or a list of aggfunc), you might need an extra step to flatten these column headers.

note

Flattening MultiIndex columns often involves re-creating the column names by joining the levels or selecting a specific level. This is a common precursor to the reset_index() step.

There are two options:

Option 1: Join levels with an underscore

import pandas as pd

df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb', 'Jan', 'Feb'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 15, 12, 5, 8, 20, 18, 7] # Added Quantity for demo
})

pivot_multi_col = df.pivot_table(
index='Region',
columns='Product',
values=['Sales', 'Quantity'], # Multiple values -> MultiIndex columns
aggfunc='sum',
fill_value=0
)
print("Pivot Table with MultiIndex Columns:")
print(pivot_multi_col)

# Flattening MultiIndex Columns (common techniques)
df_flat_cols = pivot_multi_col.copy()
df_flat_cols.columns = ['_'.join(col_levels).strip() for col_levels in df_flat_cols.columns.values]
df_flat_cols = df_flat_cols.reset_index()
print("Flattened MultiIndex Columns (joined):")
print(df_flat_cols)

Output:

Pivot Table with MultiIndex Columns:
Quantity Sales
Product A B C A B C
Region
North 18 15 0 220 150 0
South 12 23 0 200 230 0
West 20 0 7 300 0 90
Flattened MultiIndex Columns (joined):
Region Quantity_A Quantity_B Quantity_C Sales_A Sales_B Sales_C
0 North 18 15 0 220 150 0
1 South 12 23 0 200 230 0
2 West 20 0 7 300 0 90

Option 2: Option 2: If you only care about one level of the column MultiIndex

import pandas as pd

df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Month': ['Jan', 'Jan', 'Jan', 'Feb', 'Feb', 'Feb', 'Jan', 'Feb'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 15, 12, 5, 8, 20, 18, 7] # Added Quantity for demo
})

pivot_multi_col = df.pivot_table(
index='Region',
columns='Product',
values=['Sales', 'Quantity'], # Multiple values -> MultiIndex columns
aggfunc='sum',
fill_value=0
)
print("Pivot Table with MultiIndex Columns:")
print(pivot_multi_col)
print()

# If you only care about one level of the column MultiIndex (e.g., the inner 'Product')
df_flat_cols_droplevel = pivot_multi_col.copy()
df_flat_cols_droplevel.columns = df_flat_cols_droplevel.columns.droplevel(0) # Drop the 'Sales'/'Quantity' level
df_flat_cols_droplevel.columns.name = None # Remove the 'Product' name
df_flat_cols_droplevel = df_flat_cols_droplevel.reset_index()
print("Flattened MultiIndex Columns (droplevel):")
print(df_flat_cols_droplevel)

Output:

Pivot Table with MultiIndex Columns:
Quantity Sales
Product A B C A B C
Region
North 18 15 0 220 150 0
South 12 23 0 200 230 0
West 20 0 7 300 0 90

Flattened MultiIndex Columns (droplevel):
Region A B C A B C
0 North 18 15 0 220 150 0
1 South 12 23 0 200 230 0
2 West 20 0 7 300 0 90
note

This might lead to duplicate column names if not handled carefully.

Renaming Columns After Flattening (Optional)

After you have a flat DataFrame, you might want to rename the columns for clarity, especially if the original column names from the pivot table values (like 'A', 'B', 'C' in our example) aren't descriptive enough.

import pandas as pd

df_flattened = pd.DataFrame({
'Region': ['North', 'South', 'West'],
'A': [220, 200, 300],
'B': [150, 230, 0],
'C': [0, 0, 90]
})

df_renamed = df_flattened.rename(columns={
'A': 'Sales_Product_A',
'B': 'Sales_Product_B',
'C': 'Sales_Product_C'
})

# Or assign to df.columns directly if replacing all:
# df_flattened.columns = ['Region_Name', 'Sales_Product_A', 'Sales_Product_B', 'Sales_Product_C']
print("DataFrame with Renamed Columns:")
print(df_renamed)

Output:

DataFrame with Renamed Columns:
Region Sales_Product_A Sales_Product_B Sales_Product_C
0 North 220 150 0
1 South 200 230 0
2 West 300 0 90

Conclusion

Converting a Pandas pivot table back into a standard, "flat" DataFrame typically involves two main steps:

  1. Handling the column index: Remove the columns.name attribute (e.g., df_pivot.columns.name = None) or use rename_axis(None, axis=1) after resetting the row index. If the columns themselves form a MultiIndex, you might need to flatten them first (e.g., by joining level names or using droplevel()).
  2. Handling the row index: Use df_pivot.reset_index() to move the pivot table's row index levels into regular data columns.

The DataFrame.to_records() method followed by pd.DataFrame() also provides a concise way to achieve flattening. Choose the method that you find most readable and that best suits the structure of your pivot table, especially if dealing with MultiIndex columns.