Skip to main content

Python Pandas: How to Fix "ValueError: Index contains duplicate entries, cannot reshape" (with pivot)

The DataFrame.pivot() method in Pandas is a powerful tool for reshaping data from a "long" to a "wide" format, similar to creating a pivot table in a spreadsheet. However, a strict requirement for pivot() is that the combination of values in the index and columns arguments must be unique for each row in the original DataFrame. If there are duplicate entries for these chosen index/column combinations, Pandas cannot unambiguously determine which value should populate the cell in the reshaped DataFrame, leading to the ValueError: Index contains duplicate entries, cannot reshape.

This guide will thoroughly explain why duplicate index/column value combinations cause this ValueError with pivot(), demonstrate how to reproduce it, and provide robust solutions, primarily focusing on using DataFrame.pivot_table() for aggregation, or pre-processing the data by resetting the index, using groupby().unstack(), or removing duplicates.

Understanding the Error: The Uniqueness Requirement of pivot()

The DataFrame.pivot(index='idx_col', columns='col_to_be_headers', values='val_col') method reshapes your DataFrame by:

  • Making unique values from idx_col the new row index.
  • Making unique values from col_to_be_headers the new column headers.
  • Filling the cells of this new grid with values from val_col.

For this to work, each pair of (idx_col value, col_to_be_headers value) in your original DataFrame must correspond to only one val_col value. If you have multiple rows where, for example, id='A' and name='X' both appear, pivot() doesn't know which salary value to put in the cell at row 'A', column 'X'. This ambiguity causes the "Index contains duplicate entries, cannot reshape" error. The "Index" in this error message refers to the internal composite index Pandas tries to build from your specified index and columns arguments for the pivot operation.

Reproducing the Error: pivot() with Duplicate Index/Column Combinations

Consider a DataFrame where combinations of id and product_category are not unique.

import pandas as pd

df_sales = pd.DataFrame({
'order_id': [101, 101, 102, 102, 103, 103],
'product_category': ['Electronics', 'Books', 'Electronics', 'Books', 'Apparel', 'Books'],
'sales_amount': [200, 50, 150, 30, 100, 40]
})
print("Original DataFrame:")
print(df_sales)
print()

# If we try to pivot with order_id as index and product_category as columns:
# For order_id=101, we have two different product_category values ('Electronics', 'Books'), which is fine.
# The issue would arise if, for example, we had:
# order_id=101, product_category='Electronics', sales_amount=200
# order_id=101, product_category='Electronics', sales_amount=250 <-- Duplicate (101, 'Electronics') pair

# Let's create a more direct example of the error:
df_error_example = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3], # 'id' will be index
'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'], # 'name' will be columns
'salary': [100, 200, 300, 400, 500, 600] # 'salary' will be values
})
# Here, (id=1, name='Alice') appears twice with different salaries (100 and 200).
print("DataFrame that will cause error with pivot():")
print(df_error_example)
print()

try:
# ⛔️ ValueError: Index contains duplicate entries, cannot reshape
# Because for id=1 and name='Alice', there are two salary values (100, 200).
# pivot() doesn't know which one to pick.
df_pivoted_error = df_error_example.pivot(index='id', columns='name', values='salary')
print(df_pivoted_error)
except ValueError as e:
print(f"Error: {e}")

Output:

Original DataFrame:
order_id product_category sales_amount
0 101 Electronics 200
1 101 Books 50
2 102 Electronics 150
3 102 Books 30
4 103 Apparel 100
5 103 Books 40

DataFrame that will cause error with pivot():
id name salary
0 1 Alice 100
1 1 Alice 200
2 2 Bob 300
3 2 Bob 400
4 3 Carl 500
5 3 Carl 600

Error: Index contains duplicate entries, cannot reshape

If your data contains duplicate index/column combinations and you want to aggregate these duplicates (e.g., sum, mean, count, first, last), DataFrame.pivot_table() is the correct tool. It includes an aggfunc parameter to specify how to handle multiple values for the same cell.

Aggregating Duplicates (e.g., sum, mean, count)**

import pandas as pd

# df_error_example defined as before
df_error_example = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3], # 'id' will be index
'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'], # 'name' will be columns
'salary': [100, 200, 300, 400, 500, 600] # 'salary' will be values
})

# ✅ Use pivot_table with aggfunc='sum' to sum salaries for duplicate (id, name) pairs
df_pivoted_sum = df_error_example.pivot_table(
index='id',
columns='name',
values='salary',
aggfunc='sum' # Aggregate duplicate entries by summing their 'salary'
)
print("Pivot table with aggfunc='sum':")
print(df_pivoted_sum)
print()

# Example with aggfunc='mean'
df_pivoted_mean = df_error_example.pivot_table(
index='id', columns='name', values='salary', aggfunc='mean'
)
print("Pivot table with aggfunc='mean':")
print(df_pivoted_mean)

Output:

Pivot table with aggfunc='sum':
name Alice Bob Carl
id
1 300.0 NaN NaN
2 NaN 700.0 NaN
3 NaN NaN 1100.0

Pivot table with aggfunc='mean':
name Alice Bob Carl
id
1 150.0 NaN NaN
2 NaN 350.0 NaN
3 NaN NaN 550.0

Other common aggfunc values include 'first', 'last', 'count', 'min', 'max', or even custom functions.

Default Aggregation in pivot_table() (mean)**

If you call pivot_table() without specifying aggfunc, it defaults to 'mean'.

Solution 2: Using DataFrame.groupby().[agg_func]().unstack()

This is another powerful way to achieve a similar result to pivot_table(), especially when you need to aggregate first.

  1. Group by the columns that will form your new index and columns.
  2. Apply an aggregation function (like mean(), sum(), first()) to the values column.
  3. Use .unstack() to pivot one of the group levels to become column headers.
import pandas as pd

# df_error_example defined as before
df_error_example = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3], # 'id' will be index
'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'], # 'name' will be columns
'salary': [100, 200, 300, 400, 500, 600] # 'salary' will be values
})

# ✅ Group by 'id' and 'name', calculate mean of 'salary', then unstack 'name' to columns
df_grouped_unstacked = df_error_example.groupby(
['id', 'name'] # These will form the MultiIndex before unstacking
)['salary'].mean().unstack(level='name') # Unstack the 'name' level

print("Reshaped DataFrame using groupby().mean().unstack():")
print(df_grouped_unstacked)

Output:

Reshaped DataFrame using groupby().mean().unstack():
name Alice Bob Carl
id
1 150.0 NaN NaN
2 NaN 350.0 NaN
3 NaN NaN 550.0
note

If you used .sum().unstack() or .first().unstack(), you'd get results equivalent to pivot_table with aggfunc='sum' or aggfunc='first'.

Solution 3: Removing Duplicate Entries Before Pivoting (Data-Altering)

If the duplicate entries are erroneous or you specifically want to keep only one instance (e.g., the first or last occurrence) and discard others before pivoting, you can use DataFrame.drop_duplicates(). This permanently alters your data by removing rows.

import pandas as pd

# df_error_example defined as before
df_error_example = pd.DataFrame({
'id': [1, 1, 2, 2, 3, 3], # 'id' will be index
'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'], # 'name' will be columns
'salary': [100, 200, 300, 400, 500, 600] # 'salary' will be values
})

# ✅ Drop duplicates based on 'id' and 'name', keeping the first occurrence
df_no_duplicates = df_error_example.drop_duplicates(subset=['id', 'name'], keep='first')
print("DataFrame after drop_duplicates(keep='first'):")
print(df_no_duplicates)
print()

# Now pivot() will work because the (id, name) combinations are unique
df_pivoted_after_drop = df_no_duplicates.pivot(index='id', columns='name', values='salary')
print("Pivoted DataFrame after dropping duplicates:")
print(df_pivoted_after_drop)

Output:

DataFrame after drop_duplicates(keep='first'):
id name salary
0 1 Alice 100
2 2 Bob 300
4 3 Carl 500

Pivoted DataFrame after dropping duplicates:
name Alice Bob Carl
id
1 100.0 NaN NaN
2 NaN 300.0 NaN
3 NaN NaN 500.0
note

The keep parameter in drop_duplicates can be 'first', 'last', or False (drops all duplicates).

Solution 4: Resetting the Index (If Duplicates are in Current Index, Not Data Columns)

This solution is more relevant if the ValueError: Index contains duplicate entries... error arises because the DataFrame's current index (not the columns you plan to use for index and columns parameters in pivot()) has duplicates, and an operation implicitly uses this index. For df.pivot(), the error is almost always due to non-unique combinations of values in the columns specified for index and columns parameters.

If, however, you were doing an operation like df.set_index(['id', 'name']).unstack() and the resulting MultiIndex from set_index had duplicates, then df.reset_index().groupby(...).unstack() (as in Solution 2) or df.reset_index().pivot_table(...) (as in Solution 1) would be more appropriate as they create a fresh, unique default index before the grouping/pivoting logic.

Key Difference: pivot() vs. pivot_table()

  • pivot(): Purely a reshaping operation. It requires unique combinations of index and columns values. It does not perform aggregation.
  • pivot_table(): More flexible. It can handle duplicate index/columns combinations by aggregating the corresponding values using a specified aggfunc.

Conclusion

The ValueError: Index contains duplicate entries, cannot reshape when using DataFrame.pivot() is a clear indication that your data, in its current form, is not suitable for direct reshaping with pivot() because of non-unique combinations in the columns you've chosen for the new index and new column headers. The primary solutions are:

  1. Use DataFrame.pivot_table(index=..., columns=..., values=..., aggfunc=...): This is the recommended approach as it allows you to explicitly define how to aggregate the multiple values that correspond to a single cell in the pivoted output.
  2. Use DataFrame.groupby([index_cols, column_cols])[value_col].agg_func().unstack(): A more manual but equally powerful way to aggregate and then reshape.
  3. Pre-process by drop_duplicates(subset=[index_cols, column_cols]): If you intend to discard duplicate entries and keep only one instance before pivoting. This changes your underlying data.

By understanding the uniqueness requirement of pivot() and leveraging pivot_table() or groupby().unstack() for aggregation, you can effectively reshape your data even when duplicates are present.