Python Pandas: How to Fix "ValueError: Index contains duplicate entries, cannot reshape" (with `pivot`)

The DataFrame.pivot() method in Pandas is a powerful tool for reshaping data from a "long" to a "wide" format, similar to creating a pivot table in a spreadsheet. However, a strict requirement for pivot() is that the combination of values in the index and columns arguments must be unique for each row in the original DataFrame. If there are duplicate entries for these chosen index/column combinations, Pandas cannot unambiguously determine which value should populate the cell in the reshaped DataFrame, leading to the ValueError: Index contains duplicate entries, cannot reshape.

This guide will thoroughly explain why duplicate index/column value combinations cause this ValueError with pivot(), demonstrate how to reproduce it, and provide robust solutions, primarily focusing on using DataFrame.pivot_table() for aggregation, or pre-processing the data by resetting the index, using groupby().unstack(), or removing duplicates.

Understanding the Error: The Uniqueness Requirement of `pivot()`

The DataFrame.pivot(index='idx_col', columns='col_to_be_headers', values='val_col') method reshapes your DataFrame by:

Making unique values from idx_col the new row index.
Making unique values from col_to_be_headers the new column headers.
Filling the cells of this new grid with values from val_col.

For this to work, each pair of (idx_col value, col_to_be_headers value) in your original DataFrame must correspond to only one val_col value. If you have multiple rows where, for example, id='A' and name='X' both appear, pivot() doesn't know which salary value to put in the cell at row 'A', column 'X'. This ambiguity causes the "Index contains duplicate entries, cannot reshape" error. The "Index" in this error message refers to the internal composite index Pandas tries to build from your specified index and columns arguments for the pivot operation.

Reproducing the Error: `pivot()` with Duplicate Index/Column Combinations

Consider a DataFrame where combinations of id and product_category are not unique.

import pandas as pd

df_sales = pd.DataFrame({
    'order_id': [101, 101, 102, 102, 103, 103],
    'product_category': ['Electronics', 'Books', 'Electronics', 'Books', 'Apparel', 'Books'],
    'sales_amount': [200, 50, 150, 30, 100, 40]
})
print("Original DataFrame:")
print(df_sales)
print()

# If we try to pivot with order_id as index and product_category as columns:
# For order_id=101, we have two different product_category values ('Electronics', 'Books'), which is fine.
# The issue would arise if, for example, we had:
# order_id=101, product_category='Electronics', sales_amount=200
# order_id=101, product_category='Electronics', sales_amount=250 <-- Duplicate (101, 'Electronics') pair

# Let's create a more direct example of the error:
df_error_example = pd.DataFrame({
    'id': [1, 1, 2, 2, 3, 3],                                   # 'id' will be index
    'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'],   # 'name' will be columns
    'salary': [100, 200, 300, 400, 500, 600]                    # 'salary' will be values
})
# Here, (id=1, name='Alice') appears twice with different salaries (100 and 200).
print("DataFrame that will cause error with pivot():")
print(df_error_example)
print()

try:
    # ⛔️ ValueError: Index contains duplicate entries, cannot reshape
    # Because for id=1 and name='Alice', there are two salary values (100, 200).
    # pivot() doesn't know which one to pick.
    df_pivoted_error = df_error_example.pivot(index='id', columns='name', values='salary')
    print(df_pivoted_error)
except ValueError as e:
    print(f"Error: {e}")

Output:

Original DataFrame:
   order_id product_category  sales_amount
     101      Electronics           200
     101            Books            50
     102      Electronics           150
     102            Books            30
     103          Apparel           100
     103            Books            40

DataFrame that will cause error with pivot():
   id   name  salary
 1  Alice     100
 1  Alice     200
 2    Bob     300
 2    Bob     400
 3   Carl     500
 3   Carl     600

Error: Index contains duplicate entries, cannot reshape

Solution 1: Using `DataFrame.pivot_table()` with an Aggregation Function (Recommended for Duplicates)

If your data contains duplicate index/column combinations and you want to aggregate these duplicates (e.g., sum, mean, count, first, last), DataFrame.pivot_table() is the correct tool. It includes an aggfunc parameter to specify how to handle multiple values for the same cell.

Aggregating Duplicates (e.g., `sum`, `mean`, `count`)**

import pandas as pd

# df_error_example defined as before
df_error_example = pd.DataFrame({
    'id': [1, 1, 2, 2, 3, 3],                                   # 'id' will be index
    'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'],   # 'name' will be columns
    'salary': [100, 200, 300, 400, 500, 600]                    # 'salary' will be values
})

# ✅ Use pivot_table with aggfunc='sum' to sum salaries for duplicate (id, name) pairs
df_pivoted_sum = df_error_example.pivot_table(
    index='id',
    columns='name',
    values='salary',
    aggfunc='sum' # Aggregate duplicate entries by summing their 'salary'
)
print("Pivot table with aggfunc='sum':")
print(df_pivoted_sum)
print()

# Example with aggfunc='mean'
df_pivoted_mean = df_error_example.pivot_table(
    index='id', columns='name', values='salary', aggfunc='mean'
)
print("Pivot table with aggfunc='mean':")
print(df_pivoted_mean)

Output:

Pivot table with aggfunc='sum':
name  Alice    Bob    Carl
id                        
1     300.0    NaN     NaN
2       NaN  700.0     NaN
3       NaN    NaN  1100.0

Pivot table with aggfunc='mean':
name  Alice    Bob   Carl
id                       
1     150.0    NaN    NaN
2       NaN  350.0    NaN
3       NaN    NaN  550.0

Other common aggfunc values include 'first', 'last', 'count', 'min', 'max', or even custom functions.

Default Aggregation in `pivot_table()` (`mean`)**

If you call pivot_table() without specifying aggfunc, it defaults to 'mean'.

Solution 2: Using `DataFrame.groupby().[agg_func]().unstack()`

This is another powerful way to achieve a similar result to pivot_table(), especially when you need to aggregate first.

Group by the columns that will form your new index and columns.
Apply an aggregation function (like mean(), sum(), first()) to the values column.
Use .unstack() to pivot one of the group levels to become column headers.

import pandas as pd

# df_error_example defined as before
df_error_example = pd.DataFrame({
    'id': [1, 1, 2, 2, 3, 3],                                   # 'id' will be index
    'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'],   # 'name' will be columns
    'salary': [100, 200, 300, 400, 500, 600]                    # 'salary' will be values
})

# ✅ Group by 'id' and 'name', calculate mean of 'salary', then unstack 'name' to columns
df_grouped_unstacked = df_error_example.groupby(
    ['id', 'name'] # These will form the MultiIndex before unstacking
)['salary'].mean().unstack(level='name') # Unstack the 'name' level

print("Reshaped DataFrame using groupby().mean().unstack():")
print(df_grouped_unstacked)

Output:

Reshaped DataFrame using groupby().mean().unstack():
name  Alice    Bob   Carl
id                       
1     150.0    NaN    NaN
2       NaN  350.0    NaN
3       NaN    NaN  550.0

note

If you used .sum().unstack() or .first().unstack(), you'd get results equivalent to pivot_table with aggfunc='sum' or aggfunc='first'.

Solution 3: Removing Duplicate Entries Before Pivoting (Data-Altering)

If the duplicate entries are erroneous or you specifically want to keep only one instance (e.g., the first or last occurrence) and discard others before pivoting, you can use DataFrame.drop_duplicates(). This permanently alters your data by removing rows.

import pandas as pd

# df_error_example defined as before
df_error_example = pd.DataFrame({
    'id': [1, 1, 2, 2, 3, 3],                                   # 'id' will be index
    'name': ['Alice', 'Alice', 'Bob', 'Bob', 'Carl', 'Carl'],   # 'name' will be columns
    'salary': [100, 200, 300, 400, 500, 600]                    # 'salary' will be values
})

# ✅ Drop duplicates based on 'id' and 'name', keeping the first occurrence
df_no_duplicates = df_error_example.drop_duplicates(subset=['id', 'name'], keep='first')
print("DataFrame after drop_duplicates(keep='first'):")
print(df_no_duplicates)
print()

# Now pivot() will work because the (id, name) combinations are unique
df_pivoted_after_drop = df_no_duplicates.pivot(index='id', columns='name', values='salary')
print("Pivoted DataFrame after dropping duplicates:")
print(df_pivoted_after_drop)

Output:

DataFrame after drop_duplicates(keep='first'):
   id   name  salary
0   1  Alice     100
2   2    Bob     300
4   3   Carl     500

Pivoted DataFrame after dropping duplicates:
name  Alice    Bob   Carl
id                       
1     100.0    NaN    NaN
2       NaN  300.0    NaN
3       NaN    NaN  500.0

note

The keep parameter in drop_duplicates can be 'first', 'last', or False (drops all duplicates).

Solution 4: Resetting the Index (If Duplicates are in Current Index, Not Data Columns)

This solution is more relevant if the ValueError: Index contains duplicate entries... error arises because the DataFrame's current index (not the columns you plan to use for index and columns parameters in pivot()) has duplicates, and an operation implicitly uses this index. For df.pivot(), the error is almost always due to non-unique combinations of values in the columns specified for index and columns parameters.

If, however, you were doing an operation like df.set_index(['id', 'name']).unstack() and the resulting MultiIndex from set_index had duplicates, then df.reset_index().groupby(...).unstack() (as in Solution 2) or df.reset_index().pivot_table(...) (as in Solution 1) would be more appropriate as they create a fresh, unique default index before the grouping/pivoting logic.

Key Difference: `pivot()` vs. `pivot_table()`

pivot(): Purely a reshaping operation. It requires unique combinations of index and columns values. It does not perform aggregation.
pivot_table(): More flexible. It can handle duplicate index/columns combinations by aggregating the corresponding values using a specified aggfunc.

Conclusion

The ValueError: Index contains duplicate entries, cannot reshape when using DataFrame.pivot() is a clear indication that your data, in its current form, is not suitable for direct reshaping with pivot() because of non-unique combinations in the columns you've chosen for the new index and new column headers. The primary solutions are:

Use DataFrame.pivot_table(index=..., columns=..., values=..., aggfunc=...): This is the recommended approach as it allows you to explicitly define how to aggregate the multiple values that correspond to a single cell in the pivoted output.
Use DataFrame.groupby([index_cols, column_cols])[value_col].agg_func().unstack(): A more manual but equally powerful way to aggregate and then reshape.
Pre-process by drop_duplicates(subset=[index_cols, column_cols]): If you intend to discard duplicate entries and keep only one instance before pivoting. This changes your underlying data.

By understanding the uniqueness requirement of pivot() and leveraging pivot_table() or groupby().unstack() for aggregation, you can effectively reshape your data even when duplicates are present.

Understanding the Error: The Uniqueness Requirement of pivot()​

Reproducing the Error: pivot() with Duplicate Index/Column Combinations​

Solution 1: Using DataFrame.pivot_table() with an Aggregation Function (Recommended for Duplicates)​

Aggregating Duplicates (e.g., sum, mean, count)**​

Default Aggregation in pivot_table() (mean)**​

Solution 2: Using DataFrame.groupby().[agg_func]().unstack()​

Solution 3: Removing Duplicate Entries Before Pivoting (Data-Altering)​

Solution 4: Resetting the Index (If Duplicates are in Current Index, Not Data Columns)​

Key Difference: pivot() vs. pivot_table()​

Conclusion​

Table of Contents

Understanding the Error: The Uniqueness Requirement of `pivot()`

Reproducing the Error: `pivot()` with Duplicate Index/Column Combinations

Solution 1: Using `DataFrame.pivot_table()` with an Aggregation Function (Recommended for Duplicates)

Aggregating Duplicates (e.g., `sum`, `mean`, `count`)**

Default Aggregation in `pivot_table()` (`mean`)**

Solution 2: Using `DataFrame.groupby().[agg_func]().unstack()`

Solution 3: Removing Duplicate Entries Before Pivoting (Data-Altering)

Solution 4: Resetting the Index (If Duplicates are in Current Index, Not Data Columns)

Key Difference: `pivot()` vs. `pivot_table()`

Conclusion