Skip to main content

Python Pandas: How to Fix "ValueError: Shape of passed values is (X, N), indices imply (X, M)"

The ValueError: Shape of passed values is (R1, C1), indices imply (R2, C2) (where R and C represent row and column counts) is a common error encountered when creating a Pandas DataFrame using the pd.DataFrame() constructor. This error signals a fundamental misalignment: the dimensions of the data you're providing do not match the dimensions defined by the index and/or columns arguments you've specified.

This guide will clearly dissect this ValueError, explain why the mismatch between data shape and index/column label specifications triggers it, and provide straightforward solutions, including how to correctly align your data, column names, and index labels for successful DataFrame creation.

Understanding the "Shape Mismatch" Error in pd.DataFrame()

When you create a Pandas DataFrame, for example, using pd.DataFrame(data, index=my_index, columns=my_columns), Pandas needs the provided data, my_index, and my_columns to be dimensionally consistent.

  • The number of items in my_columns must match the number of columns in your data.
  • The number of items in my_index must match the number of rows in your data.

The error "Shape of passed values is (R_data, C_data), indices imply (R_implied, C_implied)" tells you:

  • (R_data, C_data): The actual shape (rows, columns) of the data you passed.
  • (R_implied, C_implied): The shape Pandas expects based on the length of the index and columns arguments you provided.

If these don't align, the ValueError is raised.

Scenario 1: Mismatch Between Data's Columns and columns Argument

This is the most common trigger for this error. The number of column names you provide in the columns list doesn't match the actual number of columns present in your input data.

Reproducing the Error

import pandas as pd
import numpy as np

# Sample data: 3 rows, 4 columns
array_data = np.array([
[10, 20, 30, 40],
[50, 60, 70, 80],
[90, 100, 110, 120]
])
print(f"Shape of array_data: {array_data.shape}")

try:
# ⛔️ Incorrect: Data has 4 columns, but only 3 column names are provided
df_error = pd.DataFrame(array_data, columns=['Col_A', 'Col_B', 'Col_C'])
print(df_error)
except ValueError as e:
print(f"Error: {e}")

Output:

Shape of array_data: (3, 4)
Error: Shape of passed values is (3, 4), indices imply (3, 3)
note

The data array_data has 4 columns, but we only supplied 3 names in columns=['Col_A', 'Col_B', 'Col_C']. Pandas expects data for 3 columns but received data for 4.

Solution A: Match the Number of Column Names to Data Columns

Ensure the list passed to the columns argument has the same number of elements as there are columns in your input data.

import pandas as pd
import numpy as np

# array_data defined as before (3 rows, 4 columns)
array_data = np.array([
[10, 20, 30, 40],
[50, 60, 70, 80],
[90, 100, 110, 120]
])

# ✅ Correct: Provide 4 column names for the 4 columns in array_data
df_correct_cols = pd.DataFrame(array_data, columns=['Metric1', 'Metric2', 'Metric3', 'Metric4'])

print("DataFrame with correct number of column names:")
print(df_correct_cols)

Output:

DataFrame with correct number of column names:
Metric1 Metric2 Metric3 Metric4
0 10 20 30 40
1 50 60 70 80
2 90 100 110 120

Solution B: Transpose Input Data if Orientation is Incorrect

Sometimes, your input data might be "pivoted" or "transposed" relative to your intended DataFrame structure. If array_data was intended to represent 4 rows and 3 columns, but you still want to use 3 column names, you might need to transpose the data first.

import pandas as pd
import numpy as np

# array_data defined as before (3 rows, 4 columns)
array_data = np.array([
[10, 20, 30, 40],
[50, 60, 70, 80],
[90, 100, 110, 120]
])

print(f"Original array_data shape: {array_data.shape}")
print()

transposed_data = array_data.T # Transpose the array
print(f"Transposed_data shape: {transposed_data.shape}")
print()

# ✅ Correct: Transposed data now has 3 columns, matching the 3 column names
df_transposed = pd.DataFrame(transposed_data, columns=['FeatureX', 'FeatureY', 'FeatureZ'])

print("DataFrame from transposed data:")
print(df_transposed)

Output:

Original array_data shape: (3, 4)

Transposed_data shape: (4, 3)

DataFrame from transposed data:
FeatureX FeatureY FeatureZ
0 10 50 90
1 20 60 100
2 30 70 110
3 40 80 120
note

Transposing changes rows to columns and columns to rows. Use this if the logical structure of your data is oriented differently than required by your desired column names.

Scenario 2: Mismatch Between Data's Rows and index Argument

Similarly, if you provide an explicit index argument, its length must match the number of rows in your input data.

Reproducing the Error

import pandas as pd
import numpy as np

# Sample data: 3 rows, 2 columns
row_data = np.array([
['Alice', 25],
['Bob', 30],
['Charlie', 22]
])
print(f"Shape of row_data: {row_data.shape}")

custom_index_labels = ['Person1', 'Person2'] # Only 2 index labels

try:
# ⛔️ Incorrect: Data has 3 rows, but only 2 index labels are provided
df_row_error = pd.DataFrame(row_data,
columns=['Name', 'Age'],
index=custom_index_labels)
print(df_row_error)
except ValueError as e:
print(f"Error: {e}")

Output:

Shape of row_data: (3, 2)
Error: Shape of passed values is (3, 2), indices imply (2, 2)
note

The data row_data has 3 rows, but we only provided 2 labels for the index.

Solution: Match the Number of Index Labels to Data Rows

Ensure the list passed to the index argument has the same number of elements as there are rows in your input data.

import pandas as pd
import numpy as np

# row_data defined as above (3 rows, 2 columns)
row_data = np.array([
['Alice', 25],
['Bob', 30],
['Charlie', 22]
])

# ✅ Correct: Provide 3 index labels for the 3 rows in row_data
correct_index_labels = ['Participant_A', 'Participant_B', 'Participant_C']
df_correct_index = pd.DataFrame(row_data,
columns=['Name', 'Age'],
index=correct_index_labels)

print("DataFrame with correct number of index labels:")
print(df_correct_index)

Output:

DataFrame with correct number of index labels:
Name Age
Participant_A Alice 25
Participant_B Bob 30
Participant_C Charlie 22

Scenario 3: Combined Data, Index, and Column Mismatches

It's possible to have mismatches in both column and index specifications simultaneously. The error message will reflect the specific discrepancies Pandas detects. The solution involves ensuring both the columns list and index list correctly match the dimensions of your input data.

import pandas as pd
import numpy as np

# Data: 2 rows, 3 columns
combined_data = np.array([[1,2,3], [4,5,6]])

# Columns: Specify 2 names (mismatch)
# Index: Specify 3 labels (mismatch)
try:
df_combined_error = pd.DataFrame(combined_data,
columns=['X', 'Y'],
index=['r1', 'r2', 'r3'])
except ValueError as e:
print(f"Error with combined mismatch: {e}")

Output:

DataFrame with correct number of index labels:
Name Age
Participant_A Alice 25
Participant_B Bob 30
Participant_C Charlie 22

Solution: df_ok = pd.DataFrame(combined_data, columns=['X','Y','Z'], index=['r1','r2'])

Debugging Tip: Check Shapes with .shape

Before creating your DataFrame, if you're unsure about dimensions, print the .shape attribute of your input data (especially if it's a NumPy array or another DataFrame). Also, check the len() of your columns and index lists.

import numpy as np

my_data = np.random.rand(5, 3) # 5 rows, 3 columns
my_columns = ['A', 'B', 'C', 'D'] # 4 column names - mismatch!
my_index = ['r1', 'r2', 'r3', 'r4', 'r5'] # 5 index labels - matches data rows

print(f"Data shape: {my_data.shape}")
print(f"Number of column names: {len(my_columns)}")
print(f"Number of index labels: {len(my_index)}")

Output:

Data shape: (5, 3)
Number of column names: 4
Number of index labels: 5

Distinction from InvalidIndexError

It's important to distinguish this ValueError from pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects.

  • ValueError: Shape of passed values...: Occurs during pd.DataFrame() creation if the lengths of provided data, columns, and/or index do not align. It's about mismatched dimensions.
  • InvalidIndexError: Reindexing only valid...: Occurs during operations like pd.concat, reindex, or merge if an existing index or column axis that needs to be used for alignment contains duplicate labels. It's about non-unique values within an axis, not primarily about initial construction shape.

Conclusion

The ValueError: Shape of passed values is X, indices imply Y in Pandas is a clear indicator that the dimensions of your input data are inconsistent with the dimensions implied by the columns and/or index labels you've provided to the pd.DataFrame() constructor. The solution invariably involves:

  1. Verifying the shape of your input data (e.g., using data.shape).
  2. Ensuring the number of elements in your columns list matches the number of columns in your data.
  3. Ensuring the number of elements in your index list (if provided) matches the number of rows in your data.
  4. Considering transposing your data (e.g., data.T) if its orientation is the source of the mismatch.

By carefully aligning these dimensions, you can successfully create your Pandas DataFrames without encountering this shape-related error.