Python Pandas: How to Check if All Values in a Column (or Row) Are Equal
When cleaning or analyzing data with Pandas, you might need to verify if all values within a specific column are identical, or if all values across columns in a particular row are the same. This can be important for data validation, identifying constant features, or simplifying datasets.
This guide demonstrates various methods to check for equality of values within Pandas DataFrame columns and rows.
The Goal: Checking for Uniformity in Values
We aim to determine:
- If every value within one particular column is the same.
- If every value within each respective column is the same (i.e., each column is constant, but different columns can have different constant values).
- If all values across specified columns for a given row are identical.
Example DataFrame
import pandas as pd
import numpy as np # For some examples
data = {
'ID': [101, 102, 103, 104, 105],
'Category': ['A', 'A', 'A', 'A', 'A'], # All values are 'A'
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive'], # Mixed values
'Value1': [10, 10, 10, 10, 10], # All values are 10
'Value2': [10, 20, 10, 20, 10], # Mixed values
'All_Same_Row_Example': [5, 5, 5, 5, 5] # All values in this column are 5
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
ID Category Status Value1 Value2 All_Same_Row_Example
0 101 A Active 10 10 5
1 102 A Active 10 20 5
2 103 A Inactive 10 10 5
3 104 A Active 10 20 5
4 105 A Inactive 10 10 5
Check if ALL Values in a SINGLE Column Are Equal
Comparing All Elements to the First (NumPy approach - Recommended)
Convert the column to a NumPy array and check if all elements are equal to the first element. This is generally efficient.
import pandas as pd
df = pd.DataFrame({
'ID': [101, 102, 103, 104, 105],
'Category': ['A', 'A', 'A', 'A', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive'],
})
def all_values_equal_in_column(series):
"""Checks if all values in a Pandas Series are equal."""
if series.empty:
return True # Or False, depending on how you define for empty
# Convert to NumPy array for efficient comparison
arr = series.to_numpy()
return (arr == arr[0]).all()
print(f"Are all values in 'Category' equal? {all_values_equal_in_column(df['Category'])}")
print(f"Are all values in 'Status' equal? {all_values_equal_in_column(df['Status'])}")
Output:
Are all values in 'Category' equal? True
Are all values in 'Status' equal? False
series.to_numpy()
: Converts the Pandas Series to a NumPy array.arr == arr[0]
: Creates a boolean array comparing each element to the first..all()
: ReturnsTrue
if all elements in the boolean array areTrue
.
By default, NaN == NaN
is False
. If NaNs should be considered equal:
def all_values_equal_nan_aware(series):
if series.empty: return True
first_val = series.iloc[0]
if pd.isna(first_val):
return series.isna().all()
else:
return (series == first_val).all()
Using Series.nunique()
The nunique()
method counts the number of distinct elements. If there's only one unique element (or zero for an empty Series), all values are effectively the same.
import pandas as pd
df = pd.DataFrame({
'ID': [101, 102, 103, 104, 105],
'Category': ['A', 'A', 'A', 'A', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive'],
})
is_category_constant = df['Category'].nunique(dropna=True) <= 1 # dropna=True is default
is_status_constant = df['Status'].nunique() <= 1
print(f"Using nunique(): 'Category' constant? {is_category_constant}")
print(f"Using nunique(): 'Status' constant? {is_status_constant}")
# For empty Series, nunique() is 0, so 0 <= 1 is True.
empty_series = pd.Series([], dtype=object)
print(f"Using nunique(): Empty series constant? {empty_series.nunique() <= 1}")
Output:
Using nunique(): 'Category' constant? True
Using nunique(): 'Status' constant? False
Using nunique(): Empty series constant? True
dropna=True
(default): NaN values are not counted as a unique value.dropna=False
: NaN values are counted as one unique value if present.<= 1
: Accounts for empty Series (0 unique) and constant Series (1 unique).
Using len(Series.unique())
Similar to nunique()
, but unique()
returns an array of unique values.
import pandas as pd
df = pd.DataFrame({
'ID': [101, 102, 103, 104, 105],
'Category': ['A', 'A', 'A', 'A', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive'],
})
is_category_constant_unique = len(df['Category'].unique()) <= 1
print(f"Using len(unique()): 'Category' constant? {is_category_constant_unique}")
Output:
Using len(unique()): 'Category' constant? True
unique()
also handlesNaN
s as a single unique value if present.
Check if ALL Values in EACH Column Are Equal (Across Entire DataFrame)
This checks if each column individually consists of identical values (but different columns can have different constant values).
import pandas as pd
df = pd.DataFrame({
'ID': [101, 102, 103, 104, 105],
'Category': ['A', 'A', 'A', 'A', 'A'],
'Status': ['Active', 'Active', 'Inactive', 'Active', 'Inactive'],
'Value1': [10, 10, 10, 10, 10],
})
# Apply the single-column check to each column
# Using nunique()
constant_columns_mask = df.apply(lambda col: col.nunique() <= 1)
# Or using the NumPy approach for each column:
# constant_columns_mask = df.apply(lambda col: (col.to_numpy() == col.to_numpy()[0]).all() if not col.empty else True)
print("Are all values in each column respectively equal?")
print(constant_columns_mask)
# To check if *all* columns are constant (i.e., the whole DataFrame is "constant" column-wise)
print(f"Are all columns constant? {constant_columns_mask.all()}")
Output:
Are all values in each column respectively equal?
ID False
Category True
Status False
Value1 True
dtype: bool
Are all columns constant? False
The df.apply()
method applies a function along an axis (default axis=0
, which is column-wise).
Check if ALL Values in a DataFrame Are Equal to a SPECIFIC Value
To check if every cell in selected columns (or the whole DataFrame) is equal to a single specific value.
import pandas as pd
df_all_ones = pd.DataFrame({'A': [1, 1, 1], 'B': [1, 1, 1]})
df_mixed_ones = pd.DataFrame({'A': [1, 1, 1], 'B': [1, 0, 1]})
target_value = 1
# Check for the whole DataFrame
all_equal_to_target_df1 = (df_all_ones == target_value).all().all()
print(f"Is df_all_ones entirely '{target_value}'? {all_equal_to_target_df1}")
all_equal_to_target_df2 = (df_mixed_ones == target_value).all().all()
print(f"Is df_mixed_ones entirely '{target_value}'? {all_equal_to_target_df2}")
Output:
Is df_all_ones entirely '1'? True
Is df_mixed_ones entirely '1'? False
df == target_value
: Element-wise comparison, returns a DataFrame of booleans..all()
: Called once, checks if all values areTrue
within each column. Returns a boolean Series..all()
: Called a second time on the boolean Series, checks if all those values (for all columns) areTrue
.
You can also check specific columns. Assume 'df' is the larger example DataFrame: are all values in 'Category' equal to 'A'?
is_category_all_A = (df['Category'] == 'A').all()
print(f"Is 'Category' column all 'A'? {is_category_all_A}")
Find Rows Where ALL Column Values Are Equal
This checks if, for each row, all values across the specified columns (or all columns) are identical to each other within that row.
Comparing Each Column to the First Column
import pandas as pd
df_rows_check = pd.DataFrame({
'Col1': ['X', 'Y', 'Z', 'W'],
'Col2': ['X', 'Y', 'A', 'W'],
'Col3': ['X', 'Y', 'B', 'W'],
'Col4': ['P', 'Y', 'C', 'W'] # Deliberately different in first row
})
df_rows_check.iloc[0,3] = 'X' # Make first row all 'X' for demo
print("DataFrame for row equality check:")
print(df_rows_check)
print()
# Compare all columns to the first column (.iloc[:, 0]) row-wise (axis=0 for broadcasting)
# Then check if all comparisons are True across columns (axis=1)
all_values_equal_in_row_mask = df_rows_check.eq(df_rows_check.iloc[:, 0], axis=0).all(axis=1)
print("Mask indicating rows where all values are equal:")
print(all_values_equal_in_row_mask)
print()
print("Rows where all values are equal:")
print(df_rows_check[all_values_equal_in_row_mask])
print()
Output:
DataFrame for row equality check:
Col1 Col2 Col3 Col4
0 X X X X
1 Y Y Y Y
2 Z A B C
3 W W W W
Mask indicating rows where all values are equal:
0 True
1 True
2 False
3 True
dtype: bool
Rows where all values are equal:
Col1 Col2 Col3 Col4
0 X X X X
1 Y Y Y Y
3 W W W W
df.eq(df.iloc[:, 0], axis=0)
: Compares each column element-wise to the corresponding element in the first column (df.iloc[:, 0]
).axis=0
ensures row-wise broadcasting of the first column for comparison..all(axis=1)
: Checks if all boolean values areTrue
across each row (axis=1).
Comparing All Elements in a Row (Using nunique
per row)
import pandas as pd
df_rows_check = pd.DataFrame({
'Col1': ['X', 'Y', 'Z', 'W'],
'Col2': ['X', 'Y', 'A', 'W'],
'Col3': ['X', 'Y', 'B', 'W'],
'Col4': ['X', 'Y', 'C', 'W']
})
# Apply nunique along rows (axis=1)
# If a row has only 1 unique value (excluding NaN), all its elements are equal
all_values_equal_in_row_mask_nunique = df_rows_check.nunique(axis=1, dropna=False) == 1
print("Mask indicating rows where all values are equal (nunique approach):")
print(all_values_equal_in_row_mask_nunique)
Output:
Mask indicating rows where all values are equal (nunique approach):
0 True
1 True
2 False
3 True
dtype: bool
Check if Specific Columns in Each Row Are Equal
If you need to check if values in a subset of columns are equal for each row.
import pandas as pd
df_specific_cols = pd.DataFrame({
'A': [1, 2, 1, 3],
'B': [1, 5, 1, 3],
'C': [1, 2, 0, 3],
'D': ['x', 'y', 'z', 'w']
})
cols_to_compare = ['A', 'B', 'C']
# Use apply with a lambda function row-wise
# For each row (x), select the subset of columns, get unique values, check length
df_specific_cols['Are_ABC_Equal'] = df_specific_cols[cols_to_compare].apply(
lambda row: row.nunique() == 1, axis=1
)
print("Checking if specific columns A, B, C are equal per row:")
print(df_specific_cols)
Output:
Checking if specific columns A, B, C are equal per row:
A B C D Are_ABC_Equal
0 1 1 1 x True
1 2 5 2 y False
2 1 1 0 z False
3 3 3 3 w True
Conclusion
Checking for equality of values within Pandas DataFrames is a key data validation step:
- To check if all values in a single column are equal,
(col.to_numpy() == col.to_numpy()[0]).all()
orcol.nunique() <= 1
are effective. - To check if each column individually has all equal values, use
df.apply(lambda col: col.nunique() <= 1)
. - To find rows where all values across columns are equal,
df.eq(df.iloc[:, 0], axis=0).all(axis=1)
ordf.nunique(axis=1) == 1
are good approaches. - To compare specific columns within each row, use
df[cols_to_compare].apply(lambda row: row.nunique()==1, axis=1)
.
Choose the method that best reflects the specific type of equality check you need to perform on your DataFrame.