Python Pandas: Select DataFrame Rows Where Two Columns Are Equal (or Not Equal)

Comparing values across different columns within the same row is a common data validation and filtering task in Pandas. You might need to identify rows where two specific columns have identical values, or conversely, where they differ.

This guide demonstrates how to select or filter Pandas DataFrame rows based on the equality (or inequality) of values in two columns, primarily using boolean indexing and the DataFrame.query() method.

The Goal: Comparing Values Across Two Columns Row-wise

Given a Pandas DataFrame, we want to compare the values in two specified columns for each row. Based on this comparison (equal or not equal), we want to select the subset of rows that meets our criteria.

Example DataFrame

import pandas as pd
import numpy as np # For NaN example

data = {
    'ID': [101, 102, 103, 104, 105, 106],
    'SKU_A': ['X123', 'Y456', 'Z789', 'X123', 'A001', 'B002'],
    'SKU_B': ['X123', 'Y450', 'Z789', 'X124', 'A001', np.nan], # SKU_B has different values and a NaN
    'Quantity_A': [10, 5, 20, 10, 15, 8],
    'Quantity_B': [10, 7, 15, 10, 15, 8]
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
    ID SKU_A SKU_B  Quantity_A  Quantity_B
101  X123  X123          10          10
102  Y456  Y450           5           7
103  Z789  Z789          20          15
104  X123  X124          10          10
105  A001  A001          15          15
106  B002   NaN           8           8

Select Rows Where Two Columns ARE Equal

We might want to find rows where SKU_A equals SKU_B, or where Quantity_A equals Quantity_B.

Using Boolean Indexing with `==` (Recommended)

This is the most direct and idiomatic Pandas way. Compare the two columns element-wise to create a boolean Series, then use this Series to filter the DataFrame.

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'ID': [101, 102, 103, 104, 105, 106],
    'SKU_A': ['X123', 'Y456', 'Z789', 'X123', 'A001', 'B002'],
    'SKU_B': ['X123', 'Y450', 'Z789', 'X124', 'A001', np.nan],
})

# Compare 'SKU_A' and 'SKU_B' for equality
sku_equality_mask = (df_example['SKU_A'] == df_example['SKU_B'])
print("Boolean mask for SKU_A == SKU_B:")
print(sku_equality_mask)
print()

# ✅ Use the mask to select rows where the condition is True
df_skus_equal = df_example[sku_equality_mask]
# Or directly: df_skus_equal = df_example[df_example['SKU_A'] == df_example['SKU_B']]

print("Rows where SKU_A == SKU_B:")
print(df_skus_equal)

Output:

Boolean mask for SKU_A == SKU_B:
0     True
1    False
2     True
3    False
4     True
5    False
dtype: bool

Rows where SKU_A == SKU_B:
    ID SKU_A SKU_B
0  101  X123  X123
2  103  Z789  Z789
4  105  A001  A001

df['Column1'] == df['Column2'] performs an element-wise comparison and returns a boolean Series.
df[boolean_series] uses this boolean Series to select rows where the value is True.

Using `DataFrame.query('ColA == ColB')`

The DataFrame.query() method allows you to specify the condition as a string.

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'ID': [101, 102, 103, 104, 105, 106],
    'SKU_A': ['X123', 'Y456', 'Z789', 'X123', 'A001', 'B002'],
    'SKU_B': ['X123', 'Y450', 'Z789', 'X124', 'A001', np.nan],
})

# ✅ Use query() to select rows where SKU_A equals SKU_B
df_skus_equal_query = df_example.query('SKU_A == SKU_B')

print("Rows where SKU_A == SKU_B (using query()):")
print(df_skus_equal_query)

Output:

Rows where SKU_A == SKU_B (using query()):
    ID SKU_A SKU_B
0  101  X123  X123
2  103  Z789  Z789
4  105  A001  A001

If your column names have spaces, enclose them in backticks: query("`Column A` == `Column B`").

Select Rows Where Two Columns Are NOT Equal

Using Boolean Indexing with `!=` (Recommended)

Use the "not equal" operator (!=).

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'ID': [101, 102, 103, 104, 105, 106],
    'SKU_A': ['X123', 'Y456', 'Z789', 'X123', 'A001', 'B002'],
    'SKU_B': ['X123', 'Y450', 'Z789', 'X124', 'A001', np.nan],
})

# ✅ Select rows where SKU_A is NOT equal to SKU_B
df_skus_not_equal = df_example[df_example['SKU_A'] != df_example['SKU_B']]

print("Rows where SKU_A != SKU_B:")
print(df_skus_not_equal)

Output:

Rows where SKU_A != SKU_B:
    ID SKU_A SKU_B
1  102  Y456  Y450
3  104  X123  X124
5  106  B002   NaN

Using Boolean Indexing with `~` (Tilde for Negation)

You can negate the result of an equality check using the tilde (~) operator. Remember to wrap the equality condition in parentheses.

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'ID': [101, 102, 103, 104, 105, 106],
    'SKU_A': ['X123', 'Y456', 'Z789', 'X123', 'A001', 'B002'],
    'SKU_B': ['X123', 'Y450', 'Z789', 'X124', 'A001', np.nan],
})

# ✅ Negate the equality condition
df_skus_not_equal_tilde = df_example[~(df_example['SKU_A'] == df_example['SKU_B'])]

print("Rows where SKU_A != SKU_B (using tilde ~):")
print(df_skus_not_equal_tilde)

Output: (Same as using !=)

Rows where SKU_A != SKU_B (using tilde ~):
    ID SKU_A SKU_B
1  102  Y456  Y450
3  104  X123  X124
5  106  B002   NaN

~(condition) inverts the boolean Series produced by condition.

Using `DataFrame.query('ColA != ColB')`

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'ID': [101, 102, 103, 104, 105, 106],
    'SKU_A': ['X123', 'Y456', 'Z789', 'X123', 'A001', 'B002'],
    'SKU_B': ['X123', 'Y450', 'Z789', 'X124', 'A001', np.nan],
})

# ✅ Use query() with the inequality operator
df_skus_not_equal_query = df_example.query('SKU_A != SKU_B')

print("Rows where SKU_A != SKU_B (using query()):")
print(df_skus_not_equal_query)

Output: (Same as using !=)

Rows where SKU_A != SKU_B (using query()):
    ID SKU_A SKU_B
1  102  Y456  Y450
3  104  X123  X124
5  106  B002   NaN

Handling `NaN` Values in Comparisons

An important aspect of comparisons in Pandas (and NumPy) is how NaN (Not a Number) values are treated:

NaN == NaN is False.
NaN != NaN is True.
Any comparison of a non-NaN value with NaN (e.g., 'X123' == np.nan or 'X123' != np.nan) results in False for == and True for !=.

Consider our original DataFrame where row 5 has SKU_B as NaN:

# For row 5: df['SKU_A'] is 'B002', df['SKU_B'] is np.nan
df['SKU_A'] == df['SKU_B']  # -> 'B002' == np.nan  -> False
df['SKU_A'] != df['SKU_B']  # -> 'B002' != np.nan  -> True

If you need to treat rows where both columns are NaN as "equal" for the purpose of your filter, or handle NaNs differently, you would need more specific logic:

# Example: Consider rows equal if both are NaN OR both are non-NaN and equal
condition_both_nan = df['SKU_A'].isna() & df['SKU_B'].isna()
condition_both_not_nan_and_equal = df['SKU_A'].notna() & df['SKU_B'].notna() & (df['SKU_A'] == df['SKU_B'])
df_equal_nan_aware = df[condition_both_nan | condition_both_not_nan_and_equal]

This is more advanced and depends on the specific definition of "equal" in the presence of missing values. For standard == and !=, NaNs behave as described above.

Conclusion

Comparing values between two columns in a Pandas DataFrame and selecting rows based on that comparison is straightforward:

For selecting rows where values in two columns are equal:
- Use boolean indexing: df[df['ColumnA'] == df['ColumnB']] (Recommended).
- Use DataFrame.query(): df.query('ColumnA == ColumnB').
For selecting rows where values in two columns are NOT equal:
- Use boolean indexing: df[df['ColumnA'] != df['ColumnB']] (Recommended).
- Or negate equality: df[~(df['ColumnA'] == df['ColumnB'])].
- Use DataFrame.query(): df.query('ColumnA != ColumnB').

Remember how NaN values are handled in comparisons (NaN is not equal to anything, including itself) and adjust your logic if you need specific NaN handling. These methods provide powerful ways to filter and subset your data based on inter-column relationships.

The Goal: Comparing Values Across Two Columns Row-wise​

Example DataFrame​

Select Rows Where Two Columns ARE Equal​

Using Boolean Indexing with == (Recommended)​

Using DataFrame.query('ColA == ColB')​

Select Rows Where Two Columns Are NOT Equal​

Using Boolean Indexing with != (Recommended)​

Using Boolean Indexing with ~ (Tilde for Negation)​

Using DataFrame.query('ColA != ColB')​

Handling NaN Values in Comparisons​

Conclusion​

Table of Contents

The Goal: Comparing Values Across Two Columns Row-wise

Example DataFrame

Select Rows Where Two Columns ARE Equal

Using Boolean Indexing with `==` (Recommended)

Using `DataFrame.query('ColA == ColB')`

Select Rows Where Two Columns Are NOT Equal

Using Boolean Indexing with `!=` (Recommended)

Using Boolean Indexing with `~` (Tilde for Negation)

Using `DataFrame.query('ColA != ColB')`

Handling `NaN` Values in Comparisons

Conclusion