Python Pandas: How to Compare Current Row Value with Previous or Next Row in a Column
Comparing a row's value in a specific column with the value in the immediately preceding or succeeding row is a common operation in time series analysis, data validation (e.g., checking for consecutive duplicates), or when calculating differences or changes. Pandas provides the Series.shift()
method, which is ideal for this kind of row-wise comparison.
This guide explains how to compare current row values with previous or next row values in a Pandas DataFrame column, primarily using .shift()
in conjunction with comparison operators or the .eq()
method.
The Goal: Row-wise Lagged/Lead Comparisons
Given a Pandas DataFrame and a specific column, we want to create a new boolean column that indicates for each row whether its value in the target column is equal to:
- The value in the previous row of that same column.
- The value in the next row of that same column.
Example DataFrame
import pandas as pd
data = {
'Timestamp': pd.to_datetime(['2023-01-01 10:00', '2023-01-01 10:05', '2023-01-01 10:05',
'2023-01-01 10:10', '2023-01-01 10:15', '2023-01-01 10:15']),
'Sensor_ID': ['A', 'A', 'A', 'B', 'B', 'B'],
'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
Timestamp Sensor_ID Reading
0 2023-01-01 10:00:00 A 10.1
1 2023-01-01 10:05:00 A 10.5
2 2023-01-01 10:05:00 A 10.5
3 2023-01-01 10:10:00 B 22.0
4 2023-01-01 10:15:00 B 22.3
5 2023-01-01 10:15:00 B 22.3
We'll focus on comparing values in the 'Reading' column.
Understanding Series.shift(periods=1)
The Series.shift(periods=N)
method is key to these comparisons.
periods=1
(default): Shifts all values in the Series down by one position. The first element becomesNaN
, the second element gets the original first element's value, and so on. This effectively gives you the previous row's value aligned with the current row.periods=-1
: Shifts all values up by one position. The last element becomesNaN
. This gives you the next row's value aligned with the current row.
import pandas as pd
df_example = pd.DataFrame({'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]})
print("Original 'Reading' column:")
print(df_example['Reading'])
print()
shifted_down = df_example['Reading'].shift(1) # Or just .shift()
print("'Reading' column shifted down by 1 (previous values):")
print(shifted_down)
print()
shifted_up = df_example['Reading'].shift(-1)
print("'Reading' column shifted up by 1 (next values):")
print(shifted_up)
Output:
Original 'Reading' column:
0 10.1
1 10.5
2 10.5
3 22.0
4 22.3
5 22.3
Name: Reading, dtype: float64
'Reading' column shifted down by 1 (previous values):
0 NaN
1 10.1
2 10.5
3 10.5
4 22.0
5 22.3
Name: Reading, dtype: float64
'Reading' column shifted up by 1 (next values):
0 10.5
1 10.5
2 22.0
3 22.3
4 22.3
5 NaN
Name: Reading, dtype: float64
Comparing with the PREVIOUS Row's Value
Using Series.eq(Series.shift())
(Recommended)
The Series.eq()
method performs an element-wise equality comparison. This is often slightly more performant and handles NaN
s in a standard way (NaN compared to anything, including another NaN, is False).
import pandas as pd
df_example = pd.DataFrame({'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]})
df_copy = df_example.copy() # modify only the copy
column_to_compare = 'Reading'
# ✅ Compare current 'Reading' with previous 'Reading'
df_copy['Is_Reading_Same_As_Prev'] = df_copy[column_to_compare].eq(df_copy[column_to_compare].shift())
print("DataFrame with comparison to previous row (using .eq()):")
print(df_copy)
Output:
DataFrame with comparison to previous row (using .eq()):
Reading Is_Reading_Same_As_Prev
0 10.1 False
1 10.5 False
2 10.5 True
3 22.0 False
4 22.3 False
5 22.3 True
The first row in 'Is_Reading_Same_As_Prev' is False
because df['Reading'].shift()
introduces a NaN
at the first position, and 10.1 == NaN
is False
.
Using the Equality Operator (==
)
The standard equality operator ==
achieves the same result for element-wise comparison.
import pandas as pd
df_example = pd.DataFrame({'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]})
df_copy2 = df_example.copy() # modify only the copy
column_to_compare = 'Reading'
# ✅ Compare using == operator
df_copy2['Is_Reading_Same_As_Prev_Op'] = (df_copy2[column_to_compare] == df_copy2[column_to_compare].shift())
print("DataFrame with comparison to previous row (using ==):")
print(df_copy2)
Output: (Same as using .eq())
DataFrame with comparison to previous row (using ==):
Reading Is_Reading_Same_As_Prev_Op
0 10.1 False
1 10.5 False
2 10.5 True
3 22.0 False
4 22.3 False
5 22.3 True
For simple equality, ==
is very readable. .eq()
might be preferred if you need to chain other Series methods or for consistency with other Pandas comparison methods like .ne()
, .gt()
, etc.
Using NumPy (Alternative)
This approach converts the Series to a NumPy array and performs shifted comparison directly. It requires manual handling of the first element.
import pandas as pd
import numpy as np
data = {
'Timestamp': pd.to_datetime(['2023-01-01 10:00', '2023-01-01 10:05', '2023-01-01 10:05',
'2023-01-01 10:10', '2023-01-01 10:15', '2023-01-01 10:15']),
'Sensor_ID': ['A', 'A', 'A', 'B', 'B', 'B'],
'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]
}
df = pd.DataFrame(data)
df_copy_np = df.copy()
column_to_compare = 'Reading'
reading_values = df_copy_np[column_to_compare].to_numpy()
# Compare element i with element i-1
# The first element has no previous, so its comparison is False by definition here
comparison_np = np.concatenate(
([False], reading_values[1:] == reading_values[:-1])
)
df_copy_np['Is_Reading_Same_As_Prev_NP'] = comparison_np
print("DataFrame with comparison to previous row (using NumPy):")
print(df_copy_np)
Output: (Same boolean result as .eq() and ==)
DataFrame with comparison to previous row (using NumPy):
Timestamp Sensor_ID Reading Is_Reading_Same_As_Prev_NP
0 2023-01-01 10:00:00 A 10.1 False
1 2023-01-01 10:05:00 A 10.5 False
2 2023-01-01 10:05:00 A 10.5 True
3 2023-01-01 10:10:00 B 22.0 False
4 2023-01-01 10:15:00 B 22.3 False
5 2023-01-01 10:15:00 B 22.3 True
reading_values[1:]
: Array from the second element to the end.reading_values[:-1]
: Array from the first element up to (but not including) the last.np.concatenate(([False], ...))
: PrependsFalse
for the first row's comparison.
Comparing with the NEXT Row's Value
To compare the current row's value with the next row's value, shift the column upwards by using periods=-1
.
Using Series.eq(Series.shift(-1))
(Recommended)
import pandas as pd
data = {
'Timestamp': pd.to_datetime(['2023-01-01 10:00', '2023-01-01 10:05', '2023-01-01 10:05',
'2023-01-01 10:10', '2023-01-01 10:15', '2023-01-01 10:15']),
'Sensor_ID': ['A', 'A', 'A', 'B', 'B', 'B'],
'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]
}
df = pd.DataFrame(data)
df_copy_next = df.copy()
column_to_compare = 'Reading'
# ✅ Compare current 'Reading' with next 'Reading'
df_copy_next['Is_Reading_Same_As_Next'] = df_copy_next[column_to_compare].eq(
df_copy_next[column_to_compare].shift(-1)
)
print("DataFrame with comparison to next row (using .eq()):")
print(df_copy_next)
Output:
DataFrame with comparison to next row (using .eq()):
Timestamp Sensor_ID Reading Is_Reading_Same_As_Next
0 2023-01-01 10:00:00 A 10.1 False
1 2023-01-01 10:05:00 A 10.5 True
2 2023-01-01 10:05:00 A 10.5 False
3 2023-01-01 10:10:00 B 22.0 False
4 2023-01-01 10:15:00 B 22.3 True
5 2023-01-01 10:15:00 B 22.3 False
The last row in 'Is_Reading_Same_As_Next'
is False
because df['Reading'].shift(-1)
introduces a NaN
at the last position.
Using the Equality Operator (==
)
import pandas as pd
data = {
'Timestamp': pd.to_datetime(['2023-01-01 10:00', '2023-01-01 10:05', '2023-01-01 10:05',
'2023-01-01 10:10', '2023-01-01 10:15', '2023-01-01 10:15']),
'Sensor_ID': ['A', 'A', 'A', 'B', 'B', 'B'],
'Reading': [10.1, 10.5, 10.5, 22.0, 22.3, 22.3]
}
df = pd.DataFrame(data)
df_copy_next_op = df.copy()
column_to_compare = 'Reading'
df_copy_next_op['Is_Reading_Same_As_Next_Op'] = (df_copy_next_op[column_to_compare] ==
df_copy_next_op[column_to_compare].shift(-1))
print("DataFrame with comparison to next row (using ==):")
print(df_copy_next_op)
Output:
DataFrame with comparison to next row (using ==):
Timestamp Sensor_ID Reading Is_Reading_Same_As_Next_Op
0 2023-01-01 10:00:00 A 10.1 False
1 2023-01-01 10:05:00 A 10.5 True
2 2023-01-01 10:05:00 A 10.5 False
3 2023-01-01 10:10:00 B 22.0 False
4 2023-01-01 10:15:00 B 22.3 True
5 2023-01-01 10:15:00 B 22.3 False
Handling the First/Last Row (NaNs from Shift)
As seen, shift(1)
introduces a NaN
at the beginning of the shifted Series, and shift(-1)
introduces one at the end. When comparing, any comparison with NaN
results in False
. This is usually the desired behavior:
- The first row cannot be equal to a "previous" row because there isn't one.
- The last row cannot be equal to a "next" row because there isn't one.
If you need different behavior for these boundary conditions (e.g., to fill the NaN
from shift()
with a specific value before comparison), you can use .fillna()
on the shifted Series:
Example: Treat first row as 'not equal to previous' by default. For comparison with previous:
shifted_col = df['col'].shift().fillna(some_value_that_wont_match_first_element)
df['Is_Same'] = df['col'] == shifted_col
However, the default NaN
behavior is often appropriate.
Conclusion
Comparing row values with their preceding or succeeding counterparts in a Pandas DataFrame column is effectively achieved using Series.shift()
:
- To compare with the previous row:
df['column'].eq(df['column'].shift())
ordf['column'] == df['column'].shift()
. - To compare with the next row:
df['column'].eq(df['column'].shift(-1))
ordf['column'] == df['column'].shift(-1)
. - The
.eq()
method is a slightly more idiomatic Pandas approach, while==
is often very readable. - The NumPy approach offers an alternative but requires more manual handling of boundary conditions.
Remember that shift()
introduces NaN
values at the boundaries, which correctly result in False
during equality comparisons, reflecting that the first row has no predecessor and the last row has no successor for comparison.