Skip to main content

Python Pandas: How to Find First/Last Non-NaN Value and Index in Series/DataFrame Column

When working with Pandas DataFrames or Series that contain missing data (represented as NaN, Not a Number, or NaT, Not a Time), a common task is to identify the first or last valid (non-missing) entry. This is useful for data cleaning, imputation, or understanding the bounds of valid data within a sequence.

This guide explains how to find the index and the value of the first and last non-NaN entries in a Pandas Series or for each column in a DataFrame, using methods like first_valid_index(), last_valid_index(), and notna().idxmax().

The Goal: Locating First/Last Valid Data Points

Given a Pandas Series or a DataFrame, we want to:

  • Find the index label of the first row that contains a non-NaN value.
  • Find the index label of the last row that contains a non-NaN value.
  • Retrieve the actual non-NaN values at these positions. This can be done for a single Series or applied column-wise to a DataFrame.

Example DataFrame and Series with NaN Values

import pandas as pd
import numpy as np

# Example Series
s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print("Original Series (s):")
print(s)
print()

# Example DataFrame
data = {
'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan], # All NaN
'Col_D': [1, 2, 3, 4, 5] # No NaN
}

df = pd.DataFrame(data, index=['R1', 'R2', 'R3', 'R4', 'R5'])
print("Original DataFrame (df):")
print(df)

Output:

Original Series (s):
a NaN
b NaN
c Apple
d Banana
e NaN
f Cherry
g NaN
dtype: object

Original DataFrame (df):
Col_A Col_B Col_C Col_D
R1 NaN NaN NaN 1
R2 10.0 NaN NaN 2
R3 NaN X NaN 3
R4 15.5 Y NaN 4
R5 20.0 NaN NaN 5

Finding First/Last Non-NaN in a Pandas SERIES

Using Series.first_valid_index()

This method returns the index label of the first non-NaN/NaT value in the Series. It returns None if all values are NaN/NaT.

import pandas as pd
import numpy as np

s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

# ✅ Find the index of the first valid (non-NaN) entry in Series s
first_valid_idx_s = s.first_valid_index()
print(f"Index of first valid value in Series s: '{first_valid_idx_s}'")

Output:

Index of first valid value in Series s: 'c'

Using Series.last_valid_index()

This method returns the index label of the last non-NaN/NaT value in the Series. It returns None if all values are NaN/NaT.

import pandas as pd
import numpy as np

s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

# ✅ Find the index of the last valid (non-NaN) entry in Series s
last_valid_idx_s = s.last_valid_index()
print(f"Index of last valid value in Series s: '{last_valid_idx_s}'")

Output:

Index of last valid value in Series s: 'f'

Getting the Values at These Indices

Once you have the index labels, use standard Series indexing (s[index_label]) to get the values.

import pandas as pd
import numpy as np

s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
first_valid_idx_s = s.first_valid_index()
last_valid_idx_s = s.last_valid_index()


if first_valid_idx_s is not None:
first_valid_value_s = s[first_valid_idx_s]
print(f"First valid value in Series s: '{first_valid_value_s}'")
else:
print("Series s has no valid values.")

if last_valid_idx_s is not None:
last_valid_value_s = s[last_valid_idx_s]
print(f"Last valid value in Series s: '{last_valid_value_s}'")
else:
print("Series s has no valid values.")

Output:

First valid value in Series s: 'Apple'
Last valid value in Series s: 'Cherry'

Finding First/Last Non-NaN for EACH COLUMN in a DataFrame

Apply the Series methods to each column of the DataFrame.

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan],
'Col_D': [1, 2, 3, 4, 5]
}, index=['R1', 'R2', 'R3', 'R4', 'R5'])

# ✅ Get index of first valid value for each column
first_valid_indices_df = df_example.apply(pd.Series.first_valid_index)
print("Index of first valid value for each column:")
print(first_valid_indices_df)
print()

# ✅ Get index of last valid value for each column
last_valid_indices_df = df_example.apply(pd.Series.last_valid_index)
print("Index of last valid value for each column:")
print(last_valid_indices_df)

Output:

Index of first valid value for each column:
Col_A R2
Col_B R3
Col_C None
Col_D R1
dtype: object

Index of last valid value for each column:
Col_A R5
Col_B R4
Col_C None
Col_D R5
dtype: object

The result is a Series where the index is the column names and values are the first/last valid index labels for those columns.

Using DataFrame.notna().idxmax() and a Reverse Scan for Last

  • df.notna(): Returns a boolean DataFrame (True where not NaN).
  • .idxmax(): For each column, returns the index label of the first True value. This directly gives the index of the first non-NaN value.
  • To find the last non-NaN, you can reverse the boolean DataFrame ([::-1]), find the idxmax() of the reversed frame, which gives the reversed index of the last True. This is slightly more convoluted than last_valid_index().
import pandas as pd
import numpy as np

df_example = pd.DataFrame({
'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan],
'Col_D': [1, 2, 3, 4, 5]
}, index=['R1', 'R2', 'R3', 'R4', 'R5'])

# ✅ Index of first non-NaN using notna() and idxmax()
# idxmax will raise ValueError if a column is all NaN
first_valid_idxmax = {}
for col in df_example.columns:
if df_example[col].notna().any(): # Check if there's any non-NaN value
first_valid_idxmax[col] = df_example[col].notna().idxmax()
else:
first_valid_idxmax[col] = None
first_valid_idxmax_series = pd.Series(first_valid_idxmax)

print("Index of first valid value (notna().idxmax()):")
print(first_valid_idxmax_series)
print()

# Index of last non-NaN using a reversed scan with notna() and idxmax()
# This is less direct than .last_valid_index()
last_valid_idxmax_reversed = {}
for col in df_example.columns:
notna_series = df_example[col].notna()
if notna_series.any():
# Get boolean series, reverse it, find first True in reversed,
# then map that reversed position back to original index.
# idxmax() on a boolean Series returns the index of the first True.
# Reversing the boolean series and then finding the first True
# is equivalent to finding the last True in the original order.
last_valid_idxmax_reversed[col] = notna_series[::-1].idxmax()
else:
last_valid_idxmax_reversed[col] = None
last_valid_idxmax_reversed_series = pd.Series(last_valid_idxmax_reversed)

print("Index of last valid value (reversed notna().idxmax()):")
print(last_valid_idxmax_reversed_series)

Output:

Index of first valid value (notna().idxmax()):
Col_A R2
Col_B R3
Col_C None
Col_D R1
dtype: object

Index of last valid value (reversed notna().idxmax()):
Col_A R5
Col_B R4
Col_C None
Col_D R5
dtype: object
note

idxmax() applied to an all-False boolean Series (which happens if a column is all NaN) will raise a ValueError. The if df[col].notna().any(): check handles this. df.apply(pd.Series.first_valid_index) is generally cleaner.

Getting the Values at These Indices per Column

Once you have the Series of first/last valid indices for each column, you can retrieve the actual values.

import pandas as pd
import numpy as np

df = pd.DataFrame({
'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan],
'Col_D': [1, 2, 3, 4, 5]
}, index=['R1', 'R2', 'R3', 'R4', 'R5'])
first_valid_indices_df = df.apply(pd.Series.first_valid_index)
last_valid_indices_df = df.apply(pd.Series.last_valid_index)


first_values = pd.Series(index=df.columns, dtype=object)
for col_name, idx_label in first_valid_indices_df.items():
if pd.notna(idx_label): # Check if an index was found (not None)
first_values[col_name] = df.loc[idx_label, col_name]
else:
first_values[col_name] = np.nan # Or None

print("First valid values for each column:")
print(first_values)
print()

last_values = pd.Series(index=df.columns, dtype=object)
for col_name, idx_label in last_valid_indices_df.items():
if pd.notna(idx_label):
last_values[col_name] = df.loc[idx_label, col_name]
else:
last_values[col_name] = np.nan

print("Last valid values for each column:")
print(last_values)

Output:

First valid values for each column:
Col_A 10.0
Col_B X
Col_C NaN
Col_D 1
dtype: object

Last valid values for each column:
Col_A 20.0
Col_B Y
Col_C NaN
Col_D 5
dtype: object

Handling Columns with All NaN Values

  • Series.first_valid_index() and Series.last_valid_index() will return None if a Series (column) contains only NaN values.
  • DataFrame.notna().idxmax() will raise a ValueError if a column is all NaN (because there's no True value to find the index of). You need to handle this with a preliminary check like if df[col].notna().any():.

The df.apply(pd.Series.first_valid_index) approach gracefully returns None for all-NaN columns, which is often easier to work with.

Conclusion

Pandas provides convenient methods for finding the first and last non-NaN (valid) entries:

  • For a single Series:
    • your_series.first_valid_index() gives the index label of the first valid value.
    • your_series.last_valid_index() gives the index label of the last valid value.
    • Both return None if the Series is all NaN or empty.
  • For each column in a DataFrame:
    • df.apply(pd.Series.first_valid_index) and df.apply(pd.Series.last_valid_index) are the recommended ways to get a Series of these indices (column names as index, first/last valid index labels as values).
    • df.notna().idxmax() can find the first valid index per column but requires care for all-NaN columns. A similar reversed approach can find the last.
  • Once you have the index label(s), use df.loc[index_label, column_name] or series[index_label] to get the actual value(s).

These methods are essential for data cleaning, identifying data boundaries, and preprocessing steps in time series or sequential data analysis.