Python Pandas: How to Find First/Last Non-NaN Value and Index in Series/DataFrame Column

When working with Pandas DataFrames or Series that contain missing data (represented as NaN, Not a Number, or NaT, Not a Time), a common task is to identify the first or last valid (non-missing) entry. This is useful for data cleaning, imputation, or understanding the bounds of valid data within a sequence.

This guide explains how to find the index and the value of the first and last non-NaN entries in a Pandas Series or for each column in a DataFrame, using methods like first_valid_index(), last_valid_index(), and notna().idxmax().

The Goal: Locating First/Last Valid Data Points

Given a Pandas Series or a DataFrame, we want to:

Find the index label of the first row that contains a non-NaN value.
Find the index label of the last row that contains a non-NaN value.
Retrieve the actual non-NaN values at these positions. This can be done for a single Series or applied column-wise to a DataFrame.

Example DataFrame and Series with `NaN` Values

import pandas as pd
import numpy as np

# Example Series
s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
              index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
print("Original Series (s):")
print(s)
print()

# Example DataFrame
data = {
    'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
    'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
    'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan], # All NaN
    'Col_D': [1, 2, 3, 4, 5] # No NaN
}

df = pd.DataFrame(data, index=['R1', 'R2', 'R3', 'R4', 'R5'])
print("Original DataFrame (df):")
print(df)

Output:

Original Series (s):
a       NaN
b       NaN
c     Apple
d    Banana
e       NaN
f    Cherry
g       NaN
dtype: object

Original DataFrame (df):
    Col_A Col_B  Col_C  Col_D
R1    NaN   NaN    NaN      1
R2   10.0   NaN    NaN      2
R3    NaN     X    NaN      3
R4   15.5     Y    NaN      4
R5   20.0   NaN    NaN      5

Finding First/Last Non-NaN in a Pandas SERIES

Using `Series.first_valid_index()`

This method returns the index label of the first non-NaN/NaT value in the Series. It returns None if all values are NaN/NaT.

import pandas as pd
import numpy as np

s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
              index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

# ✅ Find the index of the first valid (non-NaN) entry in Series s
first_valid_idx_s = s.first_valid_index()
print(f"Index of first valid value in Series s: '{first_valid_idx_s}'")

Output:

Index of first valid value in Series s: 'c'

Using `Series.last_valid_index()`

This method returns the index label of the last non-NaN/NaT value in the Series. It returns None if all values are NaN/NaT.

import pandas as pd
import numpy as np

s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
              index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])

# ✅ Find the index of the last valid (non-NaN) entry in Series s
last_valid_idx_s = s.last_valid_index()
print(f"Index of last valid value in Series s: '{last_valid_idx_s}'")

Output:

Index of last valid value in Series s: 'f'

Getting the Values at These Indices

Once you have the index labels, use standard Series indexing (s[index_label]) to get the values.

import pandas as pd
import numpy as np

s = pd.Series([np.nan, np.nan, 'Apple', 'Banana', np.nan, 'Cherry', np.nan],
              index=['a', 'b', 'c', 'd', 'e', 'f', 'g'])
first_valid_idx_s = s.first_valid_index()
last_valid_idx_s = s.last_valid_index()


if first_valid_idx_s is not None:
    first_valid_value_s = s[first_valid_idx_s]
    print(f"First valid value in Series s: '{first_valid_value_s}'")
else:
    print("Series s has no valid values.")

if last_valid_idx_s is not None:
    last_valid_value_s = s[last_valid_idx_s]
    print(f"Last valid value in Series s: '{last_valid_value_s}'")
else:
    print("Series s has no valid values.")

Output:

First valid value in Series s: 'Apple'
Last valid value in Series s: 'Cherry'

Finding First/Last Non-NaN for EACH COLUMN in a DataFrame

Using `DataFrame.apply()` with `Series.first_valid_index()` / `last_valid_index()` (Recommended)

Apply the Series methods to each column of the DataFrame.

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
    'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
    'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan],
    'Col_D': [1, 2, 3, 4, 5]
}, index=['R1', 'R2', 'R3', 'R4', 'R5'])

# ✅ Get index of first valid value for each column
first_valid_indices_df = df_example.apply(pd.Series.first_valid_index)
print("Index of first valid value for each column:")
print(first_valid_indices_df)
print()

# ✅ Get index of last valid value for each column
last_valid_indices_df = df_example.apply(pd.Series.last_valid_index)
print("Index of last valid value for each column:")
print(last_valid_indices_df)

Output:

Index of first valid value for each column:
Col_A      R2
Col_B      R3
Col_C    None
Col_D      R1
dtype: object

Index of last valid value for each column:
Col_A      R5
Col_B      R4
Col_C    None
Col_D      R5
dtype: object

The result is a Series where the index is the column names and values are the first/last valid index labels for those columns.

Using `DataFrame.notna().idxmax()` and a Reverse Scan for Last

df.notna(): Returns a boolean DataFrame (True where not NaN).
.idxmax(): For each column, returns the index label of the first True value. This directly gives the index of the first non-NaN value.
To find the last non-NaN, you can reverse the boolean DataFrame ([::-1]), find the idxmax() of the reversed frame, which gives the reversed index of the last True. This is slightly more convoluted than last_valid_index().

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
    'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
    'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan],
    'Col_D': [1, 2, 3, 4, 5]
}, index=['R1', 'R2', 'R3', 'R4', 'R5'])

# ✅ Index of first non-NaN using notna() and idxmax()
# idxmax will raise ValueError if a column is all NaN
first_valid_idxmax = {}
for col in df_example.columns:
    if df_example[col].notna().any(): # Check if there's any non-NaN value
        first_valid_idxmax[col] = df_example[col].notna().idxmax()
    else:
        first_valid_idxmax[col] = None
first_valid_idxmax_series = pd.Series(first_valid_idxmax)

print("Index of first valid value (notna().idxmax()):")
print(first_valid_idxmax_series)
print()

# Index of last non-NaN using a reversed scan with notna() and idxmax()
# This is less direct than .last_valid_index()
last_valid_idxmax_reversed = {}
for col in df_example.columns:
    notna_series = df_example[col].notna()
    if notna_series.any():
        # Get boolean series, reverse it, find first True in reversed,
        # then map that reversed position back to original index.
        # idxmax() on a boolean Series returns the index of the first True.
        # Reversing the boolean series and then finding the first True
        # is equivalent to finding the last True in the original order.
        last_valid_idxmax_reversed[col] = notna_series[::-1].idxmax()
    else:
        last_valid_idxmax_reversed[col] = None
last_valid_idxmax_reversed_series = pd.Series(last_valid_idxmax_reversed)

print("Index of last valid value (reversed notna().idxmax()):")
print(last_valid_idxmax_reversed_series)

Output:

Index of first valid value (notna().idxmax()):
Col_A      R2
Col_B      R3
Col_C    None
Col_D      R1
dtype: object

Index of last valid value (reversed notna().idxmax()):
Col_A      R5
Col_B      R4
Col_C    None
Col_D      R5
dtype: object

note

idxmax() applied to an all-False boolean Series (which happens if a column is all NaN) will raise a ValueError. The if df[col].notna().any(): check handles this. df.apply(pd.Series.first_valid_index) is generally cleaner.

Getting the Values at These Indices per Column

Once you have the Series of first/last valid indices for each column, you can retrieve the actual values.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Col_A': [np.nan, 10.0, np.nan, 15.5, 20.0],
    'Col_B': [np.nan, np.nan, 'X', 'Y', np.nan],
    'Col_C': [np.nan, np.nan, np.nan, np.nan, np.nan],
    'Col_D': [1, 2, 3, 4, 5]
}, index=['R1', 'R2', 'R3', 'R4', 'R5'])
first_valid_indices_df = df.apply(pd.Series.first_valid_index)
last_valid_indices_df = df.apply(pd.Series.last_valid_index)


first_values = pd.Series(index=df.columns, dtype=object)
for col_name, idx_label in first_valid_indices_df.items():
    if pd.notna(idx_label): # Check if an index was found (not None)
        first_values[col_name] = df.loc[idx_label, col_name]
    else:
        first_values[col_name] = np.nan # Or None

print("First valid values for each column:")
print(first_values)
print()

last_values = pd.Series(index=df.columns, dtype=object)
for col_name, idx_label in last_valid_indices_df.items():
    if pd.notna(idx_label):
        last_values[col_name] = df.loc[idx_label, col_name]
    else:
        last_values[col_name] = np.nan

print("Last valid values for each column:")
print(last_values)

Output:

First valid values for each column:
Col_A    10.0
Col_B       X
Col_C     NaN
Col_D       1
dtype: object

Last valid values for each column:
Col_A    20.0
Col_B       Y
Col_C     NaN
Col_D       5
dtype: object

Handling Columns with All `NaN` Values

Series.first_valid_index() and Series.last_valid_index() will return None if a Series (column) contains only NaN values.
DataFrame.notna().idxmax() will raise a ValueError if a column is all NaN (because there's no True value to find the index of). You need to handle this with a preliminary check like if df[col].notna().any():.

The df.apply(pd.Series.first_valid_index) approach gracefully returns None for all-NaN columns, which is often easier to work with.

Conclusion

Pandas provides convenient methods for finding the first and last non-NaN (valid) entries:

For a single Series:
- your_series.first_valid_index() gives the index label of the first valid value.
- your_series.last_valid_index() gives the index label of the last valid value.
- Both return None if the Series is all NaN or empty.
For each column in a DataFrame:
- df.apply(pd.Series.first_valid_index) and df.apply(pd.Series.last_valid_index) are the recommended ways to get a Series of these indices (column names as index, first/last valid index labels as values).
- df.notna().idxmax() can find the first valid index per column but requires care for all-NaN columns. A similar reversed approach can find the last.
Once you have the index label(s), use df.loc[index_label, column_name] or series[index_label] to get the actual value(s).

These methods are essential for data cleaning, identifying data boundaries, and preprocessing steps in time series or sequential data analysis.

The Goal: Locating First/Last Valid Data Points​

Example DataFrame and Series with NaN Values​

Finding First/Last Non-NaN in a Pandas SERIES​

Using Series.first_valid_index()​

Using Series.last_valid_index()​

Getting the Values at These Indices​

Finding First/Last Non-NaN for EACH COLUMN in a DataFrame​

Using DataFrame.apply() with Series.first_valid_index() / last_valid_index() (Recommended)​

Using DataFrame.notna().idxmax() and a Reverse Scan for Last​

Getting the Values at These Indices per Column​

Handling Columns with All NaN Values​

Conclusion​

Table of Contents

The Goal: Locating First/Last Valid Data Points

Example DataFrame and Series with `NaN` Values

Finding First/Last Non-NaN in a Pandas SERIES

Using `Series.first_valid_index()`

Using `Series.last_valid_index()`

Getting the Values at These Indices

Finding First/Last Non-NaN for EACH COLUMN in a DataFrame

Using `DataFrame.apply()` with `Series.first_valid_index()` / `last_valid_index()` (Recommended)

Using `DataFrame.notna().idxmax()` and a Reverse Scan for Last

Getting the Values at These Indices per Column

Handling Columns with All `NaN` Values

Conclusion