Skip to main content

Python Pandas: How to Get First Row of Each Group in DataFrame (groupby)

When performing grouped operations in Pandas using DataFrame.groupby(), a common requirement is to extract the first row (or the first few rows) from each formed group. This is useful for tasks like selecting a representative sample from each category, finding the initial observation in time series groups, or de-duplicating based on group priority.

This guide demonstrates several methods to get the first row (or first N rows) of each group in a Pandas DataFrame, primarily using groupby().first(), groupby().nth(0), and groupby().head(N).

The Goal: Selecting the First Row per Group

Given a Pandas DataFrame, we want to group its rows based on the values in one or more columns. Then, from each of these groups, we aim to select only the very first row as it appears in the original DataFrame's order within that group, or the first N rows.

Example DataFrame

import pandas as pd
import numpy as np # For NaN example

data = {
'Team': ['A', 'B', 'A', 'B', 'A', 'C', 'B', 'C'],
'Player': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8'],
'Score': [100, 150, 120, 90, 110, 200, 130, 180],
'Date': pd.to_datetime([
'2025-01-01', '2025-01-01', '2025-01-02', '2025-01-02',
'2025-01-03', '2025-01-03', '2025-01-03', '2025-01-04'
])
}
df_original = pd.DataFrame(data)

# Sort by Date within each Team to make 'first' more meaningful for some examples
df_original = df_original.sort_values(by=['Team', 'Date']).reset_index(drop=True)

print("Original (Sorted) DataFrame:")
print(df_original)

Output:

Original (Sorted) DataFrame:
Team Player Score Date
0 A P1 100 2025-01-01
1 A P3 120 2025-01-02
2 A P5 110 2025-01-03
3 B P2 150 2025-01-01
4 B P4 90 2025-01-02
5 B P7 130 2025-01-03
6 C P6 200 2025-01-03
7 C P8 180 2025-01-04
note

In next examples, we will group by the 'Team' column.

Method 1: Using DataFrameGroupBy.first() (First Non-Null)

The .first() method, when called on a DataFrameGroupBy object, computes the first non-null entry for each column within each group.

Basic Usage

The grouping column(s) become the index of the resulting DataFrame.

import pandas as pd

df = pd.DataFrame({
'Team': ['A', 'B', 'A', 'B', 'A', 'C', 'B', 'C'],
'Player': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8'],
'Score': [100, 150, 120, 90, 110, 200, 130, 180],
'Date': pd.to_datetime(['2025-01-01', '2025-01-01', '2025-01-02', '2025-01-02',
'2025-01-03', '2025-01-03', '2025-01-03', '2025-01-04'])
}).sort_values(by=['Team', 'Date']).reset_index(drop=True)

# ✅ Get the first row of each 'Team' group
first_rows_by_team = df.groupby('Team').first()

print("First row of each 'Team' group (using .first()):")
print(first_rows_by_team)

Output:

First row of each 'Team' group (using .first()):
Player Score Date
Team
A P1 100 2025-01-01
B P2 150 2025-01-01
C P6 200 2025-01-03
note

This effectively gives you the first encountered row for each team based on the original DataFrame's order (or the order after any sorting applied before grouping).

Resetting the Index

If you want the grouping column ('Team') back as a regular column:

import pandas as pd

df = pd.DataFrame({
'Team': ['A', 'B', 'A', 'B', 'A', 'C', 'B', 'C'],
'Player': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8'],
'Score': [100, 150, 120, 90, 110, 200, 130, 180],
'Date': pd.to_datetime(['2025-01-01', '2025-01-01', '2025-01-02', '2025-01-02',
'2025-01-03', '2025-01-03', '2025-01-03', '2025-01-04'])
}).sort_values(by=['Team', 'Date']).reset_index(drop=True)
first_rows_by_team = df.groupby('Team').first()

first_rows_reset = first_rows_by_team.reset_index()
print("First row of each group with reset index:")
print(first_rows_reset)

Output:

First row of each group with reset index:
Team Player Score Date
0 A P1 100 2025-01-01
1 B P2 150 2025-01-01
2 C P6 200 2025-01-03

Method 2: Using DataFrameGroupBy.nth(0) (First by Position)

The .nth(n) method selects the nth row (0-indexed) from each group. So, nth(0) selects the first row by its position within each group.

Basic Usage

The original index of the selected rows is preserved.

import pandas as pd

df = pd.DataFrame({
'Team': ['A', 'B', 'A', 'B', 'A', 'C', 'B', 'C'],
'Player': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8'],
'Score': [100, 150, 120, 90, 110, 200, 130, 180],
'Date': pd.to_datetime(['2025-01-01', '2025-01-01', '2025-01-02', '2025-01-02',
'2025-01-03', '2025-01-03', '2025-01-03', '2025-01-04'])
}).sort_values(by=['Team', 'Date']).reset_index(drop=True)

# ✅ Get the first row (index 0 within each group) of each 'Team' group
first_rows_nth = df.groupby('Team').nth(0)

print("First row of each 'Team' group (using .nth(0)):")
print(first_rows_nth)

Output:

First row of each 'Team' group (using .nth(0)):
Team Player Score Date
0 A P1 100 2025-01-01
3 B P2 150 2025-01-01
6 C P6 200 2025-01-03

To get the second row of each group, you would use nth(1), and so on.

Key Difference: first() vs. nth(0) with NaN Values

  • groupby().first(): For each column, it takes the first non-null (non-NaN) value within that group.
  • groupby().nth(0): Takes the first row positionally within each group, regardless of whether its values are NaN or not.
import pandas as pd
import numpy as np

df_with_nan = pd.DataFrame({
'Group': ['A', 'A', 'A', 'B', 'B'],
'Value1': [np.nan, 10, 20, 30, np.nan],
'Value2': [5, np.nan, 15, np.nan, 25]
})

print("DataFrame with NaN:")
print(df_with_nan)
print()

print("Using .first():")
print(df_with_nan.groupby('Group').first())
print()

print("Using .nth(0):")
print(df_with_nan.groupby('Group').nth(0))
print()

Output:

DataFrame with NaN:
Group Value1 Value2
0 A NaN 5.0
1 A 10.0 NaN
2 A 20.0 15.0
3 B 30.0 NaN
4 B NaN 25.0

Using .first():
Value1 Value2
Group
A 10.0 5.0
B 30.0 25.0

Using .nth(0):
Group Value1 Value2
0 A NaN 5.0
3 B 30.0 NaN
note

Choose nth(0) if you strictly need the first row by its original position within the group. Choose first() if you need the first valid data point for each column in the group.

Method 3: Getting the First N Rows of Each Group using DataFrameGroupBy.head(N)

If you need more than just the first row (e.g., the first 2 rows) from each group, use groupby().head(N).

import pandas as pd

df = pd.DataFrame({
'Team': ['A', 'B', 'A', 'B', 'A', 'C', 'B', 'C'],
'Player': ['P1', 'P2', 'P3', 'P4', 'P5', 'P6', 'P7', 'P8'],
'Score': [100, 150, 120, 90, 110, 200, 130, 180],
'Date': pd.to_datetime(['2025-01-01', '2025-01-01', '2025-01-02', '2025-01-02',
'2025-01-03', '2025-01-03', '2025-01-03', '2025-01-04'])
}).sort_values(by=['Team', 'Date']).reset_index(drop=True)

# ✅ Get the first 2 rows of each 'Team' group
first_2_rows_head = df.groupby('Team').head(2)

print("First 2 rows of each 'Team' group (using .head(2)):")
print(first_2_rows_head)
print()

# Optionally reset index if a continuous 0-based index is desired
first_2_rows_head_reset = df.groupby('Team').head(2).reset_index(drop=True)
print("With reset_index(drop=True):")
print(first_2_rows_head_reset)

Output:

First 2 rows of each 'Team' group (using .head(2)):
Team Player Score Date
0 A P1 100 2025-01-01
1 A P3 120 2025-01-02
3 B P2 150 2025-01-01
4 B P4 90 2025-01-02
6 C P6 200 2025-01-03
7 C P8 180 2025-01-04

With reset_index(drop=True):
Team Player Score Date
0 A P1 100 2025-01-01
1 A P3 120 2025-01-02
2 B P2 150 2025-01-01
3 B P4 90 2025-01-02
4 C P6 200 2025-01-03
5 C P8 180 2025-01-04

Method 4: Using DataFrame.drop_duplicates() (Alternative for First Occurrence)

If your goal is to get the first occurring row for each unique value (or combination of values) in the grouping column(s), drop_duplicates() can be an alternative. It keeps the first encountered row by default when duplicates in the specified subset of columns are found.

import pandas as pd

df_unsorted = pd.DataFrame({
'Team': ['B', 'A', 'A', 'C', 'B', 'A', 'B', 'C'], # Unsorted Teams
'Player': ['P2', 'P1', 'P3', 'P6', 'P4', 'P5', 'P7', 'P8'],
'Score': [150, 100, 120, 200, 90, 110, 130, 180]
})

print("Unsorted DataFrame for drop_duplicates demo:")
print(df_unsorted)
print()

# ✅ Keep the first row encountered for each unique 'Team'
first_occurrence_by_team = df_unsorted.drop_duplicates(subset=['Team'], keep='first')

print("First occurrence of each 'Team' (using drop_duplicates()):")
print(first_occurrence_by_team)

Output: (Rows will be the first time B, A, C appeared in df_unsorted)

Unsorted DataFrame for drop_duplicates demo:
Team Player Score
0 B P2 150
1 A P1 100
2 A P3 120
3 C P6 200
4 B P4 90
5 A P5 110
6 B P7 130
7 C P8 180

First occurrence of each 'Team' (using drop_duplicates()):
Team Player Score
0 B P2 150
1 A P1 100
3 C P6 200
note
  • subset=['Team']: Specifies that duplicates should be identified based on the 'Team' column.
  • keep='first': Instructs Pandas to keep the first occurrence and drop subsequent duplicates. keep='last' would keep the last. This method is useful if the "first row" concept is tied to the order of appearance rather than a sorted order within groups.

Conclusion

Pandas offers several convenient ways to extract the first row(s) from each group within a DataFrame:

  • df.groupby('group_col').first(): Returns a DataFrame with the first non-null value from each column for each group. The grouping column becomes the index.
  • df.groupby('group_col').nth(0): Returns a DataFrame containing the positionally first row from each group, preserving the original index of those rows. Handles NaNs by simply taking the first row as is.
  • df.groupby('group_col').head(N): Returns a DataFrame containing the first N rows from each group, preserving their original indices.
  • df.drop_duplicates(subset=['group_col'], keep='first'): An alternative if "first" means the first encountered row for each unique group key in the original DataFrame order (before any sorting by group).

Choose the method that best aligns with your definition of "first row" (positional vs. non-null) and how you want NaN values to be treated. Remember to use .reset_index() if you need the grouping column(s) back as regular columns after using methods like .first().