Python Pandas: How to GroupBy Columns with NaN (Missing) Values

When performing groupby() operations in Pandas, the default behavior is to exclude rows where the grouping key(s) contain NaN (Not a Number) or missing values. However, there are scenarios where you might want to include these NaN values as a distinct group in your analysis or replace them with a placeholder before grouping.

This guide explains how Pandas groupby() handles NaNs by default and demonstrates methods to include NaNs as a group or to preprocess them using fillna() or astype(str).

Understanding `groupby()` Behavior with `NaN` Values

By default, when you use df.groupby('column_with_nan'), Pandas implicitly sets an internal parameter dropna=True. This means that any rows where 'column_with_nan' has a NaN value will be dropped from consideration before the groups are formed and aggregations are calculated. The NaN values effectively disappear from the grouping keys.

Example DataFrame with `NaN`s

We'll use the following DataFrame which contains NaN values in the 'Category' column, a common column to group by.

import pandas as pd
import numpy as np  # For np.nan

data = {
    'ProductID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Category': ['Electronics', 'Books', np.nan, 'Electronics', 'Home Goods', np.nan, 'Books', 'Home Goods'],
    'Sales': [200, 25, 50, 300, 150, 40, 30, 120],
    'Quantity': [2, 1, 3, 3, 5, 2, 1, 4]
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
   ProductID     Category  Sales  Quantity
      101  Electronics    200         2
      102        Books     25         1
      103          NaN     50         3
      104  Electronics    300         3
      105   Home Goods    150         5
      106          NaN     40         2
      107        Books     30         1
      108   Home Goods    120         4

Method 1: Including `NaN`s as a Group using `dropna=False` (Recommended)

The DataFrame.groupby() method has a dropna parameter. Setting dropna=False instructs Pandas to treat NaN values in the grouping key(s) as a valid group label.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Category': ['Electronics', 'Books', np.nan, 'Electronics', 'Home Goods', np.nan, 'Books', 'Home Goods'],
    'Sales': [200, 25, 50, 300, 150, 40, 30, 120],
    'Quantity': [2, 1, 3, 3, 5, 2, 1, 4]
})

# Default behavior: dropna=True (NaNs in 'Category' are excluded from grouping keys)
print("GroupBy 'Category' (default, dropna=True) - sum of Sales:")
grouped_default = df.groupby('Category')['Sales'].sum()
print(grouped_default)
print()

# ✅ Include NaN as a group by setting dropna=False
print("GroupBy 'Category' (dropna=False) - sum of Sales:")
grouped_with_nan = df.groupby('Category', dropna=False)['Sales'].sum()
print(grouped_with_nan)

Output:

GroupBy 'Category' (default, dropna=True) - sum of Sales:
Category
Books           55
Electronics    500
Home Goods     270
Name: Sales, dtype: int64

GroupBy 'Category' (dropna=False) - sum of Sales:
Category
Books           55
Electronics    500
Home Goods     270
NaN             90
Name: Sales, dtype: int64

note

This is the most direct and idiomatic way to include NaNs as a separate category in your groupby operations. The NaN group will be labeled as NaN in the resulting grouped object's index.

Method 2: Replacing `NaN`s Before Grouping using `fillna()`

If you prefer to have a specific placeholder label for the NaN group instead of NaN itself, you can use DataFrame.fillna() to replace NaN values in the grouping column(s) before calling groupby().

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Category': ['Electronics', 'Books', np.nan, 'Electronics', 'Home Goods', np.nan, 'Books', 'Home Goods'],
    'Sales': [200, 25, 50, 300, 150, 40, 30, 120],
    'Quantity': [2, 1, 3, 3, 5, 2, 1, 4]
})

# Create a copy to modify for this example
df_filled = df.copy()

# ✅ Replace NaN in 'Category' with a placeholder string
placeholder = 'Uncategorized'
df_filled['Category'] = df_filled['Category'].fillna(placeholder)
# You can also do this for the whole DataFrame: df_filled = df.fillna({'Category': placeholder})

print(f"DataFrame after filling NaN in 'Category' with '{placeholder}':")
print(df_filled)
print()

# Now groupby will use the placeholder as a group label
grouped_filled = df_filled.groupby('Category')['Sales'].mean() # Example: get mean sales
print(f"GroupBy 'Category' after fillna('{placeholder}') - mean Sales:")
print(grouped_filled)

Output:

DataFrame after filling NaN in 'Category' with 'Uncategorized':
   ProductID       Category  Sales  Quantity
0        101    Electronics    200         2
1        102          Books     25         1
2        103  Uncategorized     50         3
3        104    Electronics    300         3
4        105     Home Goods    150         5
5        106  Uncategorized     40         2
6        107          Books     30         1
7        108     Home Goods    120         4

GroupBy 'Category' after fillna('Uncategorized') - mean Sales:
Category
Books             27.5
Electronics      250.0
Home Goods       135.0
Uncategorized     45.0
Name: Sales, dtype: float64

Choose a placeholder string that makes sense for your data (e.g., "Unknown", "Missing", "N/A").
This method modifies the data before grouping. If you need to preserve the original NaNs elsewhere, work on a copy of the DataFrame or the specific column.

Method 3: Converting Grouping Column to String using `astype(str)`

Another way to ensure NaN values are treated as a distinct group is to convert the entire grouping column to the string data type using astype(str). NaN values will be converted to the literal string "nan".

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Category': ['Electronics', 'Books', np.nan, 'Electronics', 'Home Goods', np.nan, 'Books', 'Home Goods'],
    'Sales': [200, 25, 50, 300, 150, 40, 30, 120],
    'Quantity': [2, 1, 3, 3, 5, 2, 1, 4]
})

df_str_type = df.copy()

# ✅ Convert the 'Category' column to string type
df_str_type['Category'] = df_str_type['Category'].astype(str)
print("'Category' column after astype(str):")
print(df_str_type['Category'])
print()

# Groupby will now treat "nan" as a regular string category
grouped_str_type = df_str_type.groupby('Category')['Quantity'].sum() # Example: sum quantity
print("GroupBy 'Category' after astype(str) - sum Quantity:")
print(grouped_str_type)

Output:

'Category' column after astype(str):
0    Electronics
1          Books
2            nan
3    Electronics
4     Home Goods
5            nan
6          Books
7     Home Goods
Name: Category, dtype: object

GroupBy 'Category' after astype(str) - sum Quantity:
Category
Books          2
Electronics    5
Home Goods     9
nan            5
Name: Quantity, dtype: int64

note

This method also changes the data. The original NaN values are now string literals "nan".

Choosing the Right Method

groupby('col', dropna=False): Most recommended if you want to include NaNs as a distinct group in your aggregation without altering the original data and are happy with the NaN label in the result. It's clean and direct.
fillna('placeholder').groupby('col'): Use this if you need a specific, human-readable label (other than NaN) for the group of missing values in your groupby output. This modifies the data before grouping.
astype(str).groupby('col'): Use this if you want to treat all values in the grouping column, including NaNs (which become "nan" strings), uniformly as strings during grouping. This also modifies the data.

Conclusion

Pandas groupby() by default excludes NaN values from the grouping keys. To include them:

Set the dropna=False parameter in your groupby() call: df.groupby('column_with_nan', dropna=False). This is the most direct way to treat NaNs as a separate group.
Alternatively, preprocess the NaN values in your grouping column before calling groupby():
- Use df['col'].fillna('SomePlaceholder') to replace NaNs with a specific string label.
- Use df['col'].astype(str) to convert all values (including NaNs, which become the string "nan") to strings.

Choosing the best approach depends on whether you want to modify your original data and how you want the NaN group to be represented in the output of your aggregation. The dropna=False option is often the cleanest for including NaNs without data modification.

Understanding groupby() Behavior with NaN Values​

Example DataFrame with NaNs​

Method 1: Including NaNs as a Group using dropna=False (Recommended)​

Method 2: Replacing NaNs Before Grouping using fillna()​

Method 3: Converting Grouping Column to String using astype(str)​

Choosing the Right Method​

Conclusion​

Table of Contents

Understanding `groupby()` Behavior with `NaN` Values

Example DataFrame with `NaN`s

Method 1: Including `NaN`s as a Group using `dropna=False` (Recommended)

Method 2: Replacing `NaN`s Before Grouping using `fillna()`

Method 3: Converting Grouping Column to String using `astype(str)`

Choosing the Right Method

Conclusion