Python Pandas: How to Use fillna()
on Specific DataFrame Columns Only
Handling missing values (NaN
, None
, NaT
) is a critical step in data cleaning and preparation with Pandas. The DataFrame.fillna()
method is a versatile tool for this, but often you only want to fill missing values in specific columns, possibly with different fill values for each.
This guide explains how to effectively use fillna()
to target one or more specific columns in a Pandas DataFrame, using column selection and dictionary-based approaches.
The Goal: Targeted Missing Value Imputation
Given a Pandas DataFrame with missing values (NaN
, None
, or NaT
) in various columns, we want to:
- Fill missing values in only one designated column with a specific value.
- Fill missing values in a list of designated columns with the same specific value.
- Fill missing values in multiple designated columns, using a different specific fill value for each of those columns.
Example DataFrame with Missing Values
import pandas as pd
import numpy as np # For np.nan
data = {
'OrderID': [101, 102, None, 104, 105, None],
'Product_Category': ['Electronics', 'Books', None, 'Electronics', None, 'Apparel'],
'Region': ['North', None, 'South', 'West', 'North', None],
'Sales': [200.0, np.nan, 150.0, 300.0, np.nan, 120.0],
'Return_Date': pd.to_datetime([None, '2023-01-10', None, '2023-02-20', None, None])
}
df_original = pd.DataFrame(data)
print("Original DataFrame with missing values:")
print(df_original)
Output:
Original DataFrame with missing values:
OrderID Product_Category Region Sales Return_Date
0 101.0 Electronics North 200.0 NaT
1 102.0 Books None NaN 2023-01-10
2 NaN None South 150.0 NaT
3 104.0 Electronics West 300.0 2023-02-20
4 105.0 None North NaN NaT
5 NaN Apparel None 120.0 NaT
Method 1: Applying fillna()
to a SINGLE Specific Column
Select the target column (which returns a Series) and then call .fillna()
on that Series. Assign the result back to the DataFrame column.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'OrderID': [101, None, 103],
'Product_Category': ['Electronics', None, 'Apparel'],
'Region': ['North', 'South', None]
})
# Make a copy to modify
df_single_col_fill = df.copy()
# ✅ Fill NaN in 'Product_Category' column with 'Unknown'
fill_value_category = 'Unknown'
df_single_col_fill['Product_Category'] = df_single_col_fill['Product_Category'].fillna(value=fill_value_category)
print("DataFrame after filling NaN in 'Product_Category' only:")
print(df_single_col_fill)
Output:
DataFrame after filling NaN in 'Product_Category' only:
OrderID Product_Category Region
0 101.0 Electronics North
1 NaN Unknown South
2 103.0 Apparel None
df['ColumnName']
: Selects the specific column..fillna(value=...)
: FillsNaN
s in that selected Series.df['ColumnName'] = ...
: Assigns the modified Series back.
Method 2: Applying fillna()
to MULTIPLE Specific Columns (Same Fill Value)
If you want to fill missing values in several columns with the same replacement value.
Using a List of Column Names
Select multiple columns (which returns a DataFrame subset) and call .fillna()
on this subset. Then assign the result back to those columns in the original DataFrame.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'OrderID': [101, None, 103], 'Product_Category': ['Electronics', None, 'Apparel'],
'Region': ['North', 'South', None], 'Sales': [200, np.nan, 150]
})
df_multi_col_fill = df.copy()
cols_to_fill_same = ['Product_Category', 'Region']
fill_value_common = 'Not_Available'
# ✅ Select multiple columns and apply fillna
df_multi_col_fill[cols_to_fill_same] = df_multi_col_fill[cols_to_fill_same].fillna(value=fill_value_common)
print("DataFrame after filling multiple columns with the same value:")
print(df_multi_col_fill)
Output:
DataFrame after filling multiple columns with the same value:
OrderID Product_Category Region Sales
0 101.0 Electronics North 200.0
1 NaN Not_Available South NaN
2 103.0 Apparel Not_Available 150.0
Using DataFrame.loc
(Alternative Selection)
You can use .loc
to select the columns for fillna
, though direct column selection as above is often simpler for this task.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'OrderID': [101, None, 103], 'Product_Category': ['Electronics', None, 'Apparel'],
'Region': ['North', 'South', None], 'Sales': [200, np.nan, 150]
})
df_loc_fill = df.copy()
cols_to_fill_loc = ['Product_Category', 'Region']
fill_value_loc = 'Missing_Data'
df_loc_fill.loc[:, cols_to_fill_loc] = df_loc_fill.loc[:, cols_to_fill_loc].fillna(value=fill_value_loc)
print("DataFrame after filling with .loc selection:")
print(df_loc_fill)
Output:
DataFrame after filling with .loc selection:
OrderID Product_Category Region Sales
0 101.0 Electronics North 200.0
1 NaN Missing_Data South NaN
2 103.0 Apparel Missing_Data 150.0
Method 3: Applying fillna()
with DIFFERENT Fill Values for Specific Columns (Using a Dictionary - Recommended for Varied Fills)
This is the most flexible method when different columns need different fill values. Pass a dictionary to df.fillna()
. The dictionary keys should be the column names, and the dictionary values should be the respective fill values for those columns.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'OrderID': [101, None, 103, 104],
'Product_Category': ['Electronics', None, 'Apparel', 'Electronics'],
'Region': ['North', 'South', None, 'West'],
'Sales': [200.0, np.nan, 150.0, np.nan],
'Return_Date': pd.to_datetime([None, '2023-01-10', None, None])
})
df_dict_fill = df.copy()
# ✅ Define a dictionary of {column_name: fill_value}
fill_values_dict = {
'OrderID': 0, # Fill missing OrderID with 0 (assuming it should be int after)
'Product_Category': 'Unknown_Category',
'Region': 'Global',
'Sales': df_dict_fill['Sales'].mean(), # Fill with mean of existing Sales
'Return_Date': pd.Timestamp('1900-01-01') # Fill missing dates with a placeholder date
}
# Apply fillna using the dictionary
# This returns a new DataFrame by default
df_filled_dict_mode = df_dict_fill.fillna(value=fill_values_dict)
print("DataFrame after filling specific columns using a dictionary:")
print(df_filled_dict_mode)
Output:
DataFrame after filling specific columns using a dictionary:
OrderID Product_Category Region Sales Return_Date
0 101.0 Electronics North 200.0 1900-01-01
1 0.0 Unknown_Category South 175.0 2023-01-10
2 103.0 Apparel Global 150.0 1900-01-01
3 104.0 Electronics West 175.0 1900-01-01
df.fillna(value=your_dictionary)
: Only the columns specified as keys inyour_dictionary
will have theirNaN
s filled using the corresponding dictionary value. Other columns remain unaffected (unless they were allNaN
and the fill value affected their dtype).- If OrderID was int and you fill with 0, then convert column back to int if desired
df_filled_dict_mode['OrderID'] = df_filled_dict_mode['OrderID'].astype(int)
Modifying In-Place (inplace=True
) vs. Reassignment
All fillna()
examples above return a new DataFrame by default. To modify the original DataFrame directly, use inplace=True
.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'OrderID': [101, None, 103, 104],
'Product_Category': ['Electronics', None, 'Apparel', 'Electronics'],
'Region': ['North', 'South', None, 'West'],
'Sales': [200.0, np.nan, 150.0, np.nan],
'Return_Date': pd.to_datetime([None, '2023-01-10', None, None])
})
df_inplace_example = df.copy()
fill_values_for_inplace = {
'Product_Category': 'N/A_Category',
'Sales': 0.0
}
print("DataFrame before inplace fillna:")
print(df_inplace_example.head(3))
print()
# ✅ Modify df_inplace_example directly
df_inplace_example.fillna(value=fill_values_for_inplace, inplace=True)
print("DataFrame AFTER inplace fillna:")
print(df_inplace_example.head(3))
# df_inplace_example is now modified
Output:
DataFrame before inplace fillna:
OrderID Product_Category Region Sales Return_Date
0 101.0 Electronics North 200.0 NaT
1 NaN None South NaN 2023-01-10
2 103.0 Apparel None 150.0 NaT
DataFrame AFTER inplace fillna:
OrderID Product_Category Region Sales Return_Date
0 101.0 Electronics North 200.0 NaT
1 NaN N/A_Category South 0.0 2023-01-10
2 103.0 Apparel None 150.0 NaT
- Reassignment (Preferred):
df = df.fillna(...)
ordf[cols] = df[cols].fillna(...)
- In-place:
df.fillna(..., inplace=True)
ordf[cols].fillna(..., inplace=True)
(Series fillna also has inplace)
Reassignment is often favored for better code clarity and to avoid SettingWithCopyWarning
in more complex operations.
Conclusion
Pandas fillna()
provides flexible control for imputing missing values in specific DataFrame columns:
- Single Column: Select the column and call
.fillna()
on it:df['ColA'] = df['ColA'].fillna(value_A)
. - Multiple Columns (Same Value): Select columns and call
.fillna()
:df[cols_list] = df[cols_list].fillna(common_value)
. - Multiple Columns (Different Values): Pass a dictionary to
df.fillna(value=fill_dict)
. This is the most versatile method for targeted filling with different values per column. This is the recommended approach when fill values differ by column.
Remember to choose your fill values appropriately for each column's data type and analytical context. Always decide whether to modify the DataFrame in-place (inplace=True
) or to reassign the result to a new (or the same) variable.