Python Pandas: How to Calculate Average (Mean) for Each Row in DataFrame
Calculating the average (mean) of values across columns for each row in a Pandas DataFrame is a common operation in data analysis. This can be useful for summarizing row-wise data, creating new features, or understanding the central tendency of values for each observation. Pandas provides the DataFrame.mean()
method, which, when used with the correct axis
parameter, makes this straightforward.
This guide explains how to calculate the average for each row across all numeric columns or a specific subset of columns in a Pandas DataFrame.
The Goal: Row-wise Averages
Given a Pandas DataFrame with multiple numeric columns, we want to compute the average of these values for each individual row. The result will be a new Pandas Series where each element is the average of the values in the corresponding row of the original DataFrame (or a new column in the DataFrame if assigned).
Example DataFrame
import pandas as pd
import numpy as np
data = {
'StudentID': ['S101', 'S102', 'S103', 'S104'],
'Test1_Score': [85, 92, 78, np.nan], # Contains a NaN
'Test2_Score': [90, 88, 82, 95],
'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
'Subject': ['Math', 'Science', 'Math', 'History'] # Non-numeric column
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
StudentID Test1_Score Test2_Score Homework_Avg Subject
0 S101 85.0 90 92.5 Math
1 S102 92.0 88 85.0 Science
2 S103 78.0 82 88.0 Math
3 S104 NaN 95 91.0 History
Method 1: Calculating Average Across ALL Numeric Columns (df.mean(axis=1)
) (Recommended)
The DataFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs)
method calculates the mean.
axis=1
: This is crucial. It specifies that the mean should be computed row-wise (across columns for each row). The defaultaxis=0
computes column-wise means.skipna=True
(default): ExcludesNaN
values from the calculation. If a row has allNaN
s in the numeric columns being averaged, its mean will beNaN
.numeric_only=False
(default in older Pandas,True
might become default or behavior change): IfFalse
, it tries to operate on all columns and might raise an error for non-numeric types. IfTrue
, it only considers numeric columns. It's often safer to explicitly select numeric columns if your DataFrame has mixed types.
Basic Usage
import pandas as pd
import numpy as np
df_example = pd.DataFrame({
'Test1_Score': [85, 92, 78, np.nan],
'Test2_Score': [90, 88, 82, 95],
'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
})
# ✅ Calculate the mean for each row across all (implicitly numeric) columns
row_means = df_example.mean(axis=1, numeric_only=True) # Explicitly use numeric_only=True for safety
# If all columns intended for mean are already numeric, numeric_only=True is not strictly needed
# but it's good practice if there could be non-numeric columns you don't want to average.
print("Average for each row (across numeric columns):")
print(row_means)
Output:
Average for each row (across numeric columns):
0 89.166667
1 88.333333
2 82.666667
3 93.000000
dtype: float64
Pandas automatically attempts to use only numeric columns when axis=1
if numeric_only
is not set to False
explicitly (behavior might slightly vary with Pandas versions; using numeric_only=True
or pre-selecting numeric columns is safest).
Assigning as a New Column (and DataFrame.assign()
)
You can assign this Series of row means back to the DataFrame as a new column.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'StudentID': ['S101', 'S102', 'S103', 'S104'],
'Test1_Score': [85, 92, 78, np.nan],
'Test2_Score': [90, 88, 82, 95],
'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
'Subject': ['Math', 'Science', 'Math', 'History']
})
# direct assignment
df['Overall_Average'] = df.mean(axis=1, numeric_only=True)
print("DataFrame with 'Overall_Average' column:")
print(df[['StudentID', 'Overall_Average']]) # Show relevant columns
Output:
DataFrame with 'Overall_Average' column:
StudentID Overall_Average
0 S101 89.166667
1 S102 88.333333
2 S103 82.666667
3 S104 93.000000
Using DataFrame.assign()
is a good practice as it returns a new DataFrame, preventing SettingWithCopyWarning
in some chained assignment scenarios.
A second option is using DataFrame.assign()
(returns a new DataFrame, good for chaining)
df = df.assign(Overall_Average_Assign=df.mean(axis=1, numeric_only=True))
Method 2: Calculating Average for SPECIFIC Columns
If you only want to average specific numeric columns for each row, select those columns first.
Using df.iloc
(Position-Based Selection)
Select columns by their integer positions.
import pandas as pd
import numpy as np
df_iloc_example = pd.DataFrame({
'StudentID': ['S101', 'S102'],
'Test1_Score': [85, 92], # Column at position 1 (0-indexed)
'Test2_Score': [90, 88], # Column at position 2
'Homework_Avg': [92.5, 85.0], # Column at position 3
'NonNumeric': ['A','B']
})
# ✅ Calculate row average for columns at position 1 and 2 ('Test1_Score', 'Test2_Score')
# df.iloc[:, 1:3] selects all rows (:) and columns from index 1 up to (not including) 3.
df_iloc_example['Test_Average_iloc'] = df_iloc_example.iloc[:, 1:3].mean(axis=1)
print("Row average for Test1_Score and Test2_Score (using .iloc):")
print(df_iloc_example[['StudentID', 'Test1_Score', 'Test2_Score', 'Test_Average_iloc']])
print()
# To select non-contiguous columns by position, e.g., 1st and 3rd data columns (positions 1 and 3):
df_iloc_example['Avg_Non_Contiguous'] = df_iloc_example.iloc[:, [1, 3]].mean(axis=1)
print("Row average for Test1_Score and Homework_Avg (using .iloc non-contiguous):")
print(df_iloc_example[['StudentID', 'Test1_Score', 'Homework_Avg', 'Avg_Non_Contiguous']])
Output:
Row average for Test1_Score and Test2_Score (using .iloc):
StudentID Test1_Score Test2_Score Test_Average_iloc
0 S101 85 90 87.5
1 S102 92 88 90.0
Row average for Test1_Score and Homework_Avg (using .iloc non-contiguous):
StudentID Test1_Score Homework_Avg Avg_Non_Contiguous
0 S101 85 92.5 88.75
1 S102 92 85.0 88.50
Using df.loc
(Label-Based Selection)
Select columns by their names (labels).
import pandas as pd
import numpy as np
df_loc_example = pd.DataFrame({
'StudentID': ['S101', 'S102'],
'Test1_Score': [85, 92],
'Test2_Score': [90, 88],
'Homework_Avg': [92.5, 85.0],
'NonNumeric': ['A','B']
})
# ✅ Calculate row average for 'Test1_Score' and 'Homework_Avg' columns
columns_to_average = ['Test1_Score', 'Homework_Avg']
df_loc_example['Selected_Average_loc'] = df_loc_example.loc[:, columns_to_average].mean(axis=1)
# Or more directly: df_loc_example[columns_to_average].mean(axis=1)
print("Row average for Test1_Score and Homework_Avg (using .loc):")
print(df_loc_example[['StudentID', 'Test1_Score', 'Homework_Avg', 'Selected_Average_loc']])
Output:
Row average for Test1_Score and Homework_Avg (using .loc):
StudentID Test1_Score Homework_Avg Selected_Average_loc
0 S101 85 92.5 88.75
1 S102 92 85.0 88.50
Handling Non-Numeric Data
df.mean(axis=1, numeric_only=True)
: This is the most straightforward way to ensure only numeric columns are considered for the row-wise mean.- Pre-selection: If you don't use
numeric_only=True
,df.mean(axis=1)
might raise aTypeError
if it encounters non-numeric columns (like 'Subject' in our main example) that it cannot average. It's good practice to either:- Explicitly select only the numeric columns you want to average:
df[['Col1', 'Col2', 'Col3']].mean(axis=1)
. - Use
df.select_dtypes(include=np.number).mean(axis=1)
to automatically select all numeric columns.
- Explicitly select only the numeric columns you want to average:
Conclusion
Calculating the average for each row in a Pandas DataFrame is a simple yet powerful operation:
- Use
DataFrame.mean(axis=1)
to compute the mean across columns for each row. - For safety with mixed-type DataFrames, either:
- Specify
numeric_only=True
:df.mean(axis=1, numeric_only=True)
. - Pre-select numeric columns before calling
.mean(axis=1)
:df[['NumericCol1', 'NumericCol2']].mean(axis=1)
(using.loc
or direct selection)df.iloc[:, start_idx:end_idx].mean(axis=1)
(using.iloc
)
- Specify
- Assign the resulting Series to a new column in your DataFrame:
df['Row_Average'] = ...
. Usingdf.assign(Row_Average=...)
is also a good pattern.
By setting axis=1
, you ensure the aggregation happens horizontally (row-wise), providing a mean value for each observation based on the selected numeric features.