Skip to main content

Python Pandas: How to Calculate Average (Mean) for Each Row in DataFrame

Calculating the average (mean) of values across columns for each row in a Pandas DataFrame is a common operation in data analysis. This can be useful for summarizing row-wise data, creating new features, or understanding the central tendency of values for each observation. Pandas provides the DataFrame.mean() method, which, when used with the correct axis parameter, makes this straightforward.

This guide explains how to calculate the average for each row across all numeric columns or a specific subset of columns in a Pandas DataFrame.

The Goal: Row-wise Averages

Given a Pandas DataFrame with multiple numeric columns, we want to compute the average of these values for each individual row. The result will be a new Pandas Series where each element is the average of the values in the corresponding row of the original DataFrame (or a new column in the DataFrame if assigned).

Example DataFrame

import pandas as pd
import numpy as np

data = {
'StudentID': ['S101', 'S102', 'S103', 'S104'],
'Test1_Score': [85, 92, 78, np.nan], # Contains a NaN
'Test2_Score': [90, 88, 82, 95],
'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
'Subject': ['Math', 'Science', 'Math', 'History'] # Non-numeric column
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
StudentID Test1_Score Test2_Score Homework_Avg Subject
0 S101 85.0 90 92.5 Math
1 S102 92.0 88 85.0 Science
2 S103 78.0 82 88.0 Math
3 S104 NaN 95 91.0 History

The DataFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs) method calculates the mean.

  • axis=1: This is crucial. It specifies that the mean should be computed row-wise (across columns for each row). The default axis=0 computes column-wise means.
  • skipna=True (default): Excludes NaN values from the calculation. If a row has all NaNs in the numeric columns being averaged, its mean will be NaN.
  • numeric_only=False (default in older Pandas, True might become default or behavior change): If False, it tries to operate on all columns and might raise an error for non-numeric types. If True, it only considers numeric columns. It's often safer to explicitly select numeric columns if your DataFrame has mixed types.

Basic Usage

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
'Test1_Score': [85, 92, 78, np.nan],
'Test2_Score': [90, 88, 82, 95],
'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
})

# ✅ Calculate the mean for each row across all (implicitly numeric) columns
row_means = df_example.mean(axis=1, numeric_only=True) # Explicitly use numeric_only=True for safety
# If all columns intended for mean are already numeric, numeric_only=True is not strictly needed
# but it's good practice if there could be non-numeric columns you don't want to average.

print("Average for each row (across numeric columns):")
print(row_means)

Output:

Average for each row (across numeric columns):
0 89.166667
1 88.333333
2 82.666667
3 93.000000
dtype: float64

Pandas automatically attempts to use only numeric columns when axis=1 if numeric_only is not set to False explicitly (behavior might slightly vary with Pandas versions; using numeric_only=True or pre-selecting numeric columns is safest).

Assigning as a New Column (and DataFrame.assign())

You can assign this Series of row means back to the DataFrame as a new column.

import pandas as pd
import numpy as np

df = pd.DataFrame({
'StudentID': ['S101', 'S102', 'S103', 'S104'],
'Test1_Score': [85, 92, 78, np.nan],
'Test2_Score': [90, 88, 82, 95],
'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
'Subject': ['Math', 'Science', 'Math', 'History']
})

# direct assignment
df['Overall_Average'] = df.mean(axis=1, numeric_only=True)

print("DataFrame with 'Overall_Average' column:")
print(df[['StudentID', 'Overall_Average']]) # Show relevant columns

Output:

DataFrame with 'Overall_Average' column:
StudentID Overall_Average
0 S101 89.166667
1 S102 88.333333
2 S103 82.666667
3 S104 93.000000
note

Using DataFrame.assign() is a good practice as it returns a new DataFrame, preventing SettingWithCopyWarning in some chained assignment scenarios.

note

A second option is using DataFrame.assign() (returns a new DataFrame, good for chaining)

df = df.assign(Overall_Average_Assign=df.mean(axis=1, numeric_only=True))

Method 2: Calculating Average for SPECIFIC Columns

If you only want to average specific numeric columns for each row, select those columns first.

Using df.iloc (Position-Based Selection)

Select columns by their integer positions.

import pandas as pd
import numpy as np

df_iloc_example = pd.DataFrame({
'StudentID': ['S101', 'S102'],
'Test1_Score': [85, 92], # Column at position 1 (0-indexed)
'Test2_Score': [90, 88], # Column at position 2
'Homework_Avg': [92.5, 85.0], # Column at position 3
'NonNumeric': ['A','B']
})

# ✅ Calculate row average for columns at position 1 and 2 ('Test1_Score', 'Test2_Score')
# df.iloc[:, 1:3] selects all rows (:) and columns from index 1 up to (not including) 3.
df_iloc_example['Test_Average_iloc'] = df_iloc_example.iloc[:, 1:3].mean(axis=1)

print("Row average for Test1_Score and Test2_Score (using .iloc):")
print(df_iloc_example[['StudentID', 'Test1_Score', 'Test2_Score', 'Test_Average_iloc']])
print()

# To select non-contiguous columns by position, e.g., 1st and 3rd data columns (positions 1 and 3):
df_iloc_example['Avg_Non_Contiguous'] = df_iloc_example.iloc[:, [1, 3]].mean(axis=1)
print("Row average for Test1_Score and Homework_Avg (using .iloc non-contiguous):")
print(df_iloc_example[['StudentID', 'Test1_Score', 'Homework_Avg', 'Avg_Non_Contiguous']])

Output:

Row average for Test1_Score and Test2_Score (using .iloc):
StudentID Test1_Score Test2_Score Test_Average_iloc
0 S101 85 90 87.5
1 S102 92 88 90.0

Row average for Test1_Score and Homework_Avg (using .iloc non-contiguous):
StudentID Test1_Score Homework_Avg Avg_Non_Contiguous
0 S101 85 92.5 88.75
1 S102 92 85.0 88.50

Using df.loc (Label-Based Selection)

Select columns by their names (labels).

import pandas as pd
import numpy as np

df_loc_example = pd.DataFrame({
'StudentID': ['S101', 'S102'],
'Test1_Score': [85, 92],
'Test2_Score': [90, 88],
'Homework_Avg': [92.5, 85.0],
'NonNumeric': ['A','B']
})

# ✅ Calculate row average for 'Test1_Score' and 'Homework_Avg' columns
columns_to_average = ['Test1_Score', 'Homework_Avg']
df_loc_example['Selected_Average_loc'] = df_loc_example.loc[:, columns_to_average].mean(axis=1)
# Or more directly: df_loc_example[columns_to_average].mean(axis=1)

print("Row average for Test1_Score and Homework_Avg (using .loc):")
print(df_loc_example[['StudentID', 'Test1_Score', 'Homework_Avg', 'Selected_Average_loc']])

Output:

Row average for Test1_Score and Homework_Avg (using .loc):
StudentID Test1_Score Homework_Avg Selected_Average_loc
0 S101 85 92.5 88.75
1 S102 92 85.0 88.50

Handling Non-Numeric Data

  • df.mean(axis=1, numeric_only=True): This is the most straightforward way to ensure only numeric columns are considered for the row-wise mean.
  • Pre-selection: If you don't use numeric_only=True, df.mean(axis=1) might raise a TypeError if it encounters non-numeric columns (like 'Subject' in our main example) that it cannot average. It's good practice to either:
    • Explicitly select only the numeric columns you want to average: df[['Col1', 'Col2', 'Col3']].mean(axis=1).
    • Use df.select_dtypes(include=np.number).mean(axis=1) to automatically select all numeric columns.

Conclusion

Calculating the average for each row in a Pandas DataFrame is a simple yet powerful operation:

  1. Use DataFrame.mean(axis=1) to compute the mean across columns for each row.
  2. For safety with mixed-type DataFrames, either:
    • Specify numeric_only=True: df.mean(axis=1, numeric_only=True).
    • Pre-select numeric columns before calling .mean(axis=1):
      • df[['NumericCol1', 'NumericCol2']].mean(axis=1) (using .loc or direct selection)
      • df.iloc[:, start_idx:end_idx].mean(axis=1) (using .iloc)
  3. Assign the resulting Series to a new column in your DataFrame: df['Row_Average'] = .... Using df.assign(Row_Average=...) is also a good pattern.

By setting axis=1, you ensure the aggregation happens horizontally (row-wise), providing a mean value for each observation based on the selected numeric features.