Python Pandas: How to Calculate Average (Mean) for Each Row in DataFrame

Calculating the average (mean) of values across columns for each row in a Pandas DataFrame is a common operation in data analysis. This can be useful for summarizing row-wise data, creating new features, or understanding the central tendency of values for each observation. Pandas provides the DataFrame.mean() method, which, when used with the correct axis parameter, makes this straightforward.

This guide explains how to calculate the average for each row across all numeric columns or a specific subset of columns in a Pandas DataFrame.

The Goal: Row-wise Averages

Given a Pandas DataFrame with multiple numeric columns, we want to compute the average of these values for each individual row. The result will be a new Pandas Series where each element is the average of the values in the corresponding row of the original DataFrame (or a new column in the DataFrame if assigned).

Example DataFrame

import pandas as pd
import numpy as np

data = {
    'StudentID': ['S101', 'S102', 'S103', 'S104'],
    'Test1_Score': [85, 92, 78, np.nan],                # Contains a NaN
    'Test2_Score': [90, 88, 82, 95],
    'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
    'Subject': ['Math', 'Science', 'Math', 'History']   # Non-numeric column
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
  StudentID  Test1_Score  Test2_Score  Homework_Avg  Subject
0      S101         85.0           90          92.5     Math
1      S102         92.0           88          85.0  Science
2      S103         78.0           82          88.0     Math
3      S104          NaN           95          91.0  History

Method 1: Calculating Average Across ALL Numeric Columns (`df.mean(axis=1)`) (Recommended)

The DataFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs) method calculates the mean.

axis=1: This is crucial. It specifies that the mean should be computed row-wise (across columns for each row). The default axis=0 computes column-wise means.
skipna=True (default): Excludes NaN values from the calculation. If a row has all NaNs in the numeric columns being averaged, its mean will be NaN.
numeric_only=False (default in older Pandas, True might become default or behavior change): If False, it tries to operate on all columns and might raise an error for non-numeric types. If True, it only considers numeric columns. It's often safer to explicitly select numeric columns if your DataFrame has mixed types.

Basic Usage

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
    'Test1_Score': [85, 92, 78, np.nan],
    'Test2_Score': [90, 88, 82, 95],
    'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
})

# ✅ Calculate the mean for each row across all (implicitly numeric) columns
row_means = df_example.mean(axis=1, numeric_only=True) # Explicitly use numeric_only=True for safety
# If all columns intended for mean are already numeric, numeric_only=True is not strictly needed
# but it's good practice if there could be non-numeric columns you don't want to average.

print("Average for each row (across numeric columns):")
print(row_means)

Output:

Average for each row (across numeric columns):
0    89.166667
1    88.333333
2    82.666667
3    93.000000
dtype: float64

Pandas automatically attempts to use only numeric columns when axis=1 if numeric_only is not set to False explicitly (behavior might slightly vary with Pandas versions; using numeric_only=True or pre-selecting numeric columns is safest).

Assigning as a New Column (and `DataFrame.assign()`)

You can assign this Series of row means back to the DataFrame as a new column.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'StudentID': ['S101', 'S102', 'S103', 'S104'],
    'Test1_Score': [85, 92, 78, np.nan],
    'Test2_Score': [90, 88, 82, 95],
    'Homework_Avg': [92.5, 85.0, 88.0, 91.0],
    'Subject': ['Math', 'Science', 'Math', 'History']
})

# direct assignment
df['Overall_Average'] = df.mean(axis=1, numeric_only=True)

print("DataFrame with 'Overall_Average' column:")
print(df[['StudentID', 'Overall_Average']]) # Show relevant columns

Output:

DataFrame with 'Overall_Average' column:
  StudentID  Overall_Average
0      S101        89.166667
1      S102        88.333333
2      S103        82.666667
3      S104        93.000000

note

Using DataFrame.assign() is a good practice as it returns a new DataFrame, preventing SettingWithCopyWarning in some chained assignment scenarios.

note

A second option is using DataFrame.assign() (returns a new DataFrame, good for chaining)

df = df.assign(Overall_Average_Assign=df.mean(axis=1, numeric_only=True))

Method 2: Calculating Average for SPECIFIC Columns

If you only want to average specific numeric columns for each row, select those columns first.

Using `df.iloc` (Position-Based Selection)

Select columns by their integer positions.

import pandas as pd
import numpy as np

df_iloc_example = pd.DataFrame({
    'StudentID': ['S101', 'S102'],
    'Test1_Score': [85, 92],       # Column at position 1 (0-indexed)
    'Test2_Score': [90, 88],       # Column at position 2
    'Homework_Avg': [92.5, 85.0],  # Column at position 3
    'NonNumeric': ['A','B']
})

# ✅ Calculate row average for columns at position 1 and 2 ('Test1_Score', 'Test2_Score')
# df.iloc[:, 1:3] selects all rows (:) and columns from index 1 up to (not including) 3.
df_iloc_example['Test_Average_iloc'] = df_iloc_example.iloc[:, 1:3].mean(axis=1)

print("Row average for Test1_Score and Test2_Score (using .iloc):")
print(df_iloc_example[['StudentID', 'Test1_Score', 'Test2_Score', 'Test_Average_iloc']])
print()

# To select non-contiguous columns by position, e.g., 1st and 3rd data columns (positions 1 and 3):
df_iloc_example['Avg_Non_Contiguous'] = df_iloc_example.iloc[:, [1, 3]].mean(axis=1)
print("Row average for Test1_Score and Homework_Avg (using .iloc non-contiguous):")
print(df_iloc_example[['StudentID', 'Test1_Score', 'Homework_Avg', 'Avg_Non_Contiguous']])

Output:

Row average for Test1_Score and Test2_Score (using .iloc):
  StudentID  Test1_Score  Test2_Score  Test_Average_iloc
0      S101           85           90               87.5
1      S102           92           88               90.0

Row average for Test1_Score and Homework_Avg (using .iloc non-contiguous):
  StudentID  Test1_Score  Homework_Avg  Avg_Non_Contiguous
0      S101           85          92.5               88.75
1      S102           92          85.0               88.50

Using `df.loc` (Label-Based Selection)

Select columns by their names (labels).

import pandas as pd
import numpy as np

df_loc_example = pd.DataFrame({
    'StudentID': ['S101', 'S102'],
    'Test1_Score': [85, 92],
    'Test2_Score': [90, 88],
    'Homework_Avg': [92.5, 85.0],
    'NonNumeric': ['A','B']
})

# ✅ Calculate row average for 'Test1_Score' and 'Homework_Avg' columns
columns_to_average = ['Test1_Score', 'Homework_Avg']
df_loc_example['Selected_Average_loc'] = df_loc_example.loc[:, columns_to_average].mean(axis=1)
# Or more directly: df_loc_example[columns_to_average].mean(axis=1)

print("Row average for Test1_Score and Homework_Avg (using .loc):")
print(df_loc_example[['StudentID', 'Test1_Score', 'Homework_Avg', 'Selected_Average_loc']])

Output:

Row average for Test1_Score and Homework_Avg (using .loc):
  StudentID  Test1_Score  Homework_Avg  Selected_Average_loc
0      S101           85          92.5                 88.75
1      S102           92          85.0                 88.50

Handling Non-Numeric Data

df.mean(axis=1, numeric_only=True): This is the most straightforward way to ensure only numeric columns are considered for the row-wise mean.
Pre-selection: If you don't use numeric_only=True, df.mean(axis=1) might raise a TypeError if it encounters non-numeric columns (like 'Subject' in our main example) that it cannot average. It's good practice to either:
- Explicitly select only the numeric columns you want to average: df[['Col1', 'Col2', 'Col3']].mean(axis=1).
- Use df.select_dtypes(include=np.number).mean(axis=1) to automatically select all numeric columns.

Conclusion

Calculating the average for each row in a Pandas DataFrame is a simple yet powerful operation:

Use DataFrame.mean(axis=1) to compute the mean across columns for each row.
For safety with mixed-type DataFrames, either:
- Specify numeric_only=True: df.mean(axis=1, numeric_only=True).
- Pre-select numeric columns before calling .mean(axis=1):
  - df[['NumericCol1', 'NumericCol2']].mean(axis=1) (using .loc or direct selection)
  - df.iloc[:, start_idx:end_idx].mean(axis=1) (using .iloc)
Assign the resulting Series to a new column in your DataFrame: df['Row_Average'] = .... Using df.assign(Row_Average=...) is also a good pattern.

By setting axis=1, you ensure the aggregation happens horizontally (row-wise), providing a mean value for each observation based on the selected numeric features.

The Goal: Row-wise Averages​

Example DataFrame​

Method 1: Calculating Average Across ALL Numeric Columns (df.mean(axis=1)) (Recommended)​

Basic Usage​

Assigning as a New Column (and DataFrame.assign())​

Method 2: Calculating Average for SPECIFIC Columns​

Using df.iloc (Position-Based Selection)​

Using df.loc (Label-Based Selection)​

Handling Non-Numeric Data​

Conclusion​

Table of Contents

The Goal: Row-wise Averages

Example DataFrame

Method 1: Calculating Average Across ALL Numeric Columns (`df.mean(axis=1)`) (Recommended)

Basic Usage

Assigning as a New Column (and `DataFrame.assign()`)

Method 2: Calculating Average for SPECIFIC Columns

Using `df.iloc` (Position-Based Selection)

Using `df.loc` (Label-Based Selection)

Handling Non-Numeric Data

Conclusion