Python Pandas: Calculating Mean (Average) Across Multiple DataFrames

In data analysis, you often work with multiple datasets that represent similar entities or observations over different periods or conditions. A common task is to consolidate these datasets and compute aggregate statistics, such as the mean (average), across all of them. For instance, you might want to find the average sales figures from several monthly reports or the mean sensor readings from different experimental runs.

This guide will comprehensively demonstrate how to calculate the mean across multiple Pandas DataFrames, covering both an overall mean for each column and a more granular row-wise mean based on matching indices. We'll explore techniques using pd.concat(), groupby(), and the stack()/unstack() methods to give you a complete toolkit.

The Goal: Averaging Data from Separate DataFrames

Imagine you have several Pandas DataFrames, each containing similar data (i.e., same columns and data types, representing comparable measurements). You might want to compute:

Overall Mean: The average value for each column, considering all data points from all DataFrames combined.
Row-wise Mean: The average value for each row, assuming the rows in different DataFrames correspond to each other (e.g., based on a common index).

Let's define two sample DataFrames for our examples:

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

df_short = pd.DataFrame({ # For illustrating different row counts
    'sensor_A': [16, 18],
    'sensor_B': [19, 22]
})

print("df1:\n", df1)
print()
print("df2:\n", df2)

Output:

df1:
    sensor_A  sensor_B
      10        22
      12        20
      15        25
      11        23
      13        21

df2:
    sensor_A  sensor_B
      11        20
      14        21
      13        22
      10        24
      15        20

Method 1: Overall Column Mean Across All DataFrames

This approach calculates the mean for each column by first combining all rows from all DataFrames into a single, larger DataFrame.

Concatenate the DataFrames

Use pandas.concat() to stack the DataFrames vertically (along axis=0, which is the default).

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

# Concatenate df1 and df2
df_combined = pd.concat([df1, df2]) # ignore_index=True can be added for a fresh index
print("Combined DataFrame (df1 and df2):")
print(df_combined)

Output:

Combined DataFrame (df1 and df2):
   sensor_A  sensor_B
      10        22
      12        20
      15        25
      11        23
      13        21
      11        20
      14        21
      13        22
      10        24
      15        20

Calculate the Mean

Once combined, call the .mean() method on the resulting DataFrame. This will compute the mean for each numeric column.

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

# Concatenate df1 and df2
df_combined = pd.concat([df1, df2]) # ignore_index=True can be added for a fresh index

# Calculate the mean for each column in the combined DataFrame
overall_column_means = df_combined.mean()
print("Overall Column Means:")
print(overall_column_means)

Output:

Overall Column Means:
sensor_A    12.4
sensor_B    21.8
dtype: float64

This gives you the average of sensor_A across all 10 values (5 from df1, 5 from df2), and similarly for sensor_B.

Method 2: Row-wise Mean Across DataFrames (by Index)

This method calculates the mean for each row position (index) across the DataFrames. It assumes that rows with the same index in different DataFrames are comparable.

Using `pd.concat()` and `groupby(df.index)` or `groupby(level=0)`

First, concatenate the DataFrames. Then, group the combined DataFrame by its index and calculate the mean for each group (each unique index value).

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

df_combined_for_rowwise = pd.concat([df1, df2])

# Group by the index of the combined DataFrame
# For a simple RangeIndex, df_combined_for_rowwise.index is sufficient.
# If you had a MultiIndex, you might use level=0.
by_row_index = df_combined_for_rowwise.groupby(df_combined_for_rowwise.index)

# Calculate the mean for each group (each original row index)
rowwise_means = by_row_index.mean()
print("Row-wise Means (df1 and df2):")
print(rowwise_means)

Output:

Row-wise Means (df1 and df2):
   sensor_A  sensor_B
    10.5      21.0
    13.0      20.5
    14.0      23.5
    10.5      23.5
    14.0      20.5

Handling DataFrames with Different Numbers of Rows

The groupby(df.index).mean() approach gracefully handles cases where DataFrames have different numbers of rows. For indices present in only one DataFrame, the "mean" will simply be the value from that single DataFrame.

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

df_short = pd.DataFrame({ # For illustrating different row counts
    'sensor_A': [16, 18],
    'sensor_B': [19, 22]
})

df_combined_diff_len = pd.concat([df1, df_short])
print("Combined DataFrame (df1 and df_short):")
print(df_combined_diff_len)
print()

rowwise_means_diff_len = df_combined_diff_len.groupby(df_combined_diff_len.index).mean()
print("Row-wise Means (df1 and df_short):")
print(rowwise_means_diff_len)

Output:

Combined DataFrame (df1 and df_short):
   sensor_A  sensor_B
      10        22
      12        20
      15        25
      11        23
      13        21
      16        19
      18        22

Row-wise Means (df1 and df_short):
   sensor_A  sensor_B
    13.0      20.5
    15.0      21.0
    15.0      25.0
    11.0      23.0
    13.0      21.0

Method 3: Row-wise Mean Using `stack()` and `unstack()` (for two DataFrames with same shape)

If you have exactly two DataFrames with the same shape (same number of rows and columns, and matching indices), you can perform element-wise operations. For averaging, a more generalizable approach for multiple DataFrames (though shown here for two) involves stack() and mean(level=...) or by directly averaging.

A simpler direct average for two DataFrames with the same shape:

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

# Ensure indices align if they are meaningful and not just default RangeIndex
# For this example, we assume default RangeIndex or already aligned indices.
if df1.shape == df2.shape and df1.index.equals(df2.index):
    rowwise_mean_direct = (df1 + df2) / 2
    print("Row-wise Mean (df1 and df2, direct average):")
    print(rowwise_mean_direct)
else:
    print("DataFrames do not have the same shape or index for direct averaging.")

Output:

Row-wise Mean (df1 and df2, direct average):
   sensor_A  sensor_B
    10.5      21.0
    13.0      20.5
    14.0      23.5
    10.5      23.5
    14.0      20.5

For a more general approach (e.g., more than two DataFrames) that is conceptually similar to the stack()/unstack(), but using pd.concat with keys and mean(level=...):

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

# Concatenate with keys to identify original DataFrames
all_dfs = [df1, df2] # Add more DFs here if needed
panel_df = pd.concat(all_dfs, keys=['df1', 'df2']) # keys distinguish original DFs
print("Panel DataFrame (MultiIndex):")
print(panel_df)
print()

# Calculate mean across the original DataFrames (level 1 of the index is the original row index)
rowwise_mean_panel = panel_df.groupby(level=1).mean()
print("Row-wise Mean from Panel DataFrame:")
print(rowwise_mean_panel)

Output:

Panel DataFrame (MultiIndex):
       sensor_A  sensor_B
df1 0        10        22
    1        12        20
    2        15        25
    3        11        23
    4        13        21
df2 0        11        20
    1        14        21
    2        13        22
    3        10        24
    4        15        20

Row-wise Mean from Panel DataFrame:
   sensor_A  sensor_B
0      10.5      21.0
1      13.0      20.5
2      14.0      23.5
3      10.5      23.5
4      14.0      20.5

This pd.concat with keys approach is robust for multiple DataFrames of the same column structure, even if row counts differ (though groupby(level=1) assumes original indices are comparable or you want to average by original position).

The (df1.stack() + df2.stack()) / 2 followed by .unstack() is a clever way for exactly two DataFrames with aligned indices, but the pd.concat with keys or groupby(df.index) methods are generally more flexible for multiple DataFrames or varying row counts.

Extension: Calculating Median Across Multiple DataFrames

The same principles apply if you want to calculate the median (or any other aggregate function like sum, min, max, std). Simply replace .mean() with .median().

import pandas as pd

df1 = pd.DataFrame({
    'sensor_A': [10, 12, 15, 11, 13],
    'sensor_B': [22, 20, 25, 23, 21]
})

df2 = pd.DataFrame({
    'sensor_A': [11, 14, 13, 10, 15],
    'sensor_B': [20, 21, 22, 24, 20]
})

df_combined = pd.concat([df1, df2])

# Overall Column Medians
overall_column_medians = df_combined.median()
print("Overall Column Medians:")
print(overall_column_medians)
print()

# Row-wise Medians (using the result from Method 2)
by_row_index_median = df_combined.groupby(df_combined.index).median()
print("Row-wise Medians:")
print(by_row_index_median)

Output:

Overall Column Medians:
sensor_A    12.5
sensor_B    21.5
dtype: float64

Row-wise Medians:
   sensor_A  sensor_B
0      10.5      21.0
1      13.0      20.5
2      14.0      23.5
3      10.5      23.5
4      14.0      20.5

Choosing the Right Method

Overall Column Mean/Median (Method 1): Use when you want a single average/median value for each column, treating all data from all DataFrames as one large dataset.
Row-wise Mean/Median by Index (Method 2 & 4 with groupby): Use when rows at the same index position across DataFrames are comparable, and you want an average/median for each such corresponding set of rows. This is very flexible.
Direct Element-wise Average (Simplified part of Method 4): Suitable only for exactly two DataFrames of identical shape and aligned indices.

Conclusion

Pandas offers flexible and powerful ways to calculate means (and other statistics) across multiple DataFrames. By first concatenating the DataFrames using pd.concat(), you can then apply .mean() (or .median(), etc.) to either the entire combined dataset for overall column statistics or use groupby(df.index) (or groupby(level=...) with keys) for more granular row-wise calculations based on matching indices. Understanding these techniques allows you to effectively aggregate and summarize information from distributed datasets.

The Goal: Averaging Data from Separate DataFrames​

Method 1: Overall Column Mean Across All DataFrames​

Concatenate the DataFrames​

Calculate the Mean​

Method 2: Row-wise Mean Across DataFrames (by Index)​

Using pd.concat() and groupby(df.index) or groupby(level=0)​

Handling DataFrames with Different Numbers of Rows​

Method 3: Row-wise Mean Using stack() and unstack() (for two DataFrames with same shape)​

Extension: Calculating Median Across Multiple DataFrames​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: Averaging Data from Separate DataFrames

Method 1: Overall Column Mean Across All DataFrames

Concatenate the DataFrames

Calculate the Mean

Method 2: Row-wise Mean Across DataFrames (by Index)

Using `pd.concat()` and `groupby(df.index)` or `groupby(level=0)`

Handling DataFrames with Different Numbers of Rows

Method 3: Row-wise Mean Using `stack()` and `unstack()` (for two DataFrames with same shape)

Extension: Calculating Median Across Multiple DataFrames

Choosing the Right Method

Conclusion