Skip to main content

Python Pandas: How to Filter Data Before Creating a Pivot Table

Pivot tables are a powerful tool in Pandas for summarizing and aggregating data. However, you often need to analyze a subset of your data before creating the pivot table. Applying filters to your DataFrame prior to pivoting allows you to focus your aggregated view on the specific data points relevant to your analysis.

This guide demonstrates how to filter a Pandas DataFrame based on single or multiple conditions before generating a pivot table.

Why Filter Before Pivoting?

Filtering data before creating a pivot table is useful for:

  • Focusing Analysis: Concentrating the pivot table on specific segments or categories of interest.
  • Excluding Outliers/Irrelevant Data: Removing data points that might skew your aggregated results or are not pertinent to the current view.
  • Performance: Reducing the amount of data processed by the pivot_table method, which can be beneficial for very large DataFrames.
  • Clarity: Making the resulting pivot table cleaner and easier to interpret by only including relevant data.

Example DataFrame: We'll use the following DataFrame for our examples:

import pandas as pd

data = {
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
Region Product Sales Quantity
0 North A 100 10
1 North B 150 12
2 South A 200 15
3 South B 50 5
4 North A 120 8
5 West A 300 25
6 South B 180 20
7 West C 90 7

The Basic Approach: Filter First, Then Pivot

The most common and straightforward way to achieve this is to first create a filtered DataFrame using standard Pandas boolean indexing, and then call the pivot_table() method on this filtered DataFrame.

Filtering by a Single Condition

Let's create a pivot table of average sales, but only for the 'North' region.

import pandas as pd

data = {
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
}
df = pd.DataFrame(data)

# Step 1: Filter the DataFrame
filtered_df_north = df[df['Region'] == 'North']
print("Filtered DataFrame (Region == 'North'):")
print(filtered_df_north)
print()

# Step 2: Create the pivot table on the filtered DataFrame
pivot_north_sales = filtered_df_north.pivot_table(
index='Product', # Rows of the pivot table
values='Sales', # Values to aggregate
aggfunc='mean' # Aggregation function (e.g., mean, sum, count)
)
print("Pivot Table (Average Sales for North Region):")
print(pivot_north_sales)
print()

Output:

Filtered DataFrame (Region == 'North'):
Region Product Sales Quantity
0 North A 100 10
1 North B 150 12
4 North A 120 8

Pivot Table (Average Sales for North Region):
Sales
Product
A 110.0
B 150.0
note

You can combine these steps into a single line:

pivot_north_sales_oneline = df[df['Region'] == 'North'].pivot_table(
index='Product', values='Sales', aggfunc='mean'
)
print("Pivot Table (One-liner for North Region):")
print(pivot_north_sales_oneline)

Filtering by Multiple Conditions (AND Logic)

To filter based on multiple conditions where all conditions must be true, use the logical AND operator (&) between your boolean Series. Remember to wrap individual conditions in parentheses due to operator precedence.

Let's filter for 'North' region AND 'Sales' > 100.

import pandas as pd

df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
})


# Define conditions
condition1_and = (df['Region'] == 'North')
condition2_and = (df['Sales'] > 100)

# Combine conditions with &
filtered_df_and = df[condition1_and & condition2_and]
print("Filtered DataFrame (North AND Sales > 100):")
print(filtered_df_and)
print()

pivot_and_conditions = filtered_df_and.pivot_table(
index='Product', columns='Region', values='Quantity', aggfunc='sum'
)
print("Pivot Table (Sum of Quantity for North AND Sales > 100):")
print(pivot_and_conditions)

Output:

Filtered DataFrame (North AND Sales > 100):
Region Product Sales Quantity
1 North B 150 12
4 North A 120 8

Pivot Table (Sum of Quantity for North AND Sales > 100):
Region North
Product
A 8
B 12

Filtering by Multiple Conditions (OR Logic)

To filter based on multiple conditions where at least one condition must be true, use the logical OR operator (|) between your boolean Series. Again, wrap individual conditions in parentheses.

Let's filter for 'North' region OR 'Product' == 'A'.

import pandas as pd

df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
})

# Define conditions
condition1_or = (df['Region'] == 'North')
condition2_or = (df['Product'] == 'A')

# Combine conditions with |
filtered_df_or = df[condition1_or | condition2_or]
print("Filtered DataFrame (North OR Product == 'A'):")
print(filtered_df_or)
print()

pivot_or_conditions = filtered_df_or.pivot_table(
index='Region', columns='Product', values='Sales', aggfunc='max'
)
print("Pivot Table (Max Sales for North OR Product A):")
print(pivot_or_conditions)

Output:

Filtered DataFrame (North OR Product == 'A'):
Region Product Sales Quantity
0 North A 100 10
1 North B 150 12
2 South A 200 15
4 North A 120 8
5 West A 300 25

Pivot Table (Max Sales for North OR Product A):
Product A B
Region
North 120.0 150.0
South 200.0 NaN
West 300.0 NaN

Using pd.pivot_table() as a Function (Alternative Syntax)

Instead of calling .pivot_table() as a DataFrame method, you can use the top-level pd.pivot_table() function and pass the (filtered) DataFrame as the first argument. The functionality is identical.

import pandas as pd

df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
})

# Filter first
filtered_df = df[df['Sales'] >= 100]

# ✅ Pass the filtered DataFrame to pd.pivot_table()
pivot_function_syntax = pd.pivot_table(
filtered_df, # First argument is the DataFrame
index='Region',
columns='Product',
values='Sales',
aggfunc='count'
)
print("Pivot Table using pd.pivot_table() function syntax:")
print(pivot_function_syntax)

Output:

Pivot Table using pd.pivot_table() function syntax:
Product A B
Region
North 2.0 1.0
South 1.0 1.0
West 1.0 NaN

This is purely a stylistic choice; both the method call (df_filtered.pivot_table(...)) and the function call (pd.pivot_table(df_filtered, ...)) achieve the same outcome.

Conclusion

Adding a filter to a Pandas pivot table operation is achieved by filtering the DataFrame before calling the pivot_table() method.

  1. Use standard boolean indexing (df[condition]) to create a new DataFrame containing only the rows that meet your criteria.
  2. Apply single conditions or combine multiple conditions using logical AND (&) or OR (|) operators (remembering to use parentheses around individual conditions).
  3. Then, call .pivot_table() on this pre-filtered DataFrame (or pass it to pd.pivot_table()).

This two-step process (filter, then pivot) allows for precise control over the data included in your pivot table summaries, leading to more focused and meaningful insights.