Python Pandas: How to Filter Data Before Creating a Pivot Table

Pivot tables are a powerful tool in Pandas for summarizing and aggregating data. However, you often need to analyze a subset of your data before creating the pivot table. Applying filters to your DataFrame prior to pivoting allows you to focus your aggregated view on the specific data points relevant to your analysis.

This guide demonstrates how to filter a Pandas DataFrame based on single or multiple conditions before generating a pivot table.

Why Filter Before Pivoting?

Filtering data before creating a pivot table is useful for:

Focusing Analysis: Concentrating the pivot table on specific segments or categories of interest.
Excluding Outliers/Irrelevant Data: Removing data points that might skew your aggregated results or are not pertinent to the current view.
Performance: Reducing the amount of data processed by the pivot_table method, which can be beneficial for very large DataFrames.
Clarity: Making the resulting pivot table cleaner and easier to interpret by only including relevant data.

Example DataFrame: We'll use the following DataFrame for our examples:

import pandas as pd

data = {
    'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
    'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
    'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
  Region Product  Sales  Quantity
North       A    100        10
North       B    150        12
South       A    200        15
South       B     50         5
North       A    120         8
 West       A    300        25
South       B    180        20
 West       C     90         7

The Basic Approach: Filter First, Then Pivot

The most common and straightforward way to achieve this is to first create a filtered DataFrame using standard Pandas boolean indexing, and then call the pivot_table() method on this filtered DataFrame.

Filtering by a Single Condition

Let's create a pivot table of average sales, but only for the 'North' region.

import pandas as pd

data = {
    'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
    'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
    'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
}
df = pd.DataFrame(data)

# Step 1: Filter the DataFrame
filtered_df_north = df[df['Region'] == 'North']
print("Filtered DataFrame (Region == 'North'):")
print(filtered_df_north)
print()

# Step 2: Create the pivot table on the filtered DataFrame
pivot_north_sales = filtered_df_north.pivot_table(
    index='Product',    # Rows of the pivot table
    values='Sales',     # Values to aggregate
    aggfunc='mean'      # Aggregation function (e.g., mean, sum, count)
)
print("Pivot Table (Average Sales for North Region):")
print(pivot_north_sales)
print()

Output:

Filtered DataFrame (Region == 'North'):
  Region Product  Sales  Quantity
0  North       A    100        10
1  North       B    150        12
4  North       A    120         8

Pivot Table (Average Sales for North Region):
         Sales
Product       
A        110.0
B        150.0

note

You can combine these steps into a single line:

pivot_north_sales_oneline = df[df['Region'] == 'North'].pivot_table(
    index='Product', values='Sales', aggfunc='mean'
)
print("Pivot Table (One-liner for North Region):")
print(pivot_north_sales_oneline)

Filtering by Multiple Conditions (AND Logic)

To filter based on multiple conditions where all conditions must be true, use the logical AND operator (&) between your boolean Series. Remember to wrap individual conditions in parentheses due to operator precedence.

Let's filter for 'North' region AND 'Sales' > 100.

import pandas as pd

df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
    'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
    'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
})


# Define conditions
condition1_and = (df['Region'] == 'North')
condition2_and = (df['Sales'] > 100)

# Combine conditions with &
filtered_df_and = df[condition1_and & condition2_and]
print("Filtered DataFrame (North AND Sales > 100):")
print(filtered_df_and)
print()

pivot_and_conditions = filtered_df_and.pivot_table(
    index='Product', columns='Region', values='Quantity', aggfunc='sum'
)
print("Pivot Table (Sum of Quantity for North AND Sales > 100):")
print(pivot_and_conditions)

Output:

Filtered DataFrame (North AND Sales > 100):
  Region Product  Sales  Quantity
1  North       B    150        12
4  North       A    120         8

Pivot Table (Sum of Quantity for North AND Sales > 100):
Region   North
Product       
A            8
B           12

Filtering by Multiple Conditions (OR Logic)

To filter based on multiple conditions where at least one condition must be true, use the logical OR operator (|) between your boolean Series. Again, wrap individual conditions in parentheses.

Let's filter for 'North' region OR 'Product' == 'A'.

import pandas as pd

df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
    'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
    'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
})

# Define conditions
condition1_or = (df['Region'] == 'North')
condition2_or = (df['Product'] == 'A')

# Combine conditions with |
filtered_df_or = df[condition1_or | condition2_or]
print("Filtered DataFrame (North OR Product == 'A'):")
print(filtered_df_or)
print()

pivot_or_conditions = filtered_df_or.pivot_table(
    index='Region', columns='Product', values='Sales', aggfunc='max'
)
print("Pivot Table (Max Sales for North OR Product A):")
print(pivot_or_conditions)

Output:

Filtered DataFrame (North OR Product == 'A'):
  Region Product  Sales  Quantity
0  North       A    100        10
1  North       B    150        12
2  South       A    200        15
4  North       A    120         8
5   West       A    300        25

Pivot Table (Max Sales for North OR Product A):
Product      A      B
Region               
North    120.0  150.0
South    200.0    NaN
West     300.0    NaN

Using `pd.pivot_table()` as a Function (Alternative Syntax)

Instead of calling .pivot_table() as a DataFrame method, you can use the top-level pd.pivot_table() function and pass the (filtered) DataFrame as the first argument. The functionality is identical.

import pandas as pd

df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South', 'North', 'West', 'South', 'West'],
    'Product': ['A', 'B', 'A', 'B', 'A', 'A', 'B', 'C'],
    'Sales': [100, 150, 200, 50, 120, 300, 180, 90],
    'Quantity': [10, 12, 15, 5, 8, 25, 20, 7]
})

# Filter first
filtered_df = df[df['Sales'] >= 100]

# ✅ Pass the filtered DataFrame to pd.pivot_table()
pivot_function_syntax = pd.pivot_table(
    filtered_df, # First argument is the DataFrame
    index='Region',
    columns='Product',
    values='Sales',
    aggfunc='count'
)
print("Pivot Table using pd.pivot_table() function syntax:")
print(pivot_function_syntax)

Output:

Pivot Table using pd.pivot_table() function syntax:
Product    A    B
Region           
North    2.0  1.0
South    1.0  1.0
West     1.0  NaN

This is purely a stylistic choice; both the method call (df_filtered.pivot_table(...)) and the function call (pd.pivot_table(df_filtered, ...)) achieve the same outcome.

Conclusion

Adding a filter to a Pandas pivot table operation is achieved by filtering the DataFrame before calling the pivot_table() method.

Use standard boolean indexing (df[condition]) to create a new DataFrame containing only the rows that meet your criteria.
Apply single conditions or combine multiple conditions using logical AND (&) or OR (|) operators (remembering to use parentheses around individual conditions).
Then, call .pivot_table() on this pre-filtered DataFrame (or pass it to pd.pivot_table()).

This two-step process (filter, then pivot) allows for precise control over the data included in your pivot table summaries, leading to more focused and meaningful insights.

Why Filter Before Pivoting?​

The Basic Approach: Filter First, Then Pivot​

Filtering by a Single Condition​

Filtering by Multiple Conditions (AND Logic)​

Filtering by Multiple Conditions (OR Logic)​

Using pd.pivot_table() as a Function (Alternative Syntax)​

Conclusion​

Table of Contents

Why Filter Before Pivoting?

The Basic Approach: Filter First, Then Pivot

Filtering by a Single Condition

Filtering by Multiple Conditions (AND Logic)

Filtering by Multiple Conditions (OR Logic)

Using `pd.pivot_table()` as a Function (Alternative Syntax)

Conclusion