Python Pandas: How to Filter DataFrame Rows by Value Counts in a Column

Filtering a Pandas DataFrame based on the frequency or count of values in a specific column is a common data analysis task. For instance, you might want to select only rows where a category appears more than N times, or identify items that are unique (appear only once).

This guide explains several effective methods to filter a Pandas DataFrame based on the value counts of a column, using groupby().filter(), groupby().transform('size'), and map() with value_counts().

The Goal: Filtering Based on Frequency

We want to select rows from a DataFrame where the value in a particular column meets a certain frequency criterion. For example, keep only rows where the value in 'Category' appears at least 3 times in the entire DataFrame.

Example DataFrame

Consider this DataFrame:

import pandas as pd

data = {
    'OrderID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Monitor', 'Mouse', 'Laptop', 'Webcam'],
    'Region': ['North', 'North', 'South', 'West', 'North', 'South', 'North', 'West'],
    'Quantity': [1, 2, 1, 1, 2, 3, 1, 1]
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Let's say we want to filter rows based on the count of occurrences in the 'Product' column.

Method 1: Using `groupby().filter()` (Recommended for Direct Filtering)

The DataFrameGroupBy.filter(func) method allows you to filter out entire groups based on a function that returns True or False for each group.

import pandas as pd

df = pd.DataFrame({
    'OrderID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Monitor', 'Mouse', 'Laptop', 'Webcam'],
    'Region': ['North', 'North', 'South', 'West', 'North', 'South', 'North', 'West'],
    'Quantity': [1, 2, 1, 1, 2, 3, 1, 1]
})

# Filter for products that appear more than once
# The lambda function x represents each group (a sub-DataFrame)
# len(x) gives the number of rows in that group (i.e., the value count)
df_filtered_by_count = df.groupby('Product').filter(lambda group_df: len(group_df) > 1)

print("DataFrame filtered for products appearing > 1 time (using groupby().filter()):")
print(df_filtered_by_count)

Output:

DataFrame filtered for products appearing > 1 time (using groupby().filter()):
   OrderID Product Region  Quantity
    101  Laptop  North         1
    102   Mouse  North         2
    104  Laptop   West         1
    106   Mouse  South         3
    107  Laptop  North         1

note

df.groupby('Product'): Groups the DataFrame by unique values in the 'Product' column.
.filter(lambda group_df: len(group_df) > 1): For each product group, len(group_df) calculates its size (count). If the count is greater than 1, all rows belonging to that product group are kept; otherwise, they are filtered out.

Method 2: Using `groupby().transform('size')`

This method first calculates the size (count) of each group and "transforms" this count back to align with the original DataFrame's index. Then, it uses this new Series of counts for boolean indexing.

import pandas as pd

df = pd.DataFrame({
    'OrderID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Monitor', 'Mouse', 'Laptop', 'Webcam'],
    'Region': ['North', 'North', 'South', 'West', 'North', 'South', 'North', 'West'],
    'Quantity': [1, 2, 1, 1, 2, 3, 1, 1]
})

# Create a Series where each element is the count of its 'Product' group
product_counts = df.groupby('Product')['Product'].transform('size')
print("Product counts transformed back to original index:")
print(product_counts)
print()

# ✅ Use this Series to create a boolean mask for filtering
df_filtered_transform = df[product_counts > 1]

print("DataFrame filtered for products appearing > 1 time (using groupby().transform('size')):")
print(df_filtered_transform)

Output:

Product counts transformed back to original index:
  3
  2
  1
  3
  1
  2
  3
  1
Name: Product, dtype: int64

DataFrame filtered for products appearing > 1 time (using groupby().transform('size')):
   OrderID Product Region  Quantity
    101  Laptop  North         1
    102   Mouse  North         2
    104  Laptop   West         1
    106   Mouse  South         3
    107  Laptop  North         1

note

df.groupby('Product')['Product'].transform('size'): Groups by 'Product', then for each group, calculates its size (count) and aligns this count with every row belonging to that group in the original DataFrame's shape.
df[product_counts > 1]: Standard boolean indexing.

Method 3: Using `Series.map()` with `value_counts()`

This approach involves:

Calculating the value counts for the target column using Series.value_counts(). This returns a Series where the index is the unique values and the values are their counts.
Using Series.map() to map these counts back to each row in the original DataFrame based on the value in the target column.
Filtering based on these mapped counts.

import pandas as pd

df = pd.DataFrame({
    'OrderID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Monitor', 'Mouse', 'Laptop', 'Webcam'],
    'Region': ['North', 'North', 'South', 'West', 'North', 'South', 'North', 'West'],
    'Quantity': [1, 2, 1, 1, 2, 3, 1, 1]
})

# Get value counts for the 'Product' column
product_value_counts = df['Product'].value_counts()
print("Value counts for 'Product':")
print(product_value_counts)
print()

# Map these counts back to the original DataFrame's 'Product' column
# This creates a Series aligned with df's index, containing the count for each product
mapped_counts = df['Product'].map(product_value_counts)
print("Mapped counts:")
print(mapped_counts)
print()

# ✅ Use the mapped counts to filter
df_filtered_map = df[mapped_counts > 1]

print("DataFrame filtered for products appearing > 1 time (using map() and value_counts()):")
print(df_filtered_map)

Output:

Value counts for 'Product':
Product
Laptop      3
Mouse       2
Keyboard    1
Monitor     1
Webcam      1
Name: count, dtype: int64

Mapped counts:
0    3
1    2
2    1
3    3
4    1
5    2
6    3
7    1
Name: Product, dtype: int64

DataFrame filtered for products appearing > 1 time (using map() and value_counts()):
   OrderID Product Region  Quantity
0      101  Laptop  North         1
1      102   Mouse  North         2
3      104  Laptop   West         1
5      106   Mouse  South         3
6      107  Laptop  North         1

Adding a Value Count Column (For Context or Later Filtering)

If you want to add the count as a new column before filtering (or just for informational purposes), groupby().transform('size') or map() are suitable.

import pandas as pd

df = pd.DataFrame({
    'OrderID': [101, 102, 103, 104, 105, 106, 107, 108],
    'Product': ['Laptop', 'Mouse', 'Keyboard', 'Laptop', 'Monitor', 'Mouse', 'Laptop', 'Webcam'],
    'Region': ['North', 'North', 'South', 'West', 'North', 'South', 'North', 'West'],
    'Quantity': [1, 2, 1, 1, 2, 3, 1, 1]
})

# Add 'Product_Count' column using transform
df['Product_Count_Transform'] = df.groupby('Product')['Product'].transform('size')

# Add 'Product_Count' column using map and value_counts
# product_value_counts = df['Product'].value_counts()
# df['Product_Count_Map'] = df['Product'].map(product_value_counts)

print("DataFrame with 'Product_Count' column (using transform):")
print(df)
print()

# Now you can filter using this new column
df_filtered_from_new_col = df[df['Product_Count_Transform'] > 1]
print("Filtered based on the new count column:")
print(df_filtered_from_new_col.drop(columns=['Product_Count_Transform'])) # Drop count col for same output as others

Output:

DataFrame with 'Product_Count' column (using transform):
   OrderID   Product Region  Quantity  Product_Count_Transform
    101    Laptop  North         1                        3
    102     Mouse  North         2                        2
    103  Keyboard  South         1                        1
    104    Laptop   West         1                        3
    105   Monitor  North         2                        1
    106     Mouse  South         3                        2
    107    Laptop  North         1                        3
    108    Webcam   West         1                        1

Filtered based on the new count column:
   OrderID Product Region  Quantity
    101  Laptop  North         1
    102   Mouse  North         2
    104  Laptop   West         1
    106   Mouse  South         3
    107  Laptop  North         1

Choosing the Right Method

df.groupby('col').filter(lambda group: condition_on_group_size): Most direct and often very readable for filtering out groups that don't meet a size criterion. The condition is applied to the group as a whole.
df[df.groupby('col')['col'].transform('size') > N]: Very effective and Pythonic. It computes the group size for each row first, then uses simple boolean indexing. This is often preferred for its clarity when the condition is a simple comparison against a count.
df[df['col'].map(df['col'].value_counts()) > N]: Works well and can be intuitive. It requires two steps (value_counts then map). Performance might be slightly less optimal than transform for very large DataFrames but is generally good.

All three methods achieve the goal. groupby().transform() is often a strong contender for adding the count back or for direct boolean masking. groupby().filter() is excellent when the logic for keeping/discarding a group is based on its properties (like size).

Conclusion

Filtering a Pandas DataFrame by the value counts of a column allows you to focus on more frequent or rare occurrences.

groupby().filter() provides a direct way to filter groups based on their size.
groupby().transform('size') creates a Series of group sizes aligned with the original DataFrame, enabling easy boolean indexing.
Series.map() with value_counts() is another effective method to achieve similar results by mapping pre-calculated counts.

Choose the method that you find most readable and that best fits the structure of your filtering logic. These techniques are essential tools for data cleaning, exploratory data analysis, and feature engineering in Pandas.

The Goal: Filtering Based on Frequency​

Example DataFrame​

Method 1: Using groupby().filter() (Recommended for Direct Filtering)​

Method 2: Using groupby().transform('size')​

Method 3: Using Series.map() with value_counts()​

Adding a Value Count Column (For Context or Later Filtering)​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: Filtering Based on Frequency

Example DataFrame

Method 1: Using `groupby().filter()` (Recommended for Direct Filtering)

Method 2: Using `groupby().transform('size')`

Method 3: Using `Series.map()` with `value_counts()`

Adding a Value Count Column (For Context or Later Filtering)

Choosing the Right Method

Conclusion