Python Pandas: How to Count Unique Combinations of Two (or More) Columns

Counting the occurrences of unique combinations of values across two or more columns in a Pandas DataFrame is a fundamental operation in data analysis. This helps in understanding the distribution and frequency of specific data patterns. For instance, you might want to count how many times each unique pair of 'Product Category' and 'Region' appears in a sales dataset.

This guide demonstrates common and effective methods to count unique combinations of column values in Pandas, primarily using groupby().size() and DataFrame.value_counts().

The Goal: Counting Co-occurrences of Column Values

Given a Pandas DataFrame, we want to identify all unique combinations of values present in two (or more) specified columns and then count how many times each unique combination appears in the DataFrame. The result is typically a new DataFrame or Series showing these combinations and their corresponding counts.

Example DataFrame

import pandas as pd
import numpy as np # For NaN example

data = {
    'UserID': [101, 102, 101, 103, 102, 101, 104, 101],
    'Product_Category': ['Electronics', 'Books', 'Electronics', 'Home', 'Books', 'Apparel', 'Home', 'Electronics'],
    'Region': ['North', 'South', 'North', 'West', 'North', 'North', 'West', 'East'],
    'Action': ['View', 'Purchase', 'View', 'Purchase', 'View', 'Purchase', 'View', 'View']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
   UserID Product_Category Region    Action
   101      Electronics  North      View
   102            Books  South  Purchase
   101      Electronics  North      View
   103             Home   West  Purchase
   102            Books  North      View
   101          Apparel  North  Purchase
   104             Home   West      View
   101      Electronics   East      View

Let's say we want to count the unique combinations of 'Product_Category' and 'Region'.

Method 1: Using `DataFrame.groupby()` and `size()` (Recommended)

This is a very flexible and widely used approach.

Getting Counts as a Series (MultiIndex)

Group by the columns of interest and then use .size() to get the number of rows in each group. The result is a Series with a MultiIndex formed by the unique combinations of the grouping columns.

import pandas as pd

df_example = pd.DataFrame({
    'UserID': [101, 102, 101, 103, 102, 101, 104, 101],
    'Product_Category': ['Electronics', 'Books', 'Electronics', 'Home', 'Books', 'Apparel', 'North', 'Electronics'],
    'Region': ['North', 'South', 'North', 'West', 'North', 'North', 'West', 'East'],
})

# Group by 'Product_Category' and 'Region', then get the size of each group
combination_counts_series = df_example.groupby(['Product_Category', 'Region']).size()

print("Counts of unique combinations (Product_Category, Region) as a Series:")
print(combination_counts_series)

Output:

Counts of unique combinations (Product_Category, Region) as a Series:
Product_Category  Region
Apparel           North     1
Books             North     1
                  South     1
Electronics       East      1
                  North     2
Home              West      1
North             West      1
dtype: int64

Converting to a DataFrame with `reset_index()` and `rename()`

To transform this Series into a DataFrame with regular columns:

reset_index(): Converts the MultiIndex levels into columns.
rename(): Renames the default column name for counts (which is 0) to something more descriptive like 'count'.

import pandas as pd

df_example = pd.DataFrame({
    'UserID': [101, 102, 101, 103, 102, 101, 104, 101],
    'Product_Category': ['Electronics', 'Books', 'Electronics', 'Home', 'Books', 'Apparel', 'North', 'Electronics'],
    'Region': ['North', 'South', 'North', 'West', 'North', 'North', 'West', 'East'],
})
combination_counts_series = df_example.groupby(['Product_Category', 'Region']).size()

# Convert to DataFrame
combination_counts_df = combination_counts_series.reset_index()
print("After reset_index():")
print(combination_counts_df) # Column for counts is named 0
print()

# Rename the count column
combination_counts_df = combination_counts_df.rename(columns={0: 'Count'})

print("Counts of unique combinations as a DataFrame:")
print(combination_counts_df)

Output:

After reset_index():
  Product_Category Region  0
        Apparel  North  1
          Books  North  1
          Books  South  1
    Electronics   East  1
    Electronics  North  2
           Home   West  1
          North   West  1

Counts of unique combinations as a DataFrame:
  Product_Category Region  Count
        Apparel  North      1
          Books  North      1
          Books  South      1
    Electronics   East      1
    Electronics  North      2
           Home   West      1
          North   West      1

Using reset_index(name='Count') directly is a more concise way to achieve the reset and rename in one step.

Using `as_index=False` in `groupby()`

If you set as_index=False in the groupby() call, the grouping columns are kept as regular columns, and size() will produce a DataFrame directly, though the count column might still need renaming (it's often named size).

import pandas as pd

df_example = pd.DataFrame({
    'UserID': [101, 102, 101, 103, 102, 101, 104, 101],
    'Product_Category': ['Electronics', 'Books', 'Electronics', 'Home', 'Books', 'Apparel', 'North', 'Electronics'],
    'Region': ['North', 'South', 'North', 'West', 'North', 'North', 'West', 'East'],
})

# Use as_index=False
combination_counts_df_as_index_false = df_example.groupby(['Product_Category', 'Region'], as_index=False).size()

print("Counts using groupby(as_index=False).size():")
print(combination_counts_df_as_index_false)

Output:

Counts using groupby(as_index=False).size():
  Product_Category Region  size
        Apparel  North     1
          Books  North     1
          Books  South     1
    Electronics   East     1
    Electronics  North     2
           Home   West     1
          North   West     1

Method 2: Using `DataFrame.value_counts()` (Concise)

The DataFrame.value_counts(subset=None, normalize=False, sort=True, ascending=False, dropna=True) method is excellent for directly counting unique row occurrences based on a subset of columns.

Getting Counts as a Series (MultiIndex)

By default, value_counts() on a subset of columns returns a Series with a MultiIndex.

import pandas as pd

df_example = pd.DataFrame({
    'UserID': [101, 102, 101, 103, 102, 101, 104, 101],
    'Product_Category': ['Electronics', 'Books', 'Electronics', 'Home', 'Books', 'Apparel', 'North', 'Electronics'],
    'Region': ['North', 'South', 'North', 'West', 'North', 'North', 'West', 'East'],
})

# Select the columns for which to count unique combinations
columns_for_value_counts = ['Product_Category', 'Region']
value_counts_series = df_example[columns_for_value_counts].value_counts()

print("Counts of unique combinations using value_counts() (Series):")
print(value_counts_series)

Output (order might vary due to default sorting by count):

Counts of unique combinations using value_counts() (Series):
Product_Category  Region
Electronics       North     2
Apparel           North     1
Books             North     1
                  South     1
Electronics       East      1
Home              West      1
North             West      1
Name: count, dtype: int64

Converting to a DataFrame with `reset_index(name='...')`

Similar to the groupby().size() result, you can use reset_index() to convert this Series to a DataFrame.

import pandas as pd

df_example = pd.DataFrame({
    'UserID': [101, 102, 101, 103, 102, 101, 104, 101],
    'Product_Category': ['Electronics', 'Books', 'Electronics', 'Home', 'Books', 'Apparel', 'North', 'Electronics'],
    'Region': ['North', 'South', 'North', 'West', 'North', 'North', 'West', 'East'],
})
columns_for_value_counts = ['Product_Category', 'Region']
value_counts_series = df_example[columns_for_value_counts].value_counts()

# Convert to DataFrame and name the count column
value_counts_df = value_counts_series.reset_index(name='Frequency')

print("Counts of unique combinations using value_counts() (DataFrame):")
print(value_counts_df)

Output:

Counts of unique combinations using value_counts() (DataFrame):
  Product_Category Region  Frequency
    Electronics  North          2
        Apparel  North          1
          Books  North          1
          Books  South          1
    Electronics   East          1
           Home   West          1
          North   West          1

Sorting Options

value_counts() sorts the results by count in descending order by default. Use ascending=True to sort in ascending order of counts, or sort=False to keep the order of appearance (though this might be less predictable for combinations).

# Sort by count ascending
value_counts_asc = df_example[columns_for_value_counts].value_counts(ascending=True).reset_index(name='Count')
print("Value counts (ascending):")
print(value_counts_asc)

Handling `NaN` Values (`dropna` parameter)

By default, value_counts(dropna=True) excludes combinations where any of the involved columns have a NaN value. Set dropna=False to include counts of combinations involving NaNs.

import pandas as pd
import numpy as np

df_with_nan = pd.DataFrame({
    'Product_Category': ['Electronics', 'Books', np.nan, 'Electronics', np.nan],
    'Region': ['North', 'South', 'North', 'North', 'West']
})

print("Value counts with NaN (dropna=True, default):")
print(df_with_nan[['Product_Category', 'Region']].value_counts(dropna=True).reset_index(name='Count'))
print()

print("Value counts with NaN (dropna=False):")
print(df_with_nan[['Product_Category', 'Region']].value_counts(dropna=False).reset_index(name='Count'))

Output:

Value counts with NaN (dropna=True, default):
  Product_Category Region  Count
0      Electronics  North      2
1            Books  South      1

Value counts with NaN (dropna=False):
  Product_Category Region  Count
0      Electronics  North      2
1            Books  South      1
2              NaN  North      1
3              NaN   West      1

Choosing the Right Method

df.groupby(cols_list).size():
- Pros: Very flexible, part of the powerful groupby framework. Allows for further chained aggregations if needed. as_index=False can directly return a DataFrame.
- Cons: Might require an explicit reset_index() and rename() if as_index=True (default).
df[cols_list].value_counts():
- Pros: Very concise and often more direct for simple unique combination counting. Has convenient ascending and dropna parameters. Often names the count column by default (e.g., 'count' or 'proportion').
- Cons: Primarily for counting; less direct if you immediately need other aggregations on the groups.

For simply counting unique combinations of two or more columns, df[cols_list].value_counts().reset_index(name='YourCountName') is often the most succinct and readable solution. groupby().size() is more general if you plan to do more with the groups.

Conclusion

Counting the unique combinations of values across multiple columns in a Pandas DataFrame is essential for understanding data distributions and relationships.

The DataFrame.groupby(['ColA', 'ColB']).size() method, followed by reset_index() and rename(), provides a flexible and robust way to achieve this. Using as_index=False in groupby can simplify this.
The df[['ColA', 'ColB']].value_counts() method offers a more concise and often more direct approach specifically tailored for counting unique row occurrences, easily convertible to a DataFrame using reset_index(name='count_col_name').

Both methods are effective. Choose the one that you find clearer and that best fits into your broader data analysis workflow.

Remember to consider how NaN values should be handled by using the dropna parameter in value_counts() or by preprocessing NaNs before groupby().

The Goal: Counting Co-occurrences of Column Values​

Example DataFrame​

Method 1: Using DataFrame.groupby() and size() (Recommended)​

Getting Counts as a Series (MultiIndex)​

Converting to a DataFrame with reset_index() and rename()​

Using as_index=False in groupby()​

Method 2: Using DataFrame.value_counts() (Concise)​

Getting Counts as a Series (MultiIndex)​

Converting to a DataFrame with reset_index(name='...')​

Sorting Options​

Handling NaN Values (dropna parameter)​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: Counting Co-occurrences of Column Values

Example DataFrame

Method 1: Using `DataFrame.groupby()` and `size()` (Recommended)

Getting Counts as a Series (MultiIndex)

Converting to a DataFrame with `reset_index()` and `rename()`

Using `as_index=False` in `groupby()`

Method 2: Using `DataFrame.value_counts()` (Concise)

Getting Counts as a Series (MultiIndex)

Converting to a DataFrame with `reset_index(name='...')`

Sorting Options

Handling `NaN` Values (`dropna` parameter)

Choosing the Right Method

Conclusion