Skip to main content

Python Pandas: How to Get Categorical Columns or List of Categories

Pandas' "categorical" data type is efficient for storing columns with a limited number of unique values. Identifying which columns in your DataFrame are categorical, or extracting the unique categories present within a specific categorical column, are common tasks in data exploration and preparation.

This guide explains how to select categorical columns from a DataFrame and how to get a list of the unique categories from a specific categorical Series (column) in Pandas.

Understanding Categorical Data in Pandas

A categorical data type in Pandas is used for columns that take on a limited, and usually fixed, number of possible values (categories). Examples include gender ('Male', 'Female', 'Other'), product types, or survey responses ('Agree', 'Neutral', 'Disagree'). Using this dtype can:

  • Save memory compared to storing as object (string) type.
  • Improve performance for some operations (e.g., groupby).
  • Enable specific statistical methods and plotting suitable for categorical data.

Example DataFrame

Here, 'Department' and 'EmploymentType' are explicitly created as categorical columns.

import pandas as pd
import numpy as np # For numeric types

data = {
'EmployeeID': [101, 102, 103, 104, 105],
'Department': pd.Categorical(['Sales', 'HR', 'Engineering', 'Sales', 'HR']),
'EmploymentType': pd.Categorical(['Full-Time', 'Part-Time', 'Full-Time', 'Full-Time', 'Contractor']),
'YearsExperience': [3, 5, 2, 7, 1], # int64
'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0] # float64
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()

print("Original dtypes:")
print(df.dtypes)
print()

Output:

Original DataFrame:
EmployeeID Department EmploymentType YearsExperience Salary
0 101 Sales Full-Time 3 60000.0
1 102 HR Part-Time 5 55000.0
2 103 Engineering Full-Time 2 90000.0
3 104 Sales Full-Time 7 65000.0
4 105 HR Contractor 1 52000.0

Original dtypes:
EmployeeID int64
Department category
EmploymentType category
YearsExperience int64
Salary float64
dtype: object

Get a List of CATEGORICAL COLUMNS from a DataFrame

These methods help you identify which columns in your DataFrame have the category dtype.

The DataFrame.select_dtypes() method returns a subset of the DataFrame's columns based on their data types.

import pandas as pd

df_example = pd.DataFrame({
'EmployeeID': [101], 'Department': pd.Categorical(['Sales']),
'EmploymentType': pd.Categorical(['Full-Time']), 'YearsExperience': [3]
})

# ✅ Select only columns with dtype 'category'
categorical_df = df_example.select_dtypes(include=['category'])
print("DataFrame containing only categorical columns:")
print(categorical_df)
print()

# To get just the names of these columns as a list:
categorical_column_names = categorical_df.columns.tolist()
print(f"List of categorical column names: {categorical_column_names}")

Output:

DataFrame containing only categorical columns:
Department EmploymentType
0 Sales Full-Time

List of categorical column names: ['Department', 'EmploymentType']
note
  • include=['category']: Specifies that only columns of category dtype should be kept. You can pass a list of dtypes.
  • You can also include other types, e.g., 'object' if strings might also be considered categorical in some contexts
    categorical_and_object_df = df.select_dtypes(include=['category', 'object'])

Using select_dtypes(exclude=...)

Alternatively, you can exclude numeric or other known types to infer which ones might be categorical (or object type, which often represents categorical data before explicit conversion).

import pandas as pd
import numpy as np

df_example = pd.DataFrame({
'EmployeeID': [101], 'Department': pd.Categorical(['Sales']),
'YearsExperience': [3], 'Salary': [60000.0]
})

# Exclude numeric types to get remaining columns (which could be category, object, etc.)
non_numeric_df = df_example.select_dtypes(exclude=[np.number]) # np.number includes int and float
# Or more specific: exclude=['int64', 'float64', 'bool']

print("DataFrame excluding numeric columns:")
print(non_numeric_df)
print()

print(f"Names of non-numeric columns: {non_numeric_df.columns.tolist()}")

Output:

DataFrame excluding numeric columns:
Department
0 Sales

Names of non-numeric columns: ['Department']

Using _get_numeric_data() and Set Difference (Less Direct)

This internal-like method selects numeric columns. You can then find the difference between all columns and numeric columns.

import pandas as pd

df_example = pd.DataFrame({
'EmployeeID': [101], 'Department': pd.Categorical(['Sales']),
'YearsExperience': [3], 'Salary': [60000.0]
})

all_cols_set = set(df_example.columns)
numeric_cols_set = set(df_example._get_numeric_data().columns)

categorical_like_cols = list(all_cols_set - numeric_cols_set)
# Or using set.difference():
# categorical_like_cols = list(all_cols_set.difference(numeric_cols_set))
# Note: This method relies on _get_numeric_data() and might not be as robust as select_dtypes
# for specifically identifying 'category' dtype. It gets non-numeric columns.

print(f"Categorical-like columns (all_cols - numeric_cols): {categorical_like_cols}")

Output:

Categorical-like columns (all_cols - numeric_cols): ['Department']
note
  • select_dtypes is generally more explicit and recommended for selecting based on specific dtypes like category.

Get a List of the Unique CATEGORIES within a SINGLE Categorical Column

Once you have a column that is of category dtype, you can access its unique categories.

For a Series of category dtype, the .cat accessor provides access to categorical-specific attributes and methods. .cat.categories returns an Index object containing the unique categories.

import pandas as pd

df_example = pd.DataFrame({
'Department': pd.Categorical(['Sales', 'HR', 'Engineering', 'Sales', 'HR']),
})

# Select the categorical column
department_series = df_example['Department']

# ✅ Get the unique categories
unique_departments = department_series.cat.categories

print(f"Unique categories in 'Department' column:")
print(unique_departments)

Output:

Unique categories in 'Department' column:
Index(['Engineering', 'HR', 'Sales'], dtype='object')
note

The order is typically sorted alphabetically by default for categories.

Converting cat.categories to a List

The cat.categories attribute returns an Index object. To get a Python list, use .tolist().

import pandas as pd

df_example = pd.DataFrame({
'Department': pd.Categorical(['Sales', 'HR', 'Engineering', 'Sales', 'HR']),
})

# Select the categorical column
department_series = df_example['Department']

# Get the unique categories
unique_departments = department_series.cat.categories
unique_departments_list = unique_departments.tolist()

print(f"Unique departments as a list: {unique_departments_list}")

Output:

Unique departments as a list: ['Engineering', 'HR', 'Sales']

Check if a Specific Column IS Categorical

To verify if a particular column has the category data type:

import pandas as pd

df_example = pd.DataFrame({
'Department': pd.Categorical(['Sales']), 'YearsExperience': [3]
})

# Check 'Department'
col_to_check = 'Department'
is_dept_categorical = (df_example[col_to_check].dtype.name == 'category')
print(f"Is '{col_to_check}' column categorical? {is_dept_categorical}")

# Check 'YearsExperience'
col_to_check_2 = 'YearsExperience'
is_exp_categorical = (df_example[col_to_check_2].dtype.name == 'category')
print(f"Is '{col_to_check_2}' column categorical? {is_exp_categorical}")

Output:

Is 'Department' column categorical? True
Is 'YearsExperience' column categorical? False
  • df['YourColumn'].dtype: Returns the dtype object for the column.
  • .name: Accesses the string name of that dtype (e.g., 'category', 'int64', 'object').
  • pandas.api.types.is_categorical_dtype(series) is another robust way to check.
note

Alternative using pd.api.types:

from pandas.api.types import is_categorical_dtype
print(f"Is 'Department' categorical (api)? {is_categorical_dtype(df['Department'])}")

Conclusion

Pandas provides clear ways to work with categorical data:

  • To get a list of all categorical column names in a DataFrame, the recommended method is df.select_dtypes(include=['category']).columns.tolist().
  • To get a list of the unique categories within a specific categorical column (Series), use your_series.cat.categories.tolist().
  • To check if a specific column is categorical, compare df['col'].dtype.name == 'category' or use pd.api.types.is_categorical_dtype(df['col']).

These methods allow you to effectively identify and utilize categorical data within your Pandas DataFrames for more efficient storage and specialized analysis.