Python Pandas: How to Change DataFrame Column Type to Categorical

In Pandas, the "categorical" data type is highly beneficial for columns that have a limited, fixed number of possible values (categories), especially if those values are strings. Using the categorical type can lead to significant memory savings and performance improvements for certain operations compared to storing such data as standard Python objects (strings).

This guide explains how to efficiently change the data type of one or more DataFrame columns to categorical using the astype() method and related techniques.

Why Use the Categorical Data Type in Pandas?

Memory Efficiency: If a column contains repetitive string values (e.g., "Male"/"Female", country names, product categories), storing them as categoricals is much more memory-efficient. Pandas stores each unique category once and uses integer codes to represent the values.
Performance: Operations like grouping (groupby()) or sorting on categorical columns can be faster than on object (string) columns.
Semantic Meaning: It explicitly tells Pandas (and other users of your code) that the column represents a fixed set of categories.
Enables Specific Operations: Some statistical or plotting functions work better or provide more relevant results with categorical data.

Example DataFrame:

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104, 105],
    'Department': ['Sales', 'HR', 'Engineering', 'Sales', 'HR'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
    'YearsOfService': [3, 5, 2, 7, 1],
    'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0]
}
df = pd.DataFrame(data)
print("Original DataFrame and dtypes:")
print(df)
print()
print("Original dtypes:")
print(df.dtypes)

Output:

Original DataFrame and dtypes:
   EmployeeID   Department  Gender  YearsOfService   Salary
0         101        Sales  Female               3  60000.0
1         102           HR    Male               5  55000.0
2         103  Engineering    Male               2  90000.0
3         104        Sales  Female               7  65000.0
4         105           HR    Male               1  52000.0

Original dtypes:
EmployeeID          int64
Department         object
Gender             object
YearsOfService      int64
Salary            float64
dtype: object

Columns like 'Department' and 'Gender' are good candidates for the categorical type.

Changing a Single Column's Type to Categorical

Using `Series.astype('category')`

The most common and recommended way to change a column's data type is by selecting the column (which returns a Series) and then calling the .astype() method on it.

import pandas as pd

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'Department': ['Sales', 'HR', 'Engineering', 'Sales', 'HR'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
    'YearsOfService': [3, 5, 2, 7, 1],
    'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0]
})

# ✅ Change the 'Department' column to categorical
df['Department'] = df['Department'].astype('category')

print("DataFrame after changing 'Department' to category:")
print(df.head())

Output:

DataFrame after changing 'Department' to category:
   EmployeeID   Department  Gender  YearsOfService   Salary
       101        Sales  Female               3  60000.0
       102           HR    Male               5  55000.0
       103  Engineering    Male               2  90000.0
       104        Sales  Female               7  65000.0
       105           HR    Male               1  52000.0

df['Department']: Selects the 'Department' column as a Series.
.astype('category'): Casts the Series to the categorical data type.
df['Department'] = ...: Assigns the converted Series back to the DataFrame column.

Verifying the Change with `DataFrame.dtypes`

The DataFrame.dtypes attribute returns a Series showing the data type of each column.

import pandas as pd

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'Department': ['Sales', 'HR', 'Engineering', 'Sales', 'HR'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
    'YearsOfService': [3, 5, 2, 7, 1],
    'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0]
})

# ✅ Change the 'Department' column to categorical
df['Department'] = df['Department'].astype('category')

# ... (after the conversion above) ...
print("Dtypes after changing 'Department':")
print(df.dtypes)

Output:

Dtypes after changing 'Department':
EmployeeID           int64
Department        category
Gender              object
YearsOfService       int64
Salary             float64
dtype: object

Changing Multiple Columns' Types to Categorical

Passing a List of Columns to `astype('category')`

In modern Pandas versions, you can select multiple columns and call .astype('category') on the resulting DataFrame subset.

import pandas as pd

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'Department': ['Sales', 'HR', 'Engineering', 'Sales', 'HR'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
    'YearsOfService': [3, 5, 2, 7, 1],
    'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0]
})


columns_to_categorize = ['Department', 'Gender']

# ✅ Select multiple columns and apply astype
df[columns_to_categorize] = df[columns_to_categorize].astype('category')

print("Dtypes after changing 'Department' and 'Gender':")
print(df.dtypes)

Output:

Dtypes after changing 'Department' and 'Gender':
EmployeeID           int64
Department        category
Gender            category
YearsOfService       int64
Salary             float64
dtype: object

note

You can also provide the list of columns directly:

df[['Department', 'Gender']] = df[['Department', 'Gender']].astype('category')

This is efficient and concise for multiple columns.

Iterating with a `for` Loop (Less Common Now)

In older Pandas versions, or if you prefer explicit iteration, a for loop was common.

import pandas as pd

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'Department': ['Sales', 'HR', 'Engineering', 'Sales', 'HR'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
    'YearsOfService': [3, 5, 2, 7, 1],
    'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0]
})


columns_to_categorize = ['Department', 'Gender']

print("Changing multiple columns using a for loop:")
for col_name in columns_to_categorize:
    df[col_name] = df[col_name].astype('category')

print(df.dtypes)

Output:

Changing multiple columns using a for loop:
EmployeeID           int64
Department        category
Gender            category
YearsOfService       int64
Salary             float64
dtype: object

While this works, the direct selection and astype call (Method 3.1) is now generally preferred for its conciseness.

Using `DataFrame.apply()` with a Lambda (Less Common Now)

Similarly, apply() with a lambda could be used, but it's also less direct than Method 3.1 for simple type casting.

import pandas as pd

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'Department': ['Sales', 'HR', 'Engineering', 'Sales', 'HR'],
    'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],
    'YearsOfService': [3, 5, 2, 7, 1],
    'Salary': [60000.0, 55000.0, 90000.0, 65000.0, 52000.0]
})

columns_to_categorize = ['Department', 'Gender']

print("Changing multiple columns using apply() and lambda:")
# Apply astype to each selected column
df[columns_to_categorize] = df[columns_to_categorize].apply(lambda x: x.astype('category'))

print(df.dtypes)

Output:

Changing multiple columns using apply() and lambda:
EmployeeID           int64
Department        category
Gender            category
YearsOfService       int64
Salary             float64
dtype: object

Converting All Columns (or All Except Some) to Categorical

Selecting Columns by Data Type (`select_dtypes`)

If you want to convert all columns of a certain type (e.g., all object type columns, which usually hold strings) or exclude certain types, select_dtypes() is very useful.

import pandas as pd

data_mixed = {
    'Name': ['Laptop', 'Mouse', 'Keyboard', 'Monitor'], # object
    'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics'], # object
    'Quantity': [10, 50, 30, 15], # int
    'Price': [1200.0, 25.0, 75.0, 300.0], # float
    'Status': ['In Stock', 'In Stock', 'Low Stock', 'In Stock'] # object
}
df_mixed = pd.DataFrame(data_mixed)
print("Original dtypes for mixed DataFrame:")
print(df_mixed.dtypes)
print()

# Convert all 'object' type columns (typically strings) to 'category'
object_cols = df_mixed.select_dtypes(include='object').columns
df_mixed[object_cols] = df_mixed[object_cols].astype('category')

print("Dtypes after converting all object columns to category:")
print(df_mixed.dtypes)
print()

# --- Example: Convert all columns *except* numeric types to category ---
df_mixed_reinit = pd.DataFrame(data_mixed) # Re-initialize for this example

# Select columns that are NOT int64 or float64
cols_to_convert = df_mixed_reinit.select_dtypes(exclude=['int64', 'float64']).columns
df_mixed_reinit[cols_to_convert] = df_mixed_reinit[cols_to_convert].astype('category')

print("Dtypes after converting non-numeric columns to category:")
print(df_mixed_reinit.dtypes) # Output: Same as above, as 'object' was the only non-numeric type

Output:

Original dtypes for mixed DataFrame:
Name         object
Category     object
Quantity      int64
Price       float64
Status       object
dtype: object

Dtypes after converting all object columns to category:
Name        category
Category    category
Quantity       int64
Price        float64
Status      category
dtype: object

Dtypes after converting non-numeric columns to category:
Name        category
Category    category
Quantity       int64
Price        float64
Status      category
dtype: object

df.select_dtypes(include=...) or df.select_dtypes(exclude=...) returns a subset of the DataFrame.
.columns gets the column names from this subset.
Then, use these column names to select and apply astype('category').

Conclusion

Changing DataFrame column types to category in Pandas is a valuable optimization for memory and performance when dealing with columns that have a limited number of unique string values.

For single columns: df['col_name'] = df['col_name'].astype('category') is standard.
For multiple specific columns: df[list_of_cols] = df[list_of_cols].astype('category') is concise and efficient in modern Pandas.
To convert all columns of a certain type (e.g., object) or all except certain types: Use df.select_dtypes() to get the target column names, then apply astype('category').

Always verify the conversion using df.dtypes to ensure the columns have been successfully changed to the category data type.

Why Use the Categorical Data Type in Pandas?​

Changing a Single Column's Type to Categorical​

Using Series.astype('category')​

Verifying the Change with DataFrame.dtypes​

Changing Multiple Columns' Types to Categorical​

Passing a List of Columns to astype('category')​

Iterating with a for Loop (Less Common Now)​

Using DataFrame.apply() with a Lambda (Less Common Now)​

Converting All Columns (or All Except Some) to Categorical​

Selecting Columns by Data Type (select_dtypes)​

Conclusion​

Table of Contents

Why Use the Categorical Data Type in Pandas?

Changing a Single Column's Type to Categorical

Using `Series.astype('category')`

Verifying the Change with `DataFrame.dtypes`

Changing Multiple Columns' Types to Categorical

Passing a List of Columns to `astype('category')`

Iterating with a `for` Loop (Less Common Now)

Using `DataFrame.apply()` with a Lambda (Less Common Now)

Converting All Columns (or All Except Some) to Categorical

Selecting Columns by Data Type (`select_dtypes`)

Conclusion