Python Pandas: How to Transform List-like Column Entries into Separate Rows with `explode()`

Data often arrives in a semi-structured format where a single cell in a Pandas DataFrame column might contain multiple values, typically as a list, tuple, or a delimited string (e.g., comma-separated). To effectively analyze or normalize such data, you often need to transform each item within these list-like entries into its own individual row, while replicating the other column values for that original row.

This guide will provide a comprehensive walkthrough of the DataFrame.explode() method, your primary tool for this task in Pandas. You'll learn how to use it, manage the resulting index, handle empty lists or scalar values, and even apply it to common scenarios like expanding comma-separated string data.

The Challenge: Nested List-like Data in DataFrame Columns

Consider a DataFrame where one or more columns contain cells with list-like structures:

import pandas as pd

df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

print("Original DataFrame with list-like 'products' column:")
print(df_initial)

Output:

Original DataFrame with list-like 'products' column:
   order_id               products customer_id
0       101        [apple, banana]      cust_A
1       102               [orange]      cust_B
2       103  [apple, grape, mango]      cust_A
3       104                 banana      cust_C

For many analyses (e.g., counting individual product occurrences, joining with a product details table), this nested structure is inconvenient. We need each product on its own row.

The Solution: `DataFrame.explode()`

The DataFrame.explode(column_name_or_names) method is specifically designed for this transformation. It takes each element from a list-like entry in the specified column(s) and creates a new row for it, duplicating the values from other columns and the original index for each new row.

Basic Usage of `explode()`

import pandas as pd

df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

# Explode the 'products' column
df_exploded = df_initial.explode('products')

print("DataFrame after exploding 'products':")
print(df_exploded)

Output:

DataFrame after exploding 'products':
   order_id products customer_id
     101    apple      cust_A
     101   banana      cust_A
     102   orange      cust_B
     103    apple      cust_A
     103    grape      cust_A
     103    mango      cust_A
     104   banana      cust_C

The first argument to explode() is the name of the column (or a list of column names) to explode.

Understanding Index Replication

As seen in the output, the original index values are repeated for each new row generated from an exploded list. This maintains a link to the original row.

Resetting the Index after Exploding (`ignore_index=True` or `reset_index()`)

If you prefer a new, continuous integer index for the exploded DataFrame:

Using ignore_index=True (Pandas 1.1.0+):

import pandas as pd

df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

df_exploded_new_index = df_initial.explode('products', ignore_index=True)
print("Exploded DataFrame with ignore_index=True:")
print(df_exploded_new_index)

Output:

Exploded DataFrame with ignore_index=True:
order_id products customer_id
     101    apple      cust_A
     101   banana      cust_A
     102   orange      cust_B
     103    apple      cust_A
     103    grape      cust_A
     103    mango      cust_A
     104   banana      cust_C

Using .reset_index(drop=True) (for older Pandas versions or if ignore_index isn't suitable):

import pandas as pd

df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

df_exploded_reset = df_initial.explode('products').reset_index(drop=True)
# drop=True prevents the old index from being added as a new column
print("Exploded DataFrame with reset_index(drop=True):")
print(df_exploded_reset)

Output:

Exploded DataFrame with reset_index(drop=True):
order_id products customer_id
     101    apple      cust_A
     101   banana      cust_A
     102   orange      cust_B
     103    apple      cust_A
     103    grape      cust_A
     103    mango      cust_A
     104   banana      cust_C

Exploding a Single Column (Series)

You can also call .explode() directly on a Series if you only need to transform that specific column.

import pandas as pd

df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

products_series_exploded = df_initial['products'].explode()
print("Exploded 'products' Series:")
print(products_series_exploded)

Output:

Exploded 'products' Series:
   apple
  banana
  orange
   apple
   grape
   mango
  banana
Name: products, dtype: object

The index is still replicated here.

Key Behaviors of `explode()`

It's important to understand how explode() handles certain edge cases:

Handling Empty Lists (Produces `NaN`)

If a cell in the column being exploded contains an empty list ([]), explode() will represent this with numpy.nan (or pd.NA for newer Pandas versions with nullable dtypes) in the exploded column for that row.

import pandas as pd

df_with_empty = pd.DataFrame({
    'id': [1, 2, 3],
    'tags': [['tagA', 'tagB'], [], ['tagC']],
    'value': [100, 200, 300]
})
print("Original DataFrame with an empty list:")
print(df_with_empty)
print()

df_exploded_empty = df_with_empty.explode('tags')
print("DataFrame after exploding column with empty list:")
print(df_exploded_empty)

Output:

Original DataFrame with an empty list:
   id          tags  value
0   1  [tagA, tagB]    100
1   2            []    200
2   3        [tagC]    300

DataFrame after exploding column with empty list:
   id  tags  value
0   1  tagA    100
0   1  tagB    100
1   2   NaN    200
2   3  tagC    300

Preserving Scalar Values

If a cell in the column being exploded already contains a scalar value (not a list-like object), that value is preserved as is on a single row. This was seen in the df_initial example where order_id 104 had 'banana' (a string) in products.

Resulting Data Type

The data type of the exploded column is typically object, especially if there's a mix of data types after explosion or if NaN values are introduced.

import pandas as pd

df_with_empty = pd.DataFrame({
    'id': [1, 2, 3],
    'tags': [['tagA', 'tagB'], [], ['tagC']],
    'value': [100, 200, 300]
})

df_exploded_empty = df_with_empty.explode('tags')

print("Data types after exploding (tags column becomes object):")
print(df_exploded_empty.dtypes)

Output:

Data types after exploding (tags column becomes object):
id        int64
tags     object
value     int64
dtype: object

Practical Use Case: Expanding Comma-Separated Strings

A very common scenario is having a column with strings where multiple values are packed together, separated by a delimiter like a comma. explode() is perfect for this after an initial split.

Splitting the String into a List

First, use the .str.split() method to convert the delimited string into a list of strings.

import pandas as pd

df_csv_strings = pd.DataFrame({
    'item_id': ['X01', 'X02', 'X03'],
    'categories': ['electronics,books', 'books,home', 'electronics,clothing,home'],
    'price': [199, 25, 75]
})

print("Original DataFrame with comma-separated strings:")
print(df_csv_strings)
print()

# Split the 'categories' string into lists
df_csv_strings['categories_list'] = df_csv_strings['categories'].str.split(',')
print("DataFrame after splitting 'categories' into lists:")
print(df_csv_strings)

Output:

Original DataFrame with comma-separated strings:
  item_id                 categories  price
0     X01          electronics,books    199
1     X02                 books,home     25
2     X03  electronics,clothing,home     75

DataFrame after splitting 'categories' into lists:
  item_id                 categories  price                categories_list
0     X01          electronics,books    199           [electronics, books]
1     X02                 books,home     25                  [books, home]
2     X03  electronics,clothing,home     75  [electronics, clothing, home]

Applying `explode()`

Now, explode the new list column. You can do this in a chained operation or assign the split result first.

Method 1: Chaining with assign (that creates a new DataFrame)

import pandas as pd

df_csv_strings = pd.DataFrame({
    'item_id': ['X01', 'X02', 'X03'],
    'categories': ['electronics,books', 'books,home', 'electronics,clothing,home'],
    'price': [199, 25, 75]
})

df_expanded_categories = df_csv_strings.assign(
    categories=df_csv_strings['categories'].str.split(',')
).explode('categories')


print("DataFrame after splitting and exploding 'categories':")
print(df_expanded_categories[['item_id', 'categories', 'price']]) # Show relevant columns

Output:

DataFrame after splitting and exploding 'categories':
  item_id   categories  price
   X01  electronics    199
   X01        books    199
   X02        books     25
   X02         home     25
   X03  electronics     75
   X03     clothing     75
   X03         home     75

Method 2: Modify DataFrame in steps

import pandas as pd

df_csv_strings = pd.DataFrame({
    'item_id': ['X01', 'X02', 'X03'],
    'categories': ['electronics,books', 'books,home', 'electronics,clothing,home'],
    'price': [199, 25, 75]
})

df_csv_strings['categories'] = df_csv_strings['categories'].str.split(',')
df_expanded_categories = df_csv_strings.explode('categories')

print("DataFrame after splitting and exploding 'categories':")
print(df_expanded_categories[['item_id', 'categories', 'price']]) # Show relevant columns

Output:

DataFrame after splitting and exploding 'categories':
  item_id   categories  price
   X01  electronics    199
   X01        books    199
   X02        books     25
   X02         home     25
   X03  electronics     75
   X03     clothing     75
   X03         home     75

Conclusion

The DataFrame.explode() method is an indispensable tool in Pandas for normalizing data where individual cells contain collections of items (lists, tuples, or even sets).

By transforming each item into its own row, explode() flattens your DataFrame structure, making it suitable for more detailed analysis, aggregation, and joining operations.
Remember to consider index management (ignore_index or reset_index) and how explode() handles empty lists and scalar values to effectively use this powerful feature.
For common cases like comma-separated strings, combining .str.split() with .explode() provides a clean and efficient solution.

The Challenge: Nested List-like Data in DataFrame Columns​

The Solution: DataFrame.explode()​

Basic Usage of explode()​

Understanding Index Replication​

Resetting the Index after Exploding (ignore_index=True or reset_index())​

Exploding a Single Column (Series)​

Key Behaviors of explode()​

Handling Empty Lists (Produces NaN)​

Preserving Scalar Values​

Resulting Data Type​

Practical Use Case: Expanding Comma-Separated Strings​

Splitting the String into a List​

Applying explode()​

Conclusion​

Table of Contents

The Challenge: Nested List-like Data in DataFrame Columns

The Solution: `DataFrame.explode()`

Basic Usage of `explode()`

Understanding Index Replication

Resetting the Index after Exploding (`ignore_index=True` or `reset_index()`)

Exploding a Single Column (Series)

Key Behaviors of `explode()`

Handling Empty Lists (Produces `NaN`)

Preserving Scalar Values

Resulting Data Type

Practical Use Case: Expanding Comma-Separated Strings

Splitting the String into a List

Applying `explode()`

Conclusion