Skip to main content

Python Pandas: How to Transform List-like Column Entries into Separate Rows with explode()

Data often arrives in a semi-structured format where a single cell in a Pandas DataFrame column might contain multiple values, typically as a list, tuple, or a delimited string (e.g., comma-separated). To effectively analyze or normalize such data, you often need to transform each item within these list-like entries into its own individual row, while replicating the other column values for that original row.

This guide will provide a comprehensive walkthrough of the DataFrame.explode() method, your primary tool for this task in Pandas. You'll learn how to use it, manage the resulting index, handle empty lists or scalar values, and even apply it to common scenarios like expanding comma-separated string data.

The Challenge: Nested List-like Data in DataFrame Columns

Consider a DataFrame where one or more columns contain cells with list-like structures:

import pandas as pd

df_initial = pd.DataFrame({
'order_id': [101, 102, 103, 104],
'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

print("Original DataFrame with list-like 'products' column:")
print(df_initial)

Output:

Original DataFrame with list-like 'products' column:
order_id products customer_id
0 101 [apple, banana] cust_A
1 102 [orange] cust_B
2 103 [apple, grape, mango] cust_A
3 104 banana cust_C

For many analyses (e.g., counting individual product occurrences, joining with a product details table), this nested structure is inconvenient. We need each product on its own row.

The Solution: DataFrame.explode()

The DataFrame.explode(column_name_or_names) method is specifically designed for this transformation. It takes each element from a list-like entry in the specified column(s) and creates a new row for it, duplicating the values from other columns and the original index for each new row.

Basic Usage of explode()

import pandas as pd

df_initial = pd.DataFrame({
'order_id': [101, 102, 103, 104],
'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

# Explode the 'products' column
df_exploded = df_initial.explode('products')

print("DataFrame after exploding 'products':")
print(df_exploded)

Output:

DataFrame after exploding 'products':
order_id products customer_id
0 101 apple cust_A
0 101 banana cust_A
1 102 orange cust_B
2 103 apple cust_A
2 103 grape cust_A
2 103 mango cust_A
3 104 banana cust_C
  • The first argument to explode() is the name of the column (or a list of column names) to explode.

Understanding Index Replication

As seen in the output, the original index values are repeated for each new row generated from an exploded list. This maintains a link to the original row.

Resetting the Index after Exploding (ignore_index=True or reset_index())

If you prefer a new, continuous integer index for the exploded DataFrame:

  • Using ignore_index=True (Pandas 1.1.0+):

    import pandas as pd

    df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
    })

    df_exploded_new_index = df_initial.explode('products', ignore_index=True)
    print("Exploded DataFrame with ignore_index=True:")
    print(df_exploded_new_index)

    Output:

    Exploded DataFrame with ignore_index=True:
    order_id products customer_id
    0 101 apple cust_A
    1 101 banana cust_A
    2 102 orange cust_B
    3 103 apple cust_A
    4 103 grape cust_A
    5 103 mango cust_A
    6 104 banana cust_C
  • Using .reset_index(drop=True) (for older Pandas versions or if ignore_index isn't suitable):

    import pandas as pd

    df_initial = pd.DataFrame({
    'order_id': [101, 102, 103, 104],
    'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
    'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
    })

    df_exploded_reset = df_initial.explode('products').reset_index(drop=True)
    # drop=True prevents the old index from being added as a new column
    print("Exploded DataFrame with reset_index(drop=True):")
    print(df_exploded_reset)

    Output:

    Exploded DataFrame with reset_index(drop=True):
    order_id products customer_id
    0 101 apple cust_A
    1 101 banana cust_A
    2 102 orange cust_B
    3 103 apple cust_A
    4 103 grape cust_A
    5 103 mango cust_A
    6 104 banana cust_C

Exploding a Single Column (Series)

You can also call .explode() directly on a Series if you only need to transform that specific column.

import pandas as pd

df_initial = pd.DataFrame({
'order_id': [101, 102, 103, 104],
'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})

products_series_exploded = df_initial['products'].explode()
print("Exploded 'products' Series:")
print(products_series_exploded)

Output:

Exploded 'products' Series:
0 apple
0 banana
1 orange
2 apple
2 grape
2 mango
3 banana
Name: products, dtype: object

The index is still replicated here.

Key Behaviors of explode()

It's important to understand how explode() handles certain edge cases:

Handling Empty Lists (Produces NaN)

If a cell in the column being exploded contains an empty list ([]), explode() will represent this with numpy.nan (or pd.NA for newer Pandas versions with nullable dtypes) in the exploded column for that row.

import pandas as pd

df_with_empty = pd.DataFrame({
'id': [1, 2, 3],
'tags': [['tagA', 'tagB'], [], ['tagC']],
'value': [100, 200, 300]
})
print("Original DataFrame with an empty list:")
print(df_with_empty)
print()

df_exploded_empty = df_with_empty.explode('tags')
print("DataFrame after exploding column with empty list:")
print(df_exploded_empty)

Output:

Original DataFrame with an empty list:
id tags value
0 1 [tagA, tagB] 100
1 2 [] 200
2 3 [tagC] 300

DataFrame after exploding column with empty list:
id tags value
0 1 tagA 100
0 1 tagB 100
1 2 NaN 200
2 3 tagC 300

Preserving Scalar Values

If a cell in the column being exploded already contains a scalar value (not a list-like object), that value is preserved as is on a single row. This was seen in the df_initial example where order_id 104 had 'banana' (a string) in products.

Resulting Data Type

The data type of the exploded column is typically object, especially if there's a mix of data types after explosion or if NaN values are introduced.

import pandas as pd

df_with_empty = pd.DataFrame({
'id': [1, 2, 3],
'tags': [['tagA', 'tagB'], [], ['tagC']],
'value': [100, 200, 300]
})

df_exploded_empty = df_with_empty.explode('tags')

print("Data types after exploding (tags column becomes object):")
print(df_exploded_empty.dtypes)

Output:

Data types after exploding (tags column becomes object):
id int64
tags object
value int64
dtype: object

Practical Use Case: Expanding Comma-Separated Strings

A very common scenario is having a column with strings where multiple values are packed together, separated by a delimiter like a comma. explode() is perfect for this after an initial split.

Splitting the String into a List

First, use the .str.split() method to convert the delimited string into a list of strings.

import pandas as pd

df_csv_strings = pd.DataFrame({
'item_id': ['X01', 'X02', 'X03'],
'categories': ['electronics,books', 'books,home', 'electronics,clothing,home'],
'price': [199, 25, 75]
})

print("Original DataFrame with comma-separated strings:")
print(df_csv_strings)
print()

# Split the 'categories' string into lists
df_csv_strings['categories_list'] = df_csv_strings['categories'].str.split(',')
print("DataFrame after splitting 'categories' into lists:")
print(df_csv_strings)

Output:

Original DataFrame with comma-separated strings:
item_id categories price
0 X01 electronics,books 199
1 X02 books,home 25
2 X03 electronics,clothing,home 75

DataFrame after splitting 'categories' into lists:
item_id categories price categories_list
0 X01 electronics,books 199 [electronics, books]
1 X02 books,home 25 [books, home]
2 X03 electronics,clothing,home 75 [electronics, clothing, home]

Applying explode()

Now, explode the new list column. You can do this in a chained operation or assign the split result first.

Method 1: Chaining with assign (that creates a new DataFrame)

import pandas as pd

df_csv_strings = pd.DataFrame({
'item_id': ['X01', 'X02', 'X03'],
'categories': ['electronics,books', 'books,home', 'electronics,clothing,home'],
'price': [199, 25, 75]
})

df_expanded_categories = df_csv_strings.assign(
categories=df_csv_strings['categories'].str.split(',')
).explode('categories')


print("DataFrame after splitting and exploding 'categories':")
print(df_expanded_categories[['item_id', 'categories', 'price']]) # Show relevant columns

Output:

DataFrame after splitting and exploding 'categories':
item_id categories price
0 X01 electronics 199
0 X01 books 199
1 X02 books 25
1 X02 home 25
2 X03 electronics 75
2 X03 clothing 75
2 X03 home 75

Method 2: Modify DataFrame in steps

import pandas as pd

df_csv_strings = pd.DataFrame({
'item_id': ['X01', 'X02', 'X03'],
'categories': ['electronics,books', 'books,home', 'electronics,clothing,home'],
'price': [199, 25, 75]
})

df_csv_strings['categories'] = df_csv_strings['categories'].str.split(',')
df_expanded_categories = df_csv_strings.explode('categories')

print("DataFrame after splitting and exploding 'categories':")
print(df_expanded_categories[['item_id', 'categories', 'price']]) # Show relevant columns

Output:

DataFrame after splitting and exploding 'categories':
item_id categories price
0 X01 electronics 199
0 X01 books 199
1 X02 books 25
1 X02 home 25
2 X03 electronics 75
2 X03 clothing 75
2 X03 home 75

Conclusion

The DataFrame.explode() method is an indispensable tool in Pandas for normalizing data where individual cells contain collections of items (lists, tuples, or even sets).

  • By transforming each item into its own row, explode() flattens your DataFrame structure, making it suitable for more detailed analysis, aggregation, and joining operations.
  • Remember to consider index management (ignore_index or reset_index) and how explode() handles empty lists and scalar values to effectively use this powerful feature.
  • For common cases like comma-separated strings, combining .str.split() with .explode() provides a clean and efficient solution.