Python Pandas: How to Transform List-like Column Entries into Separate Rows with explode()
Data often arrives in a semi-structured format where a single cell in a Pandas DataFrame column might contain multiple values, typically as a list, tuple, or a delimited string (e.g., comma-separated). To effectively analyze or normalize such data, you often need to transform each item within these list-like entries into its own individual row, while replicating the other column values for that original row.
This guide will provide a comprehensive walkthrough of the DataFrame.explode()
method, your primary tool for this task in Pandas. You'll learn how to use it, manage the resulting index, handle empty lists or scalar values, and even apply it to common scenarios like expanding comma-separated string data.
The Challenge: Nested List-like Data in DataFrame Columns
Consider a DataFrame where one or more columns contain cells with list-like structures:
import pandas as pd
df_initial = pd.DataFrame({
'order_id': [101, 102, 103, 104],
'products': [['apple', 'banana'], ['orange'], ['apple', 'grape', 'mango'], 'banana'],
'customer_id': ['cust_A', 'cust_B', 'cust_A', 'cust_C']
})
print("Original DataFrame with list-like 'products' column:")
print(df_initial)
Output:
Original DataFrame with list-like 'products' column:
order_id products customer_id
0 101 [apple, banana] cust_A
1 102 [orange] cust_B
2 103 [apple, grape, mango] cust_A
3 104 banana cust_C
For many analyses (e.g., counting individual product occurrences, joining with a product details table), this nested structure is inconvenient. We need each product on its own row.