Python Pandas: How to Split Column of Lists into Multiple Columns
A common data preparation task in Pandas involves "exploding" or "unpacking" a DataFrame column where each cell contains a list (or tuple) into multiple new columns. Each element of the list in the original cell becomes a value in a new, separate column for that row.
This guide demonstrates several effective methods to split a Pandas DataFrame column containing lists of equal (and varying) lengths into multiple new columns.
The Goal: Unpacking List Elements into New Columns
Given a Pandas DataFrame with a column where each entry is a list (e.g., Coordinates: [[10,20], [15,25], [5,30]]
), we want to transform this so that each element of the inner lists forms a new column. For example, if lists have two elements, we'd create 'Coord_X' and 'Coord_Y'.
Example DataFrame
import pandas as pd
import numpy as np # For NaN if needed
data = {
'ID': ['ItemA', 'ItemB', 'ItemC', 'ItemD'],
'Attributes': [[10, 'Red', True], [20, 'Blue', False], [15, 'Green', True], [25, 'Red', False]],
'Category': ['X', 'Y', 'X', 'Z']
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)
Output:
Original DataFrame:
ID Attributes Category
0 ItemA [10, Red, True] X
1 ItemB [20, Blue, False] Y
2 ItemC [15, Green, True] X
3 ItemD [25, Red, False] Z
Now, we want to split the 'Attributes'
column into three new columns.
Method 1: Using Series.tolist()
with pd.DataFrame()
Constructor (Recommended)
This is generally the most efficient and straightforward method when all lists in the column have the same length.
For Lists of Equal Length
- Select the column containing lists:
df['YourListColumn']
. - Convert this Series of lists into a Python list of lists:
.tolist()
. - Pass this list of lists directly to the
pd.DataFrame()
constructor. Pandas will interpret each inner list as a row for the new DataFrame. - Optionally, provide
columns
names andindex
for the new DataFrame.
import pandas as pd
df = pd.DataFrame({
'ID': ['ItemA', 'ItemB', 'ItemC'],
'Attributes': [[10, 'Red', True], [20, 'Blue', False], [15, 'Green', True]],
})
# Step 1 & 2: Select column and convert to list of lists
list_of_attributes = df['Attributes'].tolist()
print(f"List of attribute lists:\n{list_of_attributes}\n")
# Step 3: Create a new DataFrame from this list of lists
# Ensure the index aligns if adding back to the original df
new_cols_df = pd.DataFrame(list_of_attributes, index=df.index)
print("New DataFrame from list elements:")
print(new_cols_df)
print()
# Step 4: Assign new column names (optional, but good practice)
new_column_names = ['Attr_Num', 'Attr_Color', 'Attr_Flag']
new_cols_df.columns = new_column_names
print("New DataFrame with named columns:")
print(new_cols_df)
Output:
List of attribute lists:
[[10, 'Red', True], [20, 'Blue', False], [15, 'Green', True]]
New DataFrame from list elements:
0 1 2
0 10 Red True
1 20 Blue False
2 15 Green True
New DataFrame with named columns:
Attr_Num Attr_Color Attr_Flag
0 10 Red True
1 20 Blue False
2 15 Green True
Assigning to New DataFrame vs. Existing DataFrame
-
Creating a new DataFrame with only the split columns:
import pandas as pd
df = pd.DataFrame({
'ID': ['ItemA', 'ItemB', 'ItemC'],
'Attributes': [[10, 'Red', True], [20, 'Blue', False], [15, 'Green', True]],
})
df_split_only = pd.DataFrame(
df['Attributes'].tolist(),
columns=['Attribute1', 'Attribute2', 'Attribute3'], # Name columns directly
index=df.index # Important to align if merging back later
)
print("New DataFrame containing only split columns:")
print(df_split_only)Output:
New DataFrame containing only split columns:
Attribute1 Attribute2 Attribute3
0 10 Red True
1 20 Blue False
2 15 Green True -
Adding split columns to the existing DataFrame:
import pandas as pd
df = pd.DataFrame({
'ID': ['ItemA', 'ItemB', 'ItemC'],
'Attributes': [[10, 'Red', True], [20, 'Blue', False], [15, 'Green', True]],
'Category': ['X', 'Y', 'X']
})
new_col_names = ['Attr_Num', 'Attr_Color', 'Attr_Flag']
# ✅ Assign the new columns to the original DataFrame
df[new_col_names] = pd.DataFrame(df['Attributes'].tolist(), index=df.index)
# Or, if you want to name the new columns during creation:
# df[['Attr_Num', 'Attr_Color', 'Attr_Flag']] = pd.DataFrame(
# df['Attributes'].tolist(),
# index=df.index
# ) # This also works.
print("Original DataFrame with new split columns added:")
print(df)Output:
Original DataFrame with new split columns added:
ID Attributes Category Attr_Num Attr_Color Attr_Flag
0 ItemA [10, Red, True] X 10 Red True
1 ItemB [20, Blue, False] Y 20 Blue False
2 ItemC [15, Green, True] X 15 Green TrueEnsuring
index=df.index
when creating the new DataFrame from the list of lists is crucial for correct alignment when assigning back as new columns.
Method 2: Using Series.apply(pd.Series)
The Series.apply(pd.Series)
method can also "explode" a Series of lists into a DataFrame where each element of the lists becomes a new column.
import pandas as pd
df = pd.DataFrame({
'ID': ['ItemA', 'ItemB', 'ItemC'],
'Attributes': [[10, 'Red', True], [20, 'Blue', False], [15, 'Green', True]],
'Category': ['X', 'Y', 'X']
})
# ✅ Apply pd.Series to the 'Attributes' column
expanded_attributes_df = df['Attributes'].apply(pd.Series)
print("DataFrame from apply(pd.Series):")
print(expanded_attributes_df)
print()
# Rename the new columns
expanded_attributes_df.columns = ['Applied_Attr1', 'Applied_Attr2', 'Applied_Attr3']
# Add these new columns to the original DataFrame (or a copy)
# df_with_applied_cols = pd.concat([df, expanded_attributes_df], axis=1)
# Or, assign directly if the index aligns (which it should here):
df[['Applied_Attr1', 'Applied_Attr2', 'Applied_Attr3']] = expanded_attributes_df
print("DataFrame with split columns using apply(pd.Series):")
print(df)
Output:
DataFrame from apply(pd.Series):
0 1 2
0 10 Red True
1 20 Blue False
2 15 Green True
DataFrame with split columns using apply(pd.Series):
ID Attributes Category Applied_Attr1 Applied_Attr2 \
0 ItemA [10, Red, True] X 10 Red
1 ItemB [20, Blue, False] Y 20 Blue
2 ItemC [15, Green, True] X 15 Green
Applied_Attr3
0 True
1 False
2 True
df['Attributes'].apply(pd.Series)
: For each list in the 'Attributes' Series,pd.Series
is called, effectively creating a new Series from that list. Pandas then combines these resulting Series into a new DataFrame.- This method also handles lists of different lengths by filling shorter lists with
NaN
(see next section).
Handling Lists of Different Lengths
If the lists within your column have varying lengths, pd.DataFrame(your_series.tolist())
will automatically pad shorter lists with NaN
values to match the length of the longest list.
Using Series.tolist()
with pd.DataFrame()
(NaN Padding)
import pandas as pd
import numpy as np
data_varying_lengths = {
'ID': ['P1', 'P2', 'P3', 'P4'],
'Features': [
['Fast', 'Reliable'], # Length 2
['Compact', 'Efficient', 'Quiet'], # Length 3
['Durable'], # Length 1
['Lightweight', 'Portable', 'Stylish', 'Affordable'] # Length 4
]
}
df_varying = pd.DataFrame(data_varying_lengths)
print("Original DataFrame with varying list lengths:")
print(df_varying)
print()
# ✅ Convert Series of lists to list of lists
list_of_features = df_varying['Features'].tolist()
# Create new DataFrame; Pandas handles varying lengths by padding with NaN
df_features_split = pd.DataFrame(list_of_features, index=df_varying.index)
print("Split features (NaN padded):")
print(df_features_split)
print()
# Name the new columns (up to the max length)
max_len = df_features_split.shape[1] # Number of columns created
df_features_split.columns = [f'Feature_{i+1}' for i in range(max_len)]
# Add to original DataFrame
df_final_varying = pd.concat([df_varying.drop(columns=['Features']), df_features_split], axis=1)
# Or: df_varying[[f'Feature_{i+1}' for i in range(max_len)]] = df_features_split
print("Final DataFrame with split varying-length lists:")
print(df_final_varying)
Output:
Original DataFrame with varying list lengths:
ID Features
0 P1 [Fast, Reliable]
1 P2 [Compact, Efficient, Quiet]
2 P3 [Durable]
3 P4 [Lightweight, Portable, Stylish, Affordable]
Split features (NaN padded):
0 1 2 3
0 Fast Reliable None None
1 Compact Efficient Quiet None
2 Durable None None None
3 Lightweight Portable Stylish Affordable
Final DataFrame with split varying-length lists:
ID Feature_1 Feature_2 Feature_3 Feature_4
0 P1 Fast Reliable None None
1 P2 Compact Efficient Quiet None
2 P3 Durable None None None
3 P4 Lightweight Portable Stylish Affordable
The pd.DataFrame()
constructor, when given a list of lists where inner lists have different lengths, will create columns up to the length of the longest inner list, filling shorter ones with NaN
. df['col'].apply(pd.Series)
also behaves this way.
Conclusion
Splitting a Pandas DataFrame column containing lists (or tuples) into multiple new columns is a common data reshaping task.
- For lists of equal length, using
df[new_cols] = pd.DataFrame(df['list_col'].tolist(), index=df.index)
is generally the most direct and efficient method. df['list_col'].apply(pd.Series)
is another effective method that also gracefully handles lists of varying lengths by padding withNaN
.- When list lengths vary,
pd.DataFrame(df['list_col'].tolist(), index=df.index)
also correctly pads withNaN
.
After splitting, remember to assign meaningful names to your new columns. Choose the method that best fits your data's structure and your preference for conciseness.