Python Pandas: Create New Column of Tuples (or Lists) from Two Columns
In Pandas, a common data transformation task is to combine values from two (or more) existing columns for each row into a single tuple or list, and then store this collection as a new column in the DataFrame. This can be useful for creating composite keys, feature engineering, or preparing data for functions that expect grouped inputs.
This guide explains several effective methods to create a new DataFrame column containing tuples or lists derived from two existing columns, using techniques like zip()
, apply()
, itertuples()
, and values.tolist()
.
The Goal: Combining Two Columns into Tuples/Lists Row-wise
Given a Pandas DataFrame, we want to take two specific columns, say 'ColumnA' and 'ColumnB'. For each row, we want to create a tuple (value_from_A, value_from_B)
or a list [value_from_A, value_from_B]
and store this new collection in a new column, say 'Combined_AB'.
Example DataFrame
We'll use the following DataFrame for our examples:
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)
Output:
Original DataFrame:
EmployeeID FirstName LastName Department Salary
0 101 Alice Smith HR 60000
1 102 Bob Johnson Engineering 85000
2 103 Charlie Brown HR 62000
3 104 David Lee Sales 70000
Creating a New Column of Tuples from Two Columns
Let's say we want to create a new column FullNameTuple
from FirstName
and LastName
, and another ContactList
from FirstName
and Department
.
Using zip()
and list()
(Recommended for Tuples)
The built-in zip()
function is excellent for pairing up elements from multiple iterables. When applied to two DataFrame columns (which are Pandas Series), it yields tuples of corresponding elements.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
# Combine 'FirstName' and 'LastName' into tuples
# df['FirstName'] and df['LastName'] are Series
zipped_values = zip(df['FirstName'], df['LastName'])
# ✅ Assign the list of tuples to a new column
df['FullNameTuple'] = list(zipped_values)
print("DataFrame with 'FullNameTuple' column (using zip):")
print(df[['FirstName', 'LastName', 'FullNameTuple']])
Output:
DataFrame with 'FullNameTuple' column (using zip):
FirstName LastName FullNameTuple
0 Alice Smith (Alice, Smith)
1 Bob Johnson (Bob, Johnson)
2 Charlie Brown (Charlie, Brown)
3 David Lee (David, Lee)
zip(df['col1'], df['col2'])
creates an iterator of tuples.list(...)
converts this iterator into a list of tuples, which can then be assigned as a new Series/column.- This method is very Pythonic, readable, and generally efficient.
Using DataFrame.apply(tuple, axis=1)
The DataFrame.apply()
method can apply a function along an axis. By selecting the desired columns and applying the tuple
constructor row-wise (axis=1
), we can achieve the same result.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
# Select the two columns to combine
columns_to_combine = ['FirstName', 'LastName']
df_subset = df[columns_to_combine]
print(df_subset) # Shows the DataFrame with just these two columns
print()
# ✅ Apply the 'tuple' function row-wise (axis=1)
df['FullNameTuple_apply'] = df_subset.apply(tuple, axis=1)
# Alternatively, in one line:
# df['FullNameTuple_apply'] = df[['FirstName', 'LastName']].apply(tuple, axis=1)
print("DataFrame with 'FullNameTuple_apply' column (using apply):")
print(df[['FirstName', 'LastName', 'FullNameTuple_apply']])
Output: (Same as using zip)
FirstName LastName
0 Alice Smith
1 Bob Johnson
2 Charlie Brown
3 David Lee
DataFrame with 'FullNameTuple_apply' column (using apply):
FirstName LastName FullNameTuple_apply
0 Alice Smith (Alice, Smith)
1 Bob Johnson (Bob, Johnson)
2 Charlie Brown (Charlie, Brown)
3 David Lee (David, Lee)
df[['FirstName', 'LastName']]
: Selects a DataFrame subset containing only these two columns..apply(tuple, axis=1)
: For each row in this subset, it takes the values and passes them to thetuple()
constructor, creating a tuple for that row.axis=1
is crucial for row-wise operation.
Using DataFrame.itertuples()
The itertuples()
method iterates over DataFrame rows as named tuples (or regular tuples if name=None
). We can use it on a subset of columns.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
columns_to_combine = ['FirstName', 'LastName']
# Iterate over the subset as plain tuples (index=False, name=None)
# and collect them into a list
tuples_from_itertuples = list(
df[columns_to_combine].itertuples(index=False, name=None)
)
print(tuples_from_itertuples)
print()
# ✅ Assign the list of tuples
df['FullNameTuple_itertuples'] = tuples_from_itertuples
print("DataFrame with 'FullNameTuple_itertuples' column:")
print(df[['FirstName', 'LastName', 'FullNameTuple_itertuples']])
Output: (Same as using zip)
[('Alice', 'Smith'), ('Bob', 'Johnson'), ('Charlie', 'Brown'), ('David', 'Lee')]
DataFrame with 'FullNameTuple_itertuples' column:
FirstName LastName FullNameTuple_itertuples
0 Alice Smith (Alice, Smith)
1 Bob Johnson (Bob, Johnson)
2 Charlie Brown (Charlie, Brown)
3 David Lee (David, Lee)
index=False
: Excludes the DataFrame index from being the first element of each tuple.name=None
: Returns regular tuples instead of namedtuples.- While functional,
itertuples()
is generally intended for iteration, and usingzip()
orapply()
is often more direct for creating a new column.
Creating a New Column of Lists from Two Columns
Using DataFrame.values.tolist()
on a Subset (Recommended for Lists)
Select the desired columns, get their NumPy array representation using .values
, and then convert this array to a list of lists using .tolist()
.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
columns_to_combine = ['FirstName', 'Department']
# Select columns, get .values (NumPy array), then convert to list of lists
list_of_lists = df[columns_to_combine].values.tolist()
print(list_of_lists)
print()
# ✅ Assign the list of lists to a new column
df['ContactList_values'] = list_of_lists
print("DataFrame with 'ContactList_values' column (using .values.tolist()):")
print(df[['FirstName', 'Department', 'ContactList_values']])
Output:
[['Alice', 'HR'], ['Bob', 'Engineering'], ['Charlie', 'HR'], ['David', 'Sales']]
DataFrame with 'ContactList_values' column (using .values.tolist()):
FirstName Department ContactList_values
0 Alice HR [Alice, HR]
1 Bob Engineering [Bob, Engineering]
2 Charlie HR [Charlie, HR]
3 David Sales [David, Sales]
df[columns_to_combine].values
: Returns a NumPy array of the data from the selected columns..tolist()
: Converts the NumPy array into a Python list of lists. This is typically efficient.
Using DataFrame.to_records()
The DataFrame.to_records(index=False)
method converts the DataFrame (or a subset) to a NumPy record array. When iterated or converted to a list, this yields tuples by default, but these can be converted to lists.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
columns_to_combine = ['FirstName', 'Department']
# Convert selected columns to records (tuples by default when listed)
records_as_tuples = list(df[columns_to_combine].to_records(index=False))
print(records_as_tuples)
print()
# Convert each tuple in the list to a list
list_of_lists_from_records = [list(rec) for rec in records_as_tuples]
# ✅ Assign the list of lists
df['ContactList_records'] = list_of_lists_from_records
print("DataFrame with 'ContactList_records' column (using .to_records()):")
print(df[['FirstName', 'Department', 'ContactList_records']])
Output: (Same as .values.tolist()
)
[np.record(('Alice', 'HR'), dtype=[('FirstName', 'O'), ('Department', 'O')]), np.record(('Bob', 'Engineering'), dtype=[('FirstName', 'O'), ('Department', 'O')]), np.record(('Charlie', 'HR'), dtype=[('FirstName', 'O'), ('Department', 'O')]), np.record(('David', 'Sales'), dtype=[('FirstName', 'O'), ('Department', 'O')])]
DataFrame with 'ContactList_records' column (using .to_records()):
FirstName Department ContactList_records
0 Alice HR [Alice, HR]
1 Bob Engineering [Bob, Engineering]
2 Charlie HR [Charlie, HR]
3 David Sales [David, Sales]
index=False
: Prevents the DataFrame index from being included as the first field in each record.- This is slightly more verbose than
.values.tolist()
if the goal is a simple list of lists.
Adapting zip()
with List Comprehension
You can adapt the zip()
method to create lists instead of tuples.
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104],
'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
'Department': ['HR', 'Engineering', 'HR', 'Sales'],
'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
df = df_original.copy()
# ✅ Use zip with a list comprehension to create lists
df['ContactList_zip_list'] = [list(item) for item in zip(df['FirstName'], df['Department'])]
print("DataFrame with 'ContactList_zip_list' column:")
print(df[['FirstName', 'Department', 'ContactList_zip_list']])
Output: (Same as .values.tolist()
)
DataFrame with 'ContactList_zip_list' column:
FirstName Department ContactList_zip_list
0 Alice HR [Alice, HR]
1 Bob Engineering [Bob, Engineering]
2 Charlie HR [Charlie, HR]
3 David Sales [David, Sales]
Choosing the Right Method
- For creating TUFLES:
list(zip(df['col1'], df['col2']))
: Highly Pythonic, readable, and generally efficient. Often the best choice.df[['col1', 'col2']].apply(tuple, axis=1)
: Clear and explicit, good if you prefer theapply
paradigm. Can be slightly less performant thanzip
for this simple task.list(df[['col1', 'col2']].itertuples(index=False, name=None))
: Works, but less direct thanzip
orapply
for column creation.
- For creating LISTS (of lists):
df[['col1', 'col2']].values.tolist()
: Very direct and efficient for converting selected columns into a list of lists. Usually the best choice.[list(item) for item in zip(df['col1'], df['col2'])]
: Adaptszip
well if you're already using it for tuples.[list(rec) for rec in df[['col1', 'col2']].to_records(index=False)]
: More steps involved compared tovalues.tolist()
.
Conclusion
Creating a new DataFrame column composed of tuples or lists from two existing columns is a common Pandas operation.
- For generating tuples, using
list(zip(df['col1'], df['col2']))
is often the most Pythonic and efficient method.df[['col1', 'col2']].apply(tuple, axis=1)
is also a clear alternative. - For generating lists (a list of lists in the new column),
df[['col1', 'col2']].values.tolist()
is usually the most direct and performant approach.
Select the method that you find most readable and that best suits your specific needs and data. All methods shown achieve the goal of combining row-wise data from two columns into a new column of collections.