Python Pandas: Create New Column of Tuples (or Lists) from Two Columns

In Pandas, a common data transformation task is to combine values from two (or more) existing columns for each row into a single tuple or list, and then store this collection as a new column in the DataFrame. This can be useful for creating composite keys, feature engineering, or preparing data for functions that expect grouped inputs.

This guide explains several effective methods to create a new DataFrame column containing tuples or lists derived from two existing columns, using techniques like zip(), apply(), itertuples(), and values.tolist().

The Goal: Combining Two Columns into Tuples/Lists Row-wise

Given a Pandas DataFrame, we want to take two specific columns, say 'ColumnA' and 'ColumnB'. For each row, we want to create a tuple (value_from_A, value_from_B) or a list [value_from_A, value_from_B] and store this new collection in a new column, say 'Combined_AB'.

Example DataFrame

We'll use the following DataFrame for our examples:

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
   EmployeeID FirstName LastName   Department  Salary
0         101     Alice    Smith           HR   60000
1         102       Bob  Johnson  Engineering   85000
2         103   Charlie    Brown           HR   62000
3         104     David      Lee        Sales   70000

Creating a New Column of Tuples from Two Columns

Let's say we want to create a new column FullNameTuple from FirstName and LastName, and another ContactList from FirstName and Department.

Using `zip()` and `list()` (Recommended for Tuples)

The built-in zip() function is excellent for pairing up elements from multiple iterables. When applied to two DataFrame columns (which are Pandas Series), it yields tuples of corresponding elements.

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)

df = df_original.copy()

# Combine 'FirstName' and 'LastName' into tuples
# df['FirstName'] and df['LastName'] are Series
zipped_values = zip(df['FirstName'], df['LastName'])

# ✅ Assign the list of tuples to a new column
df['FullNameTuple'] = list(zipped_values)

print("DataFrame with 'FullNameTuple' column (using zip):")
print(df[['FirstName', 'LastName', 'FullNameTuple']])

Output:

DataFrame with 'FullNameTuple' column (using zip):
  FirstName LastName     FullNameTuple
0     Alice    Smith    (Alice, Smith)
1       Bob  Johnson    (Bob, Johnson)
2   Charlie    Brown  (Charlie, Brown)
3     David      Lee      (David, Lee)

note

zip(df['col1'], df['col2']) creates an iterator of tuples.
list(...) converts this iterator into a list of tuples, which can then be assigned as a new Series/column.
This method is very Pythonic, readable, and generally efficient.

Using `DataFrame.apply(tuple, axis=1)`

The DataFrame.apply() method can apply a function along an axis. By selecting the desired columns and applying the tuple constructor row-wise (axis=1), we can achieve the same result.

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)

df = df_original.copy()

# Select the two columns to combine
columns_to_combine = ['FirstName', 'LastName']
df_subset = df[columns_to_combine]
print(df_subset) # Shows the DataFrame with just these two columns
print()

# ✅ Apply the 'tuple' function row-wise (axis=1)
df['FullNameTuple_apply'] = df_subset.apply(tuple, axis=1)
# Alternatively, in one line:
# df['FullNameTuple_apply'] = df[['FirstName', 'LastName']].apply(tuple, axis=1)

print("DataFrame with 'FullNameTuple_apply' column (using apply):")
print(df[['FirstName', 'LastName', 'FullNameTuple_apply']])

Output: (Same as using zip)

  FirstName LastName
   Alice    Smith
     Bob  Johnson
 Charlie    Brown
   David      Lee

DataFrame with 'FullNameTuple_apply' column (using apply):
  FirstName LastName FullNameTuple_apply
   Alice    Smith      (Alice, Smith)
     Bob  Johnson      (Bob, Johnson)
 Charlie    Brown    (Charlie, Brown)
   David      Lee        (David, Lee)

note

df[['FirstName', 'LastName']]: Selects a DataFrame subset containing only these two columns.
.apply(tuple, axis=1): For each row in this subset, it takes the values and passes them to the tuple() constructor, creating a tuple for that row. axis=1 is crucial for row-wise operation.

Using `DataFrame.itertuples()`

The itertuples() method iterates over DataFrame rows as named tuples (or regular tuples if name=None). We can use it on a subset of columns.

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)

df = df_original.copy()

columns_to_combine = ['FirstName', 'LastName']

# Iterate over the subset as plain tuples (index=False, name=None)
# and collect them into a list
tuples_from_itertuples = list(
    df[columns_to_combine].itertuples(index=False, name=None)
)
print(tuples_from_itertuples)
print()

# ✅ Assign the list of tuples
df['FullNameTuple_itertuples'] = tuples_from_itertuples

print("DataFrame with 'FullNameTuple_itertuples' column:")
print(df[['FirstName', 'LastName', 'FullNameTuple_itertuples']])

Output: (Same as using zip)

[('Alice', 'Smith'), ('Bob', 'Johnson'), ('Charlie', 'Brown'), ('David', 'Lee')]

DataFrame with 'FullNameTuple_itertuples' column:
  FirstName LastName FullNameTuple_itertuples
0     Alice    Smith           (Alice, Smith)
1       Bob  Johnson           (Bob, Johnson)
2   Charlie    Brown         (Charlie, Brown)
3     David      Lee             (David, Lee)

note

index=False: Excludes the DataFrame index from being the first element of each tuple.
name=None: Returns regular tuples instead of namedtuples.
While functional, itertuples() is generally intended for iteration, and using zip() or apply() is often more direct for creating a new column.

Creating a New Column of Lists from Two Columns

Using `DataFrame.values.tolist()` on a Subset (Recommended for Lists)

Select the desired columns, get their NumPy array representation using .values, and then convert this array to a list of lists using .tolist().

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)

df = df_original.copy()

columns_to_combine = ['FirstName', 'Department']

# Select columns, get .values (NumPy array), then convert to list of lists
list_of_lists = df[columns_to_combine].values.tolist()
print(list_of_lists)
print()

# ✅ Assign the list of lists to a new column
df['ContactList_values'] = list_of_lists

print("DataFrame with 'ContactList_values' column (using .values.tolist()):")
print(df[['FirstName', 'Department', 'ContactList_values']])

Output:

[['Alice', 'HR'], ['Bob', 'Engineering'], ['Charlie', 'HR'], ['David', 'Sales']]

DataFrame with 'ContactList_values' column (using .values.tolist()):
  FirstName   Department  ContactList_values
0     Alice           HR         [Alice, HR]
1       Bob  Engineering  [Bob, Engineering]
2   Charlie           HR       [Charlie, HR]
3     David        Sales      [David, Sales]

note

df[columns_to_combine].values: Returns a NumPy array of the data from the selected columns.
.tolist(): Converts the NumPy array into a Python list of lists. This is typically efficient.

Using `DataFrame.to_records()`

The DataFrame.to_records(index=False) method converts the DataFrame (or a subset) to a NumPy record array. When iterated or converted to a list, this yields tuples by default, but these can be converted to lists.

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)

df = df_original.copy()

columns_to_combine = ['FirstName', 'Department']

# Convert selected columns to records (tuples by default when listed)
records_as_tuples = list(df[columns_to_combine].to_records(index=False))
print(records_as_tuples)
print()

# Convert each tuple in the list to a list
list_of_lists_from_records = [list(rec) for rec in records_as_tuples]

# ✅ Assign the list of lists
df['ContactList_records'] = list_of_lists_from_records

print("DataFrame with 'ContactList_records' column (using .to_records()):")
print(df[['FirstName', 'Department', 'ContactList_records']])

Output: (Same as .values.tolist())

[np.record(('Alice', 'HR'), dtype=[('FirstName', 'O'), ('Department', 'O')]), np.record(('Bob', 'Engineering'), dtype=[('FirstName', 'O'), ('Department', 'O')]), np.record(('Charlie', 'HR'), dtype=[('FirstName', 'O'), ('Department', 'O')]), np.record(('David', 'Sales'), dtype=[('FirstName', 'O'), ('Department', 'O')])]

DataFrame with 'ContactList_records' column (using .to_records()):
  FirstName   Department ContactList_records
0     Alice           HR         [Alice, HR]
1       Bob  Engineering  [Bob, Engineering]
2   Charlie           HR       [Charlie, HR]
3     David        Sales      [David, Sales]

note

index=False: Prevents the DataFrame index from being included as the first field in each record.
This is slightly more verbose than .values.tolist() if the goal is a simple list of lists.

Adapting `zip()` with List Comprehension

You can adapt the zip() method to create lists instead of tuples.

import pandas as pd

data = {
    'EmployeeID': [101, 102, 103, 104],
    'FirstName': ['Alice', 'Bob', 'Charlie', 'David'],
    'LastName': ['Smith', 'Johnson', 'Brown', 'Lee'],
    'Department': ['HR', 'Engineering', 'HR', 'Sales'],
    'Salary': [60000, 85000, 62000, 70000]
}
df_original = pd.DataFrame(data)

df = df_original.copy()

# ✅ Use zip with a list comprehension to create lists
df['ContactList_zip_list'] = [list(item) for item in zip(df['FirstName'], df['Department'])]

print("DataFrame with 'ContactList_zip_list' column:")
print(df[['FirstName', 'Department', 'ContactList_zip_list']])

Output: (Same as .values.tolist())

DataFrame with 'ContactList_zip_list' column:
  FirstName   Department ContactList_zip_list
0     Alice           HR          [Alice, HR]
1       Bob  Engineering   [Bob, Engineering]
2   Charlie           HR        [Charlie, HR]
3     David        Sales       [David, Sales]

Choosing the Right Method

For creating TUFLES:
- list(zip(df['col1'], df['col2'])): Highly Pythonic, readable, and generally efficient. Often the best choice.
- df[['col1', 'col2']].apply(tuple, axis=1): Clear and explicit, good if you prefer the apply paradigm. Can be slightly less performant than zip for this simple task.
- list(df[['col1', 'col2']].itertuples(index=False, name=None)): Works, but less direct than zip or apply for column creation.
For creating LISTS (of lists):
- df[['col1', 'col2']].values.tolist(): Very direct and efficient for converting selected columns into a list of lists. Usually the best choice.
- [list(item) for item in zip(df['col1'], df['col2'])]: Adapts zip well if you're already using it for tuples.
- [list(rec) for rec in df[['col1', 'col2']].to_records(index=False)]: More steps involved compared to values.tolist().

Conclusion

Creating a new DataFrame column composed of tuples or lists from two existing columns is a common Pandas operation.

For generating tuples, using list(zip(df['col1'], df['col2'])) is often the most Pythonic and efficient method. df[['col1', 'col2']].apply(tuple, axis=1) is also a clear alternative.
For generating lists (a list of lists in the new column), df[['col1', 'col2']].values.tolist() is usually the most direct and performant approach.

Select the method that you find most readable and that best suits your specific needs and data. All methods shown achieve the goal of combining row-wise data from two columns into a new column of collections.

The Goal: Combining Two Columns into Tuples/Lists Row-wise​

Example DataFrame​

Creating a New Column of Tuples from Two Columns​

Using zip() and list() (Recommended for Tuples)​

Using DataFrame.apply(tuple, axis=1)​

Using DataFrame.itertuples()​

Creating a New Column of Lists from Two Columns​

Using DataFrame.values.tolist() on a Subset (Recommended for Lists)​

Using DataFrame.to_records()​

Adapting zip() with List Comprehension​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: Combining Two Columns into Tuples/Lists Row-wise

Example DataFrame

Creating a New Column of Tuples from Two Columns

Using `zip()` and `list()` (Recommended for Tuples)

Using `DataFrame.apply(tuple, axis=1)`

Using `DataFrame.itertuples()`

Creating a New Column of Lists from Two Columns

Using `DataFrame.values.tolist()` on a Subset (Recommended for Lists)

Using `DataFrame.to_records()`

Adapting `zip()` with List Comprehension

Choosing the Right Method

Conclusion