Skip to main content

How to Combine Lists and Removing Duplicates in Python

This guide explains how to combine two or more lists in Python while simultaneously removing duplicate elements. We'll cover the most efficient and Pythonic approach using sets, and also discuss alternative methods using list comprehensions, for loops, and NumPy (for numerical data).

Sets, by definition, only store unique elements. This makes them the ideal tool for combining lists and removing duplicates:

list1 = [1, 4, 6, 9]
list2 = [4, 6, 14, 7]

combined_list = list(set(list1 + list2))
print(combined_list) # Output: [1, 4, 6, 7, 9, 14] (order may vary)
  • list1 + list2: This concatenates the two lists, creating a new list with all elements (including duplicates).
  • set(...): This converts the combined list to a set, automatically removing duplicates. Sets are unordered, so the order of elements is not guaranteed.
  • list(...): This converts the set back into a list. If you need the result to be a list, this conversion is necessary. If you don't need a list specifically, you can work directly with the set.

This is the most efficient and Pythonic way to combine lists and eliminate duplicates.

If you need to preserve the original order of the elements (as much as possible while still removing duplicates), you can use a more complex approach with dict.fromkeys() which can provide a unique and ordered result, but it is not as efficient as using sets:

from collections import OrderedDict

list1 = [1, 4, 6, 9]
list2 = [4, 6, 14, 7]
combined_list = list(OrderedDict.fromkeys(list1 + list2))

print(combined_list) # Output: [1, 4, 6, 9, 14, 7]
  • The code creates an OrderedDict to preserve the insertion order while removing duplicate values.
  • The list() constructor is used to cast the dict_keys object into a list.

Combining Lists and Removing Duplicates with NumPy (for Numerical Data)

If you're working with numerical data and have NumPy installed, np.unique() provides an efficient way to combine and deduplicate:

import numpy as np

list1 = [1, 4, 6, 9]
list2 = [4, 6, 14, 7]

result = np.unique(list1 + list2).tolist() # Combine and get unique values
print(result) # Output: [1, 4, 6, 7, 9, 14]
  • list1 + list2: Concatenates the lists (just like before).
  • np.unique(...): Finds the unique elements and sorts them. The result is a NumPy array.
  • .tolist(): Converts the NumPy array back to a standard Python list.
note

np.unique() sorts the result. If you need to maintain the original order (as much as possible), the set-based approach from previous section is preferable.

Combining Lists and Removing Duplicates with list.extend() (Less Efficient)

You can use list.extend() and a loop to achieve this, but it's significantly less efficient and less readable than the set-based approach:

list1 = [1, 4, 6, 9]
list2 = [4, 6, 14, 7]

result = list1.copy() # Start with a *copy* of list1

for item in list2:
if item not in result:
result.extend([item]) # Use extend, not result.extend(item)

print(result) # Output: [1, 4, 6, 9, 14, 7]
  • result = list1.copy(): We create a copy of list1. This is important because we don't want to modify list1 itself.
  • The loop checks for membership (item not in result) for every element. This becomes very slow for large lists. Avoid this approach unless you have a very specific reason.
  • You also need to use result.extend([item]) which appends the elements from the list (in this case a list with a single item), otherwise you would add another nested list to the result.