How to Remove Duplicate Dictionaries from a List in Python

This guide explains how to remove duplicate dictionaries from a list in Python, based on the uniqueness of a specific key (like an id). We'll explore using dictionary comprehensions (most efficient), for loops, enumerate, and, briefly, Pandas (for more complex scenarios).

Removing Duplicates Based on a Key (Recommended)

Most often, you will want to remove duplicates based on a specific key within the dictionaries (e.g., an ID, a username, etc.). This assumes that this key should be unique.

Using Dictionary Comprehension

This is the most concise and efficient way to achieve this. It leverages the fact that dictionary keys must be unique:

list_of_dictionaries = [
    {'id': 1, 'site': 'tutorialreference.com'},
    {'id': 2, 'site': 'google.com'},
    {'id': 1, 'site': 'tutorialreference.com'},  # Duplicate ID
]

# Use a dictionary comprehension to remove duplicates based on 'id'
unique_dicts = list({d['id']: d for d in list_of_dictionaries}.values())
print(unique_dicts)

Output:

[{'id': 1, 'site': 'tutorialreference.com'}, {'id': 2, 'site': 'google.com'}]

{d['id']: d for d in list_of_dictionaries}: This is a dictionary comprehension. It builds a new dictionary.
- d['id']: The key of the new dictionary is the 'id' value from each dictionary in the original list.
- d: The value of the new dictionary is the entire dictionary d from the original list.
- Key Uniqueness: Because dictionary keys must be unique, if two dictionaries have the same 'id', the later one in the list will overwrite the earlier one in the new dictionary. This is how we achieve deduplication.
.values(): We extract the values of this new dictionary (which are the unique dictionaries).
list(...): We convert the result back into a list.

Using a `for` Loop

Here's how to do it with a for loop, which is more verbose but perhaps easier to understand for beginners:

list_of_dictionaries = [
    {'id': 1, 'site': 'tutorialreference.com'},
    {'id': 2, 'site': 'google.com'},
    {'id': 1, 'site': 'tutorialreference.com'},
]

new_list = []
seen_ids = set() # Using set for efficiency.

for dictionary in list_of_dictionaries:
    if dictionary['id'] not in seen_ids:
        new_list.append(dictionary) # Adds to list
        seen_ids.add(dictionary['id']) # Adds id

print(new_list)

Output:

[{'id': 1, 'site': 'tutorialreference.com'}, {'id': 2, 'site': 'google.com'}]

seen_ids = set(): We use a set to keep track of the id values we've already seen. Checking for membership in a set (in seen_ids) is very fast (O(1) on average).
We only add the dictionaries that have a new id.

Using `enumerate` (Less Efficient, More Complex)

Using enumerate to check for duplicates is generally not recommended because it's less efficient and more complex than the other methods:

list_of_dictionaries = [
    {'id': 1, 'site': 'tutorialreference.com'},
    {'id': 2, 'site': 'google.com'},
    {'id': 1, 'site': 'tutorialreference.com'},
]

new_list = [
    dictionary for index, dictionary in enumerate(list_of_dictionaries)
    if dictionary not in list_of_dictionaries[index + 1:]
]
print(new_list)

Output:

[{'id': 2, 'site': 'google.com'}, {'id': 1, 'site': 'tutorialreference.com'}]

This approach compares each dictionary to all subsequent dictionaries in the list, making it less efficient (O(n^2) in the worst case) compared to the set-based or dictionary comprehension methods (which are closer to O(n)).

Removing Duplicates Based on Entire Dictionary (Less Common)

If you want to remove dictionaries that are completely identical (all key-value pairs match), you can't directly use a set because dictionaries are not hashable. However, you can use the in operator with a list:

list_of_dictionaries = [
    {'id': 1, 'site': 'tutorialreference.com'},
    {'id': 2, 'site': 'google.com'},
    {'id': 1, 'site': 'tutorialreference.com'},
]
new_list = []
for dictionary in list_of_dictionaries:
    if dictionary not in new_list:
        new_list.append(dictionary)
print(new_list)

You create a new list and iterate over the original list, adding the dictionaries that have not been added before.

Using Pandas `drop_duplicates()` (for DataFrames)

If you're working with tabular data, Pandas DataFrames offer a very convenient drop_duplicates() method:

import pandas as pd

list_of_dictionaries = [
    {'id': 1, 'site': 'tutorialreference.com'},
    {'id': 2, 'site': 'google.com'},
    {'id': 1, 'site': 'tutorialreference.com'},
]

new_list = pd.DataFrame(list_of_dictionaries).drop_duplicates().to_dict('records')
print(new_list)

Output:

[{'id': 1, 'site': 'tutorialreference.com'}, {'id': 2, 'site': 'google.com'}]

pd.DataFrame(list_of_dictionaries): Creates a DataFrame from your list of dictionaries.
.drop_duplicates(): Removes duplicate rows. By default, it considers all columns for duplication. You can specify a subset of columns using the subset parameter (e.g., drop_duplicates(subset=['id'])).
.to_dict('records'): Converts the DataFrame back into a list of dictionaries. The 'records' argument is important; it gives you the desired format.

Removing Duplicates Based on a Key (Recommended)​

Using Dictionary Comprehension​

Using a for Loop​

Using enumerate (Less Efficient, More Complex)​

Removing Duplicates Based on Entire Dictionary (Less Common)​

Using Pandas drop_duplicates() (for DataFrames)​

Table of Contents

Removing Duplicates Based on a Key (Recommended)

Using Dictionary Comprehension

Using a `for` Loop

Using `enumerate` (Less Efficient, More Complex)

Removing Duplicates Based on Entire Dictionary (Less Common)

Using Pandas `drop_duplicates()` (for DataFrames)