Skip to main content

Python Pandas: How to Create a Set from a Series (Get Unique Values)

A Python set is an unordered collection of unique elements. Converting a Pandas Series to a set is a common operation when you need to obtain all the distinct values present in that Series, effectively removing any duplicates. This is useful for membership testing, set operations (union, intersection), or simply getting a unique list of items.

This guide explains several methods to create a Python set from a Pandas Series, primarily using the set() constructor and the Series.unique() method.

The Goal: Extracting Unique Values into a Set

Given a Pandas Series, which may contain duplicate values, we want to create a Python set object that contains only the unique values from that Series. Sets inherently store only distinct items, so any duplicates in the Series will be automatically handled during the conversion.

Example Pandas Series

import pandas as pd

# Series with duplicate numeric values
s_numeric = pd.Series([10, 20, 10, 30, 20, 20, 40, 50, 10])
print("Original Numeric Series (s_numeric):")
print(s_numeric)
print()

# Series with duplicate string values
s_string = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])
print("Original String Series (s_string):")
print(s_string)

Output:

Original Numeric Series (s_numeric):
0 10
1 20
2 10
3 30
4 20
5 20
6 40
7 50
8 10
dtype: int64

Original String Series (s_string):
0 apple
1 banana
2 apple
3 orange
4 banana
5 banana
dtype: object

The most straightforward and Pythonic way to create a set from a Pandas Series is to pass the Series directly to the built-in set() constructor. Pandas Series are iterable, so set() can consume them.

import pandas as pd

s_numeric = pd.Series([10, 20, 10, 30, 20, 20, 40, 50, 10])
s_string = pd.Series(['apple', 'banana', 'apple', 'orange', 'banana', 'banana'])

# ✅ Convert numeric Series directly to a set
set_from_numeric = set(s_numeric)
print(f"Set from numeric Series: {set_from_numeric}")
print(f"Type: {type(set_from_numeric)}")

# ✅ Convert string Series directly to a set
set_from_string = set(s_string)
print(f"Set from string Series: {set_from_string}")

Output:

Set from numeric Series: {40, 10, 50, 20, 30}
Type: <class 'set'>
Set from string Series: {'orange', 'banana', 'apple'}
note
  • Sets are unordered collections, so the order of elements in the resulting set might not match their first appearance in the Series.
  • This method is generally efficient and clear.

Method 2: Using Series.unique() then set()

The Series.unique() method returns a NumPy array containing only the unique values from the Series (in order of first appearance). You can then convert this NumPy array to a set.

import pandas as pd

s_numeric = pd.Series([10, 20, 10, 30, 20, 20, 40, 50, 10])

# Step 1: Get unique values as a NumPy array
unique_values_array = s_numeric.unique()
print(f"Unique values from Series (NumPy array): {unique_values_array}")
print(f"Type of unique_values_array: {type(unique_values_array)}")

# Step 2: Convert the NumPy array of unique values to a set
set_from_unique_array = set(unique_values_array)
print(f"Set from unique NumPy array: {set_from_unique_array}")

Output:

Unique values from Series (NumPy array): [10 20 30 40 50]
Type of unique_values_array: <class 'numpy.ndarray'>
Set from unique NumPy array: {np.int64(40), np.int64(10), np.int64(50), np.int64(20), np.int64(30)}

This two-step process also works correctly. Series.unique() itself is an efficient way to get unique elements.

Converting a DataFrame Column (Series) to a Set

If your Series is a column within a DataFrame, you first select that column and then apply one of the methods above.

import pandas as pd

data = {
'Product_ID': ['A1', 'B2', 'A1', 'C3', 'B2', 'A1'],
'Category': ['Elec', 'Book', 'Elec', 'Home', 'Book', 'Elec'],
'Supplier': ['S_X', 'S_Y', 'S_X', 'S_Z', 'S_Y', 'S_X']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
print()

# ✅ Convert the 'Category' column (a Series) to a set
category_set = set(df['Category'])
print(f"Set of unique categories: {category_set}\n")

# ✅ Convert the 'Supplier' column using .unique() then set()
supplier_unique_array = df['Supplier'].unique()
supplier_set = set(supplier_unique_array)
print(f"Set of unique suppliers: {supplier_set}")

Output:

Original DataFrame:
Product_ID Category Supplier
0 A1 Elec S_X
1 B2 Book S_Y
2 A1 Elec S_X
3 C3 Home S_Z
4 B2 Book S_Y
5 A1 Elec S_X

Set of unique categories: {'Elec', 'Book', 'Home'}

Set of unique suppliers: {'S_X', 'S_Y', 'S_Z'}

Method 3: Using a for Loop (Less Pythonic for this Task)

While possible, using a for loop to manually build a set is less idiomatic and generally less efficient in Python/Pandas for this specific task compared to the direct set() constructor or Series.unique().

import pandas as pd

s_numeric = pd.Series([10, 20, 10, 30, 20, 20, 40, 50, 10])

# Manually build a set using a for loop
set_from_loop = set()
for element in s_numeric: # Iterating directly over the Series
set_from_loop.add(element)

print(f"Set built using a for loop: {set_from_loop}")

Output:

Set built using a for loop: {40, 10, 50, 20, 30}

The set data structure itself ensures uniqueness when using add(), so iterating over s_numeric.unique() first is unnecessary if the goal is just to populate a set.

note

Using .unique() in the loop is redundant if adding to a set, as set.add() already handles uniqueness.

import pandas as pd

s_numeric = pd.Series([10, 20, 10, 30, 20, 20, 40, 50, 10])

set_from_loop_unique_iter = set()
for element in s_numeric.unique():
set_from_loop_unique_iter.add(element)
print(f"Set built using loop over .unique(): {set_from_loop_unique_iter}")

Output:

Set built using loop over .unique(): {np.int64(40), np.int64(10), np.int64(50), np.int64(20), np.int64(30)}

Performance Note for Large Series

  • set(my_series): This is generally quite efficient. Python's set() constructor is optimized for building sets from iterables.
  • set(my_series.unique()): This involves two steps: Pandas creating a NumPy array of unique items, then Python creating a set from that array. For very large Series, Series.unique() is highly optimized. The overhead of the second conversion to a set is usually minimal. Benchmarking on your specific data might be needed if extreme performance is critical, but often the difference is negligible, and set(my_series) is very readable.

In many practical scenarios, the direct set(my_series) is preferred for its simplicity and good performance.

Conclusion

Creating a Python set from a Pandas Series is a straightforward way to get all the unique values from that Series.

  • The most direct and generally recommended method is to pass the Series directly to the set() constructor:
    my_set = set(your_pandas_series)
  • An alternative is to first get a NumPy array of unique values using your_pandas_series.unique() and then convert that array to a set:
    my_set = set(your_pandas_series.unique())
  • Using a for loop to manually add elements to a set is also possible but is less concise and often less performant than the built-in methods.

Both primary methods effectively remove duplicates and provide you with an unordered collection of the distinct elements from your Pandas Series.