How to Count Unique Words or Unique Characters in a String in Python

Counting the number of unique items (either words or individual characters) within a piece of text is a common task in text processing and data analysis. Python's built-in set data structure, which automatically stores only unique elements, provides a highly efficient way to achieve this.

This guide demonstrates how to count unique words and characters in strings and text files using sets, list comprehensions, and loops.

Count Unique Words in a String

Goal: Find the number of distinct words in a given string.

Using `split()` and `set()` (Recommended)

This is the most concise and Pythonic method.

Split the string into a list of words using str.split().
Convert the list of words into a set to automatically remove duplicates.
Get the length of the set using len().

text = "apple banana apple orange banana apple" # example String
print(f"Original string: '{text}'")

# 1. Split into words
words_list = text.split()
print(f"List of words: {words_list}")

# 2. Convert to set to get unique words
unique_word_set = set(words_list)
print(f"Set of unique words: {unique_word_set}")

# 3. Get the length of the set
count_unique_words = len(unique_word_set)
print(f"Number of unique words: {count_unique_words}")

# --- Condensed Version ---
text = "hello world hello python world"
unique_word_count = len(set(text.split()))
print(f"String: '{text}'")
print(f"Unique word count: {unique_word_count}") 

Output:

Original string: 'apple banana apple orange banana apple'
List of words: ['apple', 'banana', 'apple', 'orange', 'banana', 'apple']
Set of unique words: {'orange', 'apple', 'banana'}
Number of unique words: 3

String: 'hello world hello python world' Unique word count: 3

text.split(): Splits the string by whitespace into a list of words.
set(...): Creates a set from the list, discarding duplicates.
len(...): Returns the number of elements in the set (which is the count of unique words).

Using a `for` Loop

Manually iterate, split, and keep track of words seen so far.

text = "apple banana apple orange banana apple"
words_list = text.split()
unique_words_list = [] # Keep track of unique words found

print(f"Original string: '{text}'")
for word in words_list:
    if word not in unique_words_list: # Check if word is already seen
        unique_words_list.append(word)

count_unique_loop = len(unique_words_list)
print(f"Unique words (loop): {unique_words_list}")
print(f"Unique word count (loop): {count_unique_loop}")

Output:

Original string: 'apple banana apple orange banana apple'
Unique words (loop): ['apple', 'banana', 'orange']
Unique word count (loop): 3

note

This is less efficient than using a set because the word not in unique_words_list check becomes slower as the list grows.

Count Unique Words in a Text File

Goal: Find the number of distinct words in an entire text file.

Example sample.txt file:

this is line one
this is line two
line three has more words

Using `read()`, `split()`, and `set()` (Recommended)

Read the whole file, split into words, and use a set.

import os

# Create dummy file for example
filename = "sample.txt"
with open(filename, "w", encoding="utf-8") as f:
    f.write("this is line one\nthis is line two\nline three has more words")

unique_word_count_file = 0
unique_words_in_file = set()

try:
    with open(filename, 'r', encoding='utf-8') as f:
        # 1. Read entire file content
        file_content = f.read()

        # 2. Split content into list of words
        all_words = file_content.split()
        print(f"Words read from '{filename}':\n{all_words}")

        # 3. Convert to set for unique words
        unique_words_in_file = set(all_words)

        # 4. Get the count
        unique_word_count_file = len(unique_words_in_file)

    print(f"\nUnique words in file: {unique_words_in_file}")
    print(f"Count of unique words in file: {unique_word_count_file}") 

except FileNotFoundError:
    print(f"Error: File '{filename}' not found.")
except Exception as e:
    print(f"An error occurred: {e}")
finally:
     if os.path.exists(filename): os.remove(filename)

Output:

Words read from 'sample.txt':
['this', 'is', 'line', 'one', 'this', 'is', 'line', 'two', 'line', 'three', 'has', 'more', 'words']

Unique words in file: {'more', 'one', 'three', 'two', 'has', 'this', 'is', 'words', 'line'}
Count of unique words in file: 9

f.read(): Reads the entire file content into a single string.
.split(): Splits that string into a list of words based on whitespace.
set() and len() work as before.

note

For very large files, reading the entire content at once might consume too much memory. You might need to process the file line by line or in chunks.

Using a `for` Loop

Read the file, split, and manually track unique words.

import os

filename = "sample.txt"
with open(filename, "w", encoding="utf-8") as f: f.write("this is line one\nthis is line two")

unique_words_file_loop = []
try:
    with open(filename, 'r', encoding='utf-8') as f:
        file_content = f.read()
        all_words = file_content.split()
        print(f"Words from file (loop): {all_words}")

        for word in all_words:
            if word not in unique_words_file_loop:
                unique_words_file_loop.append(word)

    print(f"Unique words (loop): {unique_words_file_loop}")
    print(f"Unique word count (loop): {len(unique_words_file_loop)}")
except Exception as e:
    print(f"An error occurred: {e}")
finally:
     if os.path.exists(filename): os.remove(filename)

Output:

Words from file (loop): ['this', 'is', 'line', 'one', 'this', 'is', 'line', 'two']
Unique words (loop): ['this', 'is', 'line', 'one', 'two']
Unique word count (loop): 5

Count Unique Characters in a String

Goal: Find the number of distinct characters (letters, numbers, symbols, whitespace) in a string.

Using `set()` (Recommended)

The most direct way. Passing a string directly to set() treats each character as an element.

text = "programming"
print(f"Original string: '{text}'")

# Convert string directly to a set of characters
unique_char_set = set(text)
print(f"Set of unique characters: {unique_char_set}")

# Get the length of the set
unique_char_count = len(unique_char_set)
print(f"Number of unique characters: {unique_char_count}")

Output:

Original string: 'programming'
Set of unique characters: {'i', 'g', 'o', 'n', 'm', 'a', 'r', 'p'}
Number of unique characters: 8

Using `dict.fromkeys()`

Create dictionary keys from the characters (duplicates are automatically removed), then count the keys.

text = "programming"
print(f"Original string: '{text}'")

# Create a dictionary where keys are unique characters
unique_char_dict = dict.fromkeys(text)
print(f"Dict from keys: {unique_char_dict}")

# Count the number of keys
unique_char_count_dict = len(unique_char_dict)
print(f"Unique char count (dict): {unique_char_count_dict}")

Output:

Original string: 'programming'
Dict from keys: {'p': None, 'r': None, 'o': None, 'g': None, 'a': None, 'm': None, 'i': None, 'n': None}
Unique char count (dict): 8

While this works, using set() is generally considered more direct for finding unique elements.

Using a `for` Loop

Manually iterate through characters and track unique ones seen.

text = "programming"
unique_chars_list = []

print(f"Original string: '{text}'")
for char in text:
    if char not in unique_chars_list:
        unique_chars_list.append(char)

count_unique_chars_loop = len(unique_chars_list)
print(f"Unique chars (loop): {unique_chars_list}")
print(f"Unique char count (loop): {count_unique_chars_loop}")

Output:

Original string: 'programming'
Unique chars (loop): ['p', 'r', 'o', 'g', 'a', 'm', 'i', 'n']
Unique char count (loop): 8

Again, this is less efficient than using a set.

Getting the Unique Items (Not Just the Count)

All the methods above that create a set or a list of unique items (unique_word_set, unique_words_list, unique_char_set, unique_chars_list) already give you the unique items themselves.

Using set(iterable) is fastest for uniqueness but loses original order.
Using dict.fromkeys(iterable).keys() (Python 3.7+) preserves insertion order.
Using the for loop method preserves the order of first appearance.

To get a unique list while preserving order (first appearance), the for loop method or more advanced techniques involving dictionaries (as ordered sets in Python 3.7+) can be used:

text = "apple banana apple orange banana apple"
words = text.split()

# Preserving order using dict keys (Python 3.7+)
unique_ordered_words = list(dict.fromkeys(words))
print(f"Unique words preserving order: {unique_ordered_words}")

Output:

Unique words preserving order: ['apple', 'banana', 'orange']

Case Sensitivity and Punctuation Considerations

Case Sensitivity: The methods shown are case-sensitive ('Apple' and 'apple' are different). To count unique words ignoring case, convert the string or words to lowercase first: len(set(text.lower().split())).
Punctuation: split() separates by whitespace. Punctuation attached to words (e.g., "word,", "word.") will be treated as part of the word. To handle punctuation separately, you might need regular expressions (re.findall(r'\b\w+\b', text)) or more advanced string cleaning before splitting and counting.

Conclusion

Counting unique words or characters in Python is efficiently done using the set data type.

For unique words in a string: Use len(set(my_string.split())).
For unique words in a file: Use len(set(file_content.split())) after reading the file (f.read()).
For unique characters in a string: Use len(set(my_string)).

Remember that set removes duplicates automatically. Use .lower() before creating the set for case-insensitive counting. If order preservation is needed, consider using dict.fromkeys() (Python 3.7+) or a manual loop approach.

Count Unique Words in a String​

Using split() and set() (Recommended)​

Using a for Loop​

Count Unique Words in a Text File​

Using read(), split(), and set() (Recommended)​

Using a for Loop​

Count Unique Characters in a String​

Using set() (Recommended)​

Using dict.fromkeys()​

Using a for Loop​

Getting the Unique Items (Not Just the Count)​

Case Sensitivity and Punctuation Considerations​

Conclusion​

Table of Contents

Count Unique Words in a String

Using `split()` and `set()` (Recommended)

Using a `for` Loop

Count Unique Words in a Text File

Using `read()`, `split()`, and `set()` (Recommended)

Using a `for` Loop

Count Unique Characters in a String

Using `set()` (Recommended)

Using `dict.fromkeys()`

Using a `for` Loop

Getting the Unique Items (Not Just the Count)

Case Sensitivity and Punctuation Considerations

Conclusion