Skip to main content

How to Remove Non-Alphanumeric and Non-Alphabetic Characters from Strings in Python

This guide explains how to remove unwanted characters from strings in Python, specifically:

  • Non-alphanumeric characters (everything except letters and numbers)
  • Non-alphabetic characters (everything except letters)

We'll cover removing these characters while optionally preserving whitespace. We'll primarily use regular expressions (re module) for the most efficient and flexible solutions, and also demonstrate generator expressions and filter() for comparison.

Removing Non-Alphanumeric Characters with re.sub()

The re.sub() function (from the re module for regular expressions) is the most powerful and flexible way to remove unwanted characters. It allows you to define exactly what you want to keep or remove.

import re

my_str = 'tutorial !reference@ com 123'

# Remove all non-alphanumeric characters
new_str = re.sub(r'[\W_]', '', my_str) # \W matches non-word chars, _ is also removed
print(new_str) # Output: tutorialreferencecom123
  • import re: Imports the regular expression module.
  • r'[\W_]': This is the regular expression pattern.
    • []: Defines a character set.
    • \W: Matches any character that is not a "word character" (alphanumeric + underscore).
    • _: Matches underscore.
    • [\W_]: Therefore, the [\W_] set includes the non-alphanumeric characters, and also underscore.
  • re.sub(pattern, replacement, string): This function substitutes all occurrences of the pattern in the string with the replacement.
  • '' (empty string): We're replacing all non-alphanumeric characters with nothing, effectively deleting them.

Preserving Whitespace

To keep spaces while removing other non-alphanumeric characters:

import re

my_str = 'tutorial !reference@ com 123'
new_str = re.sub(r'[^\w\s]', '', my_str) # Keep word chars (\w) and spaces (\s)
print(new_str) # Output: tutorial reference com 123
  • r'[^\w\s]':

    • ^ inside the []: Negates the character set (matches anything not in the set).
    • \w: Matches any "word character" (letters, numbers, and underscore).
    • \s: Matches any whitespace character (space, tab, newline, etc.).
    • [^\w\s]: Therefore this matches any character that is NOT a word character AND NOT whitespace.
  • If there are multiple spaces in the string, and you want to reduce them to a single space:

    import re
    my_str = 'tutorial !reference@ com 123'
    new_str = re.sub(r'[^\w\s]', '', my_str)
    result = " ".join(new_str.split()) # Remove extra spaces.
    print(result) # Output: tutorial reference com 123
    • First the re.sub removes all non-alphanumeric and non-whitespace characters.
    • Then split() with no arguments splits the string into words, removing all extra whitespace.
    • Then the ' '.join(...) joins those parts again using a single space.

Removing Non-Alphabetic Characters with re.sub()

To remove non-alphabetic characters (keeping only letters):

import re

my_str = 'tutorial! reference@ com'

# Remove all non-alphabetic characters:
new_str = re.sub(r'[^a-zA-Z]', '', my_str)
print(new_str) # Output: tutorialreferencecom

# Remove all non-alphabetic characters, preserving whitespace:
new_str = re.sub(r'[^a-zA-Z\s]', '', my_str)
print(new_str) # Output: tutorial reference com
  • r'[^a-zA-Z]': Matches any character that is not a lowercase or uppercase English letter.
  • r'[^a-zA-Z\s]': Matches any character that is not a lowercase or uppercase English letter or whitespace.

Removing Non-Alphanumeric or Non-Alphabetic Characters with Generator Expressions (Less Efficient)

You can use generator expressions and str.join(), but this is generally less efficient than regular expressions, especially for longer strings:

my_str = 'tutorial !reference@ com 123'

# Remove non-alphanumeric (less efficient)
new_str = ''.join(char for char in my_str if char.isalnum())
print(new_str) # Output: tutorialreferencecom123

# Remove non-alphanumeric, preserve spaces (less efficient)
new_str = ''.join(char for char in my_str if char.isalnum() or char == ' ')
print(new_str) # Output: tutorial reference com 123
  • The generator expressions are used to loop over the string and return a sequence with the desired characters.

The same can be done for removing non-alphabetic characters:

my_str = 'tutorial! reference@ com'

# Remove non-alphabetic characters
new_str = ''.join(
char for char in my_str
if char.isalpha()
)
print(new_str)

# Remove non-alphabetic characters, preserving whitespace
new_str = ''.join(
char for char in my_str
if char.isalpha() or char == ' '
)
print(new_str)

Removing Non-Alphanumeric or Non-Alphabetic Characters with filter() (Less Efficient)

The filter function can also be used to filter out non-alphanumeric, or non-alphabetic characters.

my_str = 'tutorial !reference@ com 123'

new_str = ''.join(filter(str.isalnum, my_str))
print(new_str) # Output: tutorialreferencecom123
  • The filter() function filters the elements based on the result of the str.isalnum called on each character.