Skip to main content

How to Remove Characters Matching a Regex from Strings in Python

This guide explores how to remove specific characters or patterns from a string in Python using regular expressions. We'll primarily focus on the powerful re.sub() method and also demonstrate an alternative approach using generator expressions and str.join() for comparison.

The re.sub() function (from the re module) is the standard and most flexible way to remove characters that match a regular expression pattern:

import re

my_str = '!tutorial @reference #com $abc'

# Remove !, @, #, and $
result = re.sub(r'[!@#$]', '', my_str)
print(result) # Output: 'tutorial reference com abc'
  • re.sub(pattern, replacement, string) searches for the pattern in the string and replaces all occurrences with the replacement.
  • Here, r'[!@#$]' is the pattern:
    • [...]: Defines a character set. Matches any single character listed inside.
    • !@#$: The specific characters to match.
  • '' (empty string) is the replacement, effectively removing the matched characters.

Removing Specific Characters

Place the specific characters you want to remove inside the square brackets ([]):

import re
my_str = '1tutorial, 2reference, 3com'

# Remove digits 0-9
result = re.sub(r'[0-9]', '', my_str)
print(result) # Output: 'tutorial, reference, com'

# Remove letters a-z and A-Z
result = re.sub(r'[a-zA-Z]', '', my_str)
print(result) # Output: '1, 2, 3'

Removing Characters Not Matching a Set

To remove all characters except a specific set, use the caret (^) at the beginning of the character set:

import re
my_str = '!tutorial @reference #com $abc'
# Remove everything EXCEPT !, @, #, $
result = re.sub(r'[^!@#$]', '', my_str)
print(result) # Output: '!@#$'
  • [^...] means "match any character not inside the brackets".

Removing Specific Characters with a Generator Expression (Alternative)

For removing a simple set of characters, you can use a generator expression with str.join(). This avoids regular expressions but might be less efficient for complex patterns or very large strings:

my_str = '!tutorial @reference #com $abc'
characters_to_remove = '!@#$'

result = ''.join(
char for char in my_str
if char not in characters_to_remove
)

print(result) # Output: 'tutorial reference com abc'
  • The generator expression (char for char in my_str if char not in characters_to_remove) iterates through the string.
  • It keeps only the characters that are not in characters_to_remove.
  • ''.join(...) concatenates the kept characters back into a single string.