Skip to main content

How to Remove Non-Numeric Characters from Strings in Python

Extracting only the numeric parts of a string or removing all non-numeric characters is a common task in data cleaning and processing.

This guide explores various methods to achieve this in Python, primarily using regular expressions (re.sub()) and generator expressions with str.join(), and also covers how to selectively keep characters like the decimal point.

Removing All Non-Numeric Characters

These methods remove everything except digits (0-9).

Regular expressions provide a powerful and concise way to remove all non-digit characters:

import re

my_str = 'tu_1torial_2_re_3_ference.com'

# Method 1: Using [^0-9] (match anything NOT a digit 0-9)
result = re.sub(r'[^0-9]', '', my_str)
print(result) # Output: '123'

# Method 2: Using \D (match any non-digit character)
result_alt = re.sub(r'\D', '', my_str)
print(result_alt) # Output: '123'
  • re.sub(pattern, replacement, string) replaces all occurrences of the pattern with the replacement (an empty string '' here) in the string.
  • r'[^0-9]': The pattern matches any character that is not (^) a digit between 0 and 9.
  • r'\D': This is a shorthand equivalent to [^0-9], matching any non-digit character.

Using a Generator Expression and str.join()

This approach builds a new string containing only the digits:

my_str = 'tu_1torial_2_re_3_ference.com'

result = ''.join(char for char in my_str if char.isdigit())
print(result) # Output: '123'
  • The generator expression (char for char in my_str if char.isdigit()) iterates through the string, yielding only the characters for which isdigit() returns True.
  • ''.join(...) concatenates these digits into a new string.

Using a for Loop

A for loop offers a more explicit, step-by-step way:

my_str = 'tu_1torial_2_re_3_ference.com'
result = ''
for char in my_str:
if char.isdigit():
result += char
print(result) # Output: 123

Removing Non-Numeric Characters Except the Decimal Point (.)

These methods keep digits (0-9) and the decimal point character (.).

Modify the regular expression pattern to exclude the dot from the characters being removed:

import re

my_str = 'a3.1b4c'

# Method 1: Explicitly including '.' in the exclusion set
result = re.sub(r'[^0-9.]', '', my_str)
print(result) # Output: '3.14'

# Method 2: Using \d (digit) and escaping the dot
result_alt = re.sub(r'[^\d.]', '', my_str)
print(result_alt) # Output: '3.14'
  • r'[^0-9.]' or r'[^\d.]': These patterns match any character that is not a digit (0-9 or \d) or a literal dot (.). The dot needs to be included inside the negated character set [^...] to be preserved.

Using a Generator Expression and str.join()

Adjust the condition in the generator expression:

my_str = 'a3.1b4c'

# Check if character is a digit OR a dot
result = ''.join(char for char in my_str if char.isdigit() or char == '.')
print(result) # Output: '3.14'

Using a for Loop

Modify the if condition in the loop:

my_str = 'a3.1b4c'
result = ''
for char in my_str:
if char.isdigit() or char == '.':
result += char
print(result) # Output: 3.14

Conclusion

This guide demonstrated several ways to remove non-numeric characters from strings in Python.

  • For flexibility and conciseness, re.sub() with appropriate regular expressions is generally the recommended approach, especially when dealing with more complex patterns or needing to exclude specific characters like the decimal point.
  • Generator expressions offer a good alternative for simpler cases, while for loops provide the most explicit control flow.
  • Choose the method that best suits your specific requirements for clarity and efficiency.