Skip to main content

How to Remove Non-ASCII Characters from Strings in Python

This guide explains how to remove non-ASCII characters from a string in Python. Non-ASCII characters are those outside the standard ASCII range (0-127). We'll cover two primary methods: using str.encode() with error handling and using string.printable with filtering.

The most straightforward and generally reliable method is to encode the string to ASCII, ignoring any characters that can't be encoded, and then decode it back to a string:

def remove_non_ascii(string):
return string.encode('ascii', errors='ignore').decode('ascii')

print(remove_non_ascii('a€bñcá')) # Output: abc
print(remove_non_ascii('a_b^0')) # Output: a_b^0
  • string.encode('ascii', errors='ignore'): This attempts to encode the string using the ASCII encoding. The errors='ignore' part is crucial: it tells the encoder to skip any characters that can't be represented in ASCII (instead of raising an error). The result is a bytes object.
  • .decode('ascii'): This decodes the resulting bytes object back into a string, using ASCII. Since we've already removed the non-ASCII characters during encoding, this decoding step is safe.

This method is efficient and clearly expresses the intent: remove anything that's not ASCII.

Removing Non-ASCII Characters using string.printable

The string.printable constant (from the built-in string module) contains all printable ASCII characters. We can use this to filter a string, keeping only those characters that are in this set:

import string

def remove_non_ascii(a_str):
ascii_chars = set(string.printable) # Using a set is important

return ''.join(
filter(lambda x: x in ascii_chars, a_str)
)
print(remove_non_ascii('a€bñcá')) # Output: abc
print(remove_non_ascii('a_b^0')) # Output: a_b^0

  • Using string.printable explicitly includes printable characters such as whitespace, punctuation and digits.

Removing Non-ASCII Characters using ord()

You can also iterate over the string and check if each char is a number using the ord() method, and add the valid chars to a list:

def remove_non_ascii(string):
return ''.join(char for char in string if ord(char) < 128)
print(remove_non_ascii('a€bñcá')) # Output: abc
print(remove_non_ascii('a_b^0')) # Output: a_b^0
  • This creates a generator object that iterates through all characters and only returns those with unicode code points less than 128.