How to Remove Non-ASCII Characters from Strings in Python

This guide explains how to remove non-ASCII characters from a string in Python. Non-ASCII characters are those outside the standard ASCII range (0-127). We'll cover two primary methods: using str.encode() with error handling and using string.printable with filtering.

Removing Non-ASCII Characters with `encode()` and `decode()` (Recommended)

The most straightforward and generally reliable method is to encode the string to ASCII, ignoring any characters that can't be encoded, and then decode it back to a string:

def remove_non_ascii(string):
    return string.encode('ascii', errors='ignore').decode('ascii')

print(remove_non_ascii('a€bñcá'))  # Output: abc
print(remove_non_ascii('a_b^0'))   # Output: a_b^0

string.encode('ascii', errors='ignore'): This attempts to encode the string using the ASCII encoding. The errors='ignore' part is crucial: it tells the encoder to skip any characters that can't be represented in ASCII (instead of raising an error). The result is a bytes object.
.decode('ascii'): This decodes the resulting bytes object back into a string, using ASCII. Since we've already removed the non-ASCII characters during encoding, this decoding step is safe.

This method is efficient and clearly expresses the intent: remove anything that's not ASCII.

Removing Non-ASCII Characters using `string.printable`

The string.printable constant (from the built-in string module) contains all printable ASCII characters. We can use this to filter a string, keeping only those characters that are in this set:

import string

def remove_non_ascii(a_str):
    ascii_chars = set(string.printable) # Using a set is important

    return ''.join(
        filter(lambda x: x in ascii_chars, a_str)
    )
print(remove_non_ascii('a€bñcá'))  # Output: abc
print(remove_non_ascii('a_b^0'))   # Output: a_b^0

Using string.printable explicitly includes printable characters such as whitespace, punctuation and digits.

Removing Non-ASCII Characters using `ord()`

You can also iterate over the string and check if each char is a number using the ord() method, and add the valid chars to a list:

def remove_non_ascii(string):
    return ''.join(char for char in string if ord(char) < 128)
print(remove_non_ascii('a€bñcá')) # Output: abc
print(remove_non_ascii('a_b^0'))  # Output: a_b^0

This creates a generator object that iterates through all characters and only returns those with unicode code points less than 128.

How to Remove Non-ASCII Characters from Strings in Python

Removing Non-ASCII Characters with encode() and decode() (Recommended)​

Removing Non-ASCII Characters using string.printable​

Removing Non-ASCII Characters using ord()​

Table of Contents

Removing Non-ASCII Characters with `encode()` and `decode()` (Recommended)

Removing Non-ASCII Characters using `string.printable`

Removing Non-ASCII Characters using `ord()`