How to Remove `\xa0` (Non-Breaking Spaces) from Strings in Python

The \xa0 character represents a non-breaking space, often encountered when working with text extracted from websites or other sources.

This guide explores various methods to remove \xa0 characters from strings in Python, providing you with several options for cleaning text data.

Removing `\xa0` with `unicodedata.normalize()`

The unicodedata.normalize() method offers a robust way to replace compatibility characters with their canonical equivalents, which can remove non-breaking spaces:

import unicodedata

my_str = 'tutorial\xa0refence'
result = unicodedata.normalize('NFKD', my_str)
print(result)  # Output: tutorial refence

The unicodedata.normalize('NFKD', my_str) replaces the non-breaking space character (\xa0) with a standard space character.
The 'NFKD' option decomposes the character into its base and combining characters.

You can also try NFKC if you get unexpected results:

import unicodedata
my_str = 'tutorial\xa0refence'
result = unicodedata.normalize('NFKC', my_str)
print(result) # Output: tutorial refence

The NFKC option will first apply compatibility decomposition, then canonical decomposition.

Removing `\xa0` with `str.replace()`

The str.replace() method directly replaces all occurrences of a specified substring:

my_str = 'tutorial\xa0refence'
result = my_str.replace('\xa0', ' ')
print(result)  # Output: tutorial refence

The replace('\xa0', ' ') method is a very direct way to substitute the non-breaking space character with a regular space.

Removing `\xa0` with `split()` and `join()`

The str.split() and str.join() methods provide an alternative way to remove \xa0, by splitting the string and then joining it with a whitespace delimiter:

my_str = 'tutorial\xa0refence'
result = ' '.join(my_str.split())
print(result)  # Output: tutorial refence

The split() method with no arguments will split a string by any amount of whitespace characters.
Then we join the parts back together again with a space.

Alternatively, you can also split explicitly by the \xa0 character, which can be more robust if the string contains multiple spaces or newlines:

my_str = 'tutorial\xa0refence'
result = ' '.join(my_str.split('\xa0'))
print(result) # Output: tutorial refence

Removing `\xa0` with `BeautifulSoup`

If you're working with HTML or XML, the BeautifulSoup library is useful to remove the non breaking space.

from bs4 import BeautifulSoup

my_html = 'tutorial\xa0refence'
result = BeautifulSoup(my_html, 'lxml').get_text(strip=True)
print(result)  # Output: tutorial refence

The get_text() method will extract the text from the HTML and also removes leading and trailing spaces using the strip=True option.

note

Make sure that you have beautifulsoup4 and lxml installed. Use pip install lxml beautifulsoup4 or pip3 install lxml beautifulsoup4.

Removing `\xa0` from a List of Strings

To remove \xa0 characters from a list of strings, use a list comprehension with the replace() method:

my_list = ['tutorial\xa0', '\xa0refence']
result = [string.replace('\xa0', ' ') for string in my_list]
print(result)  # Output: ['tutorial ', ' refence']

This code iterates through the list and replaces the \xa0 character with a space character, and creates a new list with the result.

Removing \xa0 with unicodedata.normalize()​

Removing \xa0 with str.replace()​

Removing \xa0 with split() and join()​

Removing \xa0 with BeautifulSoup​

Removing \xa0 from a List of Strings​

Table of Contents

Removing `\xa0` with `unicodedata.normalize()`

Removing `\xa0` with `str.replace()`

Removing `\xa0` with `split()` and `join()`

Removing `\xa0` with `BeautifulSoup`

Removing `\xa0` from a List of Strings