How to Remove Spaces, Tabs, and Newlines from Strings in Python
This guide explains how to remove unwanted whitespace (spaces, tabs, newlines) from strings in Python. We'll cover:
- Removing all whitespace (including internal spaces, tabs, and newlines).
- Removing only leading and trailing whitespace.
- Splitting a string by tabs (or other whitespace).
- Using regular expressions for more complex whitespace handling.
Removing All Whitespace
To remove all whitespace characters (spaces, tabs, newlines, etc.) from a string, the most efficient and readable method is to combine split()
and join()
:
Using split()
and join()
(Recommended)
my_str = ' tutorial reference '
result = ''.join(my_str.split())
print(result) # Output: tutorialreference
my_str.split()
: When called with no arguments,split()
does two things:- It removes leading and trailing whitespace.
- It splits the string into a list of words, using any sequence of whitespace characters (spaces, tabs, newlines) as the delimiter.
''.join(...)
: This joins the resulting list of words back into a single string, using an empty string (''
) as the separator. This effectively removes all whitespace.
This approach is concise, handles all whitespace characters, and is generally faster than using regular expressions for this specific task.
Using re.sub()
For more complex whitespace removal (e.g., removing specific whitespace characters but not others), you can use the re.sub()
function from the re
(regular expressions) module:
import re
my_str = ' tutorial reference '
result = re.sub(r'\s+', '', my_str) # Replace all whitespace with empty string.
print(result) # Output: tutorialreference
re.sub(pattern, replacement, string)
: Replaces all occurrences of thepattern
instring
withreplacement
.r'\s+'
: This regular expression matches one or more whitespace characters (\s
matches any whitespace character, and+
means "one or more").
Removing Leading/Trailing Whitespace with strip()
, lstrip()
, and rstrip()
If you only want to remove whitespace from the beginning and end of a string (but keep internal spaces), use strip()
:
my_str = ' tutorial reference '
result = my_str.strip()
print(result) # Output: tutorial reference
my_str.strip()
: Removes leading and trailing whitespace (spaces, tabs, newlines).my_str.lstrip()
: Removes only leading whitespace.my_str.rstrip()
: Removes only trailing whitespace.
my_str = '\ttutorial\treference\t' # Example with tabs
result = my_str.strip() # Strips leading/trailing whitespace (including tabs)
print(repr(result)) # Output: 'tutorial\treference' (inner tabs preserved)
my_str = ' tutorial reference '
result = my_str.lstrip() # Left strip
print(repr(result)) # Output: 'tutorial reference '
result = my_str.rstrip() # Right strip
print(repr(result)) # Output: ' tutorial reference'
Splitting a String by Tabs
To split a string into a list of substrings based on tab characters (\t
):
Using split()
my_str = 'tutorial\treference\tcom'
my_list = my_str.split('\t')
print(my_list) # Output: ['tutorial', 'reference', 'com']
my_str.split('\t')
splits the string on each tab character.- To handle leading and trailing tabs, use
strip()
beforehand:my_str.strip().split('\t')
Using re.split()
(for multiple tabs)
If your string contains multiple, consecutive tab characters, using re.split()
ensures that the string is handled as you expect it to be:
import re
my_str = '\ttutorial\t\treference\t\tcom\t'
my_list = re.split(r'\t+', my_str.strip())
print(my_list) # Output: ['tutorial', 'reference', 'com']
re.split(r'\t+', my_str.strip())
splits the string by any sequence of one or more tab characters, removing any potential empty string.