How to Split Strings and Remove Whitespace in Python
This guide explains how to split a string in Python while simultaneously removing any leading, trailing, or extra internal whitespace. We'll cover the most common and efficient methods using list comprehensions, map()
, and regular expressions with re.split()
.
Splitting and Stripping with List Comprehensions (Recommended)
The most concise and Pythonic way to split a string and remove whitespace from the resulting substrings is to combine str.split()
with a list comprehension and str.strip()
:
my_str = 'tutorial, reference, com' # Extra spaces
my_list = [word.strip() for word in my_str.split(',')]
print(my_list) # Output: ['tutorial', 'reference', 'com']
my_str.split(',')
: Splits the string at each comma, resulting in a list like['tutorial', ' reference', ' com']
. Note the extra spaces.[word.strip() for word in ...]
: This list comprehension iterates through the list produced bysplit()
.word.strip()
: For eachword
(substring),strip()
removes leading and trailing whitespace. This handles spaces, tabs, newlines, etc.
This approach is very readable and efficient. It handles both splitting and whitespace removal in a single, clear line of code.
Splitting and Stripping with map()
You can achieve the same result using the map()
function, although it's generally less readable than a list comprehension:
my_str = 'tutorial, reference, com'
my_list = list(map(str.strip, my_str.split(',')))
print(my_list) # Output: ['tutorial', 'reference', 'com']
my_str.split(',')
: Same as before, splits on commas.map(str.strip, ...)
: Applies thestr.strip
function to each element of the list produced bysplit()
.map()
returns a map object (an iterator).list(...)
: Converts themap
object to a list.
While functional, this is less readable than the list comprehension, so the list comprehension is generally preferred.
Splitting and Stripping with re.split()
(for Multiple Delimiters)
If you need to split on multiple delimiters and remove whitespace, regular expressions with re.split()
become useful:
import re
my_str = 'tutorial, reference-dot:com' # Comma, hyphen and colon.
pattern = re.compile(r'^\s+|\s*,\s*|\s*-\s*|\s*:\s*|\s+$') # Regular expression
my_list = [word for word in pattern.split(my_str) if word]
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
re.compile(r'^\s+|\s*,\s*|\s*-\s*|\s*:\s*|\s+$')
: This regular expression is more complex:
-
^\s+
: Matches one or more whitespace characters at the beginning of the string. -
\s*,\s*
: Matches zero or more whitespace, a comma, then zero or more whitespace. -
\s*-\s*
: Matches zero or more whitespace, a hyphen, then zero or more whitespace. -
\s*:\s*
: Matches zero or more whitespace, a colon, then zero or more whitespace. -
\s+$
: Matches one or more whitespace characters at the end of the string. -
|
: The "OR" operator, matching any of the patterns separated by|
. -
List Comprehension will iterate through results of splitting and create a new list, if the item is not empty.
-
You can also split on different delimiters by passing the string to the split method:
import re
my_str = 'tutorial, reference-dot:com'
pattern = re.compile(r'^\s+|\s*(?:,|-|:)\s*|\s+$')
my_list = [word for word in pattern.split(my_str) if word]
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com'] -
[word for word in pattern.split(my_str) if word]
: This list comprehension filters out any empty strings that might result from leading/trailing delimiters or multiple delimiters in a row. This is a concise way to handle those edge cases.