How to Split Strings by Multiple Delimiters in Python
Python's built-in str.split()
method is excellent for splitting strings by a single delimiter. However, when you need to split a string based on multiple delimiters, you need more powerful tools.
This guide explains how to split strings using multiple delimiters in Python, focusing on the most efficient and readable methods: regular expressions (re.split()
) and, for simpler cases, chained replace()
calls followed by split()
.
Splitting by Multiple Delimiters with re.split()
(Recommended)
The re.split()
function from the re
(regular expression) module is the most flexible and robust way to split a string by multiple delimiters.
import re
my_str = 'tutorial,reference-dot,com'
my_list = re.split(r',|-', my_str) # Split on comma OR hyphen
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
re.split(pattern, string)
: Splits thestring
wherever thepattern
matches.r',|-'
: This is the regular expression pattern. The|
character acts as an "OR" operator. This pattern means "split on a comma or a hyphen".
Using Character Sets ([]
)
For simple delimiters (single characters), you can often use a character set within square brackets []
for a more concise pattern:
import re
my_str = 'tutorial,reference-dot:com'
my_list = re.split(r'[,-:]', my_str) # Split on comma, hyphen, OR colon
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
r'[,-:]'
: This pattern means "split on any character within the brackets: comma, hyphen, or colon." This is often cleaner than using the|
operator for single-character delimiters.
Using the OR Operator (|
)
If you have multiple delimiters, where some of them are longer than one character, use the OR operator |
:
import re
my_str = 'tutorial,reference-dot;;;com'
my_list = re.split(r',|-|;;;', my_str) # Split on comma, hyphen, OR colon
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
Handling Leading/Trailing Delimiters and Empty Strings
If your string starts or ends with a delimiter, re.split()
might produce empty strings at the beginning or end of the resulting list:
import re
my_str = ',tutorial,reference-dot:com:'
my_list = re.split(r'[,-:]', my_str)
print(my_list) # Output: ['', 'tutorial', 'reference', 'dot', 'com', '']
To remove these empty strings, use a list comprehension with a condition or the filter
function:
import re
my_str = ',tutorial,reference-dot:com:'
my_list = [item for item in re.split(r'[,-:]', my_str) if item] # List comprehension
# or: my_list = list(filter(None, re.split(r'[,-:]', my_str))) # Filter (less readable)
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
- Using list comprehension is a clear and readable approach, where we split the string, and include only the truthy strings (i.e. non-empty) to construct a new list.
Splitting by Multiple Delimiters with str.replace()
and split()
(Limited Cases)
For very simple cases with only two delimiters, and when all instances of one delimiter should be replaced by the other before splitting, you can chain replace()
calls followed by split()
:
my_str = 'tutorial_reference!dot_com'
my_list = my_str.replace('_', '!').split('!')
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
- First, all underscores are replaced with exclamation marks.
- Then, the string is split on exclamation marks.
Limitations: This approach is not generally recommended because:
- It only works reliably if you want to treat all delimiters as equivalent.
- It becomes cumbersome and inefficient if you have more than two delimiters.
- It can lead to unexpected results if the replacement character is already present in the string.
- You have to chain multiple
.replace()
calls.
Creating a Reusable Function
If you are performing the split operation often, it is best to create a reusable function that can split strings by multiple delimiters:
import re
def split_multiple(string, delimiters):
pattern = '|'.join(map(re.escape, delimiters))
return re.split(pattern, string)
my_str = 'tutorial,reference-dot:com'
print(split_multiple(my_str, [',', '-', ':'])) # Output: ['tutorial', 'reference', 'dot', 'com']
- The
re.escape
is used to escape special characters in the delimiters. - The
join
function is used to create the regex.