Skip to main content

How to Split Strings by Multiple Delimiters in Python

Python's built-in str.split() method is excellent for splitting strings by a single delimiter. However, when you need to split a string based on multiple delimiters, you need more powerful tools.

This guide explains how to split strings using multiple delimiters in Python, focusing on the most efficient and readable methods: regular expressions (re.split()) and, for simpler cases, chained replace() calls followed by split().

The re.split() function from the re (regular expression) module is the most flexible and robust way to split a string by multiple delimiters.

import re

my_str = 'tutorial,reference-dot,com'
my_list = re.split(r',|-', my_str) # Split on comma OR hyphen
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
  • re.split(pattern, string): Splits the string wherever the pattern matches.
  • r',|-': This is the regular expression pattern. The | character acts as an "OR" operator. This pattern means "split on a comma or a hyphen".

Using Character Sets ([])

For simple delimiters (single characters), you can often use a character set within square brackets [] for a more concise pattern:

import re

my_str = 'tutorial,reference-dot:com'
my_list = re.split(r'[,-:]', my_str) # Split on comma, hyphen, OR colon
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
  • r'[,-:]': This pattern means "split on any character within the brackets: comma, hyphen, or colon." This is often cleaner than using the | operator for single-character delimiters.

Using the OR Operator (|)

If you have multiple delimiters, where some of them are longer than one character, use the OR operator |:

import re

my_str = 'tutorial,reference-dot;;;com'
my_list = re.split(r',|-|;;;', my_str) # Split on comma, hyphen, OR colon
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']

Handling Leading/Trailing Delimiters and Empty Strings

If your string starts or ends with a delimiter, re.split() might produce empty strings at the beginning or end of the resulting list:

import re

my_str = ',tutorial,reference-dot:com:'
my_list = re.split(r'[,-:]', my_str)
print(my_list) # Output: ['', 'tutorial', 'reference', 'dot', 'com', '']

To remove these empty strings, use a list comprehension with a condition or the filter function:

import re

my_str = ',tutorial,reference-dot:com:'
my_list = [item for item in re.split(r'[,-:]', my_str) if item] # List comprehension
# or: my_list = list(filter(None, re.split(r'[,-:]', my_str))) # Filter (less readable)
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
  • Using list comprehension is a clear and readable approach, where we split the string, and include only the truthy strings (i.e. non-empty) to construct a new list.

Splitting by Multiple Delimiters with str.replace() and split() (Limited Cases)

For very simple cases with only two delimiters, and when all instances of one delimiter should be replaced by the other before splitting, you can chain replace() calls followed by split():

my_str = 'tutorial_reference!dot_com'
my_list = my_str.replace('_', '!').split('!')
print(my_list) # Output: ['tutorial', 'reference', 'dot', 'com']
  • First, all underscores are replaced with exclamation marks.
  • Then, the string is split on exclamation marks.
warning

Limitations: This approach is not generally recommended because:

  • It only works reliably if you want to treat all delimiters as equivalent.
  • It becomes cumbersome and inefficient if you have more than two delimiters.
  • It can lead to unexpected results if the replacement character is already present in the string.
  • You have to chain multiple .replace() calls.

Creating a Reusable Function

If you are performing the split operation often, it is best to create a reusable function that can split strings by multiple delimiters:

import re
def split_multiple(string, delimiters):
pattern = '|'.join(map(re.escape, delimiters))
return re.split(pattern, string)

my_str = 'tutorial,reference-dot:com'

print(split_multiple(my_str, [',', '-', ':'])) # Output: ['tutorial', 'reference', 'dot', 'com']
  • The re.escape is used to escape special characters in the delimiters.
  • The join function is used to create the regex.