Skip to main content

How to Split Strings Without Removing Delimiters in Python

This guide explains how to split a string in Python while keeping the delimiter(s) as part of the resulting list elements. We'll cover the recommended approach using regular expressions with re.split(), and then discuss less efficient alternatives (using a loop and str.split()) for specific situations.

The re.split() function from the re (regular expression) module is the best way to split a string and keep the delimiters. By using a capturing group in the regular expression, the delimiters are included in the result.

import re

my_str = 'tutorial_reference_com'
my_list = re.split(r'(_)', my_str) # Split on '_', but KEEP the '_'
print(my_list) # Output: ['tutorial', '_', 'reference', '_', 'com']
  • re.split(r'(_)', my_str):

    • r'(_)': This is the regular expression.
      • _: Matches the literal underscore character (our delimiter).
      • (...): Parentheses create a capturing group. This is the key to keeping the delimiter. When you use a capturing group in re.split(), the captured text (the delimiter, in this case) is included in the resulting list.
  • This will only work when the split happens on one specific delimiter, but it can be adapted for multiple ones.

    If your delimiter is more complex than a single character, you can use any valid regular expression within the capturing group. For example:

    import re
    my_str = 'tutorial-reference,com'
    my_list = re.split(r'([-;,])', my_str) # Split on '-' ',', or ';'
    print(my_list) # Output: ['tutorial', '-', 'reference', ',', 'com']

    my_str = "one,two;three-four"
    print(re.split(r'([,;-])', my_str)) # Output: ['one', ',', 'two', ';', 'three', '-', 'four']

    my_str = 'tutorial123reference'
    print(re.split(r'(\d+)', my_str)) # Output: ['tutorial', '123', 'reference']

Splitting and Keeping Delimiters with a for Loop (Less Efficient)

You can achieve the same result with a for loop and str.split(), but it's significantly less efficient and more complex:

my_str = 'tutorial_reference_com'
my_list = []
delimiter = '_'

for item in my_str.split(delimiter):
if item: # Avoid empty strings
my_list.append(item + delimiter) # Add the delimiter back

print(my_list) # Output: ['tutorial_', 'reference_', 'com_']
  • This code splits the string using the standard split() method, losing the delimiter.
  • It then iterates through the resulting parts and appends the delimiter back to each part. This is inefficient.
  • This code also has a problem: it adds the delimiter to the last element, which is often undesirable. You'd need extra logic to handle this.

Removing Trailing Delimiter

You can remove the trailing delimiter using rstrip() or list slicing.

Using rstrip():

The rstrip method removes any trailing delimiters from the last element

my_str = 'tutorial_reference_com'
my_list = []
delimiter = '_'
for item in my_str.split(delimiter):
if item:
my_list.append(item + delimiter)

print(my_list) # Output: ['tutorial_', 'reference_', 'com_']
my_list[-1] = my_list[-1].rstrip(delimiter) # Remove delimiter from last item

print(my_list) # Output: ['tutorial_', 'reference_', 'com']

Using list slicing:

We use list slicing to reassign the value in the last index, and slice the string up until the length of the delimiter to avoid the trailing delimiter.

my_str = 'tutorial_reference_com'
my_list = []
delimiter = '_'
for item in my_str.split(delimiter):
if item:
my_list.append(item + delimiter)
my_list[-1] = my_list[-1][:-len(delimiter)] # Remove delimiter
print(my_list) # Output: ['tutorial_', 'reference_', 'com']