Skip to main content

How to Split a String into Text and Number in Python

A frequent string processing task involves separating a string into its alphabetic and numeric components.

This guide explores various methods for splitting a string into text and number portions in Python, covering techniques using regular expressions, string methods, and character-by-character processing.

Splitting with Regular Expressions (re.split())

The re.split() method can efficiently split a string based on a regular expression pattern, while also including the matched delimiter in the resulting list.

import re

my_str = 'hello123'
my_list = re.split(r'(\d+)', my_str)
print(my_list) # Output: ['hello', '123', '']

my_str = '123hello'
my_list = re.split(r'(\d+)', my_str)
print(my_list) # Output: ['', '123', 'hello']

my_str = 'hello123.abc'
my_list = list(filter(None, re.split(r'(\d+)', my_str)))
print(my_list) # Output: ['hello', '123', '.', 'abc']
  • re.split(r'(\d+)', my_str) splits the string at one or more digits (\d+), capturing the digits in a group due to the parenthesis in the regex.
  • The list(filter(None, ...)) removes any empty strings from the list, which can happen if the string starts or ends with digits.

Splitting with Regular Expressions (re.match() )

The re.match() can be used to check if a string matches a certain format, and also group different sections from a string, and store them to a list. The match has to occur at the beginning of the string for re.match() to work.

import re

my_str = 'hello123'
match = re.match(r'([a-zA-Z]+)([0-9]+)', my_str)
my_list = []

if match:
my_list = list(match.groups())

print(my_list) # Output: ['hello', '123']
  • re.match() will attempt to match the regex from the start of the string, so a string that starts with a number, won't be matched correctly.
  • The [a-zA-Z] expression matches any lower or uppercase letter.
  • The parentheses around the two groups will allow us to extract each of the groups using the .groups() method, which are then stored to the my_list variable after being converted using the list() constructor.

Splitting with Regular Expressions (re.findall())

The re.findall() method finds all the occurrences of the patterns and returns a list:

import re
def split_text_number(string):
return list(re.findall(r'(\w+?)(\d+)', string)[0])

print(split_text_number('abc123')) # Output: ['abc', '123']
  • The code uses a regex to find every occurrence of letters, numbers, or underscore characters.
  • re.findall method returns a list of tuples, which are then converted to a list using list() constructor.
  • The function will return a list with text and numbers found.

Splitting with String Method rstrip()

To split the string by first extracting the digits, and then removing them:

def split_into_str_num(string):
letters = string.rstrip('0123456789')
numbers = string[len(letters):]
return [letters, numbers]

my_str = 'hello123'
print(split_into_str_num(my_str)) # Output: ['hello', '123']
  • str.rstrip('0123456789') removes all trailing digits from the string.
  • To get the digits, simply slice the string from the length of the letters string until the end: string[len(letters):]

Splitting with a for Loop

You can also use a for loop and string methods to append text and numbers to different strings as you iterate through a string.

def split_text_number(string):
letters = ''
numbers = ''
for char in string:
if char.isalpha():
letters += char
elif char.isdigit():
numbers += char
return [letters, numbers]

print(split_text_number('abc123')) # Output: ['abc', '123']
print(split_text_number('123abc')) # Output: ['abc', '123']
print(split_text_number('abc123.<#')) # Output: ['abc.<#', '123']
  • This will also append special characters to the letters.

Splitting Lists of Strings into Text and Number

To split a list containing strings, each consisting of some text and a number, use a list comprehension:

import re

my_list = ['123ab', '456cd', '789ef']

result = [
list(filter(None, re.split(r'(\d+)', item)))
for item in my_list
]

print(result) # Output: [['123', 'ab'], ['456', 'cd'], ['789', 'ef']]
  • The list comprehension will go through all the items of the list my_list and then call re.split(r'(\d+)', item) to split based on digits.
  • list(filter(None, ...)) then removes the falsy values.
  • The expression, when evaluated, will return a list of lists, each inner list has all of the alphabetic and digit characters from the string.