How to Split a String into Text and Number in Python
A frequent string processing task involves separating a string into its alphabetic and numeric components.
This guide explores various methods for splitting a string into text and number portions in Python, covering techniques using regular expressions, string methods, and character-by-character processing.
Splitting with Regular Expressions (re.split()
)
The re.split()
method can efficiently split a string based on a regular expression pattern, while also including the matched delimiter in the resulting list.
import re
my_str = 'hello123'
my_list = re.split(r'(\d+)', my_str)
print(my_list) # Output: ['hello', '123', '']
my_str = '123hello'
my_list = re.split(r'(\d+)', my_str)
print(my_list) # Output: ['', '123', 'hello']
my_str = 'hello123.abc'
my_list = list(filter(None, re.split(r'(\d+)', my_str)))
print(my_list) # Output: ['hello', '123', '.', 'abc']
re.split(r'(\d+)', my_str)
splits the string at one or more digits (\d+
), capturing the digits in a group due to the parenthesis in the regex.- The
list(filter(None, ...))
removes any empty strings from the list, which can happen if the string starts or ends with digits.
Splitting with Regular Expressions (re.match()
)
The re.match()
can be used to check if a string matches a certain format, and also group different sections from a string, and store them to a list. The match has to occur at the beginning of the string for re.match()
to work.
import re
my_str = 'hello123'
match = re.match(r'([a-zA-Z]+)([0-9]+)', my_str)
my_list = []
if match:
my_list = list(match.groups())
print(my_list) # Output: ['hello', '123']
re.match()
will attempt to match the regex from the start of the string, so a string that starts with a number, won't be matched correctly.- The
[a-zA-Z]
expression matches any lower or uppercase letter. - The parentheses around the two groups will allow us to extract each of the groups using the
.groups()
method, which are then stored to themy_list
variable after being converted using thelist()
constructor.
Splitting with Regular Expressions (re.findall()
)
The re.findall()
method finds all the occurrences of the patterns and returns a list:
import re
def split_text_number(string):
return list(re.findall(r'(\w+?)(\d+)', string)[0])
print(split_text_number('abc123')) # Output: ['abc', '123']
- The code uses a regex to find every occurrence of letters, numbers, or underscore characters.
re.findall
method returns a list of tuples, which are then converted to a list usinglist()
constructor.- The function will return a list with text and numbers found.
Splitting with String Method rstrip()
To split the string by first extracting the digits, and then removing them:
def split_into_str_num(string):
letters = string.rstrip('0123456789')
numbers = string[len(letters):]
return [letters, numbers]
my_str = 'hello123'
print(split_into_str_num(my_str)) # Output: ['hello', '123']
str.rstrip('0123456789')
removes all trailing digits from the string.- To get the digits, simply slice the string from the length of the letters string until the end:
string[len(letters):]
Splitting with a for
Loop
You can also use a for loop and string methods to append text and numbers to different strings as you iterate through a string.
def split_text_number(string):
letters = ''
numbers = ''
for char in string:
if char.isalpha():
letters += char
elif char.isdigit():
numbers += char
return [letters, numbers]
print(split_text_number('abc123')) # Output: ['abc', '123']
print(split_text_number('123abc')) # Output: ['abc', '123']
print(split_text_number('abc123.<#')) # Output: ['abc.<#', '123']
- This will also append special characters to the letters.
Splitting Lists of Strings into Text and Number
To split a list containing strings, each consisting of some text and a number, use a list comprehension:
import re
my_list = ['123ab', '456cd', '789ef']
result = [
list(filter(None, re.split(r'(\d+)', item)))
for item in my_list
]
print(result) # Output: [['123', 'ab'], ['456', 'cd'], ['789', 'ef']]
- The list comprehension will go through all the items of the list
my_list
and then callre.split(r'(\d+)', item)
to split based on digits. list(filter(None, ...))
then removes the falsy values.- The expression, when evaluated, will return a list of lists, each inner list has all of the alphabetic and digit characters from the string.