How to Find All Indices of a Substring in Python
Locating all occurrences of a substring within a larger string and getting their starting indices is a common task in text processing.
This guide explores several effective methods for finding all indices of a substring in Python, using list comprehensions with startswith()
, regular expressions with re.finditer()
, and manual iteration with str.find()
.
Finding Indices with startswith()
and List Comprehension
You can use a list comprehension with startswith()
to find all starting indices of a substring:
string = 'tutorial reference tutorialreference.com'
substring = 'tutorial'
indices = [
index for index in range(len(string))
if string.startswith(substring, index)
]
print(indices) # Output: [0, 19]
- The list comprehension iterates through each possible starting index (
0
tolen(string)-1
). string.startswith(substring, index)
checks if the string starts with thesubstring
at the currentindex
.- If it does, the
index
is added to theindices
list.
Finding Indices with re.finditer()
(Recommended)
The re.finditer()
function from the re
(regular expression) module is often the most efficient and Pythonic way to find all occurrences and their positions:
import re
string = 'tutorial reference tutorialreference.com'
substring_pattern = r'tut' # Use a regex pattern (can be literal string)
indices = [match.start() for match in re.finditer(substring_pattern, string)]
print(indices) # Output: [0, 19]
re.finditer(pattern, string)
returns an iterator of match objects for all non-overlapping matches of thepattern
in thestring
.match.start()
returns the starting index of each match.- The list comprehension efficiently collects these starting indices.
Finding Indices with a for
Loop and re.finditer()
You can achieve the same result as the list comprehension using a more explicit for
loop:
import re
string = 'tutorial reference tutorialreference.com'
substring_pattern = r'tut'
indices = []
for match in re.finditer(substring_pattern, string):
indices.append(match.start())
print(indices) # Output: [0, 19]
Finding Indices with a while
Loop and str.find()
A manual approach using a while
loop and the str.find()
method can also find all indices. str.find()
locates the first occurrence of a substring starting from a given index.
def find_indices_with_overlap(a_string, substring):
start = 0
indices = []
while True: # Loop indefinitely until break
start = a_string.find(substring, start) # Find next occurrence
if start == -1: # No more occurrences
break
indices.append(start)
start += 1 # Move start index by 1 to find overlaps
return indices
string = 'tutututut'
print(find_indices_with_overlap(string, 'tut')) # Output: [0, 2, 4, 6]
Finding Overlapping vs. Non-Overlapping Substrings
The while
loop approach above finds overlapping substrings because start
is only incremented by 1. To find only non-overlapping substrings (like re.finditer
does), increment start
by the length of the substring:
def find_indices_no_overlap(a_string, substring):
start = 0
indices = []
while True:
start = a_string.find(substring, start)
if start == -1:
break
indices.append(start)
start += len(substring) # Move start index past the current match
return indices
string = 'tutututut'
print(find_indices_no_overlap(string, 'tut')) # Output: [0, 4]