Skip to main content

How to Find All Indices of a Substring in Python

Locating all occurrences of a substring within a larger string and getting their starting indices is a common task in text processing.

This guide explores several effective methods for finding all indices of a substring in Python, using list comprehensions with startswith(), regular expressions with re.finditer(), and manual iteration with str.find().

Finding Indices with startswith() and List Comprehension

You can use a list comprehension with startswith() to find all starting indices of a substring:

string = 'tutorial reference tutorialreference.com'
substring = 'tutorial'

indices = [
index for index in range(len(string))
if string.startswith(substring, index)
]

print(indices) # Output: [0, 19]
  • The list comprehension iterates through each possible starting index (0 to len(string)-1).
  • string.startswith(substring, index) checks if the string starts with the substring at the current index.
  • If it does, the index is added to the indices list.

The re.finditer() function from the re (regular expression) module is often the most efficient and Pythonic way to find all occurrences and their positions:

import re

string = 'tutorial reference tutorialreference.com'
substring_pattern = r'tut' # Use a regex pattern (can be literal string)

indices = [match.start() for match in re.finditer(substring_pattern, string)]

print(indices) # Output: [0, 19]
  • re.finditer(pattern, string) returns an iterator of match objects for all non-overlapping matches of the pattern in the string.
  • match.start() returns the starting index of each match.
  • The list comprehension efficiently collects these starting indices.

Finding Indices with a for Loop and re.finditer()

You can achieve the same result as the list comprehension using a more explicit for loop:

import re

string = 'tutorial reference tutorialreference.com'
substring_pattern = r'tut'
indices = []

for match in re.finditer(substring_pattern, string):
indices.append(match.start())

print(indices) # Output: [0, 19]

Finding Indices with a while Loop and str.find()

A manual approach using a while loop and the str.find() method can also find all indices. str.find() locates the first occurrence of a substring starting from a given index.

def find_indices_with_overlap(a_string, substring):
start = 0
indices = []
while True: # Loop indefinitely until break
start = a_string.find(substring, start) # Find next occurrence
if start == -1: # No more occurrences
break
indices.append(start)
start += 1 # Move start index by 1 to find overlaps
return indices

string = 'tutututut'
print(find_indices_with_overlap(string, 'tut')) # Output: [0, 2, 4, 6]

Finding Overlapping vs. Non-Overlapping Substrings

The while loop approach above finds overlapping substrings because start is only incremented by 1. To find only non-overlapping substrings (like re.finditer does), increment start by the length of the substring:

def find_indices_no_overlap(a_string, substring):
start = 0
indices = []
while True:
start = a_string.find(substring, start)
if start == -1:
break
indices.append(start)
start += len(substring) # Move start index past the current match
return indices

string = 'tutututut'
print(find_indices_no_overlap(string, 'tut')) # Output: [0, 4]