How to Find All Indices of a Substring in Python

Locating all occurrences of a substring within a larger string and getting their starting indices is a common task in text processing.

This guide explores several effective methods for finding all indices of a substring in Python, using list comprehensions with startswith(), regular expressions with re.finditer(), and manual iteration with str.find().

Finding Indices with `startswith()` and List Comprehension

You can use a list comprehension with startswith() to find all starting indices of a substring:

string = 'tutorial reference tutorialreference.com'
substring = 'tutorial'

indices = [
    index for index in range(len(string))
    if string.startswith(substring, index)
]

print(indices)  # Output: [0, 19]

The list comprehension iterates through each possible starting index (0 to len(string)-1).
string.startswith(substring, index) checks if the string starts with the substring at the current index.
If it does, the index is added to the indices list.

Finding Indices with `re.finditer()` (Recommended)

The re.finditer() function from the re (regular expression) module is often the most efficient and Pythonic way to find all occurrences and their positions:

import re

string = 'tutorial reference tutorialreference.com'
substring_pattern = r'tut' # Use a regex pattern (can be literal string)

indices = [match.start() for match in re.finditer(substring_pattern, string)]

print(indices)  # Output: [0, 19]

re.finditer(pattern, string) returns an iterator of match objects for all non-overlapping matches of the pattern in the string.
match.start() returns the starting index of each match.
The list comprehension efficiently collects these starting indices.

Finding Indices with a `for` Loop and `re.finditer()`

You can achieve the same result as the list comprehension using a more explicit for loop:

import re

string = 'tutorial reference tutorialreference.com'
substring_pattern = r'tut'
indices = []

for match in re.finditer(substring_pattern, string):
    indices.append(match.start())

print(indices)  # Output: [0, 19]

Finding Indices with a `while` Loop and `str.find()`

A manual approach using a while loop and the str.find() method can also find all indices. str.find() locates the first occurrence of a substring starting from a given index.

def find_indices_with_overlap(a_string, substring):
    start = 0
    indices = []
    while True: # Loop indefinitely until break
        start = a_string.find(substring, start) # Find next occurrence
        if start == -1: # No more occurrences
            break
        indices.append(start)
        start += 1 # Move start index by 1 to find overlaps
    return indices

string = 'tutututut'
print(find_indices_with_overlap(string, 'tut')) # Output: [0, 2, 4, 6]

Finding Overlapping vs. Non-Overlapping Substrings

The while loop approach above finds overlapping substrings because start is only incremented by 1. To find only non-overlapping substrings (like re.finditer does), increment start by the length of the substring:

def find_indices_no_overlap(a_string, substring):
    start = 0
    indices = []
    while True:
        start = a_string.find(substring, start)
        if start == -1:
            break
        indices.append(start)
        start += len(substring) # Move start index past the current match
    return indices

string = 'tutututut'
print(find_indices_no_overlap(string, 'tut')) # Output: [0, 4]

Finding Indices with startswith() and List Comprehension​

Finding Indices with re.finditer() (Recommended)​

Finding Indices with a for Loop and re.finditer()​

Finding Indices with a while Loop and str.find()​

Finding Overlapping vs. Non-Overlapping Substrings​

Table of Contents

Finding Indices with `startswith()` and List Comprehension

Finding Indices with `re.finditer()` (Recommended)

Finding Indices with a `for` Loop and `re.finditer()`

Finding Indices with a `while` Loop and `str.find()`

Finding Overlapping vs. Non-Overlapping Substrings