Skip to main content

How to Extract Strings Between Quotes in Python

Extracting text enclosed in quotation marks (single or double) is a common task in string processing, especially when dealing with data parsing or text manipulation.

This guide explains how to extract substrings between quotes in Python, using regular expressions with re.findall() and string splitting with str.split().

The re.findall() function from the re (regular expression) module is the most robust and flexible way to extract strings between quotes. It handles multiple quoted substrings and avoids issues with unbalanced quotes.

Double Quotes

To extract strings enclosed in double quotes:

import re

my_str = 'tutorial "referece" com "ABC"'
my_list = re.findall(r'"([^"]*)"', my_str) # Find all quoted strings

print(my_list) # Output: ['referece', 'ABC']
print(my_list[0]) # Output: referece
print(my_list[1]) # Output: ABC

**r'"([^"]*)"' (Regular Expression Breakdown):

  • ": Matches a literal double quote character. This is the opening quote.
  • (...): This is a capturing group. re.findall() will return only the contents of the capturing groups.
  • [^"]: This is a negated character set. It matches any character that is not a double quote.
  • *: This is a quantifier. It means "match the preceding character or group zero or more times."
  • ": Matches a literal double quote character. This is the closing quote.

In essence, this regular expression finds all occurrences of text enclosed in double quotes and extracts the text between the quotes.

Single Quotes

To extract strings enclosed in single quotes, change the quotes in the regular expression:

import re

my_str = "tutorial 'referece' com 'ABC'"
my_list = re.findall(r"'([^']*)'", my_str) # Find all single-quoted strings
print(my_list) # Output: ['referece', 'ABC']

The regex is identical except for the use of single quotes (') instead of double quotes (").

Extracting Strings Between Quotes with str.split()

The str.split() method can also be used, but it's less robust than using regular expressions, especially if your string has unbalanced quotes or other complexities.

my_str = 'tutorial "referece" com "ABC"'
my_list = my_str.split('"')[1::2] # Split, then select every other item

print(my_list) # Output: ['referece', 'ABC']
  • my_str.split('"') splits the string at each double quote.
my_str = 'tutorial "referece" com "ABC"'
# Output: ['tutorial ', 'referece', ' com ', 'ABC', '']
print(my_str.split('"'))

  • [1::2] selects elements starting from index 1 (the second element), and then every second element after that. This effectively picks out the quoted strings.
warning

Limitations of str.split(): This approach is brittle. It will fail if:

  • You have unbalanced quotes (e.g., only an opening quote).
  • You have quotes within the quoted text (you'd need to escape them, which str.split() doesn't handle).
  • You have multiple delimiters and other complex scenarios

For these reasons, re.findall() is almost always the better choice.