How to Use Wildcards for String Matching and Filtering in Python
This guide explains how to use wildcard characters in Python to filter lists of strings and check if strings match specific patterns. We'll cover the fnmatch
module (for shell-style wildcards) and regular expressions (for more complex patterns).
Filtering Lists with fnmatch
The fnmatch
module provides support for Unix shell-style wildcards, which are simpler than full regular expressions.
Wildcard Characters in fnmatch
*
: Matches any sequence of characters (including zero characters).?
: Matches any single character.[seq]
: Matches any character inseq
.[!seq]
: Matches any character not inseq
.
Using fnmatch.filter()
The fnmatch.filter()
function filters a list, returning only the elements that match a given pattern:
import fnmatch
a_list = ['abc_tom.csv', 'nolan', '!@#', 'abc_employees.csv']
pattern = 'abc_*.csv' # Matches strings starting with 'abc_' and ending with '.csv'
filtered_list = fnmatch.filter(a_list, pattern)
print(filtered_list) # Output: ['abc_tom.csv', 'abc_employees.csv']
fnmatch.filter(names, pattern)
: Filters thenames
list, keeping only strings that match thepattern
.
Here's another example using ?
:
import fnmatch
a_list = ['abc', 'abz', 'abxyz']
pattern = 'ab?' # Matches strings starting with 'ab' and followed by one character
filtered_list = fnmatch.filter(a_list, pattern)
print(filtered_list) # Output: ['abc', 'abz']
Using fnmatch.fnmatch()
with a List Comprehension
You can also use fnmatch.fnmatch()
within a list comprehension for more control:
import fnmatch
import re
a_list = ['abc_tom.csv', 'nolan', '!@#', 'abc_employees.csv']
pattern = 'abc_*.csv'
filtered_list = [
item for item in a_list
if fnmatch.fnmatch(item, pattern)
]
print(filtered_list) # Output: ['abc_tom.csv', 'abc_employees.csv']
fnmatch.fnmatch(item, pattern)
: Checks if a singleitem
(string) matches thepattern
. ReturnsTrue
orFalse
.- The list comprehension builds a new list containing only the matching items.
Checking if a String Matches a Pattern with fnmatch
To check if a single string matches a wildcard pattern, use fnmatch.fnmatch()
:
import fnmatch
a_string = '2023_tom.txt'
pattern = '2023*.txt'
matches_pattern = fnmatch.fnmatch(a_string, pattern)
print(matches_pattern) # Output: True
if matches_pattern:
print('The string matches the pattern')
else:
print('The string does NOT match the pattern')
Filtering and Matching with Regular Expressions
For more complex patterns, regular expressions (the re
module) offer much greater power and flexibility.
Filtering a List with re.match()
import re
a_list = ['abc_tom.csv', 'nolan', '!@#', 'abc_employees.csv']
regex = re.compile(r'abc_.*\.csv') # Matches strings starting with "abc_" and ending with ".csv"
filtered_list = [
item for item in a_list
if re.match(regex, item) # Checks if each item matches the regex.
]
print(filtered_list) # Output: ['abc_tom.csv', 'abc_employees.csv']
re.compile(r'abc_.*\.csv')
: Compiles the regular expression. This is optional but good practice for efficiency if you reuse the same pattern multiple times.abc_
: Matches the literal characters "abc_"..*
: Matches any character (.
) zero or more times (*
). This is the "wildcard" part.\.csv
: Matches the literal characters ".csv". The backslash (\
) escapes the dot (.
), which has a special meaning in regular expressions.
re.match(regex, item)
: Checks if the regular expression matches at the beginning of the stringitem
. Returns a match object if it matches,None
otherwise.
Matching a Single String with re.match()
import re
a_string = '2023_tom.txt'
matches_pattern = bool(re.match(r'2023_.*\.txt', a_string)) # Using re.match directly.
print(matches_pattern) # Output: True
if matches_pattern:
print('The string matches the pattern')
else:
print('The string does NOT match the pattern')
re.match
tries to match from the start of the string.