Skip to main content

How to Remove Punctuation from a List of Strings in Python

This guide explains how to efficiently remove punctuation marks from strings within a list in Python. We'll cover using str.translate(), regular expressions with re.sub(), and basic looping, highlighting the strengths and weaknesses of each approach.

The str.translate() method, combined with a pre-built translation table, is the most efficient and recommended way to remove punctuation:

import string

a_list = ['t.u.t.o.r.i.a.l', 're,fer,en,ce', '', 'c:om']

# Create the translation table ONCE, outside the loop/comprehension
translator = str.maketrans('', '', string.punctuation)

new_list = [item.translate(translator) for item in a_list if item] # Added check for empty strings

print(new_list) # Output: ['tutorial', 'reference', 'com']
  • string.punctuation: This constant (from the string module) provides a string containing all common punctuation characters: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
  • str.maketrans('', '', string.punctuation): This creates a translation table. This is a highly optimized lookup table that translate() uses. The arguments mean:
    • First argument (empty string): We're not replacing any characters with other characters.
    • Second argument (empty string): We're not mapping any characters to other characters in a 1:1 fashion.
    • Third argument (string.punctuation): These are the characters we want to delete.
  • item.translate(translator): This applies the translation table to each string in the list, efficiently removing all punctuation characters.
  • List comprehension filters out any empty string that might have resulted from removing punctuation, if the original list had empty strings.

Key Advantages of str.translate():

  • Efficiency: str.translate() with a pre-built table is significantly faster than using regular expressions or looping with in. It's implemented in C and highly optimized.
  • Readability: Once you understand the maketrans() call, the code is very clear.
  • Correctness: It handles Unicode punctuation correctly.

Removing Punctuation with re.sub()

Regular expressions provide a flexible way to remove punctuation, but they are generally slower than str.translate(). Use re.sub() if you need to remove a specific, complex pattern of punctuation, not just all punctuation.

import re

a_list = ['t.u.t.o.r.i.a.l', 're,fer,en,ce', 'c:om']

new_list = [re.sub(r'[^\w\s]', '', item) for item in a_list]
print(new_list) # Output: ['tutorial', 'reference', 'com']
  • re.sub(r'[^\w\s]', '', item) will remove any character which is not alphanumeric or whitespace character, i.e. it will remove all punctuation.

  • re.sub(pattern, replacement, string): Substitutes all occurrences of pattern in string with replacement.

  • r'[^\w\s]': This regular expression matches any character that is not (^ inside [] means "not") a word character (\w - letters, numbers, and underscore) or whitespace (\s). This effectively matches all punctuation.

Removing Punctuation with a for Loop and string.punctuation (Least Efficient)

You can use a nested for loop and check each character against string.punctuation, but this is the least efficient and least readable method:

import string

a_list = ['t.u.t.o.r.i.a.l', 're,fer,en,ce', '', 'c:om']

new_list = [''.join(char for char in item
if char not in string.punctuation)
for item in a_list if item != '']

print(new_list) # Output: ['tutorial', 'reference', 'com']
  • We use a list comprehension which checks if a character is punctuation, and creates a list with the valid characters.
  • ''.join(...) joins the filtered list of characters back into a single string.