How to Remove Punctuation from a List of Strings in Python

This guide explains how to efficiently remove punctuation marks from strings within a list in Python. We'll cover using str.translate(), regular expressions with re.sub(), and basic looping, highlighting the strengths and weaknesses of each approach.

Removing Punctuation with `str.translate()` (Recommended)

The str.translate() method, combined with a pre-built translation table, is the most efficient and recommended way to remove punctuation:

import string

a_list = ['t.u.t.o.r.i.a.l', 're,fer,en,ce', '', 'c:om']

# Create the translation table ONCE, outside the loop/comprehension
translator = str.maketrans('', '', string.punctuation)

new_list = [item.translate(translator) for item in a_list if item] # Added check for empty strings

print(new_list)  # Output: ['tutorial', 'reference', 'com']

string.punctuation: This constant (from the string module) provides a string containing all common punctuation characters: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
str.maketrans('', '', string.punctuation): This creates a translation table. This is a highly optimized lookup table that translate() uses. The arguments mean:
- First argument (empty string): We're not replacing any characters with other characters.
- Second argument (empty string): We're not mapping any characters to other characters in a 1:1 fashion.
- Third argument (string.punctuation): These are the characters we want to delete.
item.translate(translator): This applies the translation table to each string in the list, efficiently removing all punctuation characters.
List comprehension filters out any empty string that might have resulted from removing punctuation, if the original list had empty strings.

Key Advantages of str.translate():

Efficiency: str.translate() with a pre-built table is significantly faster than using regular expressions or looping with in. It's implemented in C and highly optimized.
Readability: Once you understand the maketrans() call, the code is very clear.
Correctness: It handles Unicode punctuation correctly.

Removing Punctuation with `re.sub()`

Regular expressions provide a flexible way to remove punctuation, but they are generally slower than str.translate(). Use re.sub() if you need to remove a specific, complex pattern of punctuation, not just all punctuation.

import re

a_list = ['t.u.t.o.r.i.a.l', 're,fer,en,ce', 'c:om']

new_list = [re.sub(r'[^\w\s]', '', item) for item in a_list]
print(new_list)  # Output: ['tutorial', 'reference', 'com']

re.sub(r'[^\w\s]', '', item) will remove any character which is not alphanumeric or whitespace character, i.e. it will remove all punctuation.
re.sub(pattern, replacement, string): Substitutes all occurrences of pattern in string with replacement.
r'[^\w\s]': This regular expression matches any character that is not (^ inside [] means "not") a word character (\w - letters, numbers, and underscore) or whitespace (\s). This effectively matches all punctuation.

Removing Punctuation with a `for` Loop and `string.punctuation` (Least Efficient)

You can use a nested for loop and check each character against string.punctuation, but this is the least efficient and least readable method:

import string

a_list = ['t.u.t.o.r.i.a.l', 're,fer,en,ce', '', 'c:om']

new_list = [''.join(char for char in item
                    if char not in string.punctuation)
            for item in a_list if item != '']

print(new_list) # Output: ['tutorial', 'reference', 'com']

We use a list comprehension which checks if a character is punctuation, and creates a list with the valid characters.
''.join(...) joins the filtered list of characters back into a single string.

How to Remove Punctuation from a List of Strings in Python

Removing Punctuation with str.translate() (Recommended)​

Removing Punctuation with re.sub()​

Removing Punctuation with a for Loop and string.punctuation (Least Efficient)​

Table of Contents

Removing Punctuation with `str.translate()` (Recommended)

Removing Punctuation with `re.sub()`

Removing Punctuation with a `for` Loop and `string.punctuation` (Least Efficient)