Python Pandas: How to Drop DataFrame Columns if Name Contains String (Regex/str.contains)

When working with Pandas DataFrames, especially those with many columns or dynamically generated column names, you often need to drop columns based on patterns in their names. A common requirement is to remove all columns whose names contain a specific substring or match a regular expression.

This guide explains how to effectively drop Pandas DataFrame columns if their names contain a given string or match a regex, using DataFrame.filter() with regex and Index.str.contains().

The Goal: Dropping Columns by Name Patterns

Given a Pandas DataFrame, we want to identify and remove columns where the column's name (header) includes a specific substring (e.g., "temp", "_calculated") or matches a more complex regular expression pattern.

Example DataFrame

import pandas as pd
import re # For re.IGNORECASE if needed

data = {
    'Record_ID': [101, 102, 103],
    'Patient_Name': ['Alice', 'Bob', 'Charlie'],
    'Age_Years': [30, 45, 28],
    'Vital_Temp_C': [36.5, 37.0, 36.8],
    'Vital_Pressure_Systolic': [120, 130, 115],
    'Notes_General': ['Stable', 'Observation', 'Stable'],
    'Lab_Test_Result_A': [1.5, 2.1, 0.9],
    'Lab_Test_Flag_A': [False, True, False]
}

df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
   Record_ID Patient_Name  Age_Years  Vital_Temp_C  Vital_Pressure_Systolic  \
0        101        Alice         30          36.5                      120   
1        102          Bob         45          37.0                      130   
2        103      Charlie         28          36.8                      115   

  Notes_General  Lab_Test_Result_A  Lab_Test_Flag_A  
0        Stable                1.5            False  
1   Observation                2.1             True  
2        Stable                0.9            False  

Let's say we want to drop columns related to "Vital" signs or those containing "Test".

Method 1: Using `DataFrame.filter(regex=...)` and `DataFrame.drop()` (Recommended)

The DataFrame.filter() method can subset rows or columns based on labels. When used with the regex parameter and axis=1 (for columns), it selects columns whose names match the regular expression.

Dropping Columns Containing a Single String

To drop all columns whose names contain the substring "Vital":

import pandas as pd

df = pd.DataFrame({
    'Record_ID': [101, 102, 103], 'Patient_Name': ['Alice', 'Bob', 'Charlie'],
    'Age_Years': [30, 45, 28], 'Vital_Temp_C': [36.5, 37.0, 36.8],
    'Vital_Pressure_Systolic': [120, 130, 115], 'Notes_General': ['Stable', 'Observation', 'Stable']
})

# Step 1: Filter to get the names of columns containing 'Vital'
cols_to_drop_filter = df.filter(regex='Vital', axis=1).columns
print(f"Columns to drop (containing 'Vital'): {list(cols_to_drop_filter)}")

# Step 2: Drop these columns
# df.drop(columns=cols_to_drop_filter, inplace=True)    # Modifies df directly
df_dropped_vital = df.drop(columns=cols_to_drop_filter) # Returns a new DataFrame

print("DataFrame after dropping columns containing 'Vital':")
print(df_dropped_vital)

Output:

Columns to drop (containing 'Vital'): ['Vital_Temp_C', 'Vital_Pressure_Systolic']
DataFrame after dropping columns containing 'Vital':
   Record_ID Patient_Name  Age_Years Notes_General
0        101        Alice         30        Stable
1        102          Bob         45   Observation
2        103      Charlie         28        Stable

df.filter(regex='Vital', axis=1): Selects a DataFrame containing only columns whose names match the regex 'Vital' (which acts as a substring search here).
.columns: Gets the column names from this filtered DataFrame.
df.drop(columns=..., axis=1): Drops the specified columns. axis=1 is crucial for dropping columns. Use inplace=True to modify the original DataFrame directly.

You can do this in one line:

df.drop(list(df.filter(regex='Vital', axis=1)), axis=1, inplace=True)

Dropping Columns Containing One of Multiple Strings (OR Logic)

Use the pipe | character in your regex to specify OR logic. To drop columns containing "Vital" OR "Lab":

import pandas as pd

df = pd.DataFrame({
    'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
    'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120],
    'Lab_Test_Result_A': [1.5], 'Lab_Test_Flag_A': [False], 'Notes_General': ['Stable']
})

regex_pattern_multi = 'Vital|Lab' # Match 'Vital' OR 'Lab'
cols_to_drop_multi = df.filter(regex=regex_pattern_multi, axis=1).columns
print(f"Columns to drop (containing 'Vital' or 'Lab'): {list(cols_to_drop_multi)}\n")

df_dropped_multi = df.drop(columns=cols_to_drop_multi)

print("DataFrame after dropping columns containing 'Vital' or 'Lab':")
print(df_dropped_multi)

Output:

Columns to drop (containing 'Vital' or 'Lab'): ['Vital_Temp_C', 'Vital_Pressure_Systolic', 'Lab_Test_Result_A', 'Lab_Test_Flag_A']

DataFrame after dropping columns containing 'Vital' or 'Lab':
   Record_ID Patient_Name  Age_Years Notes_General
0        101        Alice         30        Stable

Method 2: Using `Index.str.contains()` and `DataFrame.loc` or `drop()`

This method involves creating a boolean mask from the DataFrame's column names.

Creating a Boolean Mask for Column Names

The df.columns.str.contains(pattern) method returns a boolean array indicating which column names contain the pattern.

import pandas as pd

df = pd.DataFrame({
    'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
    'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})

substring_to_find = 'name' # Case-sensitive search for 'name'

# Create a boolean mask for column names containing the substring
cols_contain_name_mask = df.columns.str.contains(substring_to_find)
print(f"Boolean mask for columns containing '{substring_to_find}':")
print(cols_contain_name_mask)

Output:

Boolean mask for columns containing 'name':
[False False False False False False]

Selecting Columns to Keep (Inverting the Logic)

To drop columns matching the pattern, you select the columns that do not match. Use the tilde (~) operator to invert the boolean mask.

import pandas as pd

df = pd.DataFrame({
    'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
    'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})
substring_to_find = 'Name' # Example: to match 'Patient_Name'
cols_contain_name_mask = df.columns.str.contains(substring_to_find)

# ✅ Select columns to KEEP (where mask is False, i.e., name does NOT contain 'Name')
df_kept_cols = df.loc[:, ~cols_contain_name_mask]

print(f"DataFrame after keeping columns NOT containing '{substring_to_find}':")
print(df_kept_cols)

Output (Patient_Name column is dropped):

Dropping Columns Based on the Mask

Alternatively, get the names of columns to drop and use df.drop().

import pandas as pd

df = pd.DataFrame({
    'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
    'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})
substring_to_find = 'Name'
cols_contain_name_mask = df.columns.str.contains(substring_to_find)

# Get actual column names to drop
cols_to_drop_str_contains = df.columns[cols_contain_name_mask]
print(f"Columns to drop using str.contains: {cols_to_drop_str_contains.tolist()}\n")

df_dropped_str_contains = df.drop(columns=cols_to_drop_str_contains)
print(f"DataFrame after dropping columns containing '{substring_to_find}' (using str.contains then drop):")
print(df_dropped_str_contains)

Output: (Same as selecting kept columns with ~mask)

Columns to drop using str.contains: ['Patient_Name']

DataFrame after dropping columns containing 'Name' (using str.contains then drop):
   Record_ID  Age_Years  Vital_Temp_C  Vital_Pressure_Systolic Notes_General
0        101         30          36.5                      120        Stable

Case-Insensitive Matching with `str.contains()`

Set the case=False parameter in str.contains(). You can also use flags=re.IGNORECASE.

import pandas as pd
import re # For re.IGNORECASE

df = pd.DataFrame({
    'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30], 'Notes_General': ['Stable']
})

substring_lower = 'name' # Search for 'name' case-insensitively

# ✅ Case-insensitive mask
cols_contain_name_case_insensitive_mask = df.columns.str.contains(substring_lower, case=False)
# OR: cols_contain_name_case_insensitive_mask = df.columns.str.contains(substring_lower, flags=re.IGNORECASE)

print(f"Boolean mask for columns containing '{substring_lower}' (case-insensitive):")
print(cols_contain_name_case_insensitive_mask)

df_kept_case_insensitive = df.loc[:, ~cols_contain_name_case_insensitive_mask]
print("DataFrame after case-insensitive drop:")
print(df_kept_case_insensitive)

Output:

Boolean mask for columns containing 'name' (case-insensitive):
[False  True False False]
DataFrame after case-insensitive drop:
   Record_ID  Age_Years Notes_General
0        101         30        Stable

Alternative: Keeping Only Columns Whose Names Contain a String

If your goal is the inverse – to keep only the columns whose names contain a certain string – DataFrame.filter(like=..., axis=1) or DataFrame.filter(regex=..., axis=1) directly achieves this without needing drop().

import pandas as pd

df = pd.DataFrame({
    'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
    'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})

# ✅ Keep only columns whose names contain 'Vital'
df_only_vital_cols = df.filter(like='Vital', axis=1)
# Or using regex: df_only_vital_cols = df.filter(regex='Vital', axis=1)

print("DataFrame with ONLY columns containing 'Vital':")
print(df_only_vital_cols)

Output:

DataFrame with ONLY columns containing 'Vital':
   Vital_Temp_C  Vital_Pressure_Systolic
0          36.5                      120

Conclusion

To drop Pandas DataFrame columns based on whether their names contain a specific string or match a regex:

Using df.filter(regex=pattern, axis=1) then df.drop(columns=...):
- Filter to get the list of column names matching your pattern (which can be a simple substring or a complex regex).
- Pass this list of names to df.drop(columns=...). This is generally clean and leverages regex power effectively.
Using df.columns.str.contains(pattern) then df.loc[:, ~mask] or df.drop(columns=...):
- Create a boolean mask from the column names using str.contains().
- Either select columns to keep using df.loc[:, ~mask] (inverting the mask).
- Or get the names of columns to drop from the mask df.columns[mask] and use df.drop().
- str.contains() has case and flags parameters for case-insensitivity.

Choose the method that feels most readable and maintainable for your use case. The filter(regex=...) approach combined with drop is often very expressive for this task.

The Goal: Dropping Columns by Name Patterns​

Example DataFrame​

Method 1: Using DataFrame.filter(regex=...) and DataFrame.drop() (Recommended)​

Dropping Columns Containing a Single String​

Dropping Columns Containing One of Multiple Strings (OR Logic)​

Method 2: Using Index.str.contains() and DataFrame.loc or drop()​

Creating a Boolean Mask for Column Names​

Selecting Columns to Keep (Inverting the Logic)​

Dropping Columns Based on the Mask​

Case-Insensitive Matching with str.contains()​

Alternative: Keeping Only Columns Whose Names Contain a String​

Conclusion​

Table of Contents

The Goal: Dropping Columns by Name Patterns

Example DataFrame

Method 1: Using `DataFrame.filter(regex=...)` and `DataFrame.drop()` (Recommended)

Dropping Columns Containing a Single String

Dropping Columns Containing One of Multiple Strings (OR Logic)

Method 2: Using `Index.str.contains()` and `DataFrame.loc` or `drop()`

Creating a Boolean Mask for Column Names

Selecting Columns to Keep (Inverting the Logic)

Dropping Columns Based on the Mask

Case-Insensitive Matching with `str.contains()`

Alternative: Keeping Only Columns Whose Names Contain a String

Conclusion