Skip to main content

Python Pandas: How to Drop DataFrame Columns if Name Contains String (Regex/str.contains)

When working with Pandas DataFrames, especially those with many columns or dynamically generated column names, you often need to drop columns based on patterns in their names. A common requirement is to remove all columns whose names contain a specific substring or match a regular expression.

This guide explains how to effectively drop Pandas DataFrame columns if their names contain a given string or match a regex, using DataFrame.filter() with regex and Index.str.contains().

The Goal: Dropping Columns by Name Patterns

Given a Pandas DataFrame, we want to identify and remove columns where the column's name (header) includes a specific substring (e.g., "temp", "_calculated") or matches a more complex regular expression pattern.

Example DataFrame

import pandas as pd
import re # For re.IGNORECASE if needed

data = {
'Record_ID': [101, 102, 103],
'Patient_Name': ['Alice', 'Bob', 'Charlie'],
'Age_Years': [30, 45, 28],
'Vital_Temp_C': [36.5, 37.0, 36.8],
'Vital_Pressure_Systolic': [120, 130, 115],
'Notes_General': ['Stable', 'Observation', 'Stable'],
'Lab_Test_Result_A': [1.5, 2.1, 0.9],
'Lab_Test_Flag_A': [False, True, False]
}

df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
Record_ID Patient_Name Age_Years Vital_Temp_C Vital_Pressure_Systolic \
0 101 Alice 30 36.5 120
1 102 Bob 45 37.0 130
2 103 Charlie 28 36.8 115

Notes_General Lab_Test_Result_A Lab_Test_Flag_A
0 Stable 1.5 False
1 Observation 2.1 True
2 Stable 0.9 False

Let's say we want to drop columns related to "Vital" signs or those containing "Test".

The DataFrame.filter() method can subset rows or columns based on labels. When used with the regex parameter and axis=1 (for columns), it selects columns whose names match the regular expression.

Dropping Columns Containing a Single String

To drop all columns whose names contain the substring "Vital":

import pandas as pd

df = pd.DataFrame({
'Record_ID': [101, 102, 103], 'Patient_Name': ['Alice', 'Bob', 'Charlie'],
'Age_Years': [30, 45, 28], 'Vital_Temp_C': [36.5, 37.0, 36.8],
'Vital_Pressure_Systolic': [120, 130, 115], 'Notes_General': ['Stable', 'Observation', 'Stable']
})

# Step 1: Filter to get the names of columns containing 'Vital'
cols_to_drop_filter = df.filter(regex='Vital', axis=1).columns
print(f"Columns to drop (containing 'Vital'): {list(cols_to_drop_filter)}")

# Step 2: Drop these columns
# df.drop(columns=cols_to_drop_filter, inplace=True) # Modifies df directly
df_dropped_vital = df.drop(columns=cols_to_drop_filter) # Returns a new DataFrame

print("DataFrame after dropping columns containing 'Vital':")
print(df_dropped_vital)

Output:

Columns to drop (containing 'Vital'): ['Vital_Temp_C', 'Vital_Pressure_Systolic']
DataFrame after dropping columns containing 'Vital':
Record_ID Patient_Name Age_Years Notes_General
0 101 Alice 30 Stable
1 102 Bob 45 Observation
2 103 Charlie 28 Stable
  • df.filter(regex='Vital', axis=1): Selects a DataFrame containing only columns whose names match the regex 'Vital' (which acts as a substring search here).
  • .columns: Gets the column names from this filtered DataFrame.
  • df.drop(columns=..., axis=1): Drops the specified columns. axis=1 is crucial for dropping columns. Use inplace=True to modify the original DataFrame directly.
  • You can do this in one line:
    df.drop(list(df.filter(regex='Vital', axis=1)), axis=1, inplace=True)

Dropping Columns Containing One of Multiple Strings (OR Logic)

Use the pipe | character in your regex to specify OR logic. To drop columns containing "Vital" OR "Lab":

import pandas as pd

df = pd.DataFrame({
'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120],
'Lab_Test_Result_A': [1.5], 'Lab_Test_Flag_A': [False], 'Notes_General': ['Stable']
})

regex_pattern_multi = 'Vital|Lab' # Match 'Vital' OR 'Lab'
cols_to_drop_multi = df.filter(regex=regex_pattern_multi, axis=1).columns
print(f"Columns to drop (containing 'Vital' or 'Lab'): {list(cols_to_drop_multi)}\n")

df_dropped_multi = df.drop(columns=cols_to_drop_multi)

print("DataFrame after dropping columns containing 'Vital' or 'Lab':")
print(df_dropped_multi)

Output:

Columns to drop (containing 'Vital' or 'Lab'): ['Vital_Temp_C', 'Vital_Pressure_Systolic', 'Lab_Test_Result_A', 'Lab_Test_Flag_A']

DataFrame after dropping columns containing 'Vital' or 'Lab':
Record_ID Patient_Name Age_Years Notes_General
0 101 Alice 30 Stable

Method 2: Using Index.str.contains() and DataFrame.loc or drop()

This method involves creating a boolean mask from the DataFrame's column names.

Creating a Boolean Mask for Column Names

The df.columns.str.contains(pattern) method returns a boolean array indicating which column names contain the pattern.

import pandas as pd

df = pd.DataFrame({
'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})

substring_to_find = 'name' # Case-sensitive search for 'name'

# Create a boolean mask for column names containing the substring
cols_contain_name_mask = df.columns.str.contains(substring_to_find)
print(f"Boolean mask for columns containing '{substring_to_find}':")
print(cols_contain_name_mask)

Output:

Boolean mask for columns containing 'name':
[False False False False False False]

Selecting Columns to Keep (Inverting the Logic)

To drop columns matching the pattern, you select the columns that do not match. Use the tilde (~) operator to invert the boolean mask.

import pandas as pd

df = pd.DataFrame({
'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})
substring_to_find = 'Name' # Example: to match 'Patient_Name'
cols_contain_name_mask = df.columns.str.contains(substring_to_find)

# ✅ Select columns to KEEP (where mask is False, i.e., name does NOT contain 'Name')
df_kept_cols = df.loc[:, ~cols_contain_name_mask]

print(f"DataFrame after keeping columns NOT containing '{substring_to_find}':")
print(df_kept_cols)

Output (Patient_Name column is dropped):

Dropping Columns Based on the Mask

Alternatively, get the names of columns to drop and use df.drop().

import pandas as pd

df = pd.DataFrame({
'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})
substring_to_find = 'Name'
cols_contain_name_mask = df.columns.str.contains(substring_to_find)

# Get actual column names to drop
cols_to_drop_str_contains = df.columns[cols_contain_name_mask]
print(f"Columns to drop using str.contains: {cols_to_drop_str_contains.tolist()}\n")

df_dropped_str_contains = df.drop(columns=cols_to_drop_str_contains)
print(f"DataFrame after dropping columns containing '{substring_to_find}' (using str.contains then drop):")
print(df_dropped_str_contains)

Output: (Same as selecting kept columns with ~mask)

Columns to drop using str.contains: ['Patient_Name']

DataFrame after dropping columns containing 'Name' (using str.contains then drop):
Record_ID Age_Years Vital_Temp_C Vital_Pressure_Systolic Notes_General
0 101 30 36.5 120 Stable

Case-Insensitive Matching with str.contains()

Set the case=False parameter in str.contains(). You can also use flags=re.IGNORECASE.

import pandas as pd
import re # For re.IGNORECASE

df = pd.DataFrame({
'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30], 'Notes_General': ['Stable']
})

substring_lower = 'name' # Search for 'name' case-insensitively

# ✅ Case-insensitive mask
cols_contain_name_case_insensitive_mask = df.columns.str.contains(substring_lower, case=False)
# OR: cols_contain_name_case_insensitive_mask = df.columns.str.contains(substring_lower, flags=re.IGNORECASE)

print(f"Boolean mask for columns containing '{substring_lower}' (case-insensitive):")
print(cols_contain_name_case_insensitive_mask)

df_kept_case_insensitive = df.loc[:, ~cols_contain_name_case_insensitive_mask]
print("DataFrame after case-insensitive drop:")
print(df_kept_case_insensitive)

Output:

Boolean mask for columns containing 'name' (case-insensitive):
[False True False False]
DataFrame after case-insensitive drop:
Record_ID Age_Years Notes_General
0 101 30 Stable

Alternative: Keeping Only Columns Whose Names Contain a String

If your goal is the inverse – to keep only the columns whose names contain a certain string – DataFrame.filter(like=..., axis=1) or DataFrame.filter(regex=..., axis=1) directly achieves this without needing drop().

import pandas as pd

df = pd.DataFrame({
'Record_ID': [101], 'Patient_Name': ['Alice'], 'Age_Years': [30],
'Vital_Temp_C': [36.5], 'Vital_Pressure_Systolic': [120], 'Notes_General': ['Stable']
})

# ✅ Keep only columns whose names contain 'Vital'
df_only_vital_cols = df.filter(like='Vital', axis=1)
# Or using regex: df_only_vital_cols = df.filter(regex='Vital', axis=1)

print("DataFrame with ONLY columns containing 'Vital':")
print(df_only_vital_cols)

Output:

DataFrame with ONLY columns containing 'Vital':
Vital_Temp_C Vital_Pressure_Systolic
0 36.5 120

Conclusion

To drop Pandas DataFrame columns based on whether their names contain a specific string or match a regex:

  1. Using df.filter(regex=pattern, axis=1) then df.drop(columns=...):
    • Filter to get the list of column names matching your pattern (which can be a simple substring or a complex regex).
    • Pass this list of names to df.drop(columns=...). This is generally clean and leverages regex power effectively.
  2. Using df.columns.str.contains(pattern) then df.loc[:, ~mask] or df.drop(columns=...):
    • Create a boolean mask from the column names using str.contains().
    • Either select columns to keep using df.loc[:, ~mask] (inverting the mask).
    • Or get the names of columns to drop from the mask df.columns[mask] and use df.drop().
    • str.contains() has case and flags parameters for case-insensitivity.

Choose the method that feels most readable and maintainable for your use case. The filter(regex=...) approach combined with drop is often very expressive for this task.