Python Pandas: How to Replace Entire String in Column if Contains Substring

When cleaning or transforming text data in Pandas, a common task is to replace the entire string value in a column if it contains a specific substring. For example, you might want to standardize job titles by replacing any title containing "Dev" or "Engineer" with a generic "Developer" or "Engineering Role".

This guide explains several effective methods to achieve this conditional whole-string replacement in Pandas, using str.contains(), str.replace() with regex, and Series.apply().

The Goal: Conditional Full String Replacement

Given a Pandas DataFrame and a specific column containing string values, we want to identify rows where the string in that column includes a certain substring. For all such identified rows, we want to replace the entire string value in that column with a new, specified string.

Example DataFrame

import pandas as pd
import re # For regex flags

data = {
    'EmployeeID': [101, 102, 103, 104, 105, 106],
    'JobTitle_Original': [
        'Software Developer',
        'Senior Web Developer',
        'Data Analyst',
        'DevOps Engineer',
        'Project Manager (Dev)',
        'Lead Analyst'
    ],
    'Department': ['Tech', 'Tech', 'Analytics', 'Tech', 'Management', 'Analytics']
}

df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
   EmployeeID      JobTitle_Original  Department
       101     Software Developer        Tech
       102   Senior Web Developer        Tech
       103           Data Analyst   Analytics
       104        DevOps Engineer        Tech
       105  Project Manager (Dev)  Management
       106           Lead Analyst   Analytics

Our goal is to replace any JobTitle_Original containing "Dev" or "Developer" (case-insensitive) with a standardized "Software Engineering Role".

Method 1: Using `Series.str.contains()` with `DataFrame.loc` (Recommended)

This is often the most readable and idiomatic Pandas approach.

Use Series.str.contains(substring) to create a boolean mask identifying rows where the column contains the substring.
Use DataFrame.loc[mask, 'ColumnName'] = new_value to assign the new value to the specified column for the matched rows.

Case-Sensitive Replacement

import pandas as pd

df = df_original.copy() # Work on a copy

substring_to_find = 'Dev' # Case-sensitive
new_value = 'Standardized Developer Role'

# Create a boolean mask for rows where 'JobTitle_Original' contains 'Dev'
mask = df['JobTitle_Original'].str.contains(substring_to_find, na=False) # na=False treats NaN as no match
print("Boolean mask for 'Dev':")
print(mask)
print()

# ✅ Use .loc to replace the entire string in 'JobTitle_Original' where mask is True
df.loc[mask, 'JobTitle_Original'] = new_value

print("DataFrame after case-sensitive replacement using str.contains():")
print(df)

Output:

Boolean mask for 'Dev':
   True
   True
  False
   True
   True
  False
Name: JobTitle_Original, dtype: bool

DataFrame after case-sensitive replacement using str.contains():
   EmployeeID            JobTitle_Original  Department
       101  Standardized Developer Role        Tech
       102  Standardized Developer Role        Tech
       103                 Data Analyst   Analytics
       104  Standardized Developer Role        Tech
       105  Standardized Developer Role  Management
       106                 Lead Analyst   Analytics

na=False in str.contains() ensures that if your column has NaN values, they don't cause errors and are treated as not containing the substring.

Case-Insensitive Replacement (`case=False`)

Set the case=False parameter in str.contains().

import pandas as pd

df = df_original.copy()

substring_to_find_ci = 'dev' # Search for 'dev' case-insensitively
new_value_ci = 'Software Engineering Role'

# ✅ Case-insensitive mask
mask_ci = df['JobTitle_Original'].str.contains(substring_to_find_ci, case=False, na=False)
print("Boolean mask for 'dev' (case-insensitive):")
print(mask_ci)
print()

df.loc[mask_ci, 'JobTitle_Original'] = new_value_ci

print("DataFrame after case-insensitive replacement using str.contains():")
print(df)

Output:

Boolean mask for 'dev' (case-insensitive):
   True
   True
  False
   True
   True
  False
Name: JobTitle_Original, dtype: bool

DataFrame after case-insensitive replacement using str.contains():
   EmployeeID          JobTitle_Original  Department
       101  Software Engineering Role        Tech
       102  Software Engineering Role        Tech
       103               Data Analyst   Analytics
       104  Software Engineering Role        Tech
       105  Software Engineering Role  Management
       106               Lead Analyst   Analytics

Method 2: Using `Series.str.replace()` with Regular Expressions (Regex)

The Series.str.replace(pattern, replacement_string, regex=True) method can replace occurrences of a pattern (which can be a regex) with replacement_string. To replace the entire string if a substring is found, the regex pattern needs to match the whole string while ensuring the substring is present.

Case-Sensitive Replacement

import pandas as pd

df = df_original.copy()

# Regex: '^.*Dev.*$'
# ^ : matches the start of the string
# .* : matches any character (except newline) zero or more times
# Dev : the literal substring to find
# .* : matches any character zero or more times again
# $ : matches the end of the string
# This pattern ensures the *entire string* contains 'Dev'
regex_pattern = r'^.*Dev.*$'
new_value_regex = 'Standardized Developer (Regex)'

# ✅ Replace entire string if it matches the regex pattern
df['JobTitle_Original'] = df['JobTitle_Original'].str.replace(
    regex_pattern,
    new_value_regex,
    regex=True
)

print("DataFrame after case-sensitive replacement using str.replace() with regex:")
print(df)

# A more general regex to match 'Dev' anywhere: r'.*Dev.*' and then use a conditional replacement
# or use the str.contains approach for simplicity if just replacing the whole string.
# For str.replace to replace the *whole* string, the regex must match the *whole* string.
# If the regex r'Dev' was used, only 'Dev' would be replaced, not the whole string.

Output:

DataFrame after case-sensitive replacement using str.replace() with regex:
   EmployeeID               JobTitle_Original  Department
       101  Standardized Developer (Regex)        Tech
       102  Standardized Developer (Regex)        Tech
       103                    Data Analyst   Analytics
       104  Standardized Developer (Regex)        Tech
       105  Standardized Developer (Regex)  Management
       106                    Lead Analyst   Analytics

note

The regex r'(^.*Dev.*$)' is designed to match the entire string if "Dev" is present. If you just used r'Dev', only the "Dev" part would be replaced. For whole string replacement based on a substring, Method 1 (str.contains + .loc) is often simpler.

Case-Insensitive Replacement (`flags=re.IGNORECASE`)

Use the flags parameter with re.IGNORECASE.

import pandas as pd
import re

df = df_original.copy()

regex_pattern_ci = r'^.*dev.*$' # Use lowercase in pattern for consistency with IGNORECASE
new_value_regex_ci = 'Software Engineering (Regex CI)'

df['JobTitle_Original'] = df['JobTitle_Original'].str.replace(
    regex_pattern_ci,
    new_value_regex_ci,
    regex=True,
    flags=re.IGNORECASE # ✅ Case-insensitive flag
)

print("DataFrame after case-insensitive replacement (str.replace() with regex):")
print(df)

Output:

DataFrame after case-insensitive replacement (str.replace() with regex):
   EmployeeID                JobTitle_Original  Department
       101  Software Engineering (Regex CI)        Tech
       102  Software Engineering (Regex CI)        Tech
       103                     Data Analyst   Analytics
       104  Software Engineering (Regex CI)        Tech
       105  Software Engineering (Regex CI)  Management
       106                     Lead Analyst   Analytics

Method 3: Using `Series.apply()` with a Lambda Function

The Series.apply() method can apply a custom function (or a lambda function) to each element of the Series.

Case-Sensitive Replacement

import pandas as pd

df = df_original.copy()

substring_to_check = 'Dev'
replacement_string = 'Developer via Apply'

# ✅ Apply a lambda function to check and replace
df['JobTitle_Original'] = df['JobTitle_Original'].apply(
    lambda current_string: replacement_string if isinstance(current_string, str) and substring_to_check in current_string else current_string
)

print("DataFrame after case-sensitive replacement using apply():")
print(df)

Output:

DataFrame after case-sensitive replacement using apply():
   EmployeeID    JobTitle_Original  Department
       101  Developer via Apply        Tech
       102  Developer via Apply        Tech
       103         Data Analyst   Analytics
       104  Developer via Apply        Tech
       105  Developer via Apply  Management
       106         Lead Analyst   Analytics

isinstance(current_string, str): Good practice to check if the element is a string before applying string methods, especially if the column might contain NaN or other types.

Case-Insensitive Replacement (using `.lower()`)

Convert both the current string and the substring to a consistent case (e.g., lowercase) for comparison.

import pandas as pd

df = df_original.copy()

substring_lower = 'dev'
replacement_ci_apply = 'Software Engineer (Apply CI)'

df['JobTitle_Original'] = df['JobTitle_Original'].apply(
    lambda current_string: replacement_ci_apply if isinstance(current_string, str) and substring_lower in current_string.lower() else current_string
)

print("DataFrame after case-insensitive replacement using apply():")
print(df)

Output:

DataFrame after case-insensitive replacement using apply():
   EmployeeID             JobTitle_Original  Department
       101  Software Engineer (Apply CI)        Tech
       102  Software Engineer (Apply CI)        Tech
       103                  Data Analyst   Analytics
       104  Software Engineer (Apply CI)        Tech
       105  Software Engineer (Apply CI)  Management
       106                  Lead Analyst   Analytics

note

apply() is very flexible but can be less performant than vectorized string methods (.str.contains, .str.replace) for very large DataFrames.

Choosing the Right Method

Series.str.contains() with df.loc[] assignment (Method 1): Generally the most recommended for this specific task. It's clear, idiomatic Pandas, and handles case-insensitivity easily. Performance is usually very good.
Series.str.replace() with regex (Method 2): More powerful if you need complex pattern matching to define what part of the string to look for before replacing the whole string. Constructing a regex that correctly matches the entire string when a substring is present (like r'^.*substring.*$') is key.
Series.apply() (Method 3): Most flexible for complex custom logic that goes beyond simple substring checks or regex. However, it can be slower on large datasets compared to the vectorized .str methods.

Conclusion

To replace an entire string value in a Pandas DataFrame column if it contains a specific substring:

Use df.loc[df['column'].str.contains(substring, case=..., na=False), 'column'] = new_value. This is typically the cleanest and most efficient approach.
Alternatively, df['column'].str.replace(r'^.*substring_pattern.*$', new_value, regex=True, flags=...) can be used with a carefully crafted regular expression that matches the entire string.
df['column'].apply(lambda x: ...) provides maximum flexibility for custom replacement logic but might be less performant.

Remember to handle case sensitivity (case=False or flags=re.IGNORECASE) and potential NaN values (na=False in str.contains) as needed for your specific use case.

The Goal: Conditional Full String Replacement​

Example DataFrame​

Method 1: Using Series.str.contains() with DataFrame.loc (Recommended)​

Case-Sensitive Replacement​

Case-Insensitive Replacement (case=False)​

Method 2: Using Series.str.replace() with Regular Expressions (Regex)​

Case-Sensitive Replacement​

Case-Insensitive Replacement (flags=re.IGNORECASE)​

Method 3: Using Series.apply() with a Lambda Function​

Case-Sensitive Replacement​

Case-Insensitive Replacement (using .lower())​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: Conditional Full String Replacement

Example DataFrame

Method 1: Using `Series.str.contains()` with `DataFrame.loc` (Recommended)

Case-Sensitive Replacement

Case-Insensitive Replacement (`case=False`)

Method 2: Using `Series.str.replace()` with Regular Expressions (Regex)

Case-Sensitive Replacement

Case-Insensitive Replacement (`flags=re.IGNORECASE`)

Method 3: Using `Series.apply()` with a Lambda Function

Case-Sensitive Replacement

Case-Insensitive Replacement (using `.lower()`)

Choosing the Right Method

Conclusion