Python Pandas: How to Modify a Subset of DataFrame Rows Based on Condition

Modifying specific rows in a Pandas DataFrame based on certain conditions is a fundamental part of data manipulation and cleaning. You might need to update values in one or more columns for rows that meet specific criteria (e.g., update the 'Status' for all products in a certain 'Category', or apply a discount to items with 'Price' above a threshold).

This guide explains several effective methods to modify a subset of rows in a Pandas DataFrame, primarily using DataFrame.loc, numpy.where(), and DataFrame.apply().

The Goal: Conditional Row Modification

Given a Pandas DataFrame, we want to identify a subset of rows based on a condition (or multiple conditions) and then change the values in one or more specific columns only for those selected rows.

Example DataFrame

import pandas as pd
import numpy as np # For np.where

data = {
    'ProductID': [101, 102, 103, 104, 105, 106],
    'Category': ['Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Apparel'],
    'Price': [299.99, 19.95, 45.50, 799.00, 12.50, 30.00],
    'Stock': [10, 50, 0, 5, 120, 0],
    'Status': ['Active', 'Active', 'Discontinued', 'Active', 'Active', 'Discontinued']
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
   ProductID     Category   Price  Stock        Status
      101  Electronics  299.99     10        Active
      102        Books   19.95     50        Active
      103      Apparel   45.50      0  Discontinued
      104  Electronics  799.00      5        Active
      105        Books   12.50    120        Active
      106      Apparel   30.00      0  Discontinued

Method 1: Using `DataFrame.loc` (Recommended)

The DataFrame.loc indexer is primarily label-based but also excels at boolean array indexing, making it ideal for conditional selection and assignment. The syntax is df.loc[row_condition, column_to_modify] = new_value.

Modifying a Single Column

Let's update the 'Status' to 'Clearance' for all 'Apparel' items with 0 'Stock'.

import pandas as pd

df = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106],
    'Category': ['Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Apparel'],
    'Price': [299.99, 19.95, 45.50, 799.00, 12.50, 30.00],
    'Stock': [10, 50, 0, 5, 120, 0],
    'Status': ['Active', 'Active', 'Discontinued', 'Active', 'Active', 'Discontinued']
})

# Define the condition for rows to select
condition = (df['Category'] == 'Apparel') & (df['Stock'] == 0)
print("Boolean mask for selection:")
print(condition)
print()

# ✅ Select rows meeting the condition and update the 'Status' column for them
df.loc[condition, 'Status'] = 'Clearance'

print("DataFrame after modifying 'Status' for Apparel with 0 stock:")
print(df)

Output:

Boolean mask for selection:
  False
  False
   True
  False
  False
   True
dtype: bool

DataFrame after modifying 'Status' for Apparel with 0 stock:
   ProductID     Category   Price  Stock     Status
      101  Electronics  299.99     10     Active
      102        Books   19.95     50     Active
      103      Apparel   45.50      0  Clearance
      104  Electronics  799.00      5     Active
      105        Books   12.50    120     Active
      106      Apparel   30.00      0  Clearance

Modifying Based on Existing Values in the Target Column

You can use the current values of the column being modified on the right side of the assignment. For example, increase the 'Price' by 10% for 'Electronics'.

import pandas as pd

df = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106],
    'Category': ['Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Apparel'],
    'Price': [300.0, 20.0, 45.0, 800.0, 12.0, 30.0], # Simplified prices
    'Status': ['Active', 'Active', 'Clearance', 'Active', 'Active', 'Clearance']
})

condition_electronics = (df['Category'] == 'Electronics')

# ✅ Increase 'Price' by 10% for rows matching the condition
df.loc[condition_electronics, 'Price'] = df.loc[condition_electronics, 'Price'] * 1.10

print("DataFrame after increasing 'Price' for Electronics by 10%:")
print(df)

Output:

DataFrame after increasing 'Price' for Electronics by 10%:
   ProductID     Category  Price     Status
      101  Electronics  330.0     Active
      102        Books   20.0     Active
      103      Apparel   45.0  Clearance
      104  Electronics  880.0     Active
      105        Books   12.0     Active
      106      Apparel   30.0  Clearance

It's crucial that the selection on the right side df.loc[condition_electronics, 'Price'] aligns with the rows being modified on the left.

Modifying Multiple Columns

To modify multiple columns for the selected rows, pass a list of column names. The value assigned can be a single scalar (applied to all matched cells in the specified columns) or a list/array of values that aligns with the selection.

import pandas as pd

df = pd.DataFrame({
    'ProductID': [101, 102, 103],
    'Category': ['Electronics', 'Books', 'Apparel'],
    'Price': [330.0, 20.0, 45.0],
    'Stock': [10, 50, 0],
    'Status': ['Active', 'Active', 'Clearance']
})

condition_no_stock = (df['Stock'] == 0)
columns_to_update = ['Status', 'Price']

# An option is to assign different specific values to each column
# (This requires more careful construction if values differ per row)
# For simplicity, let's assign a single new status and a new price
# For complex row-specific values, np.where or apply might be better.
# Here, we'll set 'Status' to 'Sold Out' and 'Price' to 0 for 0 stock items.
df.loc[condition_no_stock, 'Status'] = 'Sold Out'
df.loc[condition_no_stock, 'Price']  = 0.00 # Update price separately or use a more complex assignment

# If assigning the *same* value across multiple columns (less common for different types):
# df.loc[condition_no_stock, ['Status', 'Note']] = "Out of Stock"

# If assigning a list of new values (must match number of selected rows * number of columns)
# Or assign a DataFrame of new values with matching index and columns

print("DataFrame after updating multiple columns for 0 stock:")
print(df)

Output:

DataFrame after updating multiple columns for 0 stock:
   ProductID     Category  Price  Stock    Status
0        101  Electronics  330.0     10    Active
1        102        Books   20.0     50    Active
2        103      Apparel    0.0      0  Sold Out

Example: Modify multiple columns based on their existing values

import pandas as pd

# Sample DataFrame
df_multi_mod = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [10, 20, 30, 40],
    'C': ['x', 'y', 'x', 'y']
})

# Condition where column 'C' is 'x'
condition_c_x = (df_multi_mod['C'] == 'x')

# Create a DataFrame from modified values
updated_values = df_multi_mod.loc[condition_c_x, ['A', 'B']].apply(
    lambda r: pd.Series([r['A'] + 1, r['B'] * 2]), axis=1
)

# Assign the updated values back to the original DataFrame
df_multi_mod.loc[condition_c_x, ['A', 'B']] = updated_values.values

print("Modifying multiple columns based on existing values:")
print(df_multi_mod)

Output:

Modifying multiple columns based on existing values:
   A   B  C
0  2  20  x
1  2  20  y
2  4  60  x
3  4  40  y

When assigning calculated values to multiple columns, using .apply() on the right-hand side selection or preparing a DataFrame/array of new values can be necessary if the new values for each column are different and depend on existing data.

Method 2: Using `numpy.where()`

numpy.where(condition, value_if_true, value_if_false) is useful for conditional assignments, especially when creating a new column or updating one based on a condition.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106],
    'Category': ['Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Apparel'],
    'Price': [299.99, 19.95, 45.50, 799.00, 12.50, 30.00],
    'Stock': [10, 50, 0, 5, 120, 0],
    'Status': ['Active', 'Active', 'Discontinued', 'Active', 'Active', 'Discontinued']
})

# Condition: Stock == 0
condition = (df['Stock'] == 0)

# ✅ Update 'Status' column using np.where()
# If condition is True, set Status to 'Sold Out', else keep original Status
df['Status'] = np.where(condition, 'Sold Out', df['Status'])

print("DataFrame after modifying 'Status' using np.where():")
print(df)

Output:

DataFrame after modifying 'Status' using np.where():
   ProductID     Category   Price  Stock    Status
      101  Electronics  299.99     10    Active
      102        Books   19.95     50    Active
      103      Apparel   45.50      0  Sold Out
      104  Electronics  799.00      5    Active
      105        Books   12.50    120    Active
      106      Apparel   30.00      0  Sold Out

note

np.where is very efficient as it leverages NumPy's vectorized operations. It's excellent for updating a single column based on a condition applied to potentially other columns.

Method 3: Using `DataFrame.apply()` (Row-wise)

For more complex conditional logic that is hard to express with direct boolean indexing or np.where, you can use DataFrame.apply(custom_function, axis=1). The custom function receives each row as a Series.

import pandas as pd

df_apply_example = pd.DataFrame({
    'ProductID': [101, 102, 103, 104, 105, 106],
    'Category': ['Electronics', 'Books', 'Apparel', 'Electronics', 'Books', 'Apparel'],
    'Price': [299.99, 19.95, 45.50, 799.00, 12.50, 30.00],
    'Stock': [10, 50, 0, 5, 120, 0],
    'Status': ['Active', 'Active', 'Discontinued', 'Active', 'Active', 'Discontinued']
})

def update_status_based_on_stock(row):
  # This function operates on a single row (passed as a Series)
  if row['Stock'] == 0 and row['Category'] == 'Apparel':
    return 'Clearance_Applied' # New value for 'Status'
  return row['Status'] # Keep original status otherwise

# ✅ Apply the function row-wise (axis=1) to update the 'Status' column
df_apply_example['Status'] = df_apply_example.apply(update_status_based_on_stock, axis=1)

print("DataFrame after modifying 'Status' using apply():")
print(df_apply_example)

Output:

DataFrame after modifying 'Status' using apply():
   ProductID     Category   Price  Stock             Status
      101  Electronics  299.99     10             Active
      102        Books   19.95     50             Active
      103      Apparel   45.50      0  Clearance_Applied
      104  Electronics  799.00      5             Active
      105        Books   12.50    120             Active
      106      Apparel   30.00      0  Clearance_Applied

axis=1: Ensures the function is applied to each row.
apply() can be less performant than vectorized methods like .loc or np.where for simple conditions, but offers maximum flexibility for complex row-wise logic.

Choosing the Right Method

DataFrame.loc[condition, column(s)] = value: Generally recommended for its clarity, conciseness, and directness for most conditional updates, especially when assigning a fixed value or values derived from the same selection.
numpy.where(condition, value_if_true, value_if_false): Excellent for updating a single column based on a condition. It's very performant.
DataFrame.apply(func, axis=1): Best suited for complex row-by-row conditional logic that is difficult to express with boolean indexing or np.where. It's typically slower than vectorized approaches.

Conclusion

Modifying a subset of rows in a Pandas DataFrame based on conditions is a core data manipulation skill.

df.loc[row_condition, column_to_change] = new_value is the most idiomatic and versatile method for most cases.
np.where() offers an efficient alternative for updating a single column conditionally.
df.apply(..., axis=1) provides maximum flexibility for complex row-wise transformations.

Choose the method that makes your code clear, readable, and performs adequately for your dataset size and the complexity of your conditions.

The Goal: Conditional Row Modification​

Example DataFrame​

Method 1: Using DataFrame.loc (Recommended)​

Modifying a Single Column​

Modifying Based on Existing Values in the Target Column​

Modifying Multiple Columns​

Method 2: Using numpy.where()​

Method 3: Using DataFrame.apply() (Row-wise)​

Choosing the Right Method​

Conclusion​

Table of Contents

The Goal: Conditional Row Modification

Example DataFrame

Method 1: Using `DataFrame.loc` (Recommended)

Modifying a Single Column

Modifying Based on Existing Values in the Target Column

Modifying Multiple Columns

Method 2: Using `numpy.where()`

Method 3: Using `DataFrame.apply()` (Row-wise)

Choosing the Right Method

Conclusion