Python Pandas: How to Update DataFrame Rows Conditionally During Iteration

While Pandas excels at vectorized operations for high performance, there are times when you might find yourself needing to iterate over DataFrame rows and update values based on specific conditions within each row. This could be due to complex row-dependent logic, for learning purposes, or when working with smaller DataFrames where performance differences are negligible.

This guide will thoroughly demonstrate several common methods for iterating through Pandas DataFrame rows—using iterrows(), df.index, and itertuples()—and how to conditionally update cell values during this iteration using df.at[]. We will also explore the apply() method as a more idiomatic (though still row-wise) alternative for certain conditional updates.

Why Iterate and Update? (And a Word of Caution)

Iterating over DataFrame rows to perform updates is generally not recommended for large DataFrames due to significantly poorer performance compared to vectorized operations (e.g., using df.loc with boolean indexing). Vectorized operations apply an action to a whole array or Series at once, leveraging underlying C implementations for speed.

However, you might choose to iterate if:

The update logic for each row is very complex and difficult to vectorize.
You are working with very small DataFrames where the performance overhead is acceptable.
You are learning Pandas and want to understand row-by-row processing (though it's crucial to learn vectorized methods soon after).

This guide shows how to do it, but always consider if a vectorized alternative exists first.

Let's start with a sample DataFrame:

import pandas as pd

df = pd.DataFrame({
    'employee_name': ['Alice Wonderland', 'Robert Tables', 'Charles Xavier', 'Diana Prince'],
    'department': ['HR', 'IT', 'Management', 'Marketing'],
    'salary': [70000, 85000, 120000, 75000],
    'bonus_eligible': [True, False, True, True]
})

print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
      employee_name  department  salary  bonus_eligible
0  Alice Wonderland          HR   70000            True
1     Robert Tables          IT   85000           False
2    Charles Xavier  Management  120000            True
3      Diana Prince   Marketing   75000            True

Method 1: Using `DataFrame.iterrows()`

The DataFrame.iterrows() method yields (index, Series) pairs, allowing you to access row data as a Pandas Series. For updating, it's best to use df.at[index, column_label] for fast scalar setting.

Basic Conditional Update

Let's increase the salary by 10% for HR employees.

import pandas as pd

df = pd.DataFrame({
    'employee_name': ['Alice Wonderland', 'Robert Tables', 'Charles Xavier', 'Diana Prince'],
    'department': ['HR', 'IT', 'Management', 'Marketing'],
    'salary': [70000, 85000, 120000, 75000],
    'bonus_eligible': [True, False, True, True]
})

df_iterrows = df.copy()

for index, row_series in df_iterrows.iterrows():
    if row_series['department'] == 'HR':
        # Increase salary by 10%
        df_iterrows.at[index, 'salary'] = row_series['salary'] * 1.10

print("DataFrame after iterrows() update (HR salary increase):")
print(df_iterrows)

Output:

DataFrame after iterrows() update (HR salary increase):
      employee_name  department  salary  bonus_eligible
0  Alice Wonderland          HR   77000            True
1     Robert Tables          IT   85000           False
2    Charles Xavier  Management  120000            True
3      Diana Prince   Marketing   75000            True

df.iterrows(): Yields index (the row label) and row_series (the row data as a Series).
df.at[index, 'salary']: Efficiently accesses and sets a single value at the specified index and salary column.

Updating Based on Multiple Conditions (`&` and `|`)

Update salary to a fixed 90000 if department is 'IT' AND bonus_eligible is False.

import pandas as pd

df = pd.DataFrame({
    'employee_name': ['Alice Wonderland', 'Robert Tables', 'Charles Xavier', 'Diana Prince'],
    'department': ['HR', 'IT', 'Management', 'Marketing'],
    'salary': [70000, 85000, 120000, 75000],
    'bonus_eligible': [True, False, True, True]
})

df_iterrows_multi = df.copy()

for index, row_series in df_iterrows_multi.iterrows():
    # Use parentheses for each condition with bitwise operators & (AND), | (OR)
    if (row_series['department'] == 'IT') & (row_series['bonus_eligible'] == False):
        df_iterrows_multi.at[index, 'salary'] = 90000

print("DataFrame after iterrows() (IT and not bonus_eligible salary to 90k):")
print(df_iterrows_multi)

Output:

DataFrame after iterrows() (IT and not bonus_eligible salary to 90k):
      employee_name  department  salary  bonus_eligible
0  Alice Wonderland          HR   70000            True
1     Robert Tables          IT   90000           False
2    Charles Xavier  Management  120000            True
3      Diana Prince   Marketing   75000            True

To use OR, replace & with |. For example: if (condition1) | (condition2):

Method 2: Iterating with `DataFrame.index`

You can iterate directly over the DataFrame's index labels and use df.at[index_label, column_label] to access and modify values.

import pandas as pd

df = pd.DataFrame({
    'employee_name': ['Alice Wonderland', 'Robert Tables', 'Charles Xavier', 'Diana Prince'],
    'department': ['HR', 'IT', 'Management', 'Marketing'],
    'salary': [70000, 85000, 120000, 75000],
    'bonus_eligible': [True, False, True, True]
})

df_index_iter = df.copy()

for idx_label in df_index_iter.index:
    if df_index_iter.at[idx_label, 'salary'] < 80000:
        df_index_iter.at[idx_label, 'salary'] = df_index_iter.at[idx_label, 'salary'] + 5000 # Give a 5k raise

print("DataFrame after iterating with df.index (salaries < 80k get 5k raise):")
print(df_index_iter)

Output:

DataFrame after iterating with df.index (salaries < 80k get 5k raise):
      employee_name  department  salary  bonus_eligible
0  Alice Wonderland          HR   75000            True
1     Robert Tables          IT   85000           False
2    Charles Xavier  Management  120000            True
3      Diana Prince   Marketing   80000            True

This method requires you to fetch the value using df.at[] for the condition check as well, which might be slightly less readable than iterrows() where row_series is directly available.

Method 3: Using `DataFrame.itertuples()` (A Faster Iteration Method)

DataFrame.itertuples() iterates over DataFrame rows as namedtuples. This is generally faster than iterrows() because it avoids creating a Series object for each row. The first element of the tuple is the Index, and subsequent elements are the column values.

import pandas as pd

df = pd.DataFrame({
    'employee_name': ['Alice Wonderland', 'Robert Tables', 'Charles Xavier', 'Diana Prince'],
    'department': ['HR', 'IT', 'Management', 'Marketing'],
    'salary': [70000, 85000, 120000, 75000],
    'bonus_eligible': [True, False, True, True]
})

df_itertuples = df.copy()

# By default, index=True, so row.Index gives the index label.
# name=None ensures default field names like _1, _2 for columns with invalid Python identifiers.
for row_tuple in df_itertuples.itertuples(index=True, name='EmployeeRow'):
    # Access columns by their names (if valid identifiers) or by field name (e.g., row_tuple.salary)
    # The index is available as row_tuple.Index
    if row_tuple.department == 'Marketing' and row_tuple.salary < 100000:
        df_itertuples.at[row_tuple.Index, 'salary'] = row_tuple.salary * 1.05 # 5% raise for Marketing

print("DataFrame after itertuples() update (Marketing salary increase):")
print(df_itertuples)

Output:

DataFrame after itertuples() update (Marketing salary increase):
      employee_name  department  salary  bonus_eligible
0  Alice Wonderland          HR   70000            True
1     Robert Tables          IT   85000           False
2    Charles Xavier  Management  120000            True
3      Diana Prince   Marketing   78750            True

note

row_tuple.Index: Accesses the index of the current row.
row_tuple.column_name: Accesses the value of column_name for the current row.

Method 4: A More Pandas-Idiomatic Row-wise Approach: `DataFrame.apply(axis=1)`

While df.apply(func, axis=1) also processes row by row, it's often used for more complex transformations where you return a new Series or modify an existing one based on row data. It's generally more idiomatic for conditional assignments to an entire column based on other columns in that row, rather than targeted cell updates within a loop.

Let's set bonus_eligible to False if salary is >= 100000, otherwise keep its original value. This is an assignment to 'bonus_eligible', not an in-place update of various cells.

import pandas as pd

df = pd.DataFrame({
    'employee_name': ['Alice Wonderland', 'Robert Tables', 'Charles Xavier', 'Diana Prince'],
    'department': ['HR', 'IT', 'Management', 'Marketing'],
    'salary': [70000, 85000, 120000, 75000],
    'bonus_eligible': [True, False, True, True]
})

df_apply = df.copy()

def determine_bonus_eligibility(row_series):
    if row_series['salary'] >= 100000:
        return False # Not eligible
    return row_series['bonus_eligible'] # Keep original eligibility

df_apply['bonus_eligible_updated'] = df_apply.apply(determine_bonus_eligibility, axis=1)

print("DataFrame after apply() to determine new bonus eligibility:")
print(df_apply)
print()

# A more common use of apply for conditional new value for a column:
# If you wanted to set salary to 0 if not bonus_eligible:
df_apply['salary_adjusted'] = df_apply.apply(
    lambda row: 0 if not row['bonus_eligible'] else row['salary'],
    axis=1
)

print("DataFrame with salary adjusted by apply():")
print(df_apply[['employee_name', 'bonus_eligible', 'salary', 'salary_adjusted']])

Output:

DataFrame after apply() to determine new bonus eligibility:
      employee_name  department  salary  bonus_eligible  \
Alice Wonderland          HR   70000            True   
   Robert Tables          IT   85000           False   
  Charles Xavier  Management  120000            True   
    Diana Prince   Marketing   75000            True   

   bonus_eligible_updated  
                  True  
                 False  
                 False  
                  True  

DataFrame with salary adjusted by apply():
      employee_name  bonus_eligible  salary  salary_adjusted
Alice Wonderland            True   70000            70000
   Robert Tables           False   85000                0
  Charles Xavier            True  120000           120000
    Diana Prince            True   75000            75000

note

Using apply() for what could be a simple df.loc assignment is less efficient. For instance, the last example is better done with:

df.loc[df['bonus_eligible'] == False, 'salary_adjusted_loc'] = 0
df.loc[df['bonus_eligible'] == True, 'salary_adjusted_loc'] = df['salary']

Important: Performance Considerations for Large DataFrames

Explicit iteration (iterrows, itertuples, for idx in df.index) is significantly slower than vectorized Pandas operations for large DataFrames. Whenever possible, try to achieve your updates using:

Boolean Indexing with .loc:

# Example: Give 10% raise to HR department
df.loc[df['department'] == 'HR', 'salary'] *= 1.10

np.select() or np.where(): For more complex conditional logic across columns.
.map() or .replace(): For value transformations based on mappings.

These methods operate on entire arrays/Series at once and are much more efficient.

Conclusion

While direct iteration and conditional updates in Pandas using methods like iterrows(), itertuples(), or looping through df.index are possible, they should be approached with caution, especially for larger datasets due to performance implications. For these iteration methods, df.at[index_label, column_label] is the preferred way to set values.

The DataFrame.apply(axis=1) method provides another row-wise processing capability, often more suited for creating new columns or transforming existing ones based on row-level logic.

However, always strive to use vectorized operations (.loc with boolean masks, np.select, etc.) whenever your update logic allows, as they offer substantial performance benefits. Understanding both iterative and vectorized approaches will make you a more effective Pandas user.

Why Iterate and Update? (And a Word of Caution)​

Method 1: Using DataFrame.iterrows()​

Basic Conditional Update​

Updating Based on Multiple Conditions (& and |)​

Method 2: Iterating with DataFrame.index​

Method 3: Using DataFrame.itertuples() (A Faster Iteration Method)​

Method 4: A More Pandas-Idiomatic Row-wise Approach: DataFrame.apply(axis=1)​

Important: Performance Considerations for Large DataFrames​

Conclusion​

Table of Contents

Why Iterate and Update? (And a Word of Caution)

Method 1: Using `DataFrame.iterrows()`

Basic Conditional Update

Updating Based on Multiple Conditions (`&` and `|`)

Method 2: Iterating with `DataFrame.index`

Method 3: Using `DataFrame.itertuples()` (A Faster Iteration Method)

Method 4: A More Pandas-Idiomatic Row-wise Approach: `DataFrame.apply(axis=1)`

Important: Performance Considerations for Large DataFrames

Conclusion