Skip to main content

Python Pandas: How to Replace Negative Numbers with Zero in DataFrame

When working with numerical data in Pandas DataFrames, you often need to handle negative values, for instance, by replacing them with zero. This could be because negative values are invalid for a particular feature, represent errors, or need to be floored at zero for certain calculations or models.

This guide explains several effective methods to replace all negative numbers in a Pandas DataFrame (or specific columns) with zero, using boolean indexing, clip(), mask(), and where().

The Goal: Setting a Floor at Zero for Negative Values

Given a Pandas DataFrame containing numeric columns, we want to find all cells that hold a negative number and replace those negative numbers with 0. Positive numbers and zero itself should remain unchanged.

Example DataFrame

import pandas as pd
import numpy as np # For Timedelta example

data = {
'SensorA_Reading': [-10.5, 0.0, 15.2, -2.1, 8.0],
'SensorB_Change': [5, -2, -8, 0, 3],
'SensorC_Ratio': [1.1, 0.9, -0.5, 1.0, -1.2],
'Category': ['X', 'Y', 'X', 'Z', 'Y'] # Non-numeric column
}

df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)

Output:

Original DataFrame:
SensorA_Reading SensorB_Change SensorC_Ratio Category
0 -10.5 5 1.1 X
1 0.0 -2 0.9 Y
2 15.2 -8 -0.5 X
3 -2.1 0 1.0 Z
4 8.0 3 -1.2 Y

This is often the most direct and Pythonic way for DataFrames containing primarily numeric data or when you want to apply the change to all applicable (numeric) cells.

import pandas as pd

df = pd.DataFrame({
'SensorA_Reading': [-10.5, 0.0, 15.2, -2.1, 8.0],
'SensorB_Change': [5, -2, -8, 0, 3],
'SensorC_Ratio': [1.1, 0.9, -0.5, 1.0, -1.2],
}) # Using only numeric columns for this direct example

# Create a boolean DataFrame: True where value < 0, False otherwise
boolean_mask = df < 0
print("Boolean mask (df < 0):")
print(boolean_mask)
print()

# ✅ Where the mask is True, assign 0
# This operation is applied element-wise only where the condition is met.
# Non-numeric columns would typically raise a TypeError if included in `df < 0` directly.
df[df < 0] = 0

print("DataFrame after replacing negatives with 0 (boolean indexing):")
print(df)

Output:

Boolean mask (df < 0):
SensorA_Reading SensorB_Change SensorC_Ratio
0 True False False
1 False True False
2 False True True
3 True False False
4 False False True

DataFrame after replacing negatives with 0 (boolean indexing):
SensorA_Reading SensorB_Change SensorC_Ratio
0 0.0 5 1.1
1 0.0 0 0.9
2 15.2 0 0.0
3 0.0 0 1.0
4 8.0 3 0.0
  • df < 0: Performs an element-wise comparison, returning a boolean DataFrame of the same shape. This works on numeric columns; non-numeric columns might error or produce unexpected boolean results if not handled.
  • df[boolean_mask] = 0: Assigns 0 to all cells in df where the corresponding cell in boolean_mask is True.

Method 2: Using DataFrame.clip(lower=0)

The DataFrame.clip(lower=None, upper=None) method is designed to trim values at specified thresholds. Setting lower=0 will replace all values less than 0 with 0.

import pandas as pd

df = pd.DataFrame({
'SensorA_Reading': [-10.5, 0.0, 15.2, -2.1, 8.0],
'SensorB_Change': [5, -2, -8, 0, 3],
'SensorC_Ratio': [1.1, 0.9, -0.5, 1.0, -1.2],
'Category': ['X', 'Y', 'X', 'Z', 'Y']
})

# Apply clipping only to numeric columns
numeric_cols = df.select_dtypes(include='number').columns
df[numeric_cols] = df[numeric_cols].clip(lower=0)

print("DataFrame after clipping negative values in numeric columns:")
print(df)

Output:

DataFrame after clipping negative values in numeric columns:
SensorA_Reading SensorB_Change SensorC_Ratio Category
0 0.0 5 1.1 X
1 0.0 0 0.9 Y
2 15.2 0 0.0 X
3 0.0 0 1.0 Z
4 8.0 3 0.0 Y
  • df.clip(lower=0): Any value in numeric columns less than 0 is replaced by 0. Values greater than 0 are untouched. NaN values are generally left as NaN. Non-numeric columns are usually ignored by clip by default or you can select numeric columns first.

Method 3: Using DataFrame.mask(df < 0, 0)

The DataFrame.mask(cond, other=nan) method replaces values where the condition cond is True.

import pandas as pd

df = pd.DataFrame({
'SensorA_Reading': [-10.5, 0.0, 15.2, -2.1, 8.0],
'SensorB_Change': [5, -2, -8, 0, 3],
'SensorC_Ratio': [1.1, 0.9, -0.5, 1.0, -1.2],
}) # Assuming numeric only for direct df < 0

# ✅ Replace values with 0 where df < 0 is True
df_masked = df.mask(df < 0, 0)

print("DataFrame after df.mask(df < 0, 0):")
print(df_masked)

Output: (Same as boolean indexing and clip)

DataFrame after df.mask(df < 0, 0):
SensorA_Reading SensorB_Change SensorC_Ratio
0 0.0 5 1.1
1 0.0 0 0.9
2 15.2 0 0.0
3 0.0 0 1.0
4 8.0 3 0.0
  • df < 0: The condition.
  • 0: The value to replace with where the condition is True.
  • mask operates element-wise. For mixed-type DataFrames, apply to numeric columns or handle errors.

Method 4: Using DataFrame.where(df >= 0, 0)

The DataFrame.where(cond, other=nan) method is the inverse of mask. It keeps values where the condition cond is True and replaces values where cond is False with other.

import pandas as pd

df = pd.DataFrame({
'SensorA_Reading': [-10.5, 0.0, 15.2, -2.1, 8.0],
'SensorB_Change': [5, -2, -8, 0, 3],
'SensorC_Ratio': [1.1, 0.9, -0.5, 1.0, -1.2],
}) # Assuming numeric only


# ✅ Keep values where df >= 0 is True, replace others (negatives) with 0
df_where = df.where(df >= 0, 0)
# Note: This also replaces 0 with 0 if condition is df > 0.
# If you want to keep 0s as they are, the condition should be df >= 0.

print("DataFrame after df.where(df >= 0, 0):")
print(df_where)

Output: (Same as boolean indexing, clip, and mask)

DataFrame after df.where(df >= 0, 0):
SensorA_Reading SensorB_Change SensorC_Ratio
0 0.0 5 1.1
1 0.0 0 0.9
2 15.2 0 0.0
3 0.0 0 1.0
4 8.0 3 0.0
  • df >= 0: The condition to keep values.
  • 0: The value to use where the condition is False (i.e., for negative numbers).

Handling Non-Numeric Columns (Using _get_numeric_data() or select_dtypes())

If your DataFrame has mixed data types (e.g., strings, dates), applying conditions like df < 0 directly to the whole DataFrame will raise a TypeError. You should select only the numeric columns first.

import pandas as pd
import numpy as np

df_mixed = pd.DataFrame({
'SensorA_Reading': [-10.5, 0.0, 15.2],
'SensorB_Change': [5, -2, -8],
'Category': ['X', 'Y', 'X']
})

# Option 1: Using the internal _get_numeric_data() (less public API)
# numeric_cols_df = df_mixed._get_numeric_data()
# df_mixed[numeric_cols_df < 0] = 0 # Apply condition only to numeric part

# Option 2: Using select_dtypes (more robust)
numeric_column_names = df_mixed.select_dtypes(include=np.number).columns
df_mixed[numeric_column_names] = df_mixed[numeric_column_names].clip(lower=0) # Using clip as an example

print("Mixed DataFrame after replacing negatives in numeric columns only:")
print(df_mixed)

Output:

Mixed DataFrame after replacing negatives in numeric columns only:
SensorA_Reading SensorB_Change Category
0 0.0 5 X
1 0.0 0 Y
2 15.2 0 X

This ensures that the comparison and replacement operations are only attempted on columns where they make sense.

Replacing Negative Timedelta Values

If you have columns of Timedelta objects, you compare against pd.Timedelta(0).

import pandas as pd

df_timedelta = pd.DataFrame({
'DurationA': pd.to_timedelta([-1, 2, -5], unit='d'), # -1 day, 2 days, -5 days
'DurationB': pd.to_timedelta([10, -3, 0], unit='h') # 10 hours, -3 hours, 0 hours
})
print("Original Timedelta DataFrame:")
print(df_timedelta)
print()

# ✅ Replace negative Timedeltas with zero Timedelta
zero_timedelta = pd.Timedelta(0)
df_timedelta[df_timedelta < zero_timedelta] = zero_timedelta

print("Timedelta DataFrame after replacing negative durations with zero:")
print(df_timedelta)

Output:

Original Timedelta DataFrame:
DurationA DurationB
0 -1 days 0 days 10:00:00
1 2 days -1 days +21:00:00
2 -5 days 0 days 00:00:00

Timedelta DataFrame after replacing negative durations with zero:
DurationA DurationB
0 0 days 0 days 10:00:00
1 2 days 0 days 00:00:00
2 0 days 0 days 00:00:00

Conclusion

Replacing negative numbers with zero in a Pandas DataFrame can be achieved effectively using several methods:

  • Boolean Indexing (df[df < 0] = 0): Concise and Pythonic, especially for DataFrames known to be entirely numeric or after selecting numeric columns.
  • DataFrame.clip(lower=0): Specifically designed for thresholding values, very clear for this task. Works on numeric columns by default.
  • DataFrame.mask(df < 0, 0): Replaces where the condition (value is negative) is True.
  • DataFrame.where(df >= 0, 0): Keeps where the condition (value is non-negative) is True, replaces others.

When dealing with mixed-type DataFrames, first select the numeric columns (e.g., using df.select_dtypes(include=np.number)) before applying these operations to avoid TypeError. For Timedelta objects, compare against pd.Timedelta(0).