Skip to main content

Python Pandas: Remove Rows with Non-Numeric Values in a Column

Data cleaning often involves ensuring that columns expected to contain numerical data actually do. Non-numeric entries (like strings or error codes mixed with numbers) can cause issues during calculations, plotting, or model training.

This guide explains how to identify and remove rows from a Pandas DataFrame where a specific column contains non-numeric values, primarily using pd.to_numeric() and other type-checking methods.

The Goal: Ensuring Numeric Data in a Column

Given a Pandas DataFrame, we want to examine a specific column that should contain numbers. If any row in this column has a value that can not be interpreted as a number (e.g., it's a string like "N/A", "Error", or an accidental text entry), we want to remove that entire row from the DataFrame.

Example DataFrame with Mixed Data Types

import pandas as pd
import numpy as np # For np.nan and numeric types

data = {
'RecordID': [1, 2, 3, 4, 5, 6, 7],
'SensorValue': [10.5, 'Error', 12.3, -5.0, '15', np.nan, 'N/A'],
'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C'],
'Notes': ['OK', 'Sensor fault', 'OK', 'Low reading', 'OK', 'Missing', 'Not applicable']
}

df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)
print()

print("Original dtypes:")
print(df_original.dtypes)

Output:

Original DataFrame:
RecordID SensorValue Category Notes
0 1 10.5 A OK
1 2 Error B Sensor fault
2 3 12.3 A OK
3 4 -5.0 C Low reading
4 5 15 B OK
5 6 NaN A Missing
6 7 N/A C Not applicable

Original dtypes:
RecordID int64
SensorValue object
Category object
Notes object
dtype: object

Our goal is to filter df_original to keep only rows where 'SensorValue' is (or can be converted to) a valid number.

This is the most robust and idiomatic Pandas way to handle this.

  1. Attempt to convert the target column to a numeric type using pd.to_numeric().
  2. Set errors='coerce'. This will replace any values that can not be converted to a number with NaN (Not a Number).
  3. Use Series.notnull() (or Series.isna()) on the converted column to create a boolean mask. notnull() is True for numbers and False for NaN.
  4. Use this mask to filter the original DataFrame.

How It Works

import pandas as pd
import numpy as np

df = pd.DataFrame({
'RecordID': [1, 2, 3, 4, 5, 6, 7],
'SensorValue': [10.5, 'Error', 12.3, -5.0, '15', np.nan, 'N/A'],
})

# Step 1 & 2: Convert to numeric, coercing errors to NaN
numeric_sensor_values = pd.to_numeric(df['SensorValue'], errors='coerce')
print("'SensorValue' after pd.to_numeric(errors='coerce'):")
print(numeric_sensor_values)
print()

# Step 3: Create boolean mask (True for non-NaN/numeric values)
is_numeric_mask = numeric_sensor_values.notnull()
print("Boolean mask (is_numeric_mask):")
print(is_numeric_mask)

Output:

'SensorValue' after pd.to_numeric(errors='coerce'):
0 10.5
1 NaN
2 12.3
3 -5.0
4 15.0
5 NaN
6 NaN
Name: SensorValue, dtype: float64

Boolean mask (is_numeric_mask):
0 True
1 False
2 True
3 True
4 True
5 False
6 False
Name: SensorValue, dtype: bool

Implementation

import pandas as pd
import numpy as np

df = pd.DataFrame({
'RecordID': [1, 2, 3, 4, 5, 6, 7],
'SensorValue': [10.5, 'Error', 12.3, -5.0, '15', np.nan, 'N/A'],
'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C'],
})

# ✅ Filter rows where 'SensorValue' can be converted to numeric
df_numeric_rows = df[pd.to_numeric(df['SensorValue'], errors='coerce').notnull()]

print("DataFrame with non-numeric rows in 'SensorValue' removed:")
print(df_numeric_rows)

Output:

DataFrame with non-numeric rows in 'SensorValue' removed:
RecordID SensorValue Category
0 1 10.5 A
2 3 12.3 A
3 4 -5.0 C
4 5 15 B

This method effectively keeps rows where 'SensorValue' was originally a number, a string representing a number, or NaN (which to_numeric might pass through or also coerce depending on other arguments, but notnull() will then filter out). The key is that strings like "Error" or "N/A" become NaN and are then filtered out by notnull().

Method 2: Using Series.apply() with isinstance()

You can apply a custom function to each element in the column to check if its type is one of the numeric types.

import pandas as pd
import numpy as np

df = pd.DataFrame({
'RecordID': [1, 2, 3, 4, 5, 6, 7],
'SensorValue': [10.5, 'Error', 12.3, -5.0, '15', np.nan, 'N/A'], # '15' is string
'Category': ['A', 'B', 'A', 'C', 'B', 'A', 'C'],
})

# Define numeric types to check against (Python's and NumPy's)
numeric_types = (int, float, np.integer, np.floating)
# For older numpy: np.int64, np.float64 etc. More generally np.number for all numeric types.

# ✅ Create a boolean mask by checking the type of each element
is_instance_numeric_mask = df['SensorValue'].apply(lambda x: isinstance(x, numeric_types))
print("Boolean mask from apply(isinstance):")
print(is_instance_numeric_mask)
print()

df_numeric_apply = df[is_instance_numeric_mask]

print("DataFrame filtered using apply() and isinstance():")
print(df_numeric_apply)

Output:

Boolean mask from apply(isinstance):
0 True
1 False
2 True
3 True
4 False
5 True
6 False
Name: SensorValue, dtype: bool

DataFrame filtered using apply() and isinstance():
RecordID SensorValue Category
0 1 10.5 A
2 3 12.3 A
3 4 -5.0 C
5 6 NaN A
  • This method checks the current type of the elements. It won't identify strings like "15" as numeric unless they are first converted.
  • np.nan is a float, so isinstance(np.nan, float) is True. If you want to exclude NaNs as well, you'd add x is not np.nan or use pd.notna(x) in the lambda.
  • The pd.to_numeric approach (Method 1) is generally more robust for identifying values that can be numbers, even if stored as strings.

Method 3: Using Series.str.isnumeric() (For Integer Strings Only)

If you expect your "numeric" values to be non-negative integer strings (e.g., "123", "0", "9999"), and want to filter out anything else (including floats like "3.14", negative numbers "-10", or non-digit strings), you can use Series.str.isnumeric(). This requires converting the column to string type first if it isn't already.

import pandas as pd

df_isnumeric_check = pd.DataFrame({
'RecordID': [1, 2, 3, 4, 5],
'SensorValue': ['10', 'Error', '12.3', '-5', '150'], # Note: '12.3' and '-5' are not isnumeric()
'Category': ['A', 'B', 'A', 'C', 'B']
})

# Convert column to string type first to use .str accessor methods
# Fill NaN with a non-numeric placeholder if necessary before astype(str)
# or handle potential errors if non-strings exist that can't be str() easily.
# Here, we assume it can be converted, or use .astype(str) which handles most things.
is_strictly_numeric_string_mask = df_isnumeric_check['SensorValue'].astype(str).str.isnumeric()
print("Boolean mask from .str.isnumeric():")
print(is_strictly_numeric_string_mask)
print()

df_filtered_isnumeric = df_isnumeric_check[is_strictly_numeric_string_mask]

print("DataFrame filtered using .str.isnumeric():")
print(df_filtered_isnumeric)

Output:

Boolean mask from .str.isnumeric():
0 True
1 False
2 False
3 False
4 True
Name: SensorValue, dtype: bool

DataFrame filtered using .str.isnumeric():
RecordID SensorValue Category
0 1 10 A
4 5 150 B
  • str.isnumeric(): Returns True if all characters in the string are numeric characters and there is at least one character. It returns False for floats (due to .), negative numbers (due to -), and empty strings.
  • This method is very restrictive and usually only suitable if you specifically want to identify non-negative integer strings.

Important Considerations (e.g., Floats, Negative Numbers)

  • pd.to_numeric(..., errors='coerce'): Most flexible. Handles integers, floats, and strings representing numbers. Converts non-convertible to NaN.
  • isinstance(x, (int, float, np.number)): Checks current type. Doesn't convert strings like "123" to numeric.
  • str.isnumeric(): Only for non-negative integer strings. str.isdigit() is similar. str.isdecimal() is even more restrictive (only Unicode decimal digits).

Conclusion

To remove rows from a Pandas DataFrame where a specific column contains non-numeric values:

  1. The most robust and generally recommended method is to use pd.to_numeric(df['your_column'], errors='coerce').notnull(). This attempts to convert values to a numeric type, turns unconvertible values into NaN, and then allows you to filter out these NaNs (which now represent the original non-numeric entries).
    df_cleaned = df[pd.to_numeric(df['your_column'], errors='coerce').notnull()]
  2. Using df['your_column'].apply(lambda x: isinstance(x, (int, float, np.number))) can check existing types but won't convert numeric strings.
  3. df['your_column'].astype(str).str.isnumeric() is only suitable if you are specifically looking for non-negative integer strings and want to exclude floats, negatives, and other non-digit characters.

Always inspect your data and choose the method that correctly identifies what you consider "non-numeric" for your specific use case. pd.to_numeric with errors='coerce' is often the best starting point.