Python Pandas: How to Convert Entire DataFrame to Numeric (int or float)
When working with data in Pandas, columns that should be numeric are sometimes loaded as strings (object dtype), especially if the source data (like a CSV) contains mixed types or non-standard numeric representations. Converting these columns, or even an entire DataFrame where applicable, to appropriate numeric types (integer or float) is essential for performing calculations, statistical analysis, and plotting.
This guide explains how to convert all convertible columns in a Pandas DataFrame to numeric types using DataFrame.apply()
with pd.to_numeric()
and how to handle non-convertible values.
The Goal: Numeric Conversion for the Entire DataFrame
Given a Pandas DataFrame where multiple columns contain data that should be numeric (integers or floats) but are currently stored as strings (object dtype), we want to convert all such columns to their appropriate numeric types efficiently.
Example DataFrame with String Numerics
import pandas as pd
data = {
'RecordID_Str': ['101', '102', '103', '104'],
'Quantity_Str': ['5', '12', '8', '20'],
'Price_Str': ['19.99', '5.50', '120.00', '0.99'],
'Category_NonNumeric': ['A', 'B', 'A', 'C']
}
df_original = pd.DataFrame(data)
print("Original DataFrame:")
print(df_original)
print()
print("Original dtypes:")
print(df_original.dtypes)
Output:
Original DataFrame:
RecordID_Str Quantity_Str Price_Str Category_NonNumeric
0 101 5 19.99 A
1 102 12 5.50 B
2 103 8 120.00 A
3 104 20 0.99 C
Original dtypes:
RecordID_Str object
Quantity_Str object
Price_Str object
Category_NonNumeric object
dtype: object
All columns here are initially object
type because their values are strings.
Method: Using DataFrame.apply(pd.to_numeric)
The DataFrame.apply(func)
method applies a function func
along an axis of the DataFrame. When applied column-wise (default axis=0
), it passes each column (as a Series) to the function. pd.to_numeric()
is designed to convert an array-like or Series to a numeric type.
Basic Conversion (Assumes All Columns Are Convertible)
If all columns in your DataFrame contain string representations of numbers, you can directly apply pd.to_numeric
.
import pandas as pd
# Example with only convertible string columns
df_all_convertible = pd.DataFrame({
'Col_Int_Str': ['10', '20', '30'],
'Col_Float_Str': ['1.1', '2.2', '3.3']
})
print("DataFrame with all convertible string columns (before):")
print(df_all_convertible.dtypes)
print()
# ✅ Apply pd.to_numeric to each column
df_all_numeric = df_all_convertible.apply(pd.to_numeric)
print("DataFrame dtypes after apply(pd.to_numeric):")
print(df_all_numeric.dtypes)
Output:
DataFrame with all convertible string columns (before):
Col_Int_Str object
Col_Float_Str object
dtype: object
DataFrame dtypes after apply(pd.to_numeric):
Col_Int_Str int64
Col_Float_Str float64
dtype: object
pd.to_numeric
infers whether to convert toint64
orfloat64
based on the content (e.g., presence of a decimal point).
Verifying Data Types (.dtypes
or .info()
)
After conversion, check the dtypes
attribute or use df.info()
to confirm.
import pandas as pd
df_all_convertible = pd.DataFrame({
'Col_Int_Str': ['10', '20', '30'],
'Col_Float_Str': ['1.1', '2.2', '3.3']
})
df_all_numeric = df_all_convertible.apply(pd.to_numeric)
print("--- Using .info() to verify ---")
df_all_numeric.info()
Output:
--- Using .info() to verify ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Col_Int_Str 3 non-null int64
1 Col_Float_Str 3 non-null float64
dtypes: float64(1), int64(1)
memory usage: 120.0 bytes
Handling Non-Numeric Columns or Values (The errors
Parameter)
If your DataFrame contains columns that are genuinely non-numeric (like 'Category_NonNumeric' in our main example) or individual string values within a column that cannot be converted to a number (e.g., "Unknown"), applying pd.to_numeric
directly will raise a ValueError
. The errors
parameter in pd.to_numeric
controls this behavior.
Default Behavior: errors='raise'
(Raises ValueError
)
By default, if pd.to_numeric
encounters a value it cannot parse, it raises an exception.
import pandas as pd
data = {
'RecordID_Str': ['101', '102', '103', '104'],
'Quantity_Str': ['5', '12', '8', '20'],
'Price_Str': ['19.99', '5.50', '120.00', '0.99'],
'Category_NonNumeric': ['A', 'B', 'A', 'C']
}
df_original = pd.DataFrame(data)
df_error_attempt = df_original.apply(pd.to_numeric) # This would raise ValueError
# ValueError: Unable to parse string "A" at position 0 (for 'Category_NonNumeric')
Solution: errors='ignore'
(Keeps Non-Convertible Columns/Values as Original)
If errors='ignore'
, pd.to_numeric
will convert columns/values it can, and leave those it cannot parse unchanged (they will retain their original object
dtype or value).
import pandas as pd
data = {
'RecordID_Str': ['101', '102', '103', '104'],
'Quantity_Str': ['5', '12', '8', '20'],
'Price_Str': ['19.99', '5.50', '120.00', '0.99'],
'Category_NonNumeric': ['A', 'B', 'A', 'C']
}
df_original = pd.DataFrame(data)
df_original_copy = pd.DataFrame({
'RecordID_Str': ['101', '102'], 'Quantity_Str': ['5', '12'],
'Price_Str': ['19.99', '5.50'], 'Category_NonNumeric': ['A', 'B']
})
# ✅ Apply pd.to_numeric with errors='ignore'
# The 'errors' parameter is passed to to_numeric for each column.
df_ignore_errors = df_original_copy.apply(pd.to_numeric, errors='ignore')
print("DataFrame dtypes after apply(pd.to_numeric, errors='ignore'):")
print(df_ignore_errors.dtypes)
print()
print("DataFrame content (errors='ignore'):")
print(df_ignore_errors)
Output:
DataFrame dtypes after apply(pd.to_numeric, errors='ignore'):
RecordID_Str int64
Quantity_Str int64
Price_Str float64
Category_NonNumeric object
dtype: object
DataFrame content (errors='ignore'):
RecordID_Str Quantity_Str Price_Str Category_NonNumeric
0 101 5 19.99 A
1 102 12 5.50 B
Solution: errors='coerce'
(Converts Non-Convertible to NaN
)
If errors='coerce'
, pd.to_numeric
will convert columns/values it can, and replace those it cannot parse with NaN
(Not a Number). Columns containing NaN
(and successfully converted numbers) will become float
type.
import pandas as pd
# Sample DataFrame with some columns having non-numeric data
df_original_copy = pd.DataFrame({
'RecordID_Str': ['101', '102'],
'Quantity_Str': ['5', '12'],
'Price_Str': ['19.99', '5.50'],
'Category_NonNumeric': ['A', 'B'],
'Mixed_Numeric': ['100', 'Error'] # Column with some non-numeric strings
})
# ✅ Apply pd.to_numeric with errors='coerce'
df_coerce_errors = df_original_copy.apply(pd.to_numeric, errors='coerce')
# Display the data types after coercion
print("DataFrame dtypes after apply(pd.to_numeric, errors='coerce'):")
print(df_coerce_errors.dtypes)
print()
# Display the DataFrame content after coercion
print("DataFrame content (errors='coerce'):")
print(df_coerce_errors)
Output:
DataFrame dtypes after apply(pd.to_numeric, errors='coerce'):
RecordID_Str int64
Quantity_Str int64
Price_Str float64
Category_NonNumeric float64
Mixed_Numeric float64
dtype: object
DataFrame content (errors='coerce'):
RecordID_Str Quantity_Str Price_Str Category_NonNumeric Mixed_Numeric
0 101 5 19.99 NaN 100.0
1 102 12 5.50 NaN NaN
Notice that the 'Category_NonNumeric' column, which contained only non-numeric strings, becomes all NaN
and thus float64
dtype.
Using functools.partial
with apply
(Alternative for passing errors
)
If df.apply(pd.to_numeric, errors='...')
syntax feels indirect for passing keyword arguments to pd.to_numeric
, you can use functools.partial
to create a new function with the errors
argument pre-set.
from functools import partial
import pandas as pd
# Sample DataFrame with string data in the 'Price_Str' column
df_original_copy = pd.DataFrame({
'Category_NonNumeric': ['A', 'B'],
'Price_Str': ['19.99', '5.50']
})
# Create a partial function with errors='ignore' pre-set
to_numeric_ignore_errors = partial(pd.to_numeric, errors='ignore')
# Apply the partial function across the DataFrame
df_partial_example = df_original_copy.apply(to_numeric_ignore_errors)
# Display the data types after using functools.partial with errors='ignore'
print("DataFrame dtypes (using functools.partial with errors='ignore'):")
print(df_partial_example.dtypes)
Output:
DataFrame dtypes (using functools.partial with errors='ignore'):
Category_NonNumeric object
Price_Str float64
dtype: object
This achieves the same as passing errors='ignore'
directly to apply
.
Choosing the Right errors
Strategy
errors='raise'
(default): Use if you expect all data to be cleanly numeric. Any non-numeric value will halt the process, forcing you to identify and fix data quality issues.errors='ignore'
: Use if you want to convert what you can and leave problematic columns/values as they are (oftenobject
dtype). You'll need subsequent steps if you want to further process or clean these ignored columns.errors='coerce'
: Often the most practical for data cleaning. It converts valid numerics and flags unconvertible entries asNaN
, which can then be easily counted, imputed, or dropped using standard Pandas methods (.isnull().sum()
,.fillna()
,.dropna()
).
Conclusion
To convert all applicable columns in a Pandas DataFrame to numeric types (integer or float):
- Use
df_numeric = df.apply(pd.to_numeric, errors=...)
. - The
errors
parameter is crucial for handling columns or individual values that cannot be converted:errors='raise'
(default): Stops on error.errors='ignore'
: Leaves non-convertible data as is (oftenobject
type).errors='coerce'
: Converts non-convertible data toNaN
, allowing numeric operations on the rest. This is often the most useful option for data cleaning workflows.
By applying pd.to_numeric
across the DataFrame with an appropriate error handling strategy, you can efficiently ensure your data is in the correct numeric format for further analysis.