Python Pandas: How to Count Non-Missing (non-NaN) Values in Your DataFrame
A fundamental step in data exploration and cleaning with Pandas is understanding the completeness of your dataset. Knowing how many actual data points (non-missing or non-NaN values) exist in each column or row, or even in the entire DataFrame, is crucial for assessing data quality and planning subsequent analytical steps.
This guide will provide a comprehensive overview of how to effectively count non-NaN values in Pandas DataFrames. You'll learn to use the DataFrame.count()
method for per-column and per-row counts, how to get a total count for the entire DataFrame, how to include empty strings in your definition of "missing," and an alternative approach using DataFrame.notna().sum()
.
Understanding Non-NaN Values in Pandas
In Pandas, "missing" or "Not a Number" (NaN) values represent data that is absent. When we talk about counting "non-NaN" values, we are essentially counting how many actual, valid data points are present. This count excludes any cells that Pandas recognizes as missing.
Let's create a sample DataFrame with some missing values (None
or np.nan
):
import pandas as pd
import numpy as np # For np.nan
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
Original DataFrame:
product_name category price stock_qty
0 Laptop Electronics 1200.0 10.0
1 Mouse Accessories 25.0 NaN
2 None Electronics NaN 5.0
3 Keyboard None 75.0 0.0
4 None Office NaN 15.0
5 Monitor Electronics 300.0 NaN
Method 1: Using DataFrame.count()
The DataFrame.count()
method is the most direct way to count non-NA (Not Available/Not a Number) cells.
Counting Non-NaN Values per Column (Default)
By default, df.count()
operates along axis=0
, meaning it counts non-NA values for each column.
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
# Count non-NaN values in each column
column_counts = df.count()
print("Number of non-NaN values per column:")
print(column_counts)
Output:
Number of non-NaN values per column:
product_name 4
category 5
price 4
stock_qty 4
dtype: int64
product_name
has 4 non-missing values (2None
values were excluded).category
has 4 non-missing values (2None
values were excluded).price
has 4 non-missing values (2np.nan
values were excluded).stock_qty
has 4 non-missing values (2np.nan
values were excluded).
Counting Non-NaN Values per Row (axis=1
)
To count non-NA values for each row, set the axis
parameter to 1
or 'columns'
.
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
# Count non-NaN values in each row
row_counts = df.count(axis=1) # or axis='columns'
print("Number of non-NaN values per row:")
print(row_counts)
Output:
Number of non-NaN values per row:
0 4
1 3
2 2
3 3
4 2
5 3
dtype: int64
For example, the first row (index 0) has 4 non-missing values, while the second row (index 1) has 3 (because stock_qty
is NaN
).
What Pandas Considers an "NA" Value
The count()
method (and other NA-related functions like isnull()
, notna()
) considers the following as NA values by default:
None
(Python's null object)numpy.nan
(IEEE 754 floating point representation of Not a Number)pandas.NaT
(Not a Time, for datetime-like missing values)- Optionally,
numpy.inf
andnumpy.NINF
can be treated as NA ifpd.options.mode.use_inf_as_na
is set toTrue
(it'sFalse
by default).
Method 2: Getting the Total Count of Non-NaN Values in the DataFrame
If you need the grand total of all non-missing cells in the entire DataFrame, you can sum the result of df.count()
.
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
column_counts = df.count()
# Total number of non-NaN values in the entire DataFrame
total_non_nan_values = column_counts.sum()
# Alternatively, and more directly if you have NumPy imported:
# total_non_nan_values = np.sum(df.count()) # This also works
print(f"Total number of non-NaN values in the DataFrame: {total_non_nan_values}")
Output:
Total number of non-NaN values in the DataFrame: 17
This sums the non-NaN counts from each column (4+4+4+4 = 16).
Method 3: Counting Empty Strings as Missing Values
By default, df.count()
considers empty strings (''
) as valid, non-missing values. If you want to treat empty strings as if they were NaN
for counting purposes, you first need to replace them with np.nan
.
import pandas as pd
import numpy as np
df_with_empty_strings = pd.DataFrame({
'col_A': ['Apple', '', 'Banana', None],
'col_B': [10, 20, 30, '']
})
print("DataFrame with empty strings:")
print(df_with_empty_strings)
print()
print("Counts (empty strings are NOT NaN by default):")
print(df_with_empty_strings.count())
print()
# ✅ Replace empty strings with np.nan, then count
counts_empty_as_nan = df_with_empty_strings.replace('', np.nan).count()
print("Counts (treating empty strings as NaN):")
print(counts_empty_as_nan)
Output:
DataFrame with empty strings:
col_A col_B
0 Apple 10
1 20
2 Banana 30
3 None
Counts (empty strings are NOT NaN by default):
col_A 3
col_B 4
dtype: int64
Counts (treating empty strings as NaN):
col_A 2
col_B 3
dtype: int64
The df.replace('', np.nan)
step temporarily converts empty strings to NaN
before the count is performed.
Method 4: Using DataFrame.notna().sum()
(and notnull().sum()
)
An alternative way to count non-missing values is to first create a boolean DataFrame indicating non-missing values using DataFrame.notna()
and then sum these boolean values (where True
counts as 1 and False
as 0).
import pandas as pd
import numpy as np
data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
# df.notna() returns a boolean DataFrame (True where not NA, False where NA)
print("Output of df.notna():")
print(df.notna())
print()
# Summing the True values per column gives the count of non-NA values
column_counts_notna = df.notna().sum()
print("Number of non-NaN values per column (using .notna().sum()):")
print(column_counts_notna)
Output:
Output of df.notna():
product_name category price stock_qty
0 True True True True
1 True True True False
2 False True False True
3 True False True True
4 False True False True
5 True True True False
Number of non-NaN values per column (using .notna().sum()):
product_name 4
category 5
price 4
stock_qty 4
dtype: int64
- This produces the same result as
df.count()
. - The method
DataFrame.notnull()
is an alias forDataFrame.notna()
, sodf.notnull().sum()
will also yield the same result.
Conclusion
Counting non-NaN values is essential for understanding data completeness in Pandas.
df.count()
is the primary method for getting per-column (default) or per-row (axis=1
) counts of non-missing values.- To get a total count of non-missing values for the entire DataFrame, sum the results of
df.count()
. - If empty strings (
''
) should be treated as missing, replace them withnp.nan
before counting:df.replace('', np.nan).count()
. df.notna().sum()
(ordf.notnull().sum()
) offers an alternative, often equally clear, way to achieve the same per-column counts.
These methods provide you with the flexibility to accurately assess the presence of valid data points in your DataFrames, guiding your data cleaning and analysis efforts.