Skip to main content

Python Pandas: How to Count Non-Missing (non-NaN) Values in Your DataFrame

A fundamental step in data exploration and cleaning with Pandas is understanding the completeness of your dataset. Knowing how many actual data points (non-missing or non-NaN values) exist in each column or row, or even in the entire DataFrame, is crucial for assessing data quality and planning subsequent analytical steps.

This guide will provide a comprehensive overview of how to effectively count non-NaN values in Pandas DataFrames. You'll learn to use the DataFrame.count() method for per-column and per-row counts, how to get a total count for the entire DataFrame, how to include empty strings in your definition of "missing," and an alternative approach using DataFrame.notna().sum().

Understanding Non-NaN Values in Pandas

In Pandas, "missing" or "Not a Number" (NaN) values represent data that is absent. When we talk about counting "non-NaN" values, we are essentially counting how many actual, valid data points are present. This count excludes any cells that Pandas recognizes as missing.

Let's create a sample DataFrame with some missing values (None or np.nan):

import pandas as pd
import numpy as np # For np.nan

data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

Output:

Original DataFrame:
product_name category price stock_qty
0 Laptop Electronics 1200.0 10.0
1 Mouse Accessories 25.0 NaN
2 None Electronics NaN 5.0
3 Keyboard None 75.0 0.0
4 None Office NaN 15.0
5 Monitor Electronics 300.0 NaN

Method 1: Using DataFrame.count()

The DataFrame.count() method is the most direct way to count non-NA (Not Available/Not a Number) cells.

Counting Non-NaN Values per Column (Default)

By default, df.count() operates along axis=0, meaning it counts non-NA values for each column.

import pandas as pd
import numpy as np

data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)

# Count non-NaN values in each column
column_counts = df.count()

print("Number of non-NaN values per column:")
print(column_counts)

Output:

Number of non-NaN values per column:
product_name 4
category 5
price 4
stock_qty 4
dtype: int64
note
  • product_name has 4 non-missing values (2 None values were excluded).
  • category has 4 non-missing values (2 None values were excluded).
  • price has 4 non-missing values (2 np.nan values were excluded).
  • stock_qty has 4 non-missing values (2 np.nan values were excluded).

Counting Non-NaN Values per Row (axis=1)

To count non-NA values for each row, set the axis parameter to 1 or 'columns'.

import pandas as pd
import numpy as np

data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)

# Count non-NaN values in each row
row_counts = df.count(axis=1) # or axis='columns'

print("Number of non-NaN values per row:")
print(row_counts)

Output:

Number of non-NaN values per row:
0 4
1 3
2 2
3 3
4 2
5 3
dtype: int64
note

For example, the first row (index 0) has 4 non-missing values, while the second row (index 1) has 3 (because stock_qty is NaN).

What Pandas Considers an "NA" Value

The count() method (and other NA-related functions like isnull(), notna()) considers the following as NA values by default:

  • None (Python's null object)
  • numpy.nan (IEEE 754 floating point representation of Not a Number)
  • pandas.NaT (Not a Time, for datetime-like missing values)
  • Optionally, numpy.inf and numpy.NINF can be treated as NA if pd.options.mode.use_inf_as_na is set to True (it's False by default).

Method 2: Getting the Total Count of Non-NaN Values in the DataFrame

If you need the grand total of all non-missing cells in the entire DataFrame, you can sum the result of df.count().

import pandas as pd
import numpy as np

data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)
column_counts = df.count()

# Total number of non-NaN values in the entire DataFrame
total_non_nan_values = column_counts.sum()
# Alternatively, and more directly if you have NumPy imported:
# total_non_nan_values = np.sum(df.count()) # This also works

print(f"Total number of non-NaN values in the DataFrame: {total_non_nan_values}")

Output:

Total number of non-NaN values in the DataFrame: 17
note

This sums the non-NaN counts from each column (4+4+4+4 = 16).

Method 3: Counting Empty Strings as Missing Values

By default, df.count() considers empty strings ('') as valid, non-missing values. If you want to treat empty strings as if they were NaN for counting purposes, you first need to replace them with np.nan.

import pandas as pd
import numpy as np

df_with_empty_strings = pd.DataFrame({
'col_A': ['Apple', '', 'Banana', None],
'col_B': [10, 20, 30, '']
})

print("DataFrame with empty strings:")
print(df_with_empty_strings)
print()

print("Counts (empty strings are NOT NaN by default):")
print(df_with_empty_strings.count())
print()

# ✅ Replace empty strings with np.nan, then count
counts_empty_as_nan = df_with_empty_strings.replace('', np.nan).count()
print("Counts (treating empty strings as NaN):")
print(counts_empty_as_nan)

Output:

DataFrame with empty strings:
col_A col_B
0 Apple 10
1 20
2 Banana 30
3 None

Counts (empty strings are NOT NaN by default):
col_A 3
col_B 4
dtype: int64

Counts (treating empty strings as NaN):
col_A 2
col_B 3
dtype: int64
note

The df.replace('', np.nan) step temporarily converts empty strings to NaN before the count is performed.

Method 4: Using DataFrame.notna().sum() (and notnull().sum())

An alternative way to count non-missing values is to first create a boolean DataFrame indicating non-missing values using DataFrame.notna() and then sum these boolean values (where True counts as 1 and False as 0).

import pandas as pd
import numpy as np

data = {
'product_name': ['Laptop', 'Mouse', None, 'Keyboard', None, 'Monitor'],
'category': ['Electronics', 'Accessories', 'Electronics', None, 'Office', 'Electronics'],
'price': [1200.00, 25.00, np.nan, 75.00, np.nan, 300.00],
'stock_qty': [10, np.nan, 5, 0, 15, np.nan]
}
df = pd.DataFrame(data)

# df.notna() returns a boolean DataFrame (True where not NA, False where NA)
print("Output of df.notna():")
print(df.notna())
print()

# Summing the True values per column gives the count of non-NA values
column_counts_notna = df.notna().sum()
print("Number of non-NaN values per column (using .notna().sum()):")
print(column_counts_notna)

Output:

Output of df.notna():
product_name category price stock_qty
0 True True True True
1 True True True False
2 False True False True
3 True False True True
4 False True False True
5 True True True False

Number of non-NaN values per column (using .notna().sum()):
product_name 4
category 5
price 4
stock_qty 4
dtype: int64
note
  • This produces the same result as df.count().
  • The method DataFrame.notnull() is an alias for DataFrame.notna(), so df.notnull().sum() will also yield the same result.

Conclusion

Counting non-NaN values is essential for understanding data completeness in Pandas.

  • df.count() is the primary method for getting per-column (default) or per-row (axis=1) counts of non-missing values.
  • To get a total count of non-missing values for the entire DataFrame, sum the results of df.count().
  • If empty strings ('') should be treated as missing, replace them with np.nan before counting: df.replace('', np.nan).count().
  • df.notna().sum() (or df.notnull().sum()) offers an alternative, often equally clear, way to achieve the same per-column counts.

These methods provide you with the flexibility to accurately assess the presence of valid data points in your DataFrames, guiding your data cleaning and analysis efforts.