Python Pandas: How to Ensure DataFrame.describe()
Shows All Columns
The DataFrame.describe()
method in Pandas is a powerful tool for quickly generating summary statistics for your data. However, you might sometimes find that its output doesn't include all the columns from your DataFrame, or that the displayed output itself is truncated.
This guide will thoroughly address these common scenarios, explaining why describe()
might omit columns by default and how Pandas display settings can affect output visibility. You'll learn how to use the include
parameter to customize which columns are described and how to adjust display options to see the full summary, even for wide DataFrames.
The Default Behavior of DataFrame.describe()
By default, when you call df.describe()
on a Pandas DataFrame, it generates descriptive statistics only for columns with numeric data types (e.g., int64
, float64
). Columns with non-numeric types, such as strings (object
) or datetimes, are excluded from this default summary.
Let's consider a sample DataFrame:
import pandas as pd
df = pd.DataFrame({
'employee_name': ['Alice Smith', 'Robert Jones', 'Charles Brown', 'Diana Wilson'],
'department': ['HR', 'IT', 'HR', 'Sales'],
'years_experience': [5, 10, 3, 8],
'annual_salary': [75000.0, 120000.0, 65000.0, 95000.0]
})
print("Original DataFrame:")
print(df)
print()
print("Default df.describe() output:")
print(df.describe())
Output:
Original DataFrame:
employee_name department years_experience annual_salary
0 Alice Smith HR 5 75000.0
1 Robert Jones IT 10 120000.0
2 Charles Brown HR 3 65000.0
3 Diana Wilson Sales 8 95000.0
Default df.describe() output:
years_experience annual_salary
count 4.000000 4.000000
mean 6.500000 88750.000000
std 3.109126 24281.337141
min 3.000000 65000.000000
25% 4.500000 72500.000000
50% 6.500000 85000.000000
75% 8.500000 101250.000000
max 10.000000 120000.000000
Notice how employee_name
and department
(object/string columns) are missing from the default describe()
output.
Solution 1: Using the include
Parameter in describe()
The include
parameter of df.describe()
allows you to control which column types are included in the summary.
Including All Columns with include='all'
To get descriptive statistics for all columns, regardless of their data type, set include='all'
.
import pandas as pd
df = pd.DataFrame({
'employee_name': ['Alice Smith', 'Robert Jones', 'Charles Brown', 'Diana Wilson'],
'department': ['HR', 'IT', 'HR', 'Sales'],
'years_experience': [5, 10, 3, 8],
'annual_salary': [75000.0, 120000.0, 65000.0, 95000.0]
})
# ✅ Include all columns in the description
all_columns_description = df.describe(include='all')
print("df.describe(include='all') output:")
print(all_columns_description)
Output:
employee_name department years_experience annual_salary
count 4 4 4.000000 4.000000
unique 4 3 NaN NaN
top Alice Smith HR NaN NaN
freq 1 2 NaN NaN
mean NaN NaN 6.500000 88750.000000
std NaN NaN 3.109126 24281.337141
min NaN NaN 3.000000 65000.000000
25% NaN NaN 4.500000 72500.000000
50% NaN NaN 6.500000 85000.000000
75% NaN NaN 8.500000 101250.000000
max NaN NaN 10.000000 120000.000000
Understanding the Output with include='all'
When include='all'
is used:
- For numeric columns, you get
count
,mean
,std
,min
,25%
(1st quartile),50%
(median),75%
(3rd quartile), andmax
. - For non-numeric columns (like strings/objects), you get
count
,unique
(number of distinct values),top
(most frequent value), andfreq
(frequency of the top value). - Metrics not applicable to a data type will show as
NaN
.
Describing Specific Data Types (e.g., np.number
, object
)
You can also pass a list of data types to include
to summarize only columns of those types.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'employee_name': ['Alice Smith', 'Robert Jones', 'Charles Brown', 'Diana Wilson'],
'department': ['HR', 'IT', 'HR', 'Sales'],
'years_experience': [5, 10, 3, 8],
'annual_salary': [75000.0, 120000.0, 65000.0, 95000.0]
})
# ✅ Describe only numeric columns (same as default, but explicit)
numeric_description = df.describe(include=[np.number])
print("df.describe(include=[np.number]):")
print(numeric_description)
print()
# ✅ Describe only object (string) columns
object_description = df.describe(include=['object'])
print("df.describe(include=['object']):")
print(object_description)
Output:
df.describe(include=[np.number]):
years_experience annual_salary
count 4.000000 4.000000
mean 6.500000 88750.000000
std 3.109126 24281.337141
min 3.000000 65000.000000
25% 4.500000 72500.000000
50% 6.500000 85000.000000
75% 8.500000 101250.000000
max 10.000000 120000.000000
df.describe(include=['object']):
employee_name department
count 4 4
unique 4 3
top Alice Smith HR
freq 1 2
Describing a Single Column
You can call .describe()
directly on a single Series (column) as well.
import pandas as pd
df = pd.DataFrame({
'employee_name': ['Alice Smith', 'Robert Jones', 'Charles Brown', 'Diana Wilson'],
'department': ['HR', 'IT', 'HR', 'Sales'],
'years_experience': [5, 10, 3, 8],
'annual_salary': [75000.0, 120000.0, 65000.0, 95000.0]
})
salary_description = df['annual_salary'].describe()
print("Description of 'annual_salary' column:")
print(salary_description)
print()
department_description = df['department'].describe()
print("Description of 'department' column:")
print(department_description)
Output:
Description of 'annual_salary' column:
count 4.000000
mean 88750.000000
std 24281.337141
min 65000.000000
25% 72500.000000
50% 85000.000000
75% 101250.000000
max 120000.000000
Name: annual_salary, dtype: float64
Description of 'department' column:
count 4
unique 3
top HR
freq 2
Name: department, dtype: object
Solution 2: Adjusting Pandas Display Options for Wide Output
Even if describe(include='all')
generates statistics for all columns, Pandas might still truncate the display of these columns in your console if the DataFrame is too wide. This is controlled by display.max_columns
.
Setting display.max_columns
Globally
You can set this option to None
to tell Pandas to display all columns without truncation.
import pandas as pd
# df defined as before, assume it has many more columns for this example to be impactful
df = pd.DataFrame({
'employee_name': ['Alice Smith', 'Robert Jones', 'Charles Brown', 'Diana Wilson'],
'department': ['HR', 'IT', 'HR', 'Sales'],
'years_experience': [5, 10, 3, 8],
'annual_salary': [75000.0, 120000.0, 65000.0, 95000.0]
})
# ✅ Set option to display all columns
pd.set_option('display.max_columns', None)
# Now, printing the DataFrame or its description will show all columns
print("Original DataFrame (with display.max_columns=None):")
print(df)
print()
all_columns_description_full_display = df.describe(include='all')
print("df.describe(include='all') output (with display.max_columns=None):")
print(all_columns_description_full_display)
Output:
Original DataFrame (with display.max_columns=None):
employee_name department years_experience annual_salary
0 Alice Smith HR 5 75000.0
1 Robert Jones IT 10 120000.0
2 Charles Brown HR 3 65000.0
3 Diana Wilson Sales 8 95000.0
df.describe(include='all') output (with display.max_columns=None):
employee_name department years_experience annual_salary
count 4 4 4.000000 4.000000
unique 4 3 NaN NaN
top Alice Smith HR NaN NaN
freq 1 2 NaN NaN
mean NaN NaN 6.500000 88750.000000
std NaN NaN 3.109126 24281.337141
min NaN NaN 3.000000 65000.000000
25% NaN NaN 4.500000 72500.000000
50% NaN NaN 6.500000 85000.000000
75% NaN NaN 8.500000 101250.000000
max NaN NaN 10.000000 120000.000000
If the DataFrame were wider than the console's default limit, this would now show all columns instead of '...'
Using pd.option_context
for Temporary Settings
If you only want to change the display setting for a specific block of code, use pd.option_context
.
import pandas as pd
df = pd.DataFrame({
'employee_name': ['Alice Smith', 'Robert Jones', 'Charles Brown', 'Diana Wilson'],
'department': ['HR', 'IT', 'HR', 'Sales'],
'years_experience': [5, 10, 3, 8],
'annual_salary': [75000.0, 120000.0, 65000.0, 95000.0]
})
# ✅ Temporarily set option to display all columns
with pd.option_context('display.max_columns', None):
print("df.describe(include='all') output (within option_context):")
print(df.describe(include='all'))
print()
# Outside the 'with' block, the display option reverts to its previous setting.
print("df.describe(include='all') output (after option_context, may be truncated if wide):")
print(df.describe(include='all'))
Output:
df.describe(include='all') output (within option_context):
employee_name department years_experience annual_salary
count 4 4 4.000000 4.000000
unique 4 3 NaN NaN
top Alice Smith HR NaN NaN
freq 1 2 NaN NaN
mean NaN NaN 6.500000 88750.000000
std NaN NaN 3.109126 24281.337141
min NaN NaN 3.000000 65000.000000
25% NaN NaN 4.500000 72500.000000
50% NaN NaN 6.500000 85000.000000
75% NaN NaN 8.500000 101250.000000
max NaN NaN 10.000000 120000.000000
df.describe(include='all') output (after option_context, may be truncated if wide):
employee_name department years_experience annual_salary
count 4 4 4.000000 4.000000
unique 4 3 NaN NaN
top Alice Smith HR NaN NaN
freq 1 2 NaN NaN
mean NaN NaN 6.500000 88750.000000
std NaN NaN 3.109126 24281.337141
min NaN NaN 3.000000 65000.000000
25% NaN NaN 4.500000 72500.000000
50% NaN NaN 6.500000 85000.000000
75% NaN NaN 8.500000 101250.000000
max NaN NaN 10.000000 120000.000000
Combining Solutions for Full Visibility
For complete visibility of all descriptive statistics for all columns:
- Use
df.describe(include='all')
. - Ensure
pd.options.display.max_columns
is set appropriately (e.g., toNone
) if your resultingdescribe()
output DataFrame is very wide.
Conclusion
When DataFrame.describe()
doesn't show all your columns, the reason is usually twofold:
- Data Type Filtering: By default, it only processes numeric columns. Use the
include='all'
(or specify data types) parameter to control which columns are summarized. - Display Truncation: For wide DataFrames, Pandas' display settings might hide columns. Use
pd.set_option('display.max_columns', None)
orpd.option_context
to ensure all generated columns are visible in the output. By understanding and applying these solutions, you can effectively usedescribe()
to get a comprehensive statistical overview of all data within your Pandas DataFrames.