Python Pandas: How to Get DataFrame Memory Usage

Understanding the memory footprint of your Pandas DataFrames is crucial, especially when working with large datasets, as it can impact performance and resource allocation. Pandas provides several methods to inspect the memory usage of a DataFrame, both for individual columns and for the entire object.

This guide explains how to use DataFrame.memory_usage(), sys.getsizeof(), and DataFrame.info() to accurately determine the memory size of your DataFrames.

Why Check DataFrame Memory Usage?

Performance Optimization: Large DataFrames can slow down operations. Knowing memory usage helps identify bottlenecks.
Resource Management: Essential when working in memory-constrained environments (e.g., cloud functions, smaller machines).
Data Type Selection: Understanding memory impact can guide choices for more memory-efficient data types (e.g., int32 vs. int64, category vs. object).
Debugging: Unexpectedly high memory usage can indicate issues like data duplication or inefficient data structures.

Example DataFrame

import pandas as pd
import numpy as np # For np.nan
import sys # For sys.getsizeof

data = {
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'StartDate': pd.to_datetime(['2020-01-15', '2019-03-01', '2021-06-10', np.nan, '2020-08-20']),
    'Salary': [60000, 85000, 120000, 62000, 75000],
    'IsFullTime': [True, True, False, True, True]
}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)
print()

print("Original dtypes:")
print(df.dtypes)

Output:

Original DataFrame:
   EmployeeID              FullName   Department  StartDate  Salary  IsFullTime
0         101      Alice Wonderland           HR 2020-01-15   60000        True
1         102  Robert "Bob" Johnson  Engineering 2019-03-01   85000        True
2         103        Charles Xavier   Management 2021-06-10  120000       False
3         104          Diana Prince           HR        NaT   62000        True
4         105          Edward Nygma     Research 2020-08-20   75000        True

Original dtypes:
EmployeeID             int64
FullName              object
Department            object
StartDate     datetime64[ns]
Salary                 int64
IsFullTime              bool
dtype: object

Method 1: `DataFrame.memory_usage()` (Detailed and Recommended)

The DataFrame.memory_usage(index=True, deep=False) method returns a Pandas Series where the index is the column names (and optionally the DataFrame's index itself) and the values are the memory usage of each component in bytes.

Getting Memory Usage per Column

import pandas as pd

df_example = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'Salary': [60000, 85000, 120000, 62000, 75000],
})

# Get memory usage for each column, including the index by default
memory_per_column_with_index = df_example.memory_usage() # index=True by default
print("Memory usage per column (including index, deep=False by default):")
print(memory_per_column_with_index)

Output:

Memory usage per column (including index, deep=False by default):
Index         72
EmployeeID    40
FullName      20
Department    20
Salary        40
dtype: int64

By default (deep=False), for columns with dtype='object' (like strings), this only reports the memory taken by the pointers to the string objects, not the actual memory consumed by the string data itself.

Including/Excluding Index Memory

The index parameter controls whether the memory usage of the DataFrame's index is included.

import pandas as pd

df_example = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'Salary': [60000, 85000, 120000, 62000, 75000],
})

# Exclude index memory
memory_per_column_no_index = df_example.memory_usage(index=False)
print("Memory usage per column (excluding index):")
print(memory_per_column_no_index)

Output:

Memory usage per column (excluding index):
EmployeeID    40
FullName      20
Department    20
Salary        40
dtype: int32

Getting Total DataFrame Memory (`.sum()`)

To get the total memory usage of the DataFrame, call .sum() on the Series returned by memory_usage().

import pandas as pd

df_example = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'Salary': [60000, 85000, 120000, 62000, 75000],
})

# Total memory including index, but shallow for object columns
total_memory_shallow = df_example.memory_usage(index=True).sum()
print(f"Total memory (shallow, including index): {total_memory_shallow} bytes")

Output:

Total memory (shallow, including index): 192 bytes

Accurate Memory for Object Dtypes (`deep=True`)

To get a more accurate memory count that includes the actual memory used by the Python objects within object dtype columns (like strings), set deep=True.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'StartDate': pd.to_datetime(['2020-01-15', '2019-03-01', '2021-06-10', np.nan, '2020-08-20']),
    'Salary': [60000, 85000, 120000, 62000, 75000],
    'IsFullTime': [True, True, False, True, True]
})

# Memory usage per column with deep inspection
memory_deep_per_column = df.memory_usage(deep=True)
print("Memory usage per column (deep=True):")
print(memory_deep_per_column)
print()

# Total memory with deep inspection
total_memory_deep = df.memory_usage(index=True, deep=True).sum()
print(f"Total memory (deep=True, including index): {total_memory_deep} bytes")

Output:

Memory usage per column (deep=True):
Index          72
EmployeeID     40
FullName      199
Department    158
StartDate      40
Salary         40
IsFullTime      5
dtype: int64

Total memory (deep=True, including index): 554 bytes

Using deep=True provides a much more realistic estimate of memory usage when your DataFrame contains string columns or other Python objects.

Method 2: `DataFrame.info()` (Quick Summary)

The DataFrame.info() method prints a concise summary of the DataFrame, including the data types, non-null counts, and memory usage.

Basic Memory Information

By default, info() provides a shallow memory estimate.

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'StartDate': pd.to_datetime(['2020-01-15', '2019-03-01', '2021-06-10', np.nan, '2020-08-20']),
    'Salary': [60000, 85000, 120000, 62000, 75000],
    'IsFullTime': [True, True, False, True, True]
})

print("DataFrame.info() (default memory_usage):")
df.info()

Output:

DataFrame.info() (default memory_usage):
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   EmployeeID  5 non-null      int64         
 1   FullName    5 non-null      object        
 2   Department  5 non-null      object        
 3   StartDate   4 non-null      datetime64[ns]
 4   Salary      5 non-null      int64         
 5   IsFullTime  5 non-null      bool          
dtypes: bool(1), datetime64[ns](1), int64(2), object(2)
memory usage: 237.0+ bytes

Deep Memory Calculation with `info()`

You can get the deep memory calculation by passing memory_usage='deep' to info().

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'StartDate': pd.to_datetime(['2020-01-15', '2019-03-01', '2021-06-10', np.nan, '2020-08-20']),
    'Salary': [60000, 85000, 120000, 62000, 75000],
    'IsFullTime': [True, True, False, True, True]
})

print("DataFrame.info(memory_usage='deep'):")
df.info(memory_usage='deep')

Output:

DataFrame.info(memory_usage='deep'):
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   EmployeeID  5 non-null      int64         
 1   FullName    5 non-null      object        
 2   Department  5 non-null      object        
 3   StartDate   4 non-null      datetime64[ns]
 4   Salary      5 non-null      int64         
 5   IsFullTime  5 non-null      bool          
dtypes: bool(1), datetime64[ns](1), int64(2), object(2)
memory usage: 554.0 bytes

Method 3: `sys.getsizeof()` (Base Object Size - Less Accurate for Contents)

Python's built-in sys.getsizeof() returns the size of the Python object itself in memory (e.g., the DataFrame shell), but it does not recursively sum the size of the objects it contains (like the underlying NumPy arrays or the individual strings in object columns) unless those objects are very simple. For DataFrames, sys.getsizeof() typically gives a value close to df.memory_usage(deep=False).sum() but might not be exactly the same and is generally less informative than Pandas' own methods.

import sys
import pandas as pd

df = pd.DataFrame({
    'EmployeeID': [101, 102, 103, 104, 105],
    'FullName': ['Alice Wonderland', 'Robert "Bob" Johnson', 'Charles Xavier', 'Diana Prince', 'Edward Nygma'],
    'Department': ['HR', 'Engineering', 'Management', 'HR', 'Research'],
    'StartDate': pd.to_datetime(['2020-01-15', '2019-03-01', '2021-06-10', np.nan, '2020-08-20']),
    'Salary': [60000, 85000, 120000, 62000, 75000],
    'IsFullTime': [True, True, False, True, True]
})

size_sys = sys.getsizeof(df)
size_pandas_shallow = df.memory_usage(index=True, deep=False).sum()
size_pandas_deep = df.memory_usage(index=True, deep=True).sum()

print(f"sys.getsizeof(df): {size_sys} bytes")
print(f"df.memory_usage(deep=False).sum(): {size_pandas_shallow} bytes")
print(f"df.memory_usage(deep=True).sum(): {size_pandas_deep} bytes")

Output:

sys.getsizeof(df): 570 bytes
df.memory_usage(deep=False).sum(): 237 bytes
df.memory_usage(deep=True).sum(): 554 bytes

note

sys.getsizeof() is generally not the preferred method for detailed DataFrame memory analysis.
sys.getsizeof() result can be larger than pandas deep=False because it includes the Python object overhead of the DataFrame itself and its attributes, not just the data arrays.

Interpreting Memory Usage Values

Values are in bytes. Divide by 1024 for kilobytes (KB), by 1024² for megabytes (MB), etc.
For object dtype columns, deep=False is a significant underestimate. deep=True is more accurate.
Memory usage can change based on data types. Optimizing dtypes (e.g., using category for repetitive strings, int32 instead of int64 if values fit) can drastically reduce memory.

Conclusion

To effectively determine the memory size of a Pandas DataFrame:

Use DataFrame.memory_usage(deep=True).sum() for the most accurate total memory footprint, especially with string/object columns.
Use DataFrame.memory_usage(deep=True) (without .sum()) to see the detailed memory usage per column (and index).
Use DataFrame.info(memory_usage='deep') for a quick, human-readable summary that includes the deep memory calculation.
sys.getsizeof() is generally less useful for understanding the memory consumed by the DataFrame's contents.

By monitoring memory usage, you can make informed decisions about data storage, processing strategies, and data type optimization in your Pandas workflows.

Why Check DataFrame Memory Usage?​

Example DataFrame​

Method 1: DataFrame.memory_usage() (Detailed and Recommended)​

Getting Memory Usage per Column​

Including/Excluding Index Memory​

Getting Total DataFrame Memory (.sum())​

Accurate Memory for Object Dtypes (deep=True)​

Method 2: DataFrame.info() (Quick Summary)​

Basic Memory Information​

Deep Memory Calculation with info()​

Method 3: sys.getsizeof() (Base Object Size - Less Accurate for Contents)​

Interpreting Memory Usage Values​

Conclusion​

Table of Contents

Why Check DataFrame Memory Usage?

Example DataFrame

Method 1: `DataFrame.memory_usage()` (Detailed and Recommended)

Getting Memory Usage per Column

Including/Excluding Index Memory

Getting Total DataFrame Memory (`.sum()`)

Accurate Memory for Object Dtypes (`deep=True`)

Method 2: `DataFrame.info()` (Quick Summary)

Basic Memory Information

Deep Memory Calculation with `info()`

Method 3: `sys.getsizeof()` (Base Object Size - Less Accurate for Contents)

Interpreting Memory Usage Values

Conclusion