Skip to main content

Python Pandas: How to Fix "NameError: name 'df' (or 'pd') is not defined"

The NameError: name 'X' is not defined is a fundamental Python error indicating that you're trying to use a variable or name (X) that Python doesn't recognize in the current scope. When working with Pandas, you'll commonly see this as NameError: name 'df' is not defined (referring to a DataFrame variable) or NameError: name 'pd' is not defined (referring to the conventional alias for the Pandas library).

This guide will clearly explain the common causes for these NameErrors, such as attempting to use a variable before it's assigned, scope issues with functions, or problems with importing the Pandas library itself. You'll learn how to diagnose and fix these errors to ensure your Pandas code runs smoothly.

Understanding Python's NameError

In Python, before you can use a variable (like df to hold a DataFrame or pd as an alias for the Pandas module), it must first be "defined." Definition happens through:

  • Assignment: my_variable = 10 or df = pd.DataFrame(...)
  • Import statements: import pandas as pd makes pd known.
  • Function definitions: def my_function(): ... defines my_function.
  • Class definitions: class MyClass: ... defines MyClass.

A NameError occurs when the Python interpreter encounters a name that hasn't been defined in the current scope or any enclosing scopes it can access.

NameError: name 'df' is not defined

This specific error means you're trying to use a variable, conventionally named df for a Pandas DataFrame, before it has been assigned a value (i.e., before a DataFrame object has been created and stored in the df variable).

Cause: Using df Before Assignment

The most straightforward cause is accessing df before the line of code that creates it.

import pandas as pd

# ⛔️ NameError: name 'df' is not defined
# print(df) # Attempting to print df before it's created

# Definition of df happens below this line
df = pd.DataFrame({
'column1': ['A', 'B', 'C'],
'column2': [1, 2, 3]
})

# print(df) # This would work if moved here

Solution: Define df Before Use

Ensure that any code that assigns a DataFrame to the variable df (e.g., df = pd.DataFrame(...) or df = pd.read_csv(...)) executes before any code that tries to use df.

import pandas as pd

# ✅ Step 1: Define/create the DataFrame and assign it to df
df = pd.DataFrame({
'student_name': ['Alice', 'Bob', 'Charlie'],
'score': [85, 90, 78]
})

# ✅ Step 2: Now you can use df
print("DataFrame 'df':")
print(df)

Output:

DataFrame 'df':
student_name score
0 Alice 85
1 Bob 90
2 Charlie 78

Cause: df Defined in a Different Scope (e.g., inside a function)

If df is created inside a function, it's local to that function's scope and cannot be accessed directly from outside unless explicitly returned or declared global.

import pandas as pd

def create_dataframe_locally():
# df_local is local to this function
df_local = pd.DataFrame({'data': [10, 20]})
print("Inside function, df_local exists:", df_local.shape)

create_dataframe_locally()

try:
# ⛔️ NameError: name 'df_local' is not defined
print(df_local) # df_local does not exist in the global scope
except NameError as e:
print(f"Error accessing df_local outside function: {e}")

Output:

Inside function, df_local exists: (2, 1)
Error accessing df_local outside function: name 'df_local' is not defined

Solution: Return df from the Function

The best practice is to have the function return the DataFrame.

import pandas as pd

def create_and_return_dataframe():
df_created = pd.DataFrame({'data': [100, 200, 300]})
return df_created # ✅ Return the DataFrame

# Call the function and assign its return value to df_main_scope
df_main_scope = create_and_return_dataframe()

print("DataFrame returned from function:")
print(df_main_scope)

Output:

DataFrame returned from function:
data
0 100
1 200
2 300

Solution: Using global df (Use with Caution)

You can use the global keyword to indicate that a variable inside a function refers to a global variable. However, overuse of global can make code harder to understand and debug.

import pandas as pd

df_global_var = None # Or some initial value

def create_dataframe_globally():
global df_global_var # Declare intent to modify the global df_global_var
df_global_var = pd.DataFrame({'global_data': [5, 10]})

create_dataframe_globally() # Function call modifies df_global_var

print("Accessing global DataFrame:")
print(df_global_var)

Output:

Accessing global DataFrame:
global_data
0 5
1 10

NameError: name 'pd' is not defined (or name 'pandas' is not defined)

This error means you're trying to use the alias pd (or the full name pandas) before the Pandas library has been successfully imported and made available under that name.

Cause: Pandas Library Not Imported (or Not Imported Correctly)

Solution: Install Pandas

First, ensure Pandas is installed in your Python environment. If not, open your terminal or command prompt and install it:

# For pip (common for most Python environments)
pip install pandas

# For Conda environments (e.g., Anaconda, Miniconda)
conda install pandas
note

You might need pip3 or python -m pip depending on your system setup.

Solution: Import Pandas Correctly (import pandas as pd)

The standard way to import Pandas is at the top of your Python script:

# ✅ Correct import statement at the beginning of the file
import pandas as pd

# Now you can use pd to access Pandas functionalities
my_data = {'col1': [1, 2], 'col2': ['a', 'b']}
df = pd.DataFrame(my_data)
print(df)
note

If you imported it as import pandas (without as pd), you would use pandas.DataFrame(...). The as pd convention is widely adopted.

Cause: Misspelling pd or pandas

Python is case-sensitive. Ensure you're using the correct casing.

  • Correct: pd, pandas
  • Incorrect: Pd, PD, Pandas, PANDAS

Cause: Importing pandas in a Nested Scope

Importing Pandas inside a function or a try block makes it local to that scope.

def some_function():
import pandas as pd # pd is local to some_function
df_in_func = pd.DataFrame({'x':[1]})
return df_in_func

# df_created_by_func = some_function() # This is fine

try:
# ⛔️ NameError: name 'pd' is not defined
# df_outside = pd.DataFrame({'y':[2]}) # pd is not known here
pass # Placeholder to avoid actual error in this block
except NameError as e:
print(f"Error accessing pd outside its import scope: {e}")

# ✅ Always import at the top level of your script for general use!

Similarly, importing inside a try block means pd might not be defined if an error occurs before the import line within the try.

# Potentially problematic import within try-except
try:
# some_code_that_might_fail()
import pandas as pd # If above line fails, this isn't reached
# df = pd.DataFrame(...)
pass
except Exception:
# pd might not be defined here if the import failed or wasn't reached
# print(pd.__version__) # Could cause NameError
pass

# print(pd) # Could cause NameError if import in try failed

Solution: Move import pandas as pd to the top of your file.

General Debugging Tips for NameError

  • Order of Execution: Read your code from top to bottom. Is the variable assigned or module imported before its first use?
  • Scope: Where is the variable defined? Is it accessible from where you're trying to use it?
  • Typos: Double-check spelling and capitalization for both variable names and module aliases.
  • Restart Kernel/IDE: Sometimes, especially in interactive environments like Jupyter Notebooks, a restart can clear up stale states.

Conclusion

NameErrors involving df or pd in Python are usually straightforward to resolve by ensuring:

  1. Variables (like df) are assigned a value (e.g., a DataFrame) before they are used.
  2. Variables are accessed within the scope they are defined in, or properly returned/made global if needed.
  3. The Pandas library is correctly installed and imported (typically as import pandas as pd) at the beginning of your script.
  4. There are no typos in variable names or the Pandas alias (pd). By systematically checking these points, you can effectively eliminate these common NameErrors from your Pandas workflows.