Python NumPy: How to Fix "ValueError: Object arrays cannot be loaded when allow_pickle=False"
When saving and loading NumPy arrays, particularly those with dtype=object
(object arrays), you might encounter the ValueError: Object arrays cannot be loaded when allow_pickle=False
. This error arises because NumPy, by default, disables the loading of pickled object arrays from .npy
files as a security precaution (pickled data can potentially execute arbitrary code). Object arrays are often created when your array contains heterogeneous data types (like strings and numbers mixed, or lists within arrays) or when you explicitly set dtype=object
.
This guide will clearly explain why this ValueError
occurs due to the allow_pickle=False
default in np.load()
, demonstrate how to reproduce it, and provide robust solutions, primarily focusing on setting allow_pickle=True
during loading or, alternatively, saving your array with a non-object numerical data type if appropriate. We'll also briefly touch upon an older workaround related to Keras and np.load.__defaults__
.
Understanding the Error: Object Arrays, Pickling, and Security
- NumPy Object Arrays (
dtype=object
): Unlike standard NumPy arrays that hold homogeneous numerical data (e.g., allint64
or allfloat64
), object arrays can store arbitrary Python objects in each element. This could include strings, lists, dictionaries, or custom class instances. - Pickling: When NumPy saves an object array to a
.npy
file, it might use Python'spickle
module to serialize these arbitrary Python objects. - Security Concern: Loading pickled data from an untrusted source can be a security risk, as unpickling can execute arbitrary code embedded within the pickled data.
allow_pickle=False
Default: To mitigate this risk,numpy.load()
hasallow_pickle=False
as its default setting. This prevents the loading of files that contain pickled object arrays, leading to theValueError
if such an array is encountered.
Reproducing the Error: Loading an Object Array with allow_pickle=False
Let's save a NumPy array with dtype=object
and then try to load it with the default allow_pickle=False
.
import numpy as np
import os # For file cleanup
# Create an array of Python lists, which forces dtype=object
data_to_save = np.array([[1, 2], [3, 4, 5], ['hello']], dtype=object)
file_name = 'my_object_array.npy'
# Save the object array
np.save(file_name, data_to_save) # allow_pickle=True is default for save
print(f"Saved object array to {file_name}. Array dtype: {data_to_save.dtype}")
try:
# ⛔️ Attempt to load with allow_pickle=False (default)
loaded_array_error = np.load(file_name) # allow_pickle defaults to False
print(loaded_array_error)
except ValueError as e:
print(f"Error: {e}")
finally:
if os.path.exists(file_name):
os.remove(file_name) # Clean up
Output:
Saved object array to my_object_array.npy. Array dtype: object
Error: Object arrays cannot be loaded when allow_pickle=False
Solution 1: Set allow_pickle=True
in numpy.load()
(Recommended Fix)
If you trust the source of the .npy
file and know it contains an object array that you need to load, the direct solution is to explicitly set allow_pickle=True
when calling np.load()
.
Basic Usage
import numpy as np
import os
data_to_save = np.array([['apple', 'banana'], ['cherry', 'date']], dtype=object)
file_name = 'my_object_array_trusted.npy'
np.save(file_name, data_to_save)
# ✅ Correct: Load the object array with allow_pickle=True
loaded_array_correct = np.load(file_name, allow_pickle=True)
print("Successfully loaded object array with allow_pickle=True:")
print(loaded_array_correct)
print(f"dtype of loaded array: {loaded_array_correct.dtype}")
if os.path.exists(file_name): os.remove(file_name)
Output:
Successfully loaded object array with allow_pickle=True:
[['apple' 'banana']
['cherry' 'date']]
dtype of loaded array: object
Warning: Only set allow_pickle=True
if you are certain the .npy
file comes from a trusted source.
Using with a with open(...)
Statement (for .npz
files, though concept applies)
While np.load
itself doesn't directly return a context manager for .npy
files (it returns the array), this syntax is common for .npz
files (compressed archives of multiple arrays). The allow_pickle=True
argument still applies if any array within the .npz
file is an object array.
For a single .npy
file, the standard load is as in 3.1.
If you were loading an .npz
file that might contain object arrays:
# Conceptual example for .npz
with np.load('my_archive.npz', allow_pickle=True) as data:
array_a = data['arr_a'] # If arr_a was an object array
array_b = data['arr_b']
Solution 2: Save the Array with a Non-Object dtype
(If Applicable)
If the reason your array has dtype=object
is due to mixed numerical types that could be accommodated by a single numerical type (e.g., all numbers but some are int
and some float
, leading NumPy to choose object
if not explicitly typed), or if all elements are strings that could be represented with a fixed-length string dtype, consider saving with a more specific, non-object dtype
. This avoids the pickling issue altogether.
import numpy as np
import os
# Data that might initially be inferred as object if not careful during creation
# but can be represented as a specific numerical type or fixed-string type.
numerical_data = [[1, 2, 3], [4, 5, 6]]
file_name_int = 'my_int_array.npy'
# ✅ Save with a specific numerical dtype (e.g., int)
np.save(file_name_int, np.array(numerical_data, dtype=int))
# Now loading works without allow_pickle=True because it's not an object array
loaded_int_array = np.load(file_name_int) # allow_pickle=False by default is fine
print("Loaded integer array (not an object array):")
print(loaded_int_array)
print(f"dtype: {loaded_int_array.dtype}")
if os.path.exists(file_name_int): os.remove(file_name_int)
Output:
Loaded integer array (not an object array):
[[1 2 3]
[4 5 6]]
dtype: int32
This is the safest approach if your data doesn't truly need to be an array of arbitrary Python objects.
A Note on allow_pickle
in numpy.save()
The numpy.save(file, arr, allow_pickle=True, ...)
function also has an allow_pickle
argument.
- It defaults to
True
. This meansnp.save()
will use pickling if it needs to save an object array. - If you set
allow_pickle=False
innp.save()
and try to save an object array,np.save()
itself will raise aValueError
because it cannot save the object array without pickling.
So, the allow_pickle
in np.save()
controls whether saving object arrays is permitted, while allow_pickle
in np.load()
controls whether loading them is permitted.
Legacy/Specific Case: Keras load_data()
and Modifying np.load.__defaults__
(Use with Extreme Caution)
Older versions of some libraries (like Keras for its dataset loaders, e.g., keras.datasets.mnist.load_data()
) might internally call np.load()
without explicitly passing allow_pickle=True
. If the data files they try to load are object arrays (which some older dataset formats might have been), this could trigger the error.
A workaround seen in some older solutions involves temporarily modifying the default arguments of np.load()
. This is a highly discouraged practice (monkey-patching) as it can have unintended side effects and makes code harder to understand and maintain.
# ⚠️ Highly Discouraged Monkey-Patching - For Illustration Only ⚠️
import numpy as np
from tensorflow import keras # Assuming keras is installed
# Store original defaults
original_np_load_defaults = np.load.__defaults__
# Temporarily change default for allow_pickle (2nd arg in defaults tuple for np.load)
np.load.__defaults__ = (None, True, True, 'ASCII') # Setting allow_pickle default to True
# Example: Load Keras data that might use np.load internally
# This is a conceptual example; actual Keras data files might not be object arrays today.
try:
# (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print("Keras data loaded (conceptually, if it used object arrays and np.load internally without allow_pickle).")
except Exception as e:
print(f"Error with Keras load_data example (this is conceptual): {e}")
# ❗️ Crucially, restore original defaults immediately after
np.load.__defaults__ = original_np_load_defaults
print("NumPy load defaults restored.")
It is much better to address the root cause (e.g., the library not passing allow_pickle=True
when it should, or the data file format itself) or use the explicit np.load(..., allow_pickle=True)
if you are calling np.load
directly. Modifying library defaults is risky.
Alternative: Downgrading NumPy (Generally Not Recommended)
The allow_pickle=False
default in np.load()
was introduced for security reasons in NumPy 1.16.3. While downgrading NumPy to a version before this change (e.g., 1.16.2 or earlier, as the original article mentioned 1.16.1 for Keras) might make the error go away by changing the default, this is strongly discouraged. It exposes you to potential security risks and prevents you from using newer NumPy features and bug fixes.
Conclusion
The ValueError: Object arrays cannot be loaded when allow_pickle=False
is a security feature in NumPy.
- The primary and recommended solution is to explicitly set
allow_pickle=True
in yournumpy.load()
call, but only if you trust the source of the.npy
file:loaded_array = np.load('my_object_array.npy', allow_pickle=True)
- If your data does not need to be an array of arbitrary Python objects, consider saving and loading it with a specific, non-object numerical or string
dtype
. This avoids pickling altogether and is often more efficient. - Avoid modifying
np.load.__defaults__
unless in very specific, controlled legacy situations with no other recourse, and always restore it. - Downgrading NumPy is not a recommended long-term solution.
By understanding why object arrays require pickling and using allow_pickle=True
judiciously, you can safely and effectively work with NumPy object arrays.