Python NumPy: How to Fix "IndexError: arrays used as indices must be of integer (or boolean) type"
When performing "fancy indexing" or "boolean masking" in NumPy—that is, using one NumPy array (or list) to select elements from another NumPy array—you might encounter the IndexError: arrays used as indices must be of integer (or boolean) type
. This error is a clear message from NumPy: the array you are using as an indexer (the one inside the square brackets []
) must contain either all integer values (to specify positions) or all boolean values (to create a mask). If your indexing array contains floating-point numbers or other non-integer/non-boolean types, NumPy can not interpret these as valid positional indices or a valid mask.
This guide will thoroughly explain why this IndexError
occurs, demonstrate common scenarios with float-based index arrays, and provide robust solutions, primarily focusing on converting your indexing array to the correct integer or boolean dtype
using astype()
or by ensuring correct dtype
at initialization.
Understanding the Error: NumPy's Indexing Type Requirements
NumPy offers powerful ways to select elements from an array using another array as the indexer:
Integer Array Indexing (Fancy Indexing)
You can pass an array (or list) of integers to select elements at those specific integer positions.
import numpy as np
data_arr = np.array(['A', 'B', 'C', 'D', 'E'])
int_indices = np.array([0, 2, 4]) # Integer indices
selected_elements = data_arr[int_indices] # Selects elements at positions 0, 2, 4
print(selected_elements) # Output: ['A' 'C' 'E']
Output:
['A' 'C' 'E']
For this to work, int_indices
must contain integer values.
Boolean Array Indexing (Masking)
You can pass a boolean array of the same shape as the axis being indexed. Elements corresponding to True
in the mask are selected.
import numpy as np
data_arr_mask = np.array([10, 20, 30, 40, 50])
bool_mask = np.array([True, False, True, False, True]) # Boolean mask
selected_by_mask = data_arr_mask[bool_mask] # Selects elements where mask is True
print(selected_by_mask) # Output: [10 30 50]
- For this,
bool_mask
must contain boolean values. - The
IndexError: arrays used as indices must be of integer (or boolean) type
occurs if the array you use inside the[]
for indexing contains values that are neither integers nor booleans (e.g., floats).
Reproducing the Error: Using a Non-Integer/Non-Boolean Array as an Index
This typically happens if your indexing array is inadvertently created with or converted to a floating-point dtype
.
import numpy as np
main_data_array = np.array([
[10, 11, 12, 13], # Row 0
[20, 21, 22, 23], # Row 1
[30, 31, 32, 33], # Row 2
[40, 41, 42, 43] # Row 3
])
# Indexing array, but it contains floats (e.g., from a calculation or another source)
# Let's say we want to select rows using the first column of another array that happens to be float
potential_row_indices_float = np.array([
[0.0, 10.5, 20.3], # First column is 0.0
[1.0, 11.2, 22.8], # First column is 1.0 (could be integer if not for others)
[2.7, 12.1, 24.5] # First column is 2.7 (float)
])
# If we intended to use the first column of this as indices:
indices_to_use_raw = potential_row_indices_float[:, 0] # This will be [0.0, 1.0, 2.7]
print(f"Raw indexing array: {indices_to_use_raw}")
print(f"dtype of raw indexing array: {indices_to_use_raw.dtype}") # Output: float64
try:
# ⛔️ Incorrect: Trying to index main_data_array using an array of floats
selected_rows_error = main_data_array[indices_to_use_raw]
print(selected_rows_error)
except IndexError as e:
print(f"Error: {e}")
Output:
Raw indexing array: [0. 1. 2.7]
ERROR!
dtype of raw indexing array: float64
Error: arrays used as indices must be of integer (or boolean) type
Even though 0.0
and 1.0
look like integers, the presence of 2.7
(or any other float) makes the entire indices_to_use_raw
array have dtype=float64
. NumPy can not use float values as positional indices.
Verifying the dtype
of the Indexing Array
Always check the dtype
of your indexing array if you encounter this error:
print(your_indexing_array.dtype)
Solution 1: Convert Indexing Array to Integer Type using astype(int)
(Most Common)
If your indexing array contains numbers that are meant to be integer positions but are currently floats (perhaps due to calculations), convert them to integers using the ndarray.astype(int)
method.
import numpy as np
# main_data_array and indices_to_use_raw defined as above
main_data_array = np.array([
[10, 11, 12, 13], # Row 0
[20, 21, 22, 23], # Row 1
[30, 31, 32, 33], # Row 2
[40, 41, 42, 43] # Row 3
])
potential_row_indices_float = np.array([
[0.0, 10.5, 20.3], # First column is 0.0
[1.0, 11.2, 22.8], # First column is 1.0 (could be integer if not for others)
[2.7, 12.1, 24.5] # First column is 2.7 (float)
])
indices_to_use_raw = potential_row_indices_float[:, 0] # This will be [0.0, 1.0, 2.7]
# ✅ Convert the float indexing array to integer type
integer_indices = indices_to_use_raw.astype(int)
print(f"Indexing array after astype(int): {integer_indices}") # Output: [0 1 2] (floats are truncated)
print(f"dtype of integer_indices: {integer_indices.dtype}") # Output: e.g., int64 or int32
# Now use the integer_indices for indexing
selected_rows_correct = main_data_array[integer_indices]
print("Selected rows using integer indices:")
print(selected_rows_correct)
Output:
Indexing array after astype(int): [0 1 2]
dtype of integer_indices: int64
Selected rows using integer indices:
[[10 11 12 13]
[20 21 22 23]
[30 31 32 33]]
Caution: astype(int)
truncates floats (e.g., 2.7
becomes 2
). Ensure this truncation is acceptable for your indexing logic. If you need rounding, use np.round(arr).astype(int)
.
Solution 2: Convert Indexing Array to Boolean Type using astype(bool)
If your indexing array's values are intended to act as a boolean mask (where non-zero typically means True
and zero means False
), you can convert it to bool
type.
How Numeric Values Convert to Booleans
When converting a numeric array to boolean using astype(bool)
:
0
(of any numeric type like0
,0.0
) becomesFalse
.- All other non-zero numbers become
True
. np.nan
often converts toTrue
(this can be surprising, so be careful with NaNs if using this method).
import numpy as np
# main_data_array defined as above
main_data_array = np.array([
[10, 11, 12, 13], # Row 0
[20, 21, 22, 23], # Row 1
[30, 31, 32, 33], # Row 2
[40, 41, 42, 43] # Row 3
])
# Example indexing array where 0 means "don't select" and non-zero means "select"
numeric_mask_like = np.array([1.0, 0.0, 0.0, 5.5]) # Intend to select rows 0 and 3
# Ensure its length matches the axis being indexed (e.g., number of rows in main_data_array)
if len(numeric_mask_like) == main_data_array.shape[0]: # Check length
# ✅ Convert to boolean type
boolean_mask_indices = numeric_mask_like.astype(bool)
print(f"Indexing array after astype(bool): {boolean_mask_indices}")
print(f"dtype of boolean_mask_indices: {boolean_mask_indices.dtype}\n")
selected_rows_bool_mask = main_data_array[boolean_mask_indices]
print("Selected rows using boolean mask from numeric array:")
print(selected_rows_bool_mask)
else:
print("Length of numeric_mask_like does not match number of rows in main_data_array for boolean masking.")
Output:
Indexing array after astype(bool): [ True False False True]
dtype of boolean_mask_indices: bool
Selected rows using boolean mask from numeric array:
[[10 11 12 13]
[40 41 42 43]]
This solution is less common for direct indexing values and more for when an array conceptually represents a mask but isn't yet boolean. Direct creation of boolean masks (e.g., main_data_array[:, 0] > 20
) is usually preferred.
Solution 3: Specify Integer dtype
During Indexing Array Creation
If you are creating the indexing array from data that might be interpreted as float, explicitly set its dtype
to an integer type during creation if appropriate.
import numpy as np
# main_data_array defined as above
main_data_array = np.array([
[10, 11, 12, 13], # Row 0
[20, 21, 22, 23], # Row 1
[30, 31, 32, 33], # Row 2
[40, 41, 42, 43] # Row 3
])
# Data for indices that might otherwise become float
potential_float_indices_data = [0, 1.0, 2] # Mixing int and float could lead to float array
# ✅ Create indexing array with explicit integer dtype
integer_indices_at_creation = np.array(potential_float_indices_data, dtype=int)
print(f"Indexing array created with dtype=int: {integer_indices_at_creation}") # Output: [0 1 2]
print(f"dtype: {integer_indices_at_creation.dtype}\n") # Output: e.g., int64
selected_rows_dtype_init = main_data_array[integer_indices_at_creation]
print("Selected rows using dtype=int at creation:")
print(selected_rows_dtype_init)
Output:
Indexing array created with dtype=int: [0 1 2]
dtype: int64
Selected rows using dtype=int at creation:
[[10 11 12 13]
[20 21 22 23]
[30 31 32 33]]
This preempts the issue by ensuring the indexing array is of the correct integer type from the start.
Key Takeaway: Match Indexer dtype
to Indexing Method
The IndexError: arrays used as indices must be of integer (or boolean) type
is NumPy's way of enforcing strict type requirements for its advanced indexing mechanisms.
- For fancy indexing (selecting specific elements/rows/columns by position), the indexing array must contain integers.
- For boolean masking, the indexing array must contain booleans.
Conclusion
The NumPy IndexError
regarding non-integer or non-boolean index arrays is a common issue when the data type of your indexing array is incorrect for the type of indexing you're attempting. To resolve it:
- Verify
dtype
: Always checkyour_indexing_array.dtype
. - Convert to Integer: If you intend to select by position and your indexing array contains floats (or numbers that can be safely truncated/rounded to integers), use
your_indexing_array.astype(int)
. - Convert to Boolean: If your indexing array represents a condition and should be a mask, convert it using
your_indexing_array.astype(bool)
(being mindful of how numbers andNaN
convert to booleans). - Specify
dtype
at Creation: When creating an array intended for integer indexing, ensure it's created with an integerdtype
from the outset if the source data might lead to float inference (e.g.,np.array(data, dtype=int)
).
By ensuring your indexing arrays have the appropriate integer or boolean data type, you can leverage NumPy's powerful indexing capabilities without encountering this IndexError
.