Skip to main content

Python NumPy: How to Find Indices of the N Largest Values in an Array

Identifying the largest values in a dataset and, crucially, their original positions (indices) is a common requirement in many data analysis and machine learning tasks. For instance, you might want to find the top N performing products, the indices of pixels with the highest intensity, or features with the greatest importance. NumPy, with its efficient array operations, provides several methods to achieve this.

This guide will comprehensively demonstrate how to get the indices of the N largest values in a NumPy array. We'll focus on two primary NumPy approaches: using numpy.argpartition() for efficient partial sorting and numpy.argsort() for full sorting, as well as an alternative using the heapq.nlargest() function from Python's standard library for a different perspective.

The Goal: Identifying Positions of Top Values

Given a NumPy array, we want to find the indices of its N largest elements. The order of these N indices themselves might or might not matter, depending on the application.

Let's define a sample 1D NumPy array:

import numpy as np

data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95]) # Contains duplicates
print(f"Original array: {data_array}")
# Corresponding indices: [ 0, 1, 2, 3, 4, 5, 6, 7]

Output:

Original array: [15 80  5 95 30 60  5 95]

In next sections, we want to find indices corresponding of the N largest values in an array. For example, if N=3, we want to find the indices corresponding to values 95, 95, and 80.

numpy.argpartition(a, kth, axis=-1, kind='introselect', order=None) partially sorts an array. After np.argpartition(arr, kth), the element at index kth is in its final sorted position. All elements smaller than arr[kth] will be to its left (not necessarily sorted among themselves), and all elements larger will be to its right (also not necessarily sorted among themselves).

How argpartition() Works

To find the N largest elements, we are interested in the (len(arr) - N)-th element if we were sorting. If kth is negative, it counts from the end. So, np.argpartition(arr, -N) will place the Nth largest element (and elements larger than it) in the correct final part of the partially sorted array of indices.

Okay, let's simplify the explanation and code for using np.argpartition to get the indices of the N largest values, including getting them in sorted order (from largest to Nth largest).

The numpy.argpartition(array, kth) function is highly efficient for finding the k-th smallest element's final sorted position and partitioning the array around it. We can leverage this to find the N largest elements (and their indices) without fully sorting the entire array.

How argpartition() Helps Find Largest Values

If we want the N largest values, we tell argpartition to partition around the Nth largest element. If kth is negative (e.g., -N), it counts from the end, effectively working with largest values. After indices_partitioned = np.argpartition(array, -N), the last N elements of indices_partitioned will be the indices of the N largest values in array. However, these N indices themselves will not be sorted relative to the values they point to.

Getting Indices of N Largest Values (and Sorting Them)

Let's break it down:

  1. Use np.argpartition(array, -N) to get a partially sorted array of indices.
  2. Take the last N indices from this result – these point to the N largest values in the original array.
  3. (Optional but often desired) If you want these N indices to be ordered (e.g., index of the absolute largest first, then second largest, etc.), perform a second small sort only on these N indices based on their corresponding values in the original array.

Example:

import numpy as np

data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])
# Original indices: [ 0, 1, 2, 3, 4, 5, 6, 7]
N = 3 # We want the indices of the 3 largest values

# Step 1: Get indices such that the N largest values are "at the end"
# The actual indices returned will be shuffled, but the last N will point to the N largest values
partitioned_indices = np.argpartition(data_array, -N)
print(f"Indices after argpartition(data_array, -{N}): {partitioned_indices}")
# Example Output: [2 6 0 4 5 1 3 7] (The last 3 indices [1,3,7] point to 80, 95, 95)

# Step 2: Extract the indices of these N largest values
indices_of_top_N_unsorted = partitioned_indices[-N:]
print(f"Indices of the {N} largest values (unsorted among themselves): {indices_of_top_N_unsorted}")
# Example Output: [1 3 7] (These point to values [80 95 95])

# Step 3 (Optional): Sort these N indices based on their actual values in descending order
# To do this, we get the values these indices point to:
values_at_these_indices = data_array[indices_of_top_N_unsorted] # e.g., [80, 95, 95]
# Now find the order to sort these values_at_these_indices in descending order:
order_for_sorting_top_N = np.argsort(-values_at_these_indices) # e.g., [1, 2, 0] or [2, 1, 0] for [80,95,95] -> [95,95,80]
# Use this order to sort indices_of_top_N_unsorted:
indices_of_top_N_sorted = indices_of_top_N_unsorted[order_for_sorting_top_N]

print(f"The {N} largest values themselves are: {data_array[indices_of_top_N_sorted]}")
# Example Output: [95 95 80]
print(f"Indices of the {N} largest values (sorted by value, largest first): {indices_of_top_N_sorted}")
# Example Output: [3 7 1] (or [7 3 1], corresponding to values 95, 95, 80)

Output:

Indices after argpartition(data_array, -3): [2 6 4 0 5 1 3 7]
Indices of the 3 largest values (unsorted among themselves): [1 3 7]
The 3 largest values themselves are: [95 95 80]
Indices of the 3 largest values (sorted by value, largest first): [3 7 1]

Simplified Approach if Order of Top N Indices Doesn't Matter:

If you only need the indices of the N largest values and their specific order (from largest to Nth largest) doesn't matter, you can stop after Step 2:

import numpy as np
data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])
N = 3

indices_of_top_N = np.argpartition(data_array, -N)[-N:]
print(f"Indices of the {N} largest values (order among these N is not guaranteed): {indices_of_top_N}")
print(f"Values at these indices: {data_array[indices_of_top_N]}")

Output:

Indices of the 3 largest values (order among these N is not guaranteed): [1 3 7]
Values at these indices: [80 95 95]
note

argpartition is significantly faster than a full sort (argsort) for finding the N largest (or smallest) elements, especially when N is much smaller than the total size of the array.

Method 2: Using numpy.argsort()

numpy.argsort() returns the indices that would sort an array in ascending order. We can adapt this for descending order.

Full Sort then Slice (Less Efficient for just Top N)

Get all indices for an ascending sort, then take the last N (which correspond to the largest N values), and then reverse that slice if you want the index of the absolute largest first.

import numpy as np

data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])
N = 3

# Get all indices that would sort the array ascending
all_sorted_indices_asc = data_array.argsort()
# Example: [2 6 0 4 5 1 3 7] (indices for 5, 5, 15, 30, 60, 80, 95, 95)

# The last N indices correspond to the N largest values (in ascending order of value)
indices_N_largest_asc_order = all_sorted_indices_asc[-N:]
# Example: [1 3 7] (indices for 80, 95, 95)

# Reverse this slice to get indices from largest to Nth largest
indices_N_largest_desc_order_argsort = indices_N_largest_asc_order[::-1]
# Or concisely: indices_N_largest_desc_order_argsort = data_array.argsort()[-N:][::-1]

print(f"Indices of the {N} largest values (via argsort & slice, largest first): {indices_N_largest_desc_order_argsort}")
print(f"Values at these indices: {data_array[indices_N_largest_desc_order_argsort]}")

Output:

Indices of the 3 largest values (via argsort & slice, largest first): [7 3 1]
Values at these indices: [95 95 80]

This method performs a full sort, which is O(N log N), making it less efficient than argpartition (O(N)) if you only need the top N.

Argsorting a Negated Array (More Direct for Top N with argsort)

Sort the negation of the array. The first N indices of this sort will correspond to the N largest values of the original array.

import numpy as np

data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])
N = 3

# Argsort the negated array. The first N indices are what we want.
indices_N_largest_neg_argsort = (-data_array).argsort()[:N]
# Or: indices_N_largest_neg_argsort = np.argsort(-data_array)[:N]

print(f"Indices of the {N} largest values (via negated argsort, largest first): {indices_N_largest_neg_argsort}")
# Example Output: [3 7 1] (or [7 3 1])
print(f"Values at these indices: {data_array[indices_N_largest_neg_argsort]}")
# Example Output: [95 95 80]

Output:

Indices of the 3 largest values (via negated argsort, largest first): [3 7 1]
Values at these indices: [95 95 80]

This is more direct than full sort + slice + reverse if using argsort.

Method 3: Using heapq.nlargest() with Indices

Python's built-in heapq module has an nlargest function. We can use it by providing a range of indices and using the array's values as the key for comparison.

import numpy as np
from heapq import nlargest

data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])
N = 3

# Get indices of N largest values using heapq.nlargest
# We iterate over range(len(data_array)) which are the indices.
# The `key=data_array.__getitem__` (or `key=lambda idx: data_array[idx]`)
# tells nlargest to use the values from data_array at those indices for comparison.
indices_N_largest_heapq = nlargest(N, range(len(data_array)), key=data_array.__getitem__)

print(f"Indices of the {N} largest values (via heapq.nlargest, largest first): {indices_N_largest_heapq}")
print(f"Values at these indices: {data_array[indices_N_largest_heapq]}")

Output:

Indices of the 3 largest values (via heapq.nlargest, largest first): [3, 7, 1]
Values at these indices: [95 95 80]
note

This method can be efficient for small N relative to the array size. It returns a list of indices.

Creating a Reusable Function

Let's make a reusable function using the efficient argpartition method, ensuring the returned N indices are also sorted by the value they point to (largest first).

import numpy as np

data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])

def get_indices_of_N_largest(arr, n):
"""
Returns the indices of the N largest values in a NumPy array,
sorted such that the index of the absolute largest value comes first.
"""
if n <= 0:
return np.array([], dtype=int)
if n >= len(arr): # If N is total length or more, return all indices sorted descending
return (-arr).argsort()

# Get indices of the N largest values (these indices are not sorted by value)
indices_top_n_unsorted = np.argpartition(arr, -n)[-n:]

# Get the actual values at these unsorted top N indices
values_at_indices = arr[indices_top_n_unsorted]

# Sort these top N values in descending order and get their *relative* indices
order_within_top_n = np.argsort(-values_at_indices)

# Use these relative indices to sort the absolute indices
sorted_indices_top_n = indices_top_n_unsorted[order_within_top_n]

return sorted_indices_top_n

# Example usage:
# data_array = np.array([15, 80, 5, 95, 30, 60, 5, 95])
print(f"--- Using Reusable Function (argpartition based) ---")
print(f"Indices of top 3: {get_indices_of_N_largest(data_array, 3)}")
print(f"Indices of top 1: {get_indices_of_N_largest(data_array, 1)}")
print(f"Indices of top 0: {get_indices_of_N_largest(data_array, 0)}")
print(f"Indices of top all (8): {get_indices_of_N_largest(data_array, 8)}")

Output:

--- Using Reusable Function (argpartition based) ---
Indices of top 3: [3 7 1]
Indices of top 1: [7]
Indices of top 0: []
Indices of top all (8): [3 7 1 5 4 0 2 6]

Conclusion

Finding the indices of the N largest values in a NumPy array can be done efficiently using several approaches:

  1. np.argpartition(arr, -N)[-N:]: This is generally the most efficient for large arrays when you only need the top N elements/indices without full sorting. You might need an additional sort on this small subset if you need the top N indices themselves ordered by value.
  2. (-arr).argsort()[:N]: Uses full sorting of a negated array (for numeric types). Simple and direct if a full sort is acceptable or N is close to the array length.
  3. arr.argsort()[-N:][::-1]: Full ascending sort, then take last N, then reverse. Conceptually clear but less efficient than argpartition for small N.
  4. heapq.nlargest(N, range(len(arr)), key=arr.__getitem__): A Python standard library approach, good for smaller N and can be competitive.

The best choice depends on the size of your array, the value of N, and whether you need the N indices themselves to be sorted by the magnitude of the values they point to.