Skip to main content

Python NumPy: How to Calculate the Range of Array Elements (Peak-to-Peak)

Calculating the "range" of data—defined as the difference between the maximum and minimum values—is a fundamental statistical measure that provides insight into the spread or dispersion of values within a dataset. In NumPy, you can efficiently compute this range for an entire array, or along specific axes (per row or per column).

This guide will comprehensively demonstrate how to find the range of elements in a NumPy array using the dedicated numpy.ptp() (peak-to-peak) function. We'll also explore how to achieve the same result by manually using numpy.max() and numpy.min(), and critically, how to handle NaN (Not a Number) values correctly during range calculation using numpy.nanmax() and numpy.nanmin().

Understanding the "Range" in a NumPy Array

In the context of numerical data, the range is a simple measure of dispersion. It's calculated as: Range = Maximum Value - Minimum Value

For a 2D NumPy array, you might want to find:

  • The overall range of all elements in the array.
  • The range of values within each row.
  • The range of values within each column.

Let's define a sample 2D NumPy array:

import numpy as np

data_array = np.array([
[10, 2, 15, 8], # Row 0
[5, 18, 3, 12], # Row 1
[20, 1, 9, 16] # Row 2
])
print("Original 2D NumPy Array:")
print(data_array)

Output:

Original 2D NumPy Array:
[[10 2 15 8]
[ 5 18 3 12]
[20 1 9 16]]

The numpy.ptp(a, axis=None, out=None, keepdims=<no value>) function directly calculates the range of values (maximum - minimum) along a specified axis. The name ptp stands for "peak to peak." This is the most concise and idiomatic NumPy way to find the range.

Calculating Range for the Entire Array (Flattened)

If axis=None (the default), np.ptp() flattens the array and computes the range of all its elements.

import numpy as np

# data_array defined as above
data_array = np.array([
[10, 2, 15, 8], # Row 0
[5, 18, 3, 12], # Row 1
[20, 1, 9, 16] # Row 2
])

# Calculate overall range of all elements
overall_range = np.ptp(data_array)

# Max element is 20, Min element is 1. Range = 20 - 1 = 19.
print(f"Overall range of the array: {overall_range}")

Output:

Overall range of the array: 19

Calculating Range Along Rows (axis=1)

To find the range for each row, set axis=1.

import numpy as np

# data_array defined as above
data_array = np.array([
[10, 2, 15, 8], # Row 0
[5, 18, 3, 12], # Row 1
[20, 1, 9, 16] # Row 2
])

# Calculate the range for each row
range_per_row = np.ptp(data_array, axis=1)
print("Range for each row (axis=1):")
print(range_per_row)

Output:

Range for each row (axis=1):
[13 15 19]

Calculating Range Along Columns (axis=0)

To find the range for each column, set axis=0.

import numpy as np

# data_array defined as above
data_array = np.array([
[10, 2, 15, 8], # Row 0
[5, 18, 3, 12], # Row 1
[20, 1, 9, 16] # Row 2
])

# Calculate the range for each column
range_per_column = np.ptp(data_array, axis=0)
print("Range for each column (axis=0):")
print(range_per_column)

Output:

Range for each column (axis=0):
[15 17 12 8]

Method 2: Manually Calculating Range with numpy.max() and numpy.min()

You can also calculate the range manually by finding the maximum and minimum values along the desired axis and then subtracting them. This achieves the same result as np.ptp().

import numpy as np

# data_array defined as above
data_array = np.array([
[10, 2, 15, 8], # Row 0
[5, 18, 3, 12], # Row 1
[20, 1, 9, 16] # Row 2
])


def calculate_range_manual(arr, axis_val=None):
max_values = np.max(arr, axis=axis_val)
min_values = np.min(arr, axis=axis_val)
return max_values - min_values

# Overall range
overall_range_manual = calculate_range_manual(data_array) # axis_val defaults to None
print(f"Overall range (manual): {overall_range_manual}\n")

# Range per row
range_per_row_manual = calculate_range_manual(data_array, axis_val=1)
print(f"Range per row (manual): {range_per_row_manual}\n")

# Range per column
range_per_column_manual = calculate_range_manual(data_array, axis_val=0)
print(f"Range per column (manual): {range_per_column_manual}")

Output:

Overall range (manual): 19

Range per row (manual): [13 15 19]

Range per column (manual): [15 17 12 8]
note

While this works, np.ptp() is more direct and concise for this specific task.

Handling NaN Values When Calculating Range

NaN (Not a Number) values represent missing or undefined data. Standard np.ptp(), np.max(), and np.min() functions will propagate NaNs: if any value along the axis is NaN, the result for that axis will often be NaN.

The Issue: np.max(), np.min(), and np.ptp() with NaNs

import numpy as np

array_with_nan = np.array([
[5.0, 1.0, 10.0, np.nan],
[np.nan, 2.0, 6.0, 8.0],
[8.0, np.nan, 4.0, 3.0]
])
print("Array with NaN values:")
print(array_with_nan)
print()

# ptp will likely result in NaN for rows/columns containing NaN
print(f"np.ptp(array_with_nan, axis=1): {np.ptp(array_with_nan, axis=1)}\n")
print(f"np.ptp(array_with_nan, axis=0): {np.ptp(array_with_nan, axis=0)}")

Output:

Array with NaN values:
[[ 5. 1. 10. nan]
[nan 2. 6. 8.]
[ 8. nan 4. 3.]]

np.ptp(array_with_nan, axis=1): [nan nan nan]

np.ptp(array_with_nan, axis=0): [nan nan 6. nan]

Solution: Using numpy.nanmax() and numpy.nanmin()

To calculate the range while ignoring NaN values, use numpy.nanmax() and numpy.nanmin(). These functions compute the maximum and minimum, respectively, as if NaNs were not present. There isn't a direct np.nanptp(), so you combine these two.

import numpy as np

# array_with_nan defined as above
array_with_nan = np.array([
[5.0, 1.0, 10.0, np.nan],
[np.nan, 2.0, 6.0, 8.0],
[8.0, np.nan, 4.0, 3.0]
])

def calculate_nan_safe_range(arr, axis_val=None):
# Calculate max ignoring NaNs
max_val_no_nan = np.nanmax(arr, axis=axis_val)
# Calculate min ignoring NaNs
min_val_no_nan = np.nanmin(arr, axis=axis_val)
return max_val_no_nan - min_val_no_nan

# Overall NaN-safe range
overall_nan_range = calculate_nan_safe_range(array_with_nan)
# For array_with_nan: nanmax is 10.0, nanmin is 1.0. Range = 9.0
print(f"Overall NaN-safe range: {overall_nan_range}\n")

# NaN-safe range per row
nan_range_per_row = calculate_nan_safe_range(array_with_nan, axis_val=1)
print(f"NaN-safe range per row: {nan_range_per_row}\n")

# NaN-safe range per column
nan_range_per_column = calculate_nan_safe_range(array_with_nan, axis_val=0)
print(f"NaN-safe range per column: {nan_range_per_column}")

Output:

Overall NaN-safe range: 9.0

NaN-safe range per row: [9. 6. 5.]

NaN-safe range per column: [3. 1. 6. 5.]
warning

Caution: If an entire slice (row or column) consists only of NaN values, np.nanmax() and np.nanmin() will raise a RuntimeWarning and return NaN for that slice, making the range also NaN.

Conclusion

Calculating the range of values is a straightforward way to understand data dispersion in NumPy arrays.

  • The numpy.ptp() function is the most direct and recommended method for finding the range (max - min) along any axis of an array.
  • Alternatively, you can manually compute the range using numpy.max() and numpy.min().
  • When dealing with arrays that may contain NaN values, it's crucial to use numpy.nanmax() and numpy.nanmin() to ensure NaNs are ignored in the calculation, providing a meaningful range based on the available numeric data.

These methods provide flexible and efficient ways to determine the "peak-to-peak" range in your NumPy arrays, aiding in data exploration and preprocessing.