Skip to main content

Python NumPy: How to Apply a Mask from One Array to Another

In data analysis with NumPy, a common task is to filter or select elements from an array based on conditions derived from another array. This often involves creating a boolean mask from one array (e.g., based on a threshold) and then applying this same mask to a second, corresponding array to select or hide its elements at the same positions. NumPy's masked_array submodule (numpy.ma) provides convenient tools for these operations.

This guide will comprehensively demonstrate how to create a mask from one NumPy array based on a condition and then apply that exact mask to another array, effectively linking their filtering. We'll primarily use numpy.ma.masked_where() and numpy.ma.getmask() for this, covering both 1D and 2D array scenarios.

Understanding Masked Arrays in NumPy

NumPy's numpy.ma module allows for the creation of "masked arrays." A masked array is a standard NumPy array combined with a boolean mask of the same shape. Where the mask is True, the corresponding element in the data array is considered "masked" or invalid (e.g., hidden from calculations, often displayed as --). Where the mask is False, the data element is considered valid.

The Goal: Synchronized Masking Across Arrays

Often, you have two (or more) arrays where the elements at corresponding positions are related. For instance, array_A might hold sensor readings, and array_B might hold timestamps for those readings. If you want to filter out readings in array_A that are above a certain threshold, you'd also want to filter out the corresponding timestamps in array_B. This requires applying the same mask to both arrays.

Let's define two sample 1D NumPy arrays:

import numpy as np

# Array from which the condition/mask will be derived
source_array_for_mask = np.array([10, 5, 25, 8, 12, 30])

# Array to which the mask will be applied
target_array_to_be_masked = np.array(['A', 'B', 'C', 'D', 'E', 'F'])

print("Source array (for creating mask):", source_array_for_mask)
print("Target array (to apply mask to):", target_array_to_be_masked)

Output:

Source array (for creating mask): [10  5 25  8 12 30]
Target array (to apply mask to): ['A' 'B' 'C' 'D' 'E' 'F']

This is the most explicit way to generate a mask from one array and apply it to another.

Step 1: Create a Masked Array from the First Array Based on a Condition

Use np.ma.masked_where(condition, array_to_mask) to create an initial masked array. The condition will generate a boolean array; where True, elements in array_to_mask will be masked.

import numpy as np

# source_array_for_mask and target_array_to_be_masked defined as above
source_array_for_mask = np.array([10, 5, 25, 8, 12, 30])
target_array_to_be_masked = np.array(['A', 'B', 'C', 'D', 'E', 'F'])

# Condition: elements in source_array_for_mask are greater than 10
condition = (source_array_for_mask > 10)
print(f"Condition (source_array_for_mask > 10): {condition}")

# Create a masked version of source_array_for_mask
masked_source_array = np.ma.masked_where(condition, source_array_for_mask)
print("Masked source array (elements > 10 are masked '--'):")
print(masked_source_array)

Output:

Condition (source_array_for_mask > 10): [False False  True False  True  True]
Masked source array (elements > 10 are masked '--'):
[10 5 -- 8 -- --]

Step 2: Extract the Boolean Mask using np.ma.getmask()

The np.ma.getmask(masked_array_object) function returns the actual boolean mask associated with a masked array.

import numpy as np

# masked_source_array created as above
source_array_for_mask = np.array([10, 5, 25, 8, 12, 30])
target_array_to_be_masked = np.array(['A', 'B', 'C', 'D', 'E', 'F'])
condition = (source_array_for_mask > 10)
masked_source_array = np.ma.masked_where(condition, source_array_for_mask)


# Extract the boolean mask
the_generated_mask = np.ma.getmask(masked_source_array)
print("Extracted boolean mask from masked_source_array:")
print(the_generated_mask)

Output:

Extracted boolean mask from masked_source_array:
[False False True False True True]

Step 3: Apply the Extracted Mask to the Second Array

Now, use this the_generated_mask with np.ma.masked_where() on your target_array_to_be_masked.

import numpy as np

# target_array_to_be_masked and the_generated_mask defined as above
source_array_for_mask = np.array([10, 5, 25, 8, 12, 30])
target_array_to_be_masked = np.array(['A', 'B', 'C', 'D', 'E', 'F'])
condition = (source_array_for_mask > 10)
masked_source_array = np.ma.masked_where(condition, source_array_for_mask)
the_generated_mask = np.ma.getmask(masked_source_array)

# ✅ Apply the_generated_mask to target_array_to_be_masked
masked_target_array = np.ma.masked_where(the_generated_mask, target_array_to_be_masked)

print("Target array after applying the mask from source array:")
print(masked_target_array)

Output:

Target array after applying the mask from source array:
['A' 'B' -- 'D' -- --]
note

Elements in target_array_to_be_masked at positions where the_generated_mask is True (i.e., where source_array_for_mask was > 10) are now masked.

Accessing Non-Masked Data with np.ma.compressed()

To get a new array containing only the non-masked elements (the valid data), use np.ma.compressed().

import numpy as np

# masked_source_array and masked_target_array defined as above
source_array_for_mask = np.array([10, 5, 25, 8, 12, 30])
target_array_to_be_masked = np.array(['A', 'B', 'C', 'D', 'E', 'F'])
condition = (source_array_for_mask > 10)
masked_source_array = np.ma.masked_where(condition, source_array_for_mask)
the_generated_mask = np.ma.getmask(masked_source_array)
masked_target_array = np.ma.masked_where(the_generated_mask, target_array_to_be_masked)

print("Non-masked elements from masked_source_array:")
print(np.ma.compressed(masked_source_array))

print("Non-masked elements from masked_target_array:")
print(np.ma.compressed(masked_target_array))

Output:

Non-masked elements from masked_source_array:
[10 5 8]
Non-masked elements from masked_target_array:
['A' 'B' 'D']

Method 2: Applying the Same Condition Directly to Both Arrays

If the mask is generated by a simple condition on one array, and you want to apply the effect of that same condition to another array (by masking elements at the same positions), you can simply reuse the boolean condition array directly. This doesn't involve np.ma.getmask().

import numpy as np

# source_array_for_mask and target_array_to_be_masked defined as before
source_array_for_mask = np.array([10, 5, 25, 8, 12, 30])
target_array_to_be_masked = np.array(['A', 'B', 'C', 'D', 'E', 'F'])

# Condition based on source_array_for_mask
condition = (source_array_for_mask > 10)

# Apply this condition to mask source_array_for_mask
masked_source_direct = np.ma.masked_where(condition, source_array_for_mask)
print("Masked source (direct condition):", masked_source_direct)

# ✅ Apply the *same condition array* to mask target_array_to_be_masked
masked_target_direct = np.ma.masked_where(condition, target_array_to_be_masked)
print("Masked target (direct condition):", masked_target_direct)

Output:

Masked source (direct condition): [10 5 -- 8 -- --]
Masked target (direct condition): ['A' 'B' -- 'D' -- --]
note

This is more concise if the goal is simply to use the same boolean conditional outcome on multiple arrays. Method 1 is more explicit if you are passing around the "mask object" itself.

Applying Masks Between 2D NumPy Arrays

The same principles apply to 2D arrays. The condition and the resulting mask will be 2D.

import numpy as np

array_2d_source = np.array([
[10, 5, 25],
[8, 12, 30]
])
array_2d_target = np.array([
['ValA', 'ValB', 'ValC'],
['ValD', 'ValE', 'ValF']
])

# Condition: elements in array_2d_source > 10
condition_2d = (array_2d_source > 10)
print("2D Condition (array_2d_source > 10):")
print(condition_2d)

# Using Method 1:
masked_source_2d_m1 = np.ma.masked_where(condition_2d, array_2d_source)
extracted_mask_2d = np.ma.getmask(masked_source_2d_m1)
masked_target_2d_m1 = np.ma.masked_where(extracted_mask_2d, array_2d_target)

print("Masked 2D source array (Method 1):")
print(masked_source_2d_m1)
print("Masked 2D target array (Method 1):")
print(masked_target_2d_m1)

# Using Method 2 (direct condition):
masked_target_2d_m2 = np.ma.masked_where(condition_2d, array_2d_target)
print("Masked 2D target array (Method 2 - direct condition):")
print(masked_target_2d_m2) # Same output as masked_target_2d_m1

Output:

2D Condition (array_2d_source > 10):
[[False False True]
[False True True]]
Masked 2D source array (Method 1):
[[10 5 --]
[8 -- --]]
Masked 2D target array (Method 1):
[['ValA' 'ValB' --]
['ValD' -- --]]
Masked 2D target array (Method 2 - direct condition):
[['ValA' 'ValB' --]
['ValD' -- --]]

Conclusion

Applying a mask derived from one NumPy array to another is a powerful technique for synchronized data filtering.

  • The recommended and most explicit method involves:
    1. Creating a boolean condition based on your source array.
    2. Using np.ma.masked_where(condition, source_array) to create an initial masked array.
    3. Extracting the definitive boolean mask with np.ma.getmask(initial_masked_array).
    4. Applying this extracted mask to your target array using np.ma.masked_where(extracted_mask, target_array).
  • A simpler, direct approach is to generate the boolean condition from the source array and use this same condition array directly with np.ma.masked_where() on both the source and target arrays: np.ma.masked_where(condition, target_array).

Both approaches work for 1D and 2D arrays (and higher dimensions), allowing you to maintain consistency when filtering related datasets based on criteria from one of them. Remember to use np.ma.compressed() if you need a 1D array of only the unmasked values.