Python NumPy: How to Flatten Only Specific Dimensions of an Array (Partial Flattening)
NumPy arrays can have multiple dimensions, representing complex data structures. While fully flattening an array into a 1D sequence is common (array.flatten()
or array.ravel()
), there are many scenarios where you only want to flatten some of its dimensions, effectively reducing its dimensionality while preserving others. For example, you might want to convert a 3D array of shape (N, M, P)
into a 2D array of shape (N*M, P)
, collapsing the first two dimensions.
This guide will comprehensively demonstrate how to achieve partial flattening of NumPy arrays, primarily using the versatile numpy.reshape()
method with explicit shape specification or the convenient -1
dimension inference. We'll also briefly touch upon numpy.vstack()
as an alternative for specific stacking-based flattening scenarios.
Understanding Partial Flattening of NumPy Arrays
Partial flattening means reducing the number of dimensions of a NumPy array by combining (collapsing) two or more existing dimensions into a single new dimension, while leaving other dimensions intact. The total number of elements in the array must remain the same.
For instance, given a 3D array of shape (num_samples, num_timesteps, num_features)
, you might want to flatten the first two dimensions (num_samples
and num_timesteps
) to get a 2D array of shape (num_samples * num_timesteps, num_features)
, effectively treating each sample's time series as a sequence of feature vectors.
Let's define a sample 3D NumPy array:
import numpy as np
# Create a 3D array with shape (2 planes, 3 rows per plane, 4 columns per plane)
array_3d = np.arange(2 * 3 * 4).reshape((2, 3, 4))
print("Original 3D NumPy Array (shape {}):".format(array_3d.shape))
print(array_3d)
Output:
Original 3D NumPy Array (shape (2, 3, 4)):
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]
Method 1: Using numpy.reshape()
(Recommended)
The numpy.reshape(newshape, order='C')
method (or its equivalent array.reshape(newshape)
) is the most direct and flexible way to change an array's shape without changing its data, provided the new shape is compatible with the original size.
Reshaping with Explicit New Dimensions
If you know the exact new dimensions, you can specify them directly. For our (2, 3, 4)
array (total 24 elements), let's flatten the first two dimensions (2 and 3) into one, keeping the last dimension (4) as is. The new first dimension will be 2 * 3 = 6
.
import numpy as np
# array_3d defined as above (shape (2, 3, 4))
array_3d = np.arange(2 * 3 * 4).reshape((2, 3, 4))
# Flatten the first two dimensions (2*3=6) into one, keeping the last dimension (4)
# New shape will be (6, 4)
partially_flattened_explicit = array_3d.reshape(6, 4)
# Or: partially_flattened_explicit = array_3d.reshape(array_3d.shape[0] * array_3d.shape[1], array_3d.shape[2])
print("Partially flattened array (to shape (6, 4)) with explicit dimensions:")
print(partially_flattened_explicit)
print(f"New shape: {partially_flattened_explicit.shape}")
Output:
Partially flattened array (to shape (6, 4)) with explicit dimensions:
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
New shape: (6, 4)
Using -1
for Automatic Dimension Inference in reshape()
You can specify one dimension in the newshape
tuple as -1
. NumPy will automatically calculate the size of this dimension based on the total number of elements in the array and the sizes of the other specified dimensions.
Flattening All But the Last Dimension
This is a very common use case: convert an N-D array into a 2D array where the last dimension of the original array becomes the second dimension of the new array, and all preceding dimensions are collapsed into the first.
import numpy as np
# array_3d defined as above (shape (2, 3, 4))
array_3d = np.arange(2 * 3 * 4).reshape((2, 3, 4))
# Flatten all dimensions except the last one.
# The last dimension has size array_3d.shape[-1] (which is 4).
# The -1 tells reshape to calculate the first dimension automatically.
flattened_all_but_last = array_3d.reshape(-1, array_3d.shape[-1])
print("Flattened all but last dimension (using -1):")
print(flattened_all_but_last)
print(f"New shape: {flattened_all_but_last.shape}")
Output:
Flattened all but last dimension (using -1):
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
New shape: (6, 4)
How -1 is calculated:
- Original array size:
array_3d.size = 2 * 3 * 4 = 24
- Last dimension specified:
array_3d.shape[-1] = 4
- Inferred first dimension:
24 / 4 = 6
Flattening All But the Last Two Dimensions (and so on)
You can extend this to keep more trailing dimensions. The *
operator can unpack parts of the shape tuple.
import numpy as np
array_4d = np.arange(2 * 2 * 3 * 2).reshape((2, 2, 3, 2)) # Shape (2, 2, 3, 2), size 24
print("Original 4D array (shape {}):".format(array_4d.shape))
# print(array_4d) # Note: Printing can be long
# Flatten all but the last TWO dimensions.
# array_4d.shape[-2:] gives a tuple of the last two dimension sizes: (3, 2)
# The * unpacks this tuple into individual arguments for reshape.
flattened_all_but_last_two = array_4d.reshape(-1, *array_4d.shape[-2:])
# Inferred first dimension: 24 / (3*2) = 24 / 6 = 4
# New shape will be (4, 3, 2)
print("Flattened all but last two dimensions:")
print(f"New shape: {flattened_all_but_last_two.shape}") # Output: New shape: (4, 3, 2)
print(flattened_all_but_last_two) # Will show the 4x3x2 array
Output:
Original 4D array (shape (2, 2, 3, 2)):
Flattened all but last two dimensions:
New shape: (4, 3, 2)
[[[ 0 1]
[ 2 3]
[ 4 5]]
[[ 6 7]
[ 8 9]
[10 11]]
[[12 13]
[14 15]
[16 17]]
[[18 19]
[20 21]
[22 23]]]
Ensuring Shape Compatibility
The newshape
provided to reshape()
must be compatible with the original array's size (total number of elements). The product of the dimensions in newshape
(with -1
resolved) must equal array.size
. If not, NumPy will raise a ValueError
.
Method 2: Using numpy.vstack()
(for specific vertical stacking scenarios)
numpy.vstack(tup)
stacks arrays in sequence vertically (row-wise). If you pass a 3D array arr
of shape (D, M, N)
to np.vstack(arr)
, it treats each 2D slice arr[i, :, :]
(of shape (M,N)
) as an array to be stacked. The result will be a 2D array of shape (D*M, N)
. This effectively flattens the first two dimensions.
import numpy as np
# array_3d defined as above (shape (2, 3, 4))
array_3d = np.arange(2 * 3 * 4).reshape((2, 3, 4))
# Using np.vstack on a 3D array
# Each (3,4) slice along the first axis is stacked vertically.
flattened_with_vstack = np.vstack(array_3d)
print("Partially flattened array using np.vstack():")
print(flattened_with_vstack)
print(f"New shape: {flattened_with_vstack.shape}")
Output:
Partially flattened array using np.vstack():
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
New shape: (6, 4)
While np.vstack(arr)
achieves a similar result to arr.reshape(-1, arr.shape[-1])
for a 3D array arr
, reshape()
is generally more versatile for arbitrary dimension changes and often more explicit about the intended final shape. vstack
is specifically about vertical stacking.
Choosing the Right Method
-
numpy.reshape()
:- Pros: Most flexible and direct method for changing array dimensions. Allows explicit control over the new shape. The
-1
inference is very convenient. Generally preferred for partial flattening. - Cons: Requires understanding how the elements are reordered (default 'C' order, row-major).
- Pros: Most flexible and direct method for changing array dimensions. Allows explicit control over the new shape. The
-
numpy.vstack()
(when applied to an N-D array, N > 2):- Pros: Can be intuitive if you think of it as "taking each 2D slice from the first dimension and stacking them up."
- Cons: Less general than
reshape()
. It specifically flattens the first dimension into the second (row) dimension, effectivelyarr.reshape(arr.shape[0]*arr.shape[1], arr.shape[2], ...)
ifarr
has more than 3 dimensions. For a 3D array(D,M,N)
, it becomes(D*M, N)
.
For most partial flattening tasks, numpy.reshape()
with careful use of -1
and slicing of array.shape
provides the clearest and most powerful approach.
Conclusion
Partially flattening a NumPy array—reducing its dimensionality by collapsing specific dimensions—is a common and useful operation.
- The
numpy.reshape()
method is the primary tool, offering flexibility through explicit shape definition or automatic dimension inference using-1
. Patterns likearr.reshape(-1, arr.shape[-1])
(flatten all but last dimension) orarr.reshape(-1, *arr.shape[-N:])
(flatten all but last N dimensions) are particularly powerful. numpy.vstack()
can achieve a specific type of partial flattening for 3D+ arrays by vertically stacking its 2D slices along the first dimension.
Always ensure that the total number of elements remains constant during reshaping. By mastering these techniques, you can efficiently transform your multi-dimensional NumPy arrays into the shapes required for your subsequent analyses or algorithm inputs.