Extract sub-arrays of consecutive numbers that meet condition-CodePudding

I have the following input array:

a = np.array([np.nan, 10, 5, 7, np.nan, np.nan, 1, 2, 3, np.nan])

I want to extract subarrays of consecutive numbers splitting them up whenever there is a nan value.

res = [[10, 5, 7], [1, 2, 3]]

CodePudding user response：

A solution based on scipy.ndimage.label:

import scipy.ndimage

def find_valid_subarrays(a):
    label, num_features = scipy.ndimage.label(~np.isnan(a))
    return [a[label == feature] for feature in range(1, num_features 1)]

What it does:

In [1]: a = np.array([np.nan, 10,  5,  7, np.nan, np.nan,  1,  2,  3, np.nan])

In [2]: find_valid_subarrays(a)

Out[2]: [array([10.,  5.,  7.]), array([1., 2., 3.])]

I am hoping for something more readable.

CodePudding user response：

A one-line solution that requires no further dependencies would be

res = [[int(a_elem) for a_elem in list(a[ind])] for ind in np.ma.clump_unmasked(np.ma.masked_invalid(a))]

yielding

res
>[[10, 5, 7], [1, 2, 3]]

Note that an explicit type conversion back to int is performed within the list comprehension. If this is not relevant for your use case, run

[list(a[ind]) for ind in np.ma.clump_unmasked(np.ma.masked_invalid(a))]

instead.

CodePudding user response：

Using a utility I made for finding runs of consecutive values in a mask, haggis.npy_util.mask2runs combined with np.isnan, you can do

runs = haggis.npy_util.mask2runs(~np.isnan(a))
result = [a[slice(*x)] for x in runs]

The same effect can be produced with pure numpy, using a simplified version of the mask2runs function. Something like this:

runs = np.flatnonzero(np.diff(np.r_[False, ~np.isnan(a), False])).reshape(-1, 2)
result = [a[slice(*x)] for x in runs]