I have the following input array:
a = np.array([np.nan, 10, 5, 7, np.nan, np.nan, 1, 2, 3, np.nan])
I want to extract subarrays of consecutive numbers splitting them up whenever there is a nan
value.
res = [[10, 5, 7], [1, 2, 3]]
CodePudding user response:
A solution based on scipy.ndimage.label
:
import scipy.ndimage
def find_valid_subarrays(a):
label, num_features = scipy.ndimage.label(~np.isnan(a))
return [a[label == feature] for feature in range(1, num_features 1)]
What it does:
In [1]: a = np.array([np.nan, 10, 5, 7, np.nan, np.nan, 1, 2, 3, np.nan])
In [2]: find_valid_subarrays(a)
Out[2]: [array([10., 5., 7.]), array([1., 2., 3.])]
I am hoping for something more readable.
CodePudding user response:
A one-line solution that requires no further dependencies would be
res = [[int(a_elem) for a_elem in list(a[ind])] for ind in np.ma.clump_unmasked(np.ma.masked_invalid(a))]
yielding
res
>[[10, 5, 7], [1, 2, 3]]
Note that an explicit type conversion back to int
is performed within the list comprehension. If this is not relevant for your use case, run
[list(a[ind]) for ind in np.ma.clump_unmasked(np.ma.masked_invalid(a))]
instead.
CodePudding user response:
Using a utility I made for finding runs of consecutive values in a mask, haggis.npy_util.mask2runs
combined with np.isnan
, you can do
runs = haggis.npy_util.mask2runs(~np.isnan(a))
result = [a[slice(*x)] for x in runs]
The same effect can be produced with pure numpy, using a simplified version of the mask2runs
function. Something like this:
runs = np.flatnonzero(np.diff(np.r_[False, ~np.isnan(a), False])).reshape(-1, 2)
result = [a[slice(*x)] for x in runs]