Say I have a np.array, e.g. a = np.array([np.nan, 2., 3., 4., 5., np.nan, np.nan, np.nan, 8., 9., 10., np.nan, 14., np.nan, 16.])
. I want to obtain all sub-arrays with no np.nan value, i.e. my desired output is:
sub_arrays_list = [array([2., 3., 4., 5.]), array([8., 9., 10.]), array([14.]), array([16.])]
I kind of managed to solve this with the following but it is quite inefficient:
sub_arrays_list = []
start, end = 0, 0
while end < len(a) - 1:
if np.isnan(a[end]).any():
end = 1
start = end
else:
while not np.isnan(a[end]).any():
if end < len(a) - 1:
end = 1
else:
sub_arrays_list.append(a[start:])
break
else:
sub_arrays_list.append(a[start:end])
start = end
Would anyone please suggest a faster and better alternative to achieve this? Many thanks!
CodePudding user response:
You can use:
# identify NaN values
m = np.isnan(a)
# array([ True, False, False, False, False, True, True, True, False,
# False, False, True, False, True, False])
# compute groups
idx = np.cumsum(m)
# array([1, 1, 1, 1, 1, 2, 3, 4, 4, 4, 4, 5, 5, 6, 6])
# remove NaNs, get indices of first non-NaN per group and split
out = np.split(a[~m], np.unique(idx[~m], return_index=True)[1][1:])
output:
[array([2., 3., 4., 5.]), array([ 8., 9., 10.]), array([14.]), array([16.])]