Get two neighboring non-nan values in numpy array-CodePudding

Let's say I have a numpy array

my_array = [0.2, 0.3, nan, nan, nan, 0.1, nan, 0.5, nan]

For each nan value, I want to extract the two non-nan values to the left and right of that point (or single value if appropriate). So I would like my output to be something like

output = [[0.3,0.1], [0.3,0.1], [0.3,0.1], [0.1,0.5], [0.5]]

I was thinking of looping through all the values in my_array, then finding those that are nan, but I'm not sure how to do the next part of finding the nearest non-nan values.

CodePudding user response：

Using pandas and numpy:

s = pd.Series([0.2, 0.3, nan, nan, nan, 0.1, nan, 0.5, nan])
m = s.isna()
a = np.vstack((s.ffill()[m], s.bfill()[m]))
out = a[:,~np.isnan(a).any(0)].T.tolist()

Output:

[[0.3, 0.1], [0.3, 0.1], [0.3, 0.1], [0.1, 0.5]]

NB. You can choose to keep or drop the lists containing NaNs.

With NaNs:

out = a.T.tolist()

[[0.3, 0.1], [0.3, 0.1], [0.3, 0.1], [0.1, 0.5], [0.5, nan]]

alternative to handle the single elements:

s = pd.Series([0.2, 0.3, nan, nan, nan, 0.1, nan, 0.5, nan])
m = s.isna()

(pd
 .concat((s.ffill()[m], s.bfill()[m]), axis=1)
 .stack()
 .groupby(level=0).agg(list)
 .to_list()
 )

Output:

[[0.3, 0.1], [0.3, 0.1], [0.3, 0.1], [0.1, 0.5], [0.5]]

CodePudding user response：

Less elegant than @mozway's answer, but the last list only has one element:

pd.DataFrame({
    'left':arr.ffill(), 
    'right': arr.bfill()
}).loc[arr.isna()].apply(lambda row: row.dropna().to_list(), axis=1).to_list()