How to find the index of the closest valid value, given two boolean arrays?-CodePudding

Basically, I have a problem in which I have two arrays of length L: a data array (let's call it D), representing my actual data, and a validity array (called here V), with boolean values, saying which of these values are valid.

For instance, imagine I have:

D = [10, 20, 40, 1000, 2000, -1000, 50, 20, 1000]
V = [1, 1, 1, 0, 0, 0, 1, 1, 0]

In this case, my V array indicates that values on indexes 3, 4, 5 and 8 are invalid.

For each of these indexes, I want to replace the corresponding data values D[i] with the closest valid data. So, my index finding function would give: f(V) = [0, 1, 2, 2, 2, 6, 6, 7, 7]
(or f(V) = [0, 1, 2, 2, 6, 6, 6, 7, 7], doesn't really matter)

In this case, I could correct my D array with:

D[i] = D[f(V)]

And get:

D = [10, 20, 40, 40, 40, 50, 50, 20, 20]

Is something like this implemented in Python? If not, how could I implement this easily?

CodePudding user response：

You can use pandas and interpolate:

df = pd.DataFrame({'D': D, 'V': V})
D2 = (df['D']
     .mask(df['V'].eq(0))
     .interpolate(method='nearest')
     .ffill(downcast='infer')
     .tolist()
     )

output: [10, 20, 40, 40, 40, 50, 50, 20, 20]

CodePudding user response：

If you are willing to use numpy, that can be done pretty concisely (although this involves a O(m*k) implicit loop, with m and k the number of valid and invalid indices):

import numpy as np

d = np.array(D)
v = np.array(V, dtype=bool)
diff = np.abs(d[~v, None] - d[None, v])
out = np.arange(len(d))
out[~v] = np.nonzero(v)[0][np.argmin(diff, axis=1)]

>>> out
array([0, 1, 2, 6, 6, 0, 6, 7, 6])