I have an array of x and y values (same length)
x = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8])
y = np.array([3, 4, 2, 6, 2, 3, 2, 10, 6, 4, 2, 3, 1, 8, 3, 1, 6, 4])
I have a separate dataframe
df = pd.DataFrame({'Time': [0.3, 1.1], 'Duration': [0.2, 0.4]})
I want to zero the values of y
where corresponding indexes of x
fall between df['Time'][i] <= x < df['Time'][i] df['Duration'][i]
(for any i
) yielding the following:
y_out = np.array([3, 4, 0, 0, 2, 3, 2, 10, 6, 4, 0, 0, 0, 0, 3, 1, 6, 4])
Note: I have to do this on millions of points, so it has to be relatively fast...
CodePudding user response:
You can use np.greater_equal
's outer
function to make this vectorized.
mask = (np.greater_equal.outer(x, df['Time'].to_numpy())
& np.less.outer(x, (df['Time'] df['Duration']).to_numpy())).any(1)
Then simply
y[mask] = 0
Using the outer product means that you will, in a vectorized way, compare all values of your array x
with all values of your rows in df
. This is fast, but costly in terms of memory.
Consider partitioning the processing in chunks, in case the whole operation doesn't fit in memory.
CodePudding user response:
I would use logical operations np.multiply
and then map like this:
np.multiply(y, ((x < record['Time']) | (x > record['Time'] record['Duration'])))
here is a working example: https://abstra.show/4qgrdKVzLP
reference: