Is it possible to Pandas vectorize for operation involving a condition of a range of slice data?-CodePudding

In this operation, the array was sliced over a range. Such that, given the array

arr = np.array([.1, .11, .21, .01, .5, .7, .91, .92, .95, .96, .1, .21, .23, .6, .7, .71, .72, .95, 0.96, 0.97])

and a range of values,

Step 1

drange = np.arange(start_, end_)

The slicing was conducted as below

Step 2

select_val = arr[drange]

Then the select_val was check for values larger than a threshold, th.

Step 3

bool_data = select_val<th

Finally, using argmin to return the indices of the minimum values along an axis.

Step 4

doutput = np.argmin(bool_data)

In my case, the variable start_, end_ was stored in a Pandas Dataframe:

df = pd.DataFrame(dict(s=[1, 10], e=[12, 19]))

whereas, the arr is as of Numpy type.

Currently, I employ Pandas' apply to a function which compress all the steps 1-4:

def fx(arr, st, en, th):

    return np.argmin(arr[np.arange(st, en)] < th)

However, is it possible to employ a vectorization approach instead?

The code of the current strategy is as below:

def fx(arr, st, en, th):

    return np.argmin(arr[np.arange(st, en)] < th)

 th = 0.9
 np.random.seed(0)

 arr = np.array([.1, .11, .21, .01, .5, .7, .91, .92, .95,  # 8 select 6 range: 1-12
                  .96, .1, .21, .23, .6, .7, .71, .72, .95, 0.96, 0.97])     # Select 15 range 10-17


 df = pd.DataFrame(dict(s=[1, 10], e=[12, 19]))

 df['opt'] = df.apply(lambda x: fx(arr, x['s'], x['e'], th), axis=1)

CodePudding user response：

NumPy broadcasting

m1 = arr[:, None] > th
ix = np.arange(len(arr))[:, None]
m2 = (ix >= list(df.s)) & (ix < list(df.e))

df['opt'] = np.argmax(m1 & m2, axis=0) - df.s

Result

    s   e  opt
0   1  12    5
1  10  19    7