Home > Back-end >  Is it possible to Pandas vectorize for operation involving condition of range of slice data?
Is it possible to Pandas vectorize for operation involving condition of range of slice data?

Time:05-02

In this operation, the array was sliced over a range. Such that, given the array

arr=np.array([.1,.11,.21,.01,.5,.7,.91,.92,.95,.96,.1,.21,.23,.6,.7,.71,.72,.95,0.96,0.97])  

and a range of value

STEP 1

drange=np.arange(start_,end_)

The slicing was conducted as below

STEP 2

select_val=arr[drange]

Then the select_val was check for value larger than a threshold th.

STEP 3

bool_data=select_val<th

Finally,used argmin to returns the indices of the minimum values along an axis

STEP 4

doutput = np.argmin(bool_data)

In my case, the variable start_, end_ was stored in a Pandas Dataframe

df=pd.DataFrame(dict(s=[1,10],e=[12,19]))

whereas, the arr is as of Numpy type.

Currently, I employ pandas apply to a function which compress all the STEP 1-4

def fx(arr,st,en,th):

    return np.argmin(arr[np.arange(st,en)]<th)

However, I wonder whether if it is possible employ Vectorization approach instead?

The code of the current strategy is as below

def fx(arr,st,en,th):

    return np.argmin(arr[np.arange(st,en)]<th)

 th=0.9
 np.random.seed(0)

 arr=np.array([.1,.11,.21,.01,.5,.7,.91,.92,.95,  # 8 select 6 range:1-12
                  .96,.1,.21,.23,.6,.7,.71,.72,.95,0.96,0.97])     # select 15 range 10-17


 df=pd.DataFrame(dict(s=[1,10],e=[12,19]))

 df['opt']=df.apply(lambda x: fx(arr,x['s'],x['e'],th),axis=1)

Remark: This question was originally post at Code Review, but flag for migration

CodePudding user response:

Numpy broadcasting

m1 = arr[:, None] > th
ix = np.arange(len(arr))[:, None]
m2 = (ix >= list(df.s)) & (ix < list(df.e))

df['opt'] = np.argmax(m1 & m2, axis=0) - df.s

Result

    s   e  opt
0   1  12    5
1  10  19    7
  • Related