How to efficiently update pandas row if computation involving lookup another array value-CodePudding

The objective is to update the df rows, by considering element in the df and and reference value from external np array.

Currently, I had to use a for loop to update each row, as below.

However, I wonder whether this can be takcle using any pandas built-in module.

import pandas as pd
import numpy as np

arr=np.array([1,2,5,100,3,6,8,3,99,12,5,6,8,11,14,11,100,1,3])
arr=arr.reshape((1,-1))
df=pd.DataFrame(zip([1,7,13],[4,11,17],['a','g','t']),columns=['start','end','o'])




for n in range (len(df)):
    a=df.loc[n]
    drange=list(range(a['start'],a['end'] 1))
    darr=arr[0,drange]
    r=np.where(darr==np.amax(darr))[0].item()
    df.loc[n,'pos_peak']=drange[r]

Expected output

   start  end  o  pos_peak
0      1    4  a       3.0
1      7   11  g       8.0
2     13   17  t      16.0

CodePudding user response：

My approach would be to use pandas apply() function with which you can apply a function to each row of your dataframe. In order to find the index of the maximum element, I used the numpy function argmax() onto the relevant part of arr. Here is the code:

import pandas as pd
import numpy as np

arr=np.array([1,2,5,100,3,6,8,3,99,12,5,6,8,11,14,11,100,1,3])
arr=arr.reshape((1,-1))
df=pd.DataFrame(zip([1,7,13],[4,11,17],['a','g','t']),columns=['start','end','o'])

df['pos_peak'] = df.apply(lambda x: x['start']   np.argmax(arr[0][x['start']:x['end'] 1]), axis=1)

df

Output:

    start   end o   pos_peak
0   1   4   a   3
1   7   11  g   8
2   13  17  t   16