Home > Software design >  Find top 3 highest values across 3 columns row-wise pandas
Find top 3 highest values across 3 columns row-wise pandas

Time:04-29

I have four columns of values consisting of the looking times to a right/left/upper/lower positioned image. Another column shows the position of the image which was chosen (decision_resp). I created a new column showing the looking time of the chosen image. Now I want to create 3 more columns showing the looking times of the not chosen images sorted by highest looking time (top1), second highest looking time (toop2) and third highest looking time (top 3). The looking time of the chosen image has to be excluded.

These are the columns I have:

 lookRight_t  lookLeft_t  lookUp_t  lookDown_t decision_resp  chosen_img_et
0     1.291667    1.325000  3.025000    1.141667            up       3.025000
1     0.000000    0.000000  1.125000    3.150000          down       3.150000
2     0.000000    0.000000  3.508333    2.275000            up       3.508333
3     3.700000    1.950000  0.000000    0.000000         right       3.700000
4     2.633333    1.316667  1.341667    0.000000         right       2.633333
5     1.766667    1.333333  0.825000    2.208333          down       2.208333
6     0.000000    0.000000  1.108333    5.283333          down       5.283333 

My approach was:

# create new column for looking time of chosen image
trials.loc[trials['decision_resp']=='right','chosen_img_et'] = trials['lookRight_t']
trials.loc[trials['decision_resp']=='left','chosen_img_et'] = trials['lookLeft_t']
trials.loc[trials['decision_resp']=='down','chosen_img_et'] = trials['lookDown_t']
trials.loc[trials['decision_resp']=='up','chosen_img_et'] = trials['lookUp_t']

# here I got stuck        
trials.loc[trials['decision_resp']=='right', 3 new columns (top1/2/3)] = trials[['lookLeft_t', 'lookDown_t', 'lookUp_t']].find max values and put it in order
trials.loc[trials['decision_resp']=='left', 3 new columns (top1/2/3)] = trials[['lookRight_t', 'lookDown_t', 'lookUp_t']].find max values and put it in order
trials.loc[trials['decision_resp']=='down', 3 new columns (top1/2/3)] = trials[['lookLeft_t', 'lookRight_t', 'lookUp_t']].find max values and put it in order
trials.loc[trials['decision_resp']=='up', 3 new columns (top1/2/3)] = trials[['lookLeft_t', 'lookDown_t', 'lookRight_t']].find max values and put it in order

Thank you for any help!

CodePudding user response:

First you can use Looking up solution for new column chosen_img_et

You can sorting selected columns by numpy.sort and then use indexing for top3 rows without selected value, so is compared columns names by column decision_resp with broadcasting and set missing values by mask by DataFrame.mask:

cols = ['lookRight_t','lookLeft_t','lookUp_t','lookDown_t']
#replace substrings for match column decision_resp
look = [x.replace('look','').replace('_t','').lower() for x in cols]

new = [f'top{x 1}' for x in range(3)]

#lookup
idx1, cols1 = pd.factorize(trials['decision_resp'])
trials['chosen_img_et'] = (trials[cols].set_axis(look, axis=1)
                                       .reindex(cols1, axis=1)
                                       .to_numpy()[np.arange(len(trials)), idx1])

mask = np.array(look) == trials['decision_resp'].to_numpy()[:, None]

#np.sort sorting by default descending, 
#so for ascending order use -1 in indexing and 2 is for remove first only NaN column
trials[new] = np.sort(trials[cols].mask(mask), axis=1)[:, 2::-1]

print (trials)
   lookRight_t  lookLeft_t  lookUp_t  lookDown_t decision_resp  chosen_img_et  \
0     1.291667    1.325000  3.025000    1.141667            up       3.025000   
1     0.000000    0.000000  1.125000    3.150000          down       3.150000   
2     0.000000    0.000000  3.508333    2.275000            up       3.508333   
3     3.700000    1.950000  0.000000    0.000000         right       3.700000   
4     2.633333    1.316667  1.341667    0.000000         right       2.633333   
5     1.766667    1.333333  0.825000    2.208333          down       2.208333   
6     0.000000    0.000000  1.108333    5.283333          down       5.283333   

       top1      top2      top3  
0  1.325000  1.291667  1.141667  
1  1.125000  0.000000  0.000000  
2  2.275000  0.000000  0.000000  
3  1.950000  0.000000  0.000000  
4  1.341667  1.316667  0.000000  
5  1.766667  1.333333  0.825000  
6  1.108333  0.000000  0.000000  

Details:

print (trials[cols].mask(mask))
   lookRight_t  lookLeft_t  lookUp_t  lookDown_t
0     1.291667    1.325000       NaN    1.141667
1     0.000000    0.000000  1.125000         NaN
2     0.000000    0.000000       NaN    2.275000
3          NaN    1.950000  0.000000    0.000000
4          NaN    1.316667  1.341667    0.000000
5     1.766667    1.333333  0.825000         NaN
6     0.000000    0.000000  1.108333         NaN

CodePudding user response:

  • Find the max column name and extract the direction from it.
  • Find the max for each row.
  • Find the remaining 3 values and sort them as desired.
  • Concat the results together.
    • Renameing columns as I go.
decision_resp = df.idxmax(axis=1).str.extract('look(\w*)_t', expand=False)
decision_resp.rename('decision_resp', inplace=True)

chosen_img_et = df.max(axis=1, numeric_only=True)
chosen_img_et.rename('chosen_img_et', inplace=True)

top3 = df.apply(lambda x: x.nlargest(4).sort_values(ascending=False, ignore_index=True)[1:], axis=1)
top3.columns = ['top1', 'top2', 'top3']

df = pd.concat([df, decision_resp, chosen_img_et, top3], axis=1)
print(df)

Output:

   lookRight_t  lookLeft_t  lookUp_t  lookDown_t decision_resp  chosen_img_et  \
0     1.291667    1.325000  3.025000    1.141667            Up       3.025000   
1     0.000000    0.000000  1.125000    3.150000          Down       3.150000   
2     0.000000    0.000000  3.508333    2.275000            Up       3.508333   
3     3.700000    1.950000  0.000000    0.000000         Right       3.700000   
4     2.633333    1.316667  1.341667    0.000000         Right       2.633333   
5     1.766667    1.333333  0.825000    2.208333          Down       2.208333   
6     0.000000    0.000000  1.108333    5.283333          Down       5.283333   

       top1      top2      top3  
0  1.325000  1.291667  1.141667  
1  1.125000  0.000000  0.000000  
2  2.275000  0.000000  0.000000  
3  1.950000  0.000000  0.000000  
4  1.341667  1.316667  0.000000  
5  1.766667  1.333333  0.825000  
6  1.108333  0.000000  0.000000 

Other way, addressing jezrael's concerns:

col_list = ['lookRight_t', 'lookLeft_t', 'lookUp_t', 'lookDown_t']
idx, cols = pd.factorize('look'   df['decision_resp'].str.title()   '_t')
df['chosen_img_et'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
mask = np.array(col_list) == df['decision_resp'].to_numpy()[:, None]
df[[f'top{x 1}' for x in range(3)]] = np.sort(df[col_list].mask(mask), axis=1)[:, 2::-1]

Output:

   lookRight_t  lookLeft_t  lookUp_t  lookDown_t decision_resp  chosen_img_et  \
0     1.291667    1.325000  3.025000    1.141667            up       3.025000   
1     0.000000    0.000000  1.125000    3.150000          down       3.150000   
2     0.000000    0.000000  3.508333    2.275000            up       3.508333   
3     3.700000    1.950000  0.000000    0.000000         right       3.700000   
4     2.633333    1.316667  1.341667    0.000000         right       2.633333   
5     1.766667    1.333333  0.825000    2.208333          down       2.208333   
6     0.000000    0.000000  1.108333    5.283333          down       5.283333   

       top1      top2      top3  
0  1.325000  1.291667  1.141667  
1  1.125000  0.000000  0.000000  
2  2.275000  0.000000  0.000000  
3  1.950000  0.000000  0.000000  
4  1.341667  1.316667  0.000000  
5  1.766667  1.333333  0.825000  
6  1.108333  0.000000  0.000000  
  • Related