get the named index in nlargest operation in pandas-CodePudding

Given the following df:

   word1    word2  distance
   mango      ola        25
   mango  johnkoo        33
   apple      ola        25
   apple  johnkoo         0

I find the two largest values of distance per group in the following way:

res = df.groupby(['word1'])['distance'].nlargest(2)
print(res)

word1   
apple  2    25
       3     0
mango  1    33
       0    25

This is a pandas series with a multindex that contains the index of the position of word2, I would like to have word2 value instead of index , like

word1   
apple  ola    25
       johnkoo    0
mango  johnkoo   33
       ola    25

print(res,index) gives:

MultiIndex([('apple', 2),
            ('apple', 3),
            ('mango', 1),
            ('mango', 0)],
           names=['word1', None])

I have tried using set_levels, but could not figure out the solution.

CodePudding user response：

You can try with

(df.sort_values('distance',ascending=False)
   .groupby('word1').head(2).set_index(['word1','word2'])['distance'])
Out[166]: 
word1  word2  
mango  johnkoo    33
       ola        25
apple  ola        25
       johnkoo     0
Name: distance, dtype: int64

CodePudding user response：

Use the positions in the second level from res to get your desired output:

out = df.loc[res.index.get_level_values(-1)]
out 
   word1    word2  distance
2  apple      ola        25
3  apple  johnkoo         0
1  mango  johnkoo        33
0  mango      ola        25

You can set the index if you prefer:

out.set_index(['word1', 'word2'])
               distance
word1 word2            
apple ola            25
      johnkoo         0
mango johnkoo        33
      ola            25

out.set_index(['word1', 'word2']).distance 
word1  word2  
apple  ola        25
       johnkoo     0
mango  johnkoo    33
       ola        25
Name: distance, dtype: int64