Given the following df:
word1 word2 distance
mango ola 25
mango johnkoo 33
apple ola 25
apple johnkoo 0
I find the two largest values of distance per group in the following way:
res = df.groupby(['word1'])['distance'].nlargest(2)
print(res)
word1
apple 2 25
3 0
mango 1 33
0 25
This is a pandas series with a multindex that contains the index of the position of word2, I would like to have word2 value instead of index , like
word1
apple ola 25
johnkoo 0
mango johnkoo 33
ola 25
print(res,index) gives:
MultiIndex([('apple', 2),
('apple', 3),
('mango', 1),
('mango', 0)],
names=['word1', None])
I have tried using set_levels, but could not figure out the solution.
CodePudding user response:
You can try with
(df.sort_values('distance',ascending=False)
.groupby('word1').head(2).set_index(['word1','word2'])['distance'])
Out[166]:
word1 word2
mango johnkoo 33
ola 25
apple ola 25
johnkoo 0
Name: distance, dtype: int64
CodePudding user response:
Use the positions in the second level from res
to get your desired output:
out = df.loc[res.index.get_level_values(-1)]
out
word1 word2 distance
2 apple ola 25
3 apple johnkoo 0
1 mango johnkoo 33
0 mango ola 25
You can set the index if you prefer:
out.set_index(['word1', 'word2'])
distance
word1 word2
apple ola 25
johnkoo 0
mango johnkoo 33
ola 25
out.set_index(['word1', 'word2']).distance
word1 word2
apple ola 25
johnkoo 0
mango johnkoo 33
ola 25
Name: distance, dtype: int64