Home > other >  Changing multiindex in a pandas series?
Changing multiindex in a pandas series?

Time:03-04

I have a dataframe like this:

mainid  pidl       pidw   score
  0     Austria    1      533
  1     Canada     2      754
  2     Canada     3      267
  3     Austria    4      852
  4     Taiwan     5      124
  5     Slovakia   6      344
  6     Spain      7      1556
  7     Taiwan     8      127

I want to select top 5 pidw for each pidl. When I have grouped by on column 'pidl' and then sorted the score in descending order in each group , i got the following series, s..

s= df.set_index(['pidl', 'pidw']).groupby('pidl')['score'].nlargest(5)

pidl      pidl      pidw     score
Austria   Austria   49       948
                    47       859
                    48       855
                    50       807
                    46       727
Belgium   Belgium   15      2339
                    14      1861
                    45      1692
                    16      1626
                    46      1423

Name: score, dtype: float64

The result looks correct, but I wish I could remove a second 'pidl' from this series.

I have tried

s.reset_index('pidl') 

to get 'ValueError: The name location occurs multiple times, use a level number'.

and

s.to_frame().reset_index()

ValueError: cannot insert pidl, already exists.

so I am not sure how to proceed about it.

CodePudding user response:

Use group_keys=False parameter in DataFrame.groupby:

s= df.set_index(['pidl', 'pidw']).groupby('pidl', group_keys=False)['score'].nlargest(5)
print (s)
pidl      pidw
Austria   4        852
          1        533
Canada    2        754
          3        267
Slovakia  6        344
Spain     7       1556
Taiwan    8        127
          5        124
Name: score, dtype: int64

Or add Series.droplevel for remove first level (pandas count from 0, so used 0):

s= df.set_index(['pidl', 'pidw']).groupby('pidl')['score'].nlargest(5).droplevel(0)
  • Related