Home > other >  why does sum from multiIndex dataframe drop target column?
why does sum from multiIndex dataframe drop target column?

Time:07-06

I have following multi index dataframe:

offset A B C
0      0 0 100
         1 200
         2 300
         3 400
       1 0 10
         1 20
         2 30
         3 40
...

To group A and sum values of C, I execute:

df.droplevel('B').sum(level = [0, 1], axis = 0)

whose exepcted output must be:

offset A C
0      0 1000
0      1 100
...

However, the output is (column C is discarded):

offset A
0      0
       1
1      0
       1
...

Is there something wrong to get expected output (why is column C discarded)?

CodePudding user response:

C is also in your Multiindex, there is no sum happening at all. If you would change your code to df.droplevel('B').sum(level = [0, 1, 2], axis = 0) you would see column C, it is not discarded.

Running your code is returning a warning for me:

FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum()

You want to do it like this for example:

res = df.reset_index(level=-1).groupby(level=[0,1]).sum()
print(res)

             C
offset A      
0      0  1000
       1   100

C is no index at the moment.

  • Related