Home > Enterprise >  Sum specific columns in dataframe with multi index
Sum specific columns in dataframe with multi index

Time:06-18

I have a dataframe (df) with a multi index

Class   A       B   
Sex    M    F   M   F
Group               
B1    81    34  55  92
B2    38     3  25  11
B3    73    71  69  79
B4    69    23  27  96
B5     5     1  33  28
B6    40    81  91  87

I am trying to create 2 sum columns (one for M and one for F) so my output would look like:

Class   A        B      Total   
Sex    M    F    M   F  Total M Total F
Group                       
B1    81    34  55  92    136       126
B2    38     3  25  11     63        14
B3    73    71  69  79    142       150
B4    69    23  27  96     96       119
B5     5     1  33  28     38        29
B6    40    81  91  87    131       168

I have tried :

df['Total M'] = df['M'].sum(axis=1)
df['Total F'] = df['F'].sum(axis=1)

without success

CodePudding user response:

You can try

df[('Total', 'Total M')] = df.xs('M', level=1, axis=1).sum(axis=1)
df[('Total', 'Total F')] = df.xs('F', level=1, axis=1).sum(axis=1)

# or in a for loop

for col in ['M', 'F']:
    df[('Total', f'Total {col}')] = df.xs(col, level=1, axis=1).sum(axis=1)
print(df)

     A       B       Total
     M   F   M   F Total M Total F
B1  81  34  55  92     136     126
B2  38   3  25  11      63      14
B3  73  71  69  79     142     150
B4  69  23  27  96      96     119
B5   5   1  33  28      38      29
B6  40  81  91  87     131     168

CodePudding user response:

use the loc accessor. Code below

df[('Total', 'Total M')] = df.loc(axis=1)[:, ['M']].sum(axis=1)
df[('Total', 'Total F')]= df.loc(axis=1)[:, ['F']].sum(axis=1)

CodePudding user response:

You can use pandas.DataFrame.sum with inputing level like below:

df[('Total', 'Total M')] = df.sum(level=1, axis=1)['M']
df[('Total', 'Total F')] = df.sum(level=1, axis=1)['F']

CodePudding user response:

Here's an alternative approach using groupby. Probably overkill for only 2 level groups, but should scale well in scenarios where there are more.

totals = df.groupby(axis=1, level=1).sum()
totals.columns = pd.MultiIndex.from_product([['Total'], totals.columns])

df = df.join(totals)

[out]

Class   A       B     Total     
Sex     M   F   M   F     F    M
Group                           
B1     81  34  55  92   126  136
B2     38   3  25  11    14   63
B3     73  71  69  79   150  142
B4     69  23  27  96   119   96
B5      5   1  33  28    29   38
B6     40  81  91  87   168  131
  • Related