How to average multiindex row in Pandas-CodePudding

The objective is to average the first level of a multi index row.

For example, the task it to average the rows (s1,s2) and (s1,s3).

Given the following df

          a  fe  gg new_text
(s1, s2)  4   0   3        t
(s1, s3)  3   3   1        t
(s2, s3)  3   2   4        t
(s2, s4)  0   0   4        t
(s3, s1)  2   1   0        t
(s3, s4)  1   1   0        t

The expected output is as below

    a  fe  gg new_text
s1  7   3   4 t      
s2  3   2   8 t      
s3  3   3   0 t

I tried using the following syntax

df.groupby(level=0).agg(['mean'])

Which produced undersired output

            a   fe   gg
         mean mean mean
(s1, s2)  4.0  0.0  3.0
(s1, s3)  3.0  3.0  1.0
(s2, s3)  3.0  2.0  4.0
(s2, s4)  0.0  0.0  4.0
(s3, s1)  2.0  1.0  0.0
(s3, s4)  1.0  1.0  0.0

May I know how to address this problem.

The output can be reproduced using the following codes

import pandas as pd
import numpy as np
np.random.seed(0)

arr=np.random.randint(5, size=(6, 3))

df = pd.DataFrame(data=arr, index=[('s1','s2'),('s1','s3'),('s2','s3'),('s2','s4'),('s3','s1'),('s3','s4')],
                  columns=['a','fe','gg'])
df['new_text']='t'
df2=df.groupby(level=0).agg(['mean'])

CodePudding user response：

I think you mean sum and not mean.

agg_dict = {'a':'sum', 'fe':'sum', 'gg':'sum', 'new_text':'first'}
out = df.groupby([m[0] for m in df.index]).agg(agg_dict)
print(out)

    a  fe  gg new_text
s                     
s1  7   3   4        t
s2  3   2   8        t
s3  3   2   0        t

CodePudding user response：

i Hope that help

a=df.index.values.tolist() #get value index 
l=[]
for i in range(len(df)):
    l.append(str(a[i]).split("'")[1])

df['new_id']=l
df.groupby("new_id")['a','fe','gg'].sum()

CodePudding user response：

You have used tuples as the entries in the index for your DataFrame, but you haven't used a MultiIndex.

If you use a MultiIndex, you can use xs() to select the subset of entries you want to calculate the mean:

df.index = pd.MultiIndex.from_arrays([[c[0] for c in df.index], [c[1] for c in df.index]])

With the index converted to a MultiIndex, the DataFrame now prints like so:

       a  fe  gg
s1 s2  4   1   4
   s3  1   1   1
s2 s3  4   2   3
   s4  2   2   4
s3 s1  1   4   3
   s4  2   3   1

Now we can pick out the rows we want to use for the calculation with xs(), in this case all rows with 's1' in the 0th level of the MultiIndex:

df.xs('s1').mean(axis=0)

Result:

a     2.5
fe    1.0
gg    2.5
dtype: float64

You can also use groupby as you had expected, now that you have a proper MultiIndex:

df.groupby(level=0).mean()

Result:

      a   fe   gg
s1  2.5  1.0  2.5
s2  3.0  2.0  3.5
s3  1.5  3.5  2.0

CodePudding user response：

df.index = pd.MultiIndex.from_tuples(df.index.tolist())
df.groupby(level=0).agg({'a': 'sum', 'fe': 'sum', 'gg': 'sum', 'new_text': 'first'})

produces

    a  fe  gg new_text
s1  7   3   4 t      
s2  3   2   8 t      
s3  3   3   0 t