Home > Mobile >  pandas aggregate multiple columns during pivot_table
pandas aggregate multiple columns during pivot_table

Time:09-01

Have a dataframe like this:

df = pd.DataFrame((['1990-01-01','A','S1','2','string2','string3'],
 ['1990-01-01','A','S2','1','string1','string4'],
 ['1990-01-01','A','S3','1','string5','string6']),columns= 
 ["date","type","status","count","s1","s2")


           date type status count       s1       s2
 0  1990-01-01    A     S1     2  string2  string3
 1  1990-01-01    A     S2     1  string1  string4
 2  1990-01-01    A     S3     1  string5  string6
 ...

I want to get bellow result (each date and each type should have single row, and get the min of s1 column, get max of s2 column)

date             type       S1    S2   S3    min_s1        max_s2
1990-01-01       A           2     1   1     string1      string6

I tried to use pivot_table

df.pivot_table(index=['date','type'],columns=['status'],values=['count','s1','s2'], aggfunc={
'count':np.sum, 
's1': np.min,
's2': np.max
})

But this would only get bellow result, which leads to multiple columns and not the final result.

                count             s1                         s2
status             S1 S2 S3       S1       S2       S3       S1       S2       S3
date       type
1990-01-01 A        2  1  1  string2  string1  string5  string3  string4  string6

Anyone idea? Thanks.

CodePudding user response:

Looks like you want to combine a pivot and groupby.agg:

(df.pivot(index=['date','type'],columns='status', values='count')
   .join(df.groupby(['date', 'type']).agg({'s1': 'min', 's2': 'max'}))
   .reset_index()
)

output:

         date type S1 S2 S3       s1       s2
0  1990-01-01    A  2  1  1  string1  string6
  • Related