Home > OS >  Pandas simplify adding aggregate columns (average, sum, count, max) in one step like with groupby.ag
Pandas simplify adding aggregate columns (average, sum, count, max) in one step like with groupby.ag

Time:10-13

I have some data similar to:

df = pd.DataFrame({'ID':[1, 2, 3, 4, 5],
                   '1' :[7, 8, 2, 3, 0],
                   '2' :[1, 0, 4, 9, 9],
                   '3' :[3, 1, 1, 6, 8]})

   ID  1  2  3
0   1  7  1  3
1   2  8  0  1
2   3  2  4  1
3   4  3  9  6
4   5  0  9  8

and I want to add an Av, Sum, Count and Max column for each row for the values in columns 1, 2 and 3. Currently I am doing this like this:

cols = ['1', '2', '3']
df['Av'] = df[cols].mean(axis=1)
df['Sum'] = df[cols].sum(axis=1).fillna(0)
df['Count'] = df[cols].count(axis=1).fillna(0)
df['Max'] = df[cols].max(axis=1).fillna(0)

   ID  1  2  3        Av  Sum  Count  Max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1  3.000000    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6  6.000000   18      3    9
4   5  0  9  8  5.666667   17      3    9

I was wondering if there is a way to do this neater/in 1 or 2 lines like with groupby().agg()?

CodePudding user response:

Use DataFrame.agg with list of functions:

df = df.join(df[cols].agg(['mean','sum','count','max'], axis=1).fillna(0).convert_dtypes())
print (df)
   ID  1  2  3      mean  sum  count  max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1       3.0    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6       6.0   18      3    9
4   5  0  9  8  5.666667   17      3    9

With rename columns:

d = {'mean':'Av', 'sum':'Sum','count':'Count', 'max':'Max'}
df = (df.join(df[cols].agg(list(d.keys()), axis=1).fillna(0)
                                  .convert_dtypes().rename(columns=d)))
print (df)
   ID  1  2  3        Av  Sum  Count  Max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1       3.0    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6       6.0   18      3    9
4   5  0  9  8  5.666667   17      3    9

Another idea:

d = {'mean':'Av', 'sum':'Sum','count':'Count', 'max':'Max'}
df = df.join(pd.concat({v: df[cols].agg(k, axis=1).fillna(0) 
                                      for k, v in d.items()}, axis=1))
print (df)
   ID  1  2  3        Av  Sum  Count  Max
0   1  7  1  3  3.666667   11      3    7
1   2  8  0  1  3.000000    9      3    8
2   3  2  4  1  2.333333    7      3    4
3   4  3  9  6  6.000000   18      3    9
4   5  0  9  8  5.666667   17      3    9
  • Related