Home > Enterprise >  Custom aggregation of pandas dataframe
Custom aggregation of pandas dataframe

Time:08-21

I have below dataframe

import pandas as pd
df1 = pd.DataFrame({'A' : [1,2,3], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : [1,1,1]})
df2 = pd.DataFrame({'A' : [1 1,2 1,3 1], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : [2,2,2]})
df3 = pd.DataFrame({'A' : [1 3,2 3,3 3], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : [3,3,3]})
df = pd.concat([df1, df2, df3], axis = 0, ignore_index = True)

Now I want to create an aggregated dataframe from df, based on the distinct values in the column 'D'. The aggregation will be based on the numerical column 'A' with below rule :

For matching values of columns 'B' and 'C', final value of column 'A' will be

0.4 * 1   0.5 * 2   0.5 * 4, 0.4 * 2   0.5 * 3   0.5 * 5, and 0.4 * 3   0.5 * 4   0.5 * 6

Here, the numbers 0.4, 0.5, and 0.5 are fixed and can be treated as weights.

My final dataframe shall look like

>>> pd.DataFrame({'A' : [0.4 * 1   0.5 * 2   0.5 * 4,0.4 * 2   0.5 * 3   0.5 * 5,0.4 * 3   0.5 * 4   0.5 * 6], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : ['Aggregated', 'Aggregated', 'Aggregated']})
     A  B   C           D
0  3.4  X  XX  Aggregated
1  4.8  Y  YY  Aggregated
2  6.2  Z  ZZ  Aggregated

I have many subgroups in my actual dataframe and therefore looking for some automated way to achieve the same.

Is there any method/function available to achieve the same?

CodePudding user response:

To get the desired output you could do something like this:

#lambda also possible but this looks a bit cleaner
def weights(grp):
    val1,val2,val3 = grp
    return 0.4*val1   0.5*val2   0.5*val3

# the aggregations on B and D are just examples. You can change that to whatever you like
df.groupby('C').agg({'A':weights, 'B':'first', 'D':lambda x: 'aggregated'}).reset_index()

Output:

    C    A  B           D
0  XX  3.4  X  aggregated
1  YY  4.8  Y  aggregated
2  ZZ  6.2  Z  aggregated
  • Related