I have below dataframe
import pandas as pd
df1 = pd.DataFrame({'A' : [1,2,3], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : [1,1,1]})
df2 = pd.DataFrame({'A' : [1 1,2 1,3 1], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : [2,2,2]})
df3 = pd.DataFrame({'A' : [1 3,2 3,3 3], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : [3,3,3]})
df = pd.concat([df1, df2, df3], axis = 0, ignore_index = True)
Now I want to create an aggregated dataframe from df
, based on the distinct values in the column 'D'
. The aggregation will be based on the numerical column 'A'
with below rule :
For matching values of columns 'B'
and 'C'
, final value of column 'A'
will be
0.4 * 1 0.5 * 2 0.5 * 4, 0.4 * 2 0.5 * 3 0.5 * 5, and 0.4 * 3 0.5 * 4 0.5 * 6
Here, the numbers 0.4, 0.5, and 0.5
are fixed and can be treated as weights.
My final dataframe shall look like
>>> pd.DataFrame({'A' : [0.4 * 1 0.5 * 2 0.5 * 4,0.4 * 2 0.5 * 3 0.5 * 5,0.4 * 3 0.5 * 4 0.5 * 6], 'B' : ['X', 'Y', 'Z'], 'C' : ['XX', 'YY', 'ZZ'], 'D' : ['Aggregated', 'Aggregated', 'Aggregated']})
A B C D
0 3.4 X XX Aggregated
1 4.8 Y YY Aggregated
2 6.2 Z ZZ Aggregated
I have many subgroups in my actual dataframe and therefore looking for some automated way to achieve the same.
Is there any method/function available to achieve the same?
CodePudding user response:
To get the desired output you could do something like this:
#lambda also possible but this looks a bit cleaner
def weights(grp):
val1,val2,val3 = grp
return 0.4*val1 0.5*val2 0.5*val3
# the aggregations on B and D are just examples. You can change that to whatever you like
df.groupby('C').agg({'A':weights, 'B':'first', 'D':lambda x: 'aggregated'}).reset_index()
Output:
C A B D
0 XX 3.4 X aggregated
1 YY 4.8 Y aggregated
2 ZZ 6.2 Z aggregated