Similar to a previous post
I'm looking for how can I groupby with an efficient way based on mapping
I have a df like below:
when value is <30 its low when value is between 30-70 its medium and high when >70
|col1|col2|col3|col1_cat|col2_cat|col3_cat
----------------------------------------------------
0 |29 |80 |56 |low |high |medium
1 |19 |50 |88 |low |medium |high
col1 is mapped to col1_cat and so on how can I groupby sum all columns on the groups at once provided that I have a mapping like below in another dataframe
name | group
-------------
col1 | col1_cat
col2 | col2_cat
col3 | col3_cat
required final output like below:
cat | col1_sum | col2_sum | col3_sum
_______________________________________
high | | 80 | 88
medium | | 50 | 56
low | 48 | |
Tried previous answer to a similar question
CodePudding user response:
I think what you would like to achieve can be accomplished in the folliowing way:
df = pd.DataFrame({
'col1': [10,20,50,20,30,140,50,80,20],
'col2': [40,60,10,70,110,10,340,10, 50]
})
def map_col(x):
if x<30:
return 'low'
elif x<70:
return 'medium'
else:
return 'high'
df['col1_cat'] = df['col1'].map(map_col)
df['col2_cat'] = df['col2'].map(map_col)
and then you groupby twice:
df.groupby(['col1_cat', 'col2_cat']).agg('sum').groupby('col2_cat').agg('sum')
CodePudding user response:
If you have the mapping dataframe in mapping_df
you can try:
mapping_dict = mapping_df.set_index('name')['group'].to_dict()
def fn(x):
rv = {}
for k, v in mapping_dict.items():
rv[(k, x[v])] = rv.get((k, x[v]), 0) x[k]
return pd.Series(rv)
df_out = df.apply(fn, axis=1).sum().unstack(level=0).add_suffix('_sum')
print(df_out)
Prints:
col1_sum col2_sum col3_sum
high NaN 80.0 88.0
low 48.0 NaN NaN
medium NaN 50.0 56.0