Home > Blockchain >  groupby and sum on specific colums based on mapping
groupby and sum on specific colums based on mapping

Time:01-14

Similar to a previous post

I'm looking for how can I groupby with an efficient way based on mapping

I have a df like below:

when value is <30 its low when value is between 30-70 its medium and high when >70

   |col1|col2|col3|col1_cat|col2_cat|col3_cat
----------------------------------------------------
0  |29  |80  |56  |low     |high    |medium
1  |19  |50  |88  |low     |medium  |high

col1 is mapped to col1_cat and so on how can I groupby sum all columns on the groups at once provided that I have a mapping like below in another dataframe

name | group
-------------
col1 | col1_cat
col2 | col2_cat
col3 | col3_cat

required final output like below:

cat     | col1_sum | col2_sum | col3_sum
_______________________________________
high    |          |  80      | 88
medium  |          |  50      | 56
low     | 48       |          | 

Tried previous answer to a similar question

CodePudding user response:

I think what you would like to achieve can be accomplished in the folliowing way:

df = pd.DataFrame({
    'col1': [10,20,50,20,30,140,50,80,20],
    'col2': [40,60,10,70,110,10,340,10, 50]
})

def map_col(x):
    if x<30:
        return 'low'
    elif x<70:
        return 'medium'
    else:
        return 'high'

df['col1_cat'] = df['col1'].map(map_col)
df['col2_cat'] = df['col2'].map(map_col)

and then you groupby twice:

df.groupby(['col1_cat', 'col2_cat']).agg('sum').groupby('col2_cat').agg('sum')

CodePudding user response:

If you have the mapping dataframe in mapping_df you can try:

mapping_dict = mapping_df.set_index('name')['group'].to_dict()

def fn(x):
    rv = {}
    for k, v in mapping_dict.items():
        rv[(k, x[v])] =  rv.get((k, x[v]), 0)   x[k]
    return pd.Series(rv)


df_out = df.apply(fn, axis=1).sum().unstack(level=0).add_suffix('_sum')
print(df_out)

Prints:

        col1_sum  col2_sum  col3_sum
high         NaN      80.0      88.0
low         48.0       NaN       NaN
medium       NaN      50.0      56.0
  • Related