Home > Blockchain >  Pandas SUM value by Index
Pandas SUM value by Index

Time:08-06

How to sum all values from rows by string condition?

Original DF:

schema numb
tbc_hh 1751732
tbc_hh 1084173
tbc_rr 1751732
tbc_rr 1084173
tbc_zz 1751732
tbc_zz 1084173

DF expect to get:

schema numb
tbc_hh 1751732
tbc_hh 1084173
tbc_hh_total 2835905
tbc_rr 1751733
tbc_rr 1084174
tbc_rr_sum 2835907
tbc_zz 1751734
tbc_zz 1084175
tbc_zz_sum 2835909

CodePudding user response:

You can do:

    df.merge(
        df.groupby('schema',as_index=False).sum().assign(
            schema=df['schema'].map('{}_total'.format)),
        how = 'outer'
    ).sort_values('schema')

print(df_out):

         schema     numb
0        tbc_hh  1751732
1        tbc_hh  1084173
6  tbc_hh_total  2835905
2        tbc_rr  1751732
3        tbc_rr  1084173
7  tbc_rr_total  2835905
4        tbc_zz  1751732
5        tbc_zz  1084173
8  tbc_zz_total  2835905

CodePudding user response:

You can try

out = (df.groupby('schema', as_index=False)
       .apply(lambda g: pd.concat([g,
                                   pd.DataFrame([[g.name '_total', g['numb'].sum()]],
                                                columns=g.columns)]))
       .reset_index(drop=True))
print(out)

         schema     numb
0        tbc_hh  1751732
1        tbc_hh  1084173
2  tbc_hh_total  2835905
3        tbc_rr  1751732
4        tbc_rr  1084173
5  tbc_rr_total  2835905
6        tbc_zz  1751732
7        tbc_zz  1084173
8  tbc_zz_total  2835905
  • Related