How to sum all values from rows by string condition?
Original DF:
schema | numb |
---|---|
tbc_hh | 1751732 |
tbc_hh | 1084173 |
tbc_rr | 1751732 |
tbc_rr | 1084173 |
tbc_zz | 1751732 |
tbc_zz | 1084173 |
DF expect to get:
schema | numb |
---|---|
tbc_hh | 1751732 |
tbc_hh | 1084173 |
tbc_hh_total | 2835905 |
tbc_rr | 1751733 |
tbc_rr | 1084174 |
tbc_rr_sum | 2835907 |
tbc_zz | 1751734 |
tbc_zz | 1084175 |
tbc_zz_sum | 2835909 |
CodePudding user response:
You can do:
df.merge(
df.groupby('schema',as_index=False).sum().assign(
schema=df['schema'].map('{}_total'.format)),
how = 'outer'
).sort_values('schema')
print(df_out):
schema numb
0 tbc_hh 1751732
1 tbc_hh 1084173
6 tbc_hh_total 2835905
2 tbc_rr 1751732
3 tbc_rr 1084173
7 tbc_rr_total 2835905
4 tbc_zz 1751732
5 tbc_zz 1084173
8 tbc_zz_total 2835905
CodePudding user response:
You can try
out = (df.groupby('schema', as_index=False)
.apply(lambda g: pd.concat([g,
pd.DataFrame([[g.name '_total', g['numb'].sum()]],
columns=g.columns)]))
.reset_index(drop=True))
print(out)
schema numb
0 tbc_hh 1751732
1 tbc_hh 1084173
2 tbc_hh_total 2835905
3 tbc_rr 1751732
4 tbc_rr 1084173
5 tbc_rr_total 2835905
6 tbc_zz 1751732
7 tbc_zz 1084173
8 tbc_zz_total 2835905