How to merge semi-duplicated rows in a Dataframe-CodePudding

I have a dataframe that looks like this:

ID	Name	app_A	app_B	app_C	Total
0001	John Smith	0	5	3	8
0002	Peter Piper	2	0	3	5
0003	Susan Jones	1	0	0	7
0003	Susan Jones	0	2	4	7
...	...	...	...	...	...

You can see that there are some Duplicate IDs (and Names), and in these individual rows, the Totals are not correct, but if you were to add the columns together from both rows, then the total would be correct. How can I merge the duplicate rows so that I do not have duplicate IDs, and that the totals are correct.

The desired output is like below:

ID	Name	app_A	app_B	app_C	Total
0001	John Smith	0	5	3	8
0002	Peter Piper	2	0	3	5
0003	Susan Jones	1	2	4	7
...	...	...	...	...	...

CodePudding user response：

Simple..aggregate app like columns with sum and Total with first

c = df.filter(like='app_')
df.groupby(['ID', 'Name']).agg({**dict.fromkeys(c, 'sum'), 'Total': 'first'})

Result

                  app_A  app_B  app_C  Total
ID   Name                                   
0001 John Smith       0      5      3      8
0002 Peter Piper      2      0      3      5
0003 Susan Jones      1      2      4      7