Home > OS >  How to merge semi-duplicated rows in a Dataframe
How to merge semi-duplicated rows in a Dataframe

Time:09-16

I have a dataframe that looks like this:

ID Name app_A app_B app_C Total
0001 John Smith 0 5 3 8
0002 Peter Piper 2 0 3 5
0003 Susan Jones 1 0 0 7
0003 Susan Jones 0 2 4 7
... ... ... ... ... ...

You can see that there are some Duplicate IDs (and Names), and in these individual rows, the Totals are not correct, but if you were to add the columns together from both rows, then the total would be correct. How can I merge the duplicate rows so that I do not have duplicate IDs, and that the totals are correct.

The desired output is like below:

ID Name app_A app_B app_C Total
0001 John Smith 0 5 3 8
0002 Peter Piper 2 0 3 5
0003 Susan Jones 1 2 4 7
... ... ... ... ... ...

CodePudding user response:

Simple..aggregate app like columns with sum and Total with first

c = df.filter(like='app_')
df.groupby(['ID', 'Name']).agg({**dict.fromkeys(c, 'sum'), 'Total': 'first'})

Result

                  app_A  app_B  app_C  Total
ID   Name                                   
0001 John Smith       0      5      3      8
0002 Peter Piper      2      0      3      5
0003 Susan Jones      1      2      4      7
  • Related