I have a dataframe that looks like this:
ID | Name | app_A | app_B | app_C | Total |
---|---|---|---|---|---|
0001 | John Smith | 0 | 5 | 3 | 8 |
0002 | Peter Piper | 2 | 0 | 3 | 5 |
0003 | Susan Jones | 1 | 0 | 0 | 7 |
0003 | Susan Jones | 0 | 2 | 4 | 7 |
... | ... | ... | ... | ... | ... |
You can see that there are some Duplicate IDs (and Names), and in these individual rows, the Totals are not correct, but if you were to add the columns together from both rows, then the total would be correct. How can I merge the duplicate rows so that I do not have duplicate IDs, and that the totals are correct.
The desired output is like below:
ID | Name | app_A | app_B | app_C | Total |
---|---|---|---|---|---|
0001 | John Smith | 0 | 5 | 3 | 8 |
0002 | Peter Piper | 2 | 0 | 3 | 5 |
0003 | Susan Jones | 1 | 2 | 4 | 7 |
... | ... | ... | ... | ... | ... |
CodePudding user response:
Simple..aggregate app
like columns with sum
and Total
with first
c = df.filter(like='app_')
df.groupby(['ID', 'Name']).agg({**dict.fromkeys(c, 'sum'), 'Total': 'first'})
Result
app_A app_B app_C Total
ID Name
0001 John Smith 0 5 3 8
0002 Peter Piper 2 0 3 5
0003 Susan Jones 1 2 4 7