I have DataFrame in Python Pandas like below:
ID | COL1| COL2 | COL3
----------|------|------
123 | XXX | 0 | 1
123 | XXX | 1 | 1
444 | ABC | 1 | 1
444 | ABC | 1 | 1
555 | PPP | 0 | 0
And I need to drop duplicates in above DF in that way:
- if in COL2 or COL3 is at least once '1' then should be 1 in these columns for ID (nevermind how often he had 0 in mentioned columns)
- rest of columns should still be in output
- In COL1 the is no duplicates per ID
So as a result I need output like below (I have many more columns so in output I need to have not only ID, COL2, COL3, but ID, COL1, COL2, COL3)
ID | COL1| COL2 | COL3
----|-----|------|-----
123 | XXX | 1 | 1
444 | ABC | 1 | 1
555 | PPP | 0 | 0
How can I do that in Python Pandas ?
CodePudding user response:
Use a groupby.max
:
out = df.groupby(['ID', 'COL1'], as_index=False).max()
output:
ID COL1 COL2 COL3
0 123 XXX 1 1
1 444 ABC 1 1
2 555 PPP 0 0