Home > database >  How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas?
How to aggregate DataFrame and drop duplicates based on values in two columns in Python Pandas?

Time:10-08

I have DataFrame in Python Pandas like below:

ID  | COL1| COL2 | COL3
----------|------|------
123 | XXX | 0    | 1
123 | XXX | 1    | 1
444 | ABC | 1    | 1
444 | ABC | 1    | 1 
555 | PPP | 0    | 0

And I need to drop duplicates in above DF in that way:

  • if in COL2 or COL3 is at least once '1' then should be 1 in these columns for ID (nevermind how often he had 0 in mentioned columns)
  • rest of columns should still be in output
  • In COL1 the is no duplicates per ID

So as a result I need output like below (I have many more columns so in output I need to have not only ID, COL2, COL3, but ID, COL1, COL2, COL3)

ID  | COL1| COL2 | COL3
----|-----|------|-----
123 | XXX | 1    | 1
444 | ABC | 1    | 1
555 | PPP | 0    | 0

How can I do that in Python Pandas ?

CodePudding user response:

Use a groupby.max:

out = df.groupby(['ID', 'COL1'], as_index=False).max()

output:

    ID COL1  COL2  COL3
0  123  XXX     1     1
1  444  ABC     1     1
2  555  PPP     0     0
  • Related