Home > other >  Drop group if another column has duplicate values - pandas dataframe
Drop group if another column has duplicate values - pandas dataframe

Time:05-11

I have the following df

id  value many other variables
A    5
A    5
A    8  
A    9
B    3
B    4 
B    5
B    9 
C    10
C    11
C    19 
D    6
D    6
D    10
E    0
E    0
E    0
... 

I want to drop the whole id group if there are duplicate values in the value column (except zeros) So the output should be

id  value many other variables
B    3
B    4 
B    5
B    9 
C    10
C    11
C    19 
E    0
E    0
E    0
... 

CodePudding user response:

You can use duplicated to flag the duplicates, then transform groupby.any to flag the groups with duplicates. Then to get the rows with 0s, chain this boolean mask with a boolean mask that flags 0s:

out =df[~df.duplicated(['id','value']).groupby(df['id']).transform('any') | df['value'].eq(0)]

Output:

   id  value many_other_variables
4   B      3                     
5   B      4                     
6   B      5                     
7   B      9                     
8   C     10                     
9   C     11                     
10  C     19                     
14  E      0                     
15  E      0                     
16  E      0                     

Note: groupby.any is an aggregation, transform transforms that aggregate to match the length of the original DataFrame. The goal is to create a boolean mask to filter df with; and boolean masks must have the same length as the original df, so we transform the aggregate here.

  • Related