Drop group if another column has duplicate values

I have the following df

id  value many other variables
A    5
A    5
A    8  
A    9
B    3
B    4 
B    5
B    9 
C    10
C    11
C    19 
D    6
D    6
D    10
E    0
E    0
E    0
...

I want to drop the whole id group if there are duplicate values in the value column (except zeros) So the output should be

id  value many other variables
B    3
B    4 
B    5
B    9 
C    10
C    11
C    19 
E    0
E    0
E    0
...

CodePudding user response：

You can use duplicated to flag the duplicates, then transform groupby.any to flag the groups with duplicates. Then to get the rows with 0s, chain this boolean mask with a boolean mask that flags 0s:

out =df[~df.duplicated(['id','value']).groupby(df['id']).transform('any') | df['value'].eq(0)]

Output:

   id  value many_other_variables
4   B      3                     
5   B      4                     
6   B      5                     
7   B      9                     
8   C     10                     
9   C     11                     
10  C     19                     
14  E      0                     
15  E      0                     
16  E      0

Note: groupby.any is an aggregation, transform transforms that aggregate to match the length of the original DataFrame. The goal is to create a boolean mask to filter df with; and boolean masks must have the same length as the original df, so we transform the aggregate here.