I have the following df
id value many other variables
A 5
A 5
A 8
A 9
B 3
B 4
B 5
B 9
C 10
C 11
C 19
D 6
D 6
D 10
E 0
E 0
E 0
...
I want to drop the whole id group if there are duplicate values in the value
column (except zeros) So the output should be
id value many other variables
B 3
B 4
B 5
B 9
C 10
C 11
C 19
E 0
E 0
E 0
...
CodePudding user response:
You can use duplicated
to flag the duplicates, then transform groupby.any
to flag the groups with duplicates. Then to get the rows with 0s, chain this boolean mask with a boolean mask that flags 0s:
out =df[~df.duplicated(['id','value']).groupby(df['id']).transform('any') | df['value'].eq(0)]
Output:
id value many_other_variables
4 B 3
5 B 4
6 B 5
7 B 9
8 C 10
9 C 11
10 C 19
14 E 0
15 E 0
16 E 0
Note: groupby.any
is an aggregation, transform
transforms that aggregate to match the length of the original DataFrame. The goal is to create a boolean mask to filter df
with; and boolean masks must have the same length as the original df
, so we transform the aggregate here.