My question is quite similar to this one: Drop group if another column has duplicate values - pandas dataframe
I have the following dataframe:
letter value many other variables
A 5
A 5
A 8
A 9
B 3
B 10
B 10
B 4
B 5
B 9
C 10
C 10
C 10
D 6
D 8
D 10
E 5
E 5
E 5
F 4
F 4
And when grouping it by letter I want to remove all the resulting groups that only have value that repeats, thus getting a result like this:
letter value many other variables
A 5
A 5
A 8
A 9
B 3
B 10
B 10
B 4
B 5
B 9
D 6
D 8
D 10
I am afraid that if I use the duplicate() function similarly to the question I mentioned at the beggining I would be deleting groups (or the rows in) 'A' and 'B' which should rather stay in their place.
CodePudding user response:
You have several possibilities.
Using duplicated
and groupby.transform
:
m = (df.duplicated(subset=['letter', 'value'], keep=False)
.groupby(df['letter']).transform('all')
)
out = df[~m]
NB. this won't drop groups with a single row.
Using groupby.transform
and nunique
:
out = df[df.groupby('letter')['value'].transform('nunique').gt(1)]
NB. this will drop groups with a single row.
Output:
letter value many other variables
0 A 5 NaN NaN NaN
1 A 5 NaN NaN NaN
2 A 8 NaN NaN NaN
3 A 9 NaN NaN NaN
4 B 3 NaN NaN NaN
5 B 10 NaN NaN NaN
6 B 10 NaN NaN NaN
7 B 4 NaN NaN NaN
8 B 5 NaN NaN NaN
9 B 9 NaN NaN NaN
13 D 6 NaN NaN NaN
14 D 8 NaN NaN NaN
15 D 10 NaN NaN NaN