Home > Back-end >  Drop group if another column only has duplicate values - pandas dataframe
Drop group if another column only has duplicate values - pandas dataframe

Time:01-25

My question is quite similar to this one: Drop group if another column has duplicate values - pandas dataframe

I have the following dataframe:

letter  value   many    other   variables
A          5            
A          5            
A          8            
A          9            
B          3            
B         10            
B         10            
B          4            
B          5            
B          9            
C         10            
C         10            
C         10            
D          6            
D          8            
D         10            
E          5            
E          5            
E          5            
F          4            
F          4            

And when grouping it by letter I want to remove all the resulting groups that only have value that repeats, thus getting a result like this:

letter      value   many    other   variables
A            5          
A            5          
A            8          
A            9          
B            3          
B           10          
B           10          
B            4          
B            5          
B            9          
D            6          
D            8          
D           10          

I am afraid that if I use the duplicate() function similarly to the question I mentioned at the beggining I would be deleting groups (or the rows in) 'A' and 'B' which should rather stay in their place.

CodePudding user response:

You have several possibilities.

Using duplicated and groupby.transform:

m = (df.duplicated(subset=['letter', 'value'], keep=False)
       .groupby(df['letter']).transform('all')
    )

out = df[~m]

NB. this won't drop groups with a single row.

Using groupby.transform and nunique:

out = df[df.groupby('letter')['value'].transform('nunique').gt(1)]

NB. this will drop groups with a single row.

Output:

   letter  value  many  other  variables
0       A      5   NaN    NaN        NaN
1       A      5   NaN    NaN        NaN
2       A      8   NaN    NaN        NaN
3       A      9   NaN    NaN        NaN
4       B      3   NaN    NaN        NaN
5       B     10   NaN    NaN        NaN
6       B     10   NaN    NaN        NaN
7       B      4   NaN    NaN        NaN
8       B      5   NaN    NaN        NaN
9       B      9   NaN    NaN        NaN
13      D      6   NaN    NaN        NaN
14      D      8   NaN    NaN        NaN
15      D     10   NaN    NaN        NaN
  • Related