Is there a Pandas function to separate rows based on one column value and other column diversity?-CodePudding

This is a DataFrame sample:

       Folder Model
0      123        A
1      123        A
2      123        A
3      4541       A
4      4541       B
5      4541       C
6      4541       A
7      11         B
8      11         C
9      222        D
10     222        D
11     222        B
12     222        A

I need to separate Folders that have items with Model A and also another Model (B, C or D). The final DataFrame should look like that.

       Folder Model
3      4541       A
4      4541       B
5      4541       C
6      4541       A
9      222        D
10     222        D
11     222        B
12     222        A

I suppose it is something in the groupby universe, but couldn't get to a conclusion. Any suggestions?

CodePudding user response：

group must have 'A' and must not have only 'A'

use groupby filter

(df
 .groupby('Folder')
 .filter(
     lambda x: (x['Model'].eq('A').sum() > 0) & (x['Model'].ne('A').sum() > 0)
 )
)

or if you want use transform boolean indexing

cond1 = (df
         .groupby('Folder')['Model']
         .transform(
             lambda x: (x.eq('A').sum() > 0) & (x.ne('A').sum() > 0)
         )
         )
df[cond1]

CodePudding user response：

You can use set operations (is the set of the Models per group greater than A alone?):

out = (df.groupby('Folder')
         .filter(lambda x: set(x['Model'])>{'A'})
      )

A bit longer, but potentially more efficient approach:

m = df.groupby('Folder')['Model'].agg(lambda x: set(x)>{'A'})

out = df[df['Folder'].isin(m[m].index)]

Output:

    Folder Model
3     4541     A
4     4541     B
5     4541     C
6     4541     A
9      222     D
10     222     D
11     222     B
12     222     A