This is a DataFrame sample:
Folder Model
0 123 A
1 123 A
2 123 A
3 4541 A
4 4541 B
5 4541 C
6 4541 A
7 11 B
8 11 C
9 222 D
10 222 D
11 222 B
12 222 A
I need to separate Folders
that have items with Model
A
and also another Model
(B
, C
or D
). The final DataFrame should look like that.
Folder Model
3 4541 A
4 4541 B
5 4541 C
6 4541 A
9 222 D
10 222 D
11 222 B
12 222 A
I suppose it is something in the groupby
universe, but couldn't get to a conclusion. Any suggestions?
CodePudding user response:
group must have 'A' and must not have only 'A'
use groupby filter
(df
.groupby('Folder')
.filter(
lambda x: (x['Model'].eq('A').sum() > 0) & (x['Model'].ne('A').sum() > 0)
)
)
or if you want use transform
boolean indexing
cond1 = (df
.groupby('Folder')['Model']
.transform(
lambda x: (x.eq('A').sum() > 0) & (x.ne('A').sum() > 0)
)
)
df[cond1]
CodePudding user response:
You can use set
operations (is the set of the Models per group greater than A alone?):
out = (df.groupby('Folder')
.filter(lambda x: set(x['Model'])>{'A'})
)
A bit longer, but potentially more efficient approach:
m = df.groupby('Folder')['Model'].agg(lambda x: set(x)>{'A'})
out = df[df['Folder'].isin(m[m].index)]
Output:
Folder Model
3 4541 A
4 4541 B
5 4541 C
6 4541 A
9 222 D
10 222 D
11 222 B
12 222 A