Remove group if contains record with status 300-CodePudding

I would like to group records by ID from df and delete group if any of records has STATUS = 300.

import pandas as pd


df1 = pd.DataFrame(
    {
        "ID": ["A0", "A0", "A0", "A1", "A1", "A1", "A2", "A2", "A2"],
        "STATUS": [100, 100, 300, 100, 100, 100, 300, 100, 100],
    },
    index=[0, 1, 2, 3, 4, 5, 6, 7, 8],
)

output:

   ID  STATUS
0  A0     100
1  A0     100
2  A0     300
3  A1     100
4  A1     100
5  A1     100
6  A2     300
7  A2     100
8  A2     100

I would like to get:

   ID  STATUS
0  A1     100
1  A1     100
2  A1     100

I tried: dfnew = df1.groupby('ID').filter(lambda x: x['STATUS'] != 300)

But I got the error: TypeError: filter function returned a Series, but expected a scalar bool

CodePudding user response：

df1.groupby('ID').filter(lambda x: 300 not in x['STATUS'].to_list())

CodePudding user response：

An efficient method to match any value from a list (see OP's comment) is to use isin coupled with groupby transform:

df1[~df1['STATUS'].isin([300, 500]).groupby(df1['ID']).transform('any')]

output:

   ID  STATUS
3  A1     100
4  A1     100
5  A1     100

CodePudding user response：

Another way to solve it is with transform. We create a mask of group IDs where their max is not 300. We could also use not in or whatever logic we decide

df1[df1.groupby('ID')['STATUS'].transform(lambda d: d.max() != 300)]

# or this for 300   500

df1[df1.groupby('ID')['STATUS'].transform(lambda d: d.max() not in (300, 500))]

CodePudding user response：

Try this: df1[df1.STATUS != 300]