I would like to group records by ID from df and delete group if any of records has STATUS = 300.
import pandas as pd
df1 = pd.DataFrame(
{
"ID": ["A0", "A0", "A0", "A1", "A1", "A1", "A2", "A2", "A2"],
"STATUS": [100, 100, 300, 100, 100, 100, 300, 100, 100],
},
index=[0, 1, 2, 3, 4, 5, 6, 7, 8],
)
output:
ID STATUS
0 A0 100
1 A0 100
2 A0 300
3 A1 100
4 A1 100
5 A1 100
6 A2 300
7 A2 100
8 A2 100
I would like to get:
ID STATUS
0 A1 100
1 A1 100
2 A1 100
I tried: dfnew = df1.groupby('ID').filter(lambda x: x['STATUS'] != 300)
But I got the error: TypeError: filter function returned a Series, but expected a scalar bool
CodePudding user response:
df1.groupby('ID').filter(lambda x: 300 not in x['STATUS'].to_list())
CodePudding user response:
An efficient method to match any value from a list (see OP's comment) is to use isin
coupled with groupby
transform
:
df1[~df1['STATUS'].isin([300, 500]).groupby(df1['ID']).transform('any')]
output:
ID STATUS
3 A1 100
4 A1 100
5 A1 100
CodePudding user response:
Another way to solve it is with transform
. We create a mask of group IDs where their max is not 300. We could also use not in
or whatever logic we decide
df1[df1.groupby('ID')['STATUS'].transform(lambda d: d.max() != 300)]
# or this for 300 500
df1[df1.groupby('ID')['STATUS'].transform(lambda d: d.max() not in (300, 500))]
CodePudding user response:
Try this:
df1[df1.STATUS != 300]