I am looking to figure out how I can filter rows in pandas based on a certain criteria that changes for each group. See dummy data here and the desired result.
Dummy data
data = {'ID':['1111', '1111', '1111', '1111','1112','1112','1112','1112','1112'],
'Category':[1,2,2,2,1,3,2,2,1]}
pd.DataFrame(data)
ID Category
1111 1
1111 2
1111 2
1111 2
1112 1
1112 3
1112 2
1112 2
1112 1
Now I am trying to filter the dataframe such that for each ID
it picks all the rows before the first occurrence of Category 2
. This is how the desired result would look like
ID Category
1111 1
1112 1
1112 3
CodePudding user response:
Try this:
df[(df['Category'] != 2).groupby(df['ID']).cumprod()]
Output:
ID Category
0 1111 1
4 1112 1
5 1112 3
Details:
Create a boolean series where True when Category is not equal to 2, then use cumprod such that the first False Category = 2, it remains False for each group.
CodePudding user response:
df = pd.DataFrame(data)
df.loc[df.Category.eq(2).groupby(df.ID).cumsum().eq(0)]
# ID Category
# 0 1111 1
# 4 1112 1
# 5 1112 3