I have the following data frame:
data = {
'Day':[7,7,7,7,5,5,5,5],
'Direction': ["North","NorthEast","NorthWest","West","East","EastWest","EastNorth","West"],
'Bool':[True,False,False,False,True,False,False,False],}
df = pd.DataFrame(data)
df.groupby(["Day"])
Day Direction Bool
0 7 North True
1 7 NorthEast False
2 7 NorthWest False
3 7 West False
4 5 East True
5 5 EastWest False
6 5 EastNorth False
7 5 West False
I would like to filter for each group by Day, the rows where the string column df['Direction']
is not contained in the row from df['Direction']
where df['Bool']
is True
.
So for example in the first group, df['Direction']= "West"
it's not a match with df["direction"]= "North"
(where df["Bool"]== True
) so it's dropped. df["Direction"]="NorthWest"
is a match since the string contains North
so it's kept.
Expected Output:
Day Direction Bool
0 7 North True
1 7 NorthEast False
2 7 NorthWest False
3 5 East True
4 5 EastWest False
5 5 EastNorth False
The rows do not always have the same order, so using shift()
is not an option. I'm wondering if there's a quick way to do this without using a loop as well.
CodePudding user response:
IIUC, you can use groupby.apply
with boolean slicing:
df2 = (df
.groupby('Day', sort=False, group_keys=False)
.apply(lambda g: g[g['Direction'].str.contains('|'.join(g.loc[g['Bool'], 'Direction']))])
)
output:
Day Direction Bool
0 7 North True
1 7 NorthEast False
2 7 NorthWest False
4 5 East True
5 5 EastWest False
6 5 EastNorth False