Home > other >  Filter rows from ogrouped data frames based on string & boolean columns
Filter rows from ogrouped data frames based on string & boolean columns

Time:08-18

I have the following data frame:

data = {
      'Day':[7,7,7,7,5,5,5,5],
     'Direction': ["North","NorthEast","NorthWest","West","East","EastWest","EastNorth","West"],
    'Bool':[True,False,False,False,True,False,False,False],}

df = pd.DataFrame(data)
df.groupby(["Day"])

      Day  Direction   Bool
  0    7      North   True  
  1    7  NorthEast  False
  2    7  NorthWest  False
  3    7       West  False
  4    5       East   True
  5    5   EastWest  False
  6    5  EastNorth  False
  7    5       West  False

I would like to filter for each group by Day, the rows where the string column df['Direction'] is not contained in the row from df['Direction'] where df['Bool'] is True.

So for example in the first group, df['Direction']= "West" it's not a match with df["direction"]= "North" (where df["Bool"]== True) so it's dropped. df["Direction"]="NorthWest" is a match since the string contains North so it's kept.

Expected Output:

      Day  Direction   Bool
  0    7      North   True  
  1    7  NorthEast  False
  2    7  NorthWest  False
  3    5       East   True
  4    5   EastWest  False
  5    5  EastNorth  False

The rows do not always have the same order, so using shift() is not an option. I'm wondering if there's a quick way to do this without using a loop as well.

CodePudding user response:

IIUC, you can use groupby.apply with boolean slicing:

df2 = (df
   .groupby('Day', sort=False, group_keys=False)
   .apply(lambda g: g[g['Direction'].str.contains('|'.join(g.loc[g['Bool'], 'Direction']))])
)

output:

   Day  Direction   Bool
0    7      North   True
1    7  NorthEast  False
2    7  NorthWest  False
4    5       East   True
5    5   EastWest  False
6    5  EastNorth  False
  • Related