Home > database >  filter df using key words
filter df using key words

Time:05-23

I have a df:

item_name   price   stock
red apple    2       2
green apple  4       1
green grape  4       3
yellow apple 1       2
purple grape 4       1

I have another df:

Key Word   Min_stock
red;grape   2

The result I would like to get is:

item_name   price   stock
red apple    2       2
green grape  4       3

I would like to filter the first df based on the second df, for keyword, I would like to select item_name that contains either key word in Key Word cloumn.

Is there any way to acheive it?

CodePudding user response:

Try this

df = pd.DataFrame({'item_name': ['red apple','green apple','green grape','yellow apple','purple grape'],
                   'price': [2, 4, 4, 1, 4],
                   'stock': [2, 1, 3, 2, 1]})
df2 = pd.DataFrame({'Key Word': ['red;grape'], 'Min_stock': [2]})

# create a mask that matches keywords
kw_m = df['item_name'].str.contains(df2.loc[0, 'Key Word'].replace(';','|'))
# create a mask that checks if there's enough stock
st_m = df['stock'] >= df2.loc[0,'Min_stock']
# filter
df[kw_m & st_m]
     item_name  price  stock
0    red apple      2      2
2  green grape      4      3

CodePudding user response:

Assuming df1 and df2 the DataFrames, you can compute a regex from the splitted and exploded df2, then extract and map the min values and filter with boolean indexing:

s = (df2.assign(kw=df2['Key Word'].str.split(';'))
        .explode('kw')
        .set_index('kw')['Min_stock']
     )
# red       2
# grape     2
# blue     10
# apple    10

regex = '|'.join(s.index)
# 'red|grape|blue|apple'

mask = df1['item_name'].str.extract(f'({regex})', expand=False).map(s)
# 0     2
# 1    10
# 2     2
# 3    10
# 4     2

out = df1[mask.notna()&df1['stock'].ge(mask)]

output:

     item_name  price  stock
0    red apple      2      2
2  green grape      4      3

NB. for generalization, I used a different df2 as input:

Key Word   Min_stock
red;grape   2
blue;apple  10
  • Related