I have a df:
item_name price stock
red apple 2 2
green apple 4 1
green grape 4 3
yellow apple 1 2
purple grape 4 1
I have another df:
Key Word Min_stock
red;grape 2
The result I would like to get is:
item_name price stock
red apple 2 2
green grape 4 3
I would like to filter the first df based on the second df, for keyword, I would like to select item_name that contains either key word in Key Word cloumn.
Is there any way to acheive it?
CodePudding user response:
Try this
df = pd.DataFrame({'item_name': ['red apple','green apple','green grape','yellow apple','purple grape'],
'price': [2, 4, 4, 1, 4],
'stock': [2, 1, 3, 2, 1]})
df2 = pd.DataFrame({'Key Word': ['red;grape'], 'Min_stock': [2]})
# create a mask that matches keywords
kw_m = df['item_name'].str.contains(df2.loc[0, 'Key Word'].replace(';','|'))
# create a mask that checks if there's enough stock
st_m = df['stock'] >= df2.loc[0,'Min_stock']
# filter
df[kw_m & st_m]
item_name price stock
0 red apple 2 2
2 green grape 4 3
CodePudding user response:
Assuming df1
and df2
the DataFrames, you can compute a regex from the split
ted and explod
ed df2
, then extract
and map
the min values and filter with boolean indexing:
s = (df2.assign(kw=df2['Key Word'].str.split(';'))
.explode('kw')
.set_index('kw')['Min_stock']
)
# red 2
# grape 2
# blue 10
# apple 10
regex = '|'.join(s.index)
# 'red|grape|blue|apple'
mask = df1['item_name'].str.extract(f'({regex})', expand=False).map(s)
# 0 2
# 1 10
# 2 2
# 3 10
# 4 2
out = df1[mask.notna()&df1['stock'].ge(mask)]
output:
item_name price stock
0 red apple 2 2
2 green grape 4 3
NB. for generalization, I used a different df2
as input:
Key Word Min_stock
red;grape 2
blue;apple 10