I have a Dataframe that has many keywords like this:
keywords = ['Chinese', 'American', 'Japanese', 'Greek']
Dataframe is
**Resteraunt || Catagory
McDonalds || 'Burger,Fast Food,American'
Sticky Rice || 'Sushi,Japanese'
Schechuan || 'Resteraunt, Japanese, Takeout'
Gyro King || 'Greek, Gyro, Food'
What I want is:
Resteraunt || Categories || Cuisine
McDonalds || 'Burger,Fast Food,American' || "American"
Sticky Rice || 'Sushi,Japanese' ||"Japanese"
Schechuan || 'Resteraunt, Chinese, Takeout' || "Chinese"
Gyro King || 'Greek, Gyro, Food' || "Greek"
CodePudding user response:
You can use pandas.Series.str.extract
to capture the Cuisine
.
joined_keywords = '|'.join(keywords)
df['Cuisine']= (
df['Catagory'].str.extract(fr'({joined_keywords})',expand=False)
.fillna('NotFound')
)
# Outupt :
print(out)
Resteraunt Catagory Cuisine
0 McDonalds Burger, FastFood, American American
1 StickyRice Sushi, Japanese Japanese
2 Schechuan Resteraunt, Japanese, Takeout Japanese
3 GyroKing Greek, Gyro, Food Greek
CodePudding user response:
def findcuisine(col):
for word in keywords:
if word in col:
return word
df['Cuisine'] = df['Category'].apply(findcuisine)
Output
Restaurant Category Cuisine
0 McDonalds Burger,Fast Food,American American
1 Sticky Rice Sushi,Japanese Japanese
2 Schechuan Japanese,Takeout Japanese
3 Gyro King Greek, Gyro, Food Greek