I am trying to use a table that has headers that correspond to categorizations to descriptions for products
Table1
ID | Description |
---|---|
1 | Wow what a nice car |
2 | That automobile is fabulous |
3 | When will they serve us dessert? |
4 | When is Jim's ride showing up? |
5 | What do I have to do to get a decent cookie around here! |
6 | Bury me with a chocolate sundae. |
CategoryTable
Dessert | Cars | Coffee |
---|---|---|
Candy | Car | Cold Brew |
Cake | Automobile | Starbucks |
Chocolate | Driving | Mug |
Cookie | Ride | |
sundae | Park | |
Dessert | Wheels |
ID | Description | Result |
---|---|---|
1 | Wow what a nice car | Car |
2 | That automobile rocks | Car |
3 | When's Dessert? | Dessert |
4 | Where is Jim's ride? | Car |
5 | Gimme the Cookie! | Dessert |
6 | Starbucks is popular. | Coffee |
I am happy to use Python, SQL or Excel to figure this all out thanks in advance!
CodePudding user response:
Note that I just separated tags and description tables in different sheets.
Sheet5:
Sheet1:
Use:
data = pd.read_excel('Example Data Set.xlsx', sheet_name='Sheet5')
tags = pd.read_excel('Example Data Set.xlsx', sheet_name='Sheet1')
tags_str = '|'.join([x.lower() for x in tags.fillna('_').values.flatten()])
temp = data['Description'].str.lower().str.extract(f"( {tags_str} )")
def tag2cat(tag):
for col in tags.columns:
if tag.values[0] in tags[col].str.lower().values:
return col
temp.apply(tag2cat, axis = 1)
Output:
CodePudding user response:
You can try:
pattern = '|'.join([f"(?P<{c}>{'|'.join(df2[c].str.lower().dropna())})"
for c in df2])
df1['Result'] = (df1['Description']
.str.lower().str.extractall(pattern)
.stack().reset_index(level=2).groupby(level=0)['level_2']
.apply(lambda x: ', '.join(set(x))))
Output:
>>> df1
ID Description Result
0 1 Wow what a nice car Cars
1 2 That automobile is fabulous Cars
2 3 When will they serve us dessert? Dessert
3 4 When is Jim's ride showing up? Cars
4 5 What do I have to do to get a decent cookie ar... Dessert
5 6 Bury me with a chocolate sundae. Dessert
It works if you have multiple match. Tags will be comma separated.