Say you have a dataFrame that looks like this:
name Description
ABC 'Verdansk was the best'
EDF 'Pro Clubs with the lads'
LMN 'Being Druid was the best'
RST 'The game of FIFA23'
XYZ 'Switch on for Bio lab drop'
... ....
and a dictionary like this:
categories = {
"CALL_OF_DUTY":["Verdansk","Bio Lab"],
"WORLD_OF_WARCRAFT":["Druid","Rogue"],
"FIFA": ["Pro Clubs", "FIFA"]
}
What I want to return the keys as the category where the values from dictionary is found anywhere in the description column from the pandas dataFrame.
The output should be:
name Description category
ABC 'Verdansk was the best' 'CALL_OF_DUTY'
EDF 'Pro Clubs with the lads' 'FIFA'
LMN 'Being Druid was the best' 'WORLD_OF_WARCRAFT'
RST 'The game of FIFA23' 'FIFA'
XYZ 'Switch on for Bio lab drop' 'CALL_OF_DUTY'
... ....
The solution I have so far is below but doesn't work for keys but if I want to return the values from the dict then that works. I have spent over a week trying to solve this and would like some help and if possible an explanation so I can learn from it.
df['Category'] = (df['Description'].str.extract(fr"\b({'|'.join(categories.values())})\b", re.IGNORECASE)[0].map(categories))
CodePudding user response:
You need reverse the categories
. Since you want to match FIFA23
with FIFA
, you don't need \b
.
categories = {v.lower():k for k, lst in categories.items() for v in lst}
df['Category'] = (df['Description'].str.extract(fr"({'|'.join(categories)})", re.IGNORECASE)
[0].str.lower().map(categories))
print(df)
name Description Category
0 ABC 'Verdansk was the best' CALL_OF_DUTY
1 EDF 'Pro Clubs with the lads' FIFA
2 LMN 'Being Druid was the best' WORLD_OF_WARCRAFT
3 RST 'The game of FIFA23' FIFA
4 XYZ 'Switch on for Bio lab drop' CALL_OF_DUTY