String matching through a python dictionary of lists and and assign label in a new column [closed]-CodePudding

I have a dictionary of lists and I want to use it to label the sentences. What's the most efficient way to do this?

entertainment_dict = {
  "Food": ["McDonald", "Five Guys", "KFC"],
  "Music": ["Taylor Swift", "Jay Z", "One Direction"],
  "TV": ["Big Bang Theory", "Queen of South", "Ted Lasso"]
}

{'Food': ['McDonald', 'Five Guys', 'KFC'], 'Music': ['Taylor Swift', 'Jay Z', 'One Direction'], 'TV': ['Big Bang Theory', 'Queen of South', 'Ted Lasso']}

data = {'text':["Kevin Lee has bought a Taylor Swift's CD.", 
                "The best burger in McDonald is cheeze buger.",
                "Kevin McDonald is planning to watch the Big Bang Theory."]}

df = pd.DataFrame(data)

                                                text
0     Kevin Lee has bought a Taylor Swift's CD.
1       The best burger in McDonald is cheeze buger.
2  Kevin Lee is planning to watch the Big Ba...

Expected output:

                                                text labels
0     Kevin Lee has bought a Taylor Swift's CD.  Music
1       The best burger in McDonald is cheeze buger.   Food
2  Kevin Lee is planning to watch the Big Ba...     TV

CodePudding user response：

Like in your previous questions, you can craft a custom regex to use with extract:

regex = '|'.join(f'(?P<{k}>{"|".join(v)})' for k,v in entertainment_dict.items())

df['labels'] = ((df['text'].str.extract(regex).notnull()*entertainment_dict.keys())
                 .apply(lambda r: ','.join([i for i in r if i]) , axis=1)
                )

output:

                                                text labels
0          Kevin Lee has bought a Taylor Swift's CD.  Music
1       The best burger in McDonald is cheeze buger.   Food
2  Kevin McDonald is planning to watch the Big Ba...   Food