I can match each row with each diciotnary key but I am wondering if there's a way I can get the related value (string) in a different column as well.
import pandas as pd
entertainment_dict = {
"Food": ["McDonald", "Five Guys", "KFC"],
"Music": ["Taylor Swift", "Jay Z", "One Direction"],
"TV": ["Big Bang Theory", "Queen of South", "Ted Lasso"]
}
data = {'text':["Kevin Lee has bought a Taylor Swift's CD and eaten at McDonald.",
"The best burger in McDonald is cheeze buger.",
"Kevin Lee is planning to watch the Big Bang Theory and eat at KFC."]}
df = pd.DataFrame(data)
regex = '|'.join(f'(?P<{k}>{"|".join(v)})' for k,v in entertainment_dict.items())
df['labels'] = ((df['text'].str.extractall(regex).notnull().groupby(level=0).max()*entertainment_dict.keys())
.apply(lambda r: ','.join([i for i in r if i]) , axis=1)
)
text labels
0 Kevin Lee has bought a Taylor Swift's CD and e... Food,Music
1 The best burger in McDonald is cheeze buger. Food
2 Kevin Lee is planning to watch the Big Bang Th... Food,TV
Expected output
text labels words
0 Kevin Lee has bought a Taylor Swift's CD and e... Food,Music Taylor Swift, McDonald
1 The best burger in McDonald is cheeze buger. Food McDonald
2 Kevin Lee is planning to watch the Big Bang Th... Food,TV Big Bang Theory, KFC
CodePudding user response:
You could use:
df['words'] = (df['text'].str.extractall(regex)
.groupby(level=0).first()
.apply(lambda x: ','.join(set(x).difference([None])),
axis=1)
)
output:
text labels words
0 Kevin Lee has bought ... McDonald. Food,Music Taylor Swift,McDonald
1 The best burger in ... cheeze buger. Food McDonald
2 Kevin Lee is planning ... eat at KFC. Food,TV KFC,Big Bang Theory
CodePudding user response:
Use DataFrame.stack
with convert first level to column by reset_index
, so possible join values in GroupBy.agg
, for unique values in order is used dict.fromkeys
trick:
uniq = lambda x: ','.join(dict.fromkeys(x).keys())
df[['label','words']] = (df['text'].str.extractall(regex)
.stack()
.reset_index(level=-1)
.groupby(level=0)
.agg(uniq))
print (df)
text label \
0 Kevin Lee has bought a Taylor Swift's CD and e... Music,Food
1 The best burger in McDonald is cheeze buger. Food
2 Kevin Lee is planning to watch the Big Bang Th... TV,Food
words
0 Taylor Swift,McDonald
1 McDonald
2 Big Bang Theory,KFC