I want to check if a string in a Pandas column contains a word from a dictionary and if there is a match need to create a new column with the appropriate dictionary key as the column value. eg. dict = {'Car': ['Merc', 'BMW', 'Ford, 'Suzuki'], 'MotorCycle': ['Harley', 'Yamaha', 'Triump']}
df
Person | Sentence |
---|---|
A | 'He drives a Merc' |
B | 'He rides a Harley' |
should return
Person | Sentence | Vehicle |
---|---|---|
A | 'He drives a Merc' | 'Car' |
B | 'He rides a Harley' | "Motorcycle |
CodePudding user response:
One solution is to create reversed dictionary from the dct
and search for right word using str.split
:
dct = {
"Car": ["Merc", "BMW", "Ford", "Suzuki"],
"MotorCycle": ["Harley", "Yamaha", "Triump"],
}
dct_inv = {i: k for k, v in dct.items() for i in v}
def find_word(x):
for w in x.strip(" '").split():
if w in dct_inv:
return dct_inv[w]
return None
df["Vehicle"] = df["Sentence"].apply(find_word)
print(df)
Prints:
Person Sentence Vehicle
0 A 'He drives a Merc' Car
1 B 'He rides a Harley' MotorCycle
CodePudding user response:
You can invert the dictionary and use a regex map
:
import re
dic = {'Car': ['Merc', 'BMW', 'Ford', 'Suzuki'],
'MotorCycle': ['Harley', 'Yamaha', 'Triump']}
# invert dictionary
d = {k:v for v,l in dic.items()
for k in l}
# craft regex
regex = f'({"|".join(map(re.escape, d))})'
# map vehicle from match
df['Vehicle'] = df['Sentence'].str.extract(regex, expand=False).map(d)
Output:
Person Sentence Vehicle
0 A He drives a Merc Car
1 B He rides a Harley MotorCycle