Checking a if a string contains a string value from a dictionary and return the appropriate key-CodePudding

I want to check if a string in a Pandas column contains a word from a dictionary and if there is a match need to create a new column with the appropriate dictionary key as the column value. eg. dict = {'Car': ['Merc', 'BMW', 'Ford, 'Suzuki'], 'MotorCycle': ['Harley', 'Yamaha', 'Triump']}

Person	Sentence
A	'He drives a Merc'
B	'He rides a Harley'

should return

Person	Sentence	Vehicle
A	'He drives a Merc'	'Car'
B	'He rides a Harley'	"Motorcycle

CodePudding user response：

One solution is to create reversed dictionary from the dct and search for right word using str.split:

dct = {
    "Car": ["Merc", "BMW", "Ford", "Suzuki"],
    "MotorCycle": ["Harley", "Yamaha", "Triump"],
}

dct_inv = {i: k for k, v in dct.items() for i in v}


def find_word(x):
    for w in x.strip(" '").split():
        if w in dct_inv:
            return dct_inv[w]
    return None


df["Vehicle"] = df["Sentence"].apply(find_word)
print(df)

Prints:

  Person             Sentence     Vehicle
0      A   'He drives a Merc'         Car
1      B  'He rides a Harley'  MotorCycle

CodePudding user response：

You can invert the dictionary and use a regex map:

import re

dic = {'Car': ['Merc', 'BMW', 'Ford', 'Suzuki'],
       'MotorCycle': ['Harley', 'Yamaha', 'Triump']}

# invert dictionary
d = {k:v for v,l in dic.items()
     for k in l}

# craft regex 
regex = f'({"|".join(map(re.escape, d))})'

# map vehicle from match
df['Vehicle'] = df['Sentence'].str.extract(regex, expand=False).map(d)

Output:

  Person           Sentence     Vehicle
0      A   He drives a Merc         Car
1      B  He rides a Harley  MotorCycle