I am trying to create a function to get certain strings of a df column, only if there is an exact match with a string in a list. Here is an example:
my_list = ['Lys', 'Lysol', 'Cla', 'Clarins']
def test(row):
for i in my_list:
if i in row['Product']:
return i
else:
return row['Product']
df['Exact_match'] = df.apply(test, axis=1)
Output:
Product | Exact_match |
---|---|
Lysol Disinfectant | Lys |
Napkins Lys | Lys |
Cream Clarins | Cream Clarins |
However, I couldn't achieve my goal. This is the expected output I wanted to get:
Product | Exact_match |
---|---|
Lysol Disinfectant | Lysol |
Napkins Lys | Lys |
Cream Clarins | Clarins |
CodePudding user response:
Re-order the searches so that it's longest string first and also use .str.extract
instead of an applied function, eg:
df['Product'].str.extract('(Clarins|Lysol|Lys|Cla)')
CodePudding user response:
This should work, no matter how long your list is and how you sort things:
def find_str(row, list_):
words = row.split(' ')
for elem in words:
for search_str in list_:
if search_str in elem:
return elem
return row
df['Exact_match'] = df['Product'].apply(find_str, list_=my_list)