Home > Blockchain >  Find exact match between strings
Find exact match between strings

Time:07-09

I am trying to create a function to get certain strings of a df column, only if there is an exact match with a string in a list. Here is an example:

my_list = ['Lys', 'Lysol', 'Cla', 'Clarins']

def test(row):
   for i in my_list:
      if i in row['Product']:
         return i
      else:
         return row['Product']

df['Exact_match'] = df.apply(test, axis=1)

Output:

Product Exact_match
Lysol Disinfectant Lys
Napkins Lys Lys
Cream Clarins Cream Clarins

However, I couldn't achieve my goal. This is the expected output I wanted to get:

Product Exact_match
Lysol Disinfectant Lysol
Napkins Lys Lys
Cream Clarins Clarins

CodePudding user response:

Re-order the searches so that it's longest string first and also use .str.extract instead of an applied function, eg:

df['Product'].str.extract('(Clarins|Lysol|Lys|Cla)')

CodePudding user response:

This should work, no matter how long your list is and how you sort things:

def find_str(row, list_):
    words = row.split(' ')
    for elem in words:
        for search_str in list_:
            if search_str in elem:
                return elem
    
    return row

df['Exact_match'] = df['Product'].apply(find_str, list_=my_list)
  • Related