Home > Blockchain >  Search substrings in strings and return relevant string when matched
Search substrings in strings and return relevant string when matched

Time:09-08

I have a dataframe with product titles, which contain keywords, that can identify the product type as such:

df_product_titles dataframe

product_title
blue phn small           
silver totebag           
crossshldr bag          
crossshldr tote

I have another dataframe with two columns, where the 1st column has the keyword and the relevant product type:

df_product_types dataframe

search_keyword    product_type
phn               phone
tote              tote bag
shldr             shoulder bag

I want to search each keyword from product_types dataframe in the product_titles dataframe and return the relevant product type. Some product titles have multiple keywords and thus, have multiple product types, in which case it would be useful to return all product types in a single string separated by a comma.

df_output

product_title       product_type
blue phn small      phone       
silver totebag      tote bag           
cross-shldr bag     shoulder bag
crossshldr tote     shoulder bag, tote bag

I would greatly appreciate any help. Thanks!

CodePudding user response:

I could came with this solution

df1 = pd.DataFrame({"product_title": ["blue phn small","silver totebag", 
                                      "crossshldr bag", "crossshldr tote"]})
df2 = pd.DataFrame({"search_keyword":["phn", "tote", "shldr"],
                    "product_type": ["phone","tote bag", "shoulder bag"]})

df1["product_type"] = df1["product_title"].apply(lambda x: ", ".join([df2.loc[index, "product_type"] 
                                                            for index, val in df2.search_keyword.iteritems() 
                                                            if val in x]))

output


    product_title   product_type
0   blue phn small  phone
1   silver totebag  tote bag
2   crossshldr bag  shoulder bag
3   crossshldr tote tote bag, shoulder bag

CodePudding user response:

Alternative solution:

import numpy as np
import pandas as pd

df_product_titles = pd.DataFrame({'product_title' : ['blue phn small', 'silver totebag', 'crossshldr bag', 'crossshldr tote']})
df_product_types  = pd.DataFrame({'earch_keyword': ['phn', 'tote', 'shldr'], 'product_type': ['phone', 'tote bag', 'shoulder bag']})

product_type = np.empty((df_product_titles.shape[0],),object)
product_type.fill([])
product_type[...] = [[] for _ in range(df_product_titles.shape[0])]
df_product_titles['product_type'] = product_type

for i in df_product_types.index:
    for j in df_product_titles.index:
        if df_product_types.loc[i, 'earch_keyword'] in df_product_titles.loc[j, 'product_title']:
            df_product_titles.loc[j, 'product_type'].append(df_product_types.loc[i, 'product_type'])
            
for j in df_product_titles.index:
    df_product_titles.loc[j,'product_type'] = ', '.join(df_product_titles.loc[j,'product_type'])
  • Related