I have a dataframe with product titles, which contain keywords, that can identify the product type as such:
df_product_titles dataframe
product_title
blue phn small
silver totebag
crossshldr bag
crossshldr tote
I have another dataframe with two columns, where the 1st column has the keyword and the relevant product type:
df_product_types dataframe
search_keyword product_type
phn phone
tote tote bag
shldr shoulder bag
I want to search each keyword from product_types dataframe in the product_titles dataframe and return the relevant product type. Some product titles have multiple keywords and thus, have multiple product types, in which case it would be useful to return all product types in a single string separated by a comma.
df_output
product_title product_type
blue phn small phone
silver totebag tote bag
cross-shldr bag shoulder bag
crossshldr tote shoulder bag, tote bag
I would greatly appreciate any help. Thanks!
CodePudding user response:
I could came with this solution
df1 = pd.DataFrame({"product_title": ["blue phn small","silver totebag",
"crossshldr bag", "crossshldr tote"]})
df2 = pd.DataFrame({"search_keyword":["phn", "tote", "shldr"],
"product_type": ["phone","tote bag", "shoulder bag"]})
df1["product_type"] = df1["product_title"].apply(lambda x: ", ".join([df2.loc[index, "product_type"]
for index, val in df2.search_keyword.iteritems()
if val in x]))
output
product_title product_type
0 blue phn small phone
1 silver totebag tote bag
2 crossshldr bag shoulder bag
3 crossshldr tote tote bag, shoulder bag
CodePudding user response:
Alternative solution:
import numpy as np
import pandas as pd
df_product_titles = pd.DataFrame({'product_title' : ['blue phn small', 'silver totebag', 'crossshldr bag', 'crossshldr tote']})
df_product_types = pd.DataFrame({'earch_keyword': ['phn', 'tote', 'shldr'], 'product_type': ['phone', 'tote bag', 'shoulder bag']})
product_type = np.empty((df_product_titles.shape[0],),object)
product_type.fill([])
product_type[...] = [[] for _ in range(df_product_titles.shape[0])]
df_product_titles['product_type'] = product_type
for i in df_product_types.index:
for j in df_product_titles.index:
if df_product_types.loc[i, 'earch_keyword'] in df_product_titles.loc[j, 'product_title']:
df_product_titles.loc[j, 'product_type'].append(df_product_types.loc[i, 'product_type'])
for j in df_product_titles.index:
df_product_titles.loc[j,'product_type'] = ', '.join(df_product_titles.loc[j,'product_type'])