I have written below code to filter the matching product description available in the data frame by comparing the input product description in Python. This filtering process is taking much time to complete. Could you please help me to speed up the above filtering logic bit faster?
df1["PRODSPLIT"] = df1["PRODUCTDESC"].str.split()
df1["INTERSECT"] = df1["PRODSPLIT"].apply(lambda x:list(set(x).intersection("input product description".split())))
df1["PRODSPLITLEN"] = df1["INTERSECT"].str.len()
df1 = df1[df1["PRODSPLITLEN"] > 0]
CodePudding user response:
Is the check always supposed to just check whether any word within a row in df1["PRODUCTDESC"]
is input
, product
or discription
? In this case you could use:
def intersects_inp_prod_or_desc(string_):
return any([spl in ["input", "product", "description"] for spl in string_.split()])
df1["INTERSECT"] = df1["PRODUCTDESC"].apply(intersects_inp_prod_or_desc)
df1 = df1[df1["INTERSECT"]]
Which creates a boolean column containing True if it does and False if it doesnt and then uses that to filter the DataFrame.