Home > Net >  (vectorization) loop through two dataframe cell by cell and find if one is part of the other
(vectorization) loop through two dataframe cell by cell and find if one is part of the other

Time:07-25

I have a dataframe contains color and material parameters and another one contain data. I want to check cell by cell if the data dataframe have any of the data in the parameters dataframe I know that I should use vectorization but I am not sure how

parameter = pd.DataFrame({'color': ['red','blue','green'],
                   'material': ['wood','metal','plastic']})


data = pd.DataFrame({'name': ['my blue color','red chair','green rod'],
                   'description': ['it is a great color','made with wood','made with metal']})

and i want to create a new column contains the parameters. This is the output that i need.

data['attribute2']= ['','wood','metal']
print(data)
           color             material attribute attribute2
0  my blue color  it is a great color      blue           
1      red chair       made with wood       red       wood
2      green rod      made with metal     green      metal
    

CodePudding user response:

The following code filters color and material which is able to extract color(s) and material(s).

data['attribute'] = data['name'].apply(lambda name: ','.join([c for c in parameter['color'].tolist() if c in name]))
data['attribute2'] = data['description'].apply(lambda desc: ','.join([m for m in parameter['material'].tolist() if m in desc]))

Output:

index color material attribute attribute2
0 my blue color it is a great color blue
1 red chair made with wood red wood
2 green rod made with metal green metal
  • Related