I have a dataframe contains color and material parameters and another one contain data. I want to check cell by cell if the data dataframe have any of the data in the parameters dataframe I know that I should use vectorization but I am not sure how
parameter = pd.DataFrame({'color': ['red','blue','green'],
'material': ['wood','metal','plastic']})
data = pd.DataFrame({'name': ['my blue color','red chair','green rod'],
'description': ['it is a great color','made with wood','made with metal']})
and i want to create a new column contains the parameters. This is the output that i need.
data['attribute2']= ['','wood','metal']
print(data)
color material attribute attribute2
0 my blue color it is a great color blue
1 red chair made with wood red wood
2 green rod made with metal green metal
CodePudding user response:
The following code filters color
and material
which is able to extract color(s) and material(s).
data['attribute'] = data['name'].apply(lambda name: ','.join([c for c in parameter['color'].tolist() if c in name]))
data['attribute2'] = data['description'].apply(lambda desc: ','.join([m for m in parameter['material'].tolist() if m in desc]))
Output:
index | color | material | attribute | attribute2 |
---|---|---|---|---|
0 | my blue color | it is a great color | blue | |
1 | red chair | made with wood | red | wood |
2 | green rod | made with metal | green | metal |