I have 2 dataframes: DF1
ID | Name | Category---- |
---|---|---|
1 | Apple | Fruit |
2 | Orange | Fruit |
3 | brocolli | Vegetable |
4 | Spinach | Vegetable |
DF2
UserID | Date | UserName | Description |
---|---|---|---|
111 | 01/01/2020 | AAA | Ordered 1 Box Apples |
111 | 01/02/2021 | AAA | Ordered 1KG spinach |
222 | 15/03/2021 | BBB | Ordered 3 boxes of Orange |
Can anyone help how I can match the "Description" from DF2 which contains "Name" string from DF1 and add the respective "Category" column in DF2?
Desired Output:
UserID | Date | UserName | Description | Category |
---|---|---|---|---|
111 | 01/01/2020 | AAA | Ordered 1 Box Apples | Fruit |
111 | 01/02/2021 | AAA | Ordered 1KG spinach | Vegetable |
222 | 15/03/2021 | BBB | Ordered 3 boxes of Orange | Fruit |
CodePudding user response:
You can try str.extract
then map
import re
c = '(' '|'.join(df1.Name.tolist()) ')'
df2['new'] = df2.Description.str.extract(c,flags=re.IGNORECASE)[0].str.upper().\
map(dict(zip(df1.Name.str.upper(),df1.Category)))
0 Fruit
1 Vegetable
2 Fruit
Name: 0, dtype: object
CodePudding user response:
This code uses MERGE to do the same task
import pandas as pd
# Input Data
df1 = pd.DataFrame({'Name':['Apple','Orange','Brocolli','Spinach'], 'Category':['Fruit', 'Fruit','Vegitable','Vegitable']})
df2 = pd.DataFrame({'Date':['01/01/2020','02/02/2021','03/03/2022'], 'Description':['Ordered 1 Box Apple', 'Ordered 1 KG spinach','Ordered 3 Box Orange']})
# Data Processing
pd.merge(df2, df1, left_on = df2['Description'].str.lower().str.split(' ', expand=True)[3], right_on = df1['Name'].str.lower(), how='left' ).drop('key_0', axis=1)
Output: