Home > Enterprise >  How to Compare String in a Dataframe column with a sub-string in another Dataframe and extract the v
How to Compare String in a Dataframe column with a sub-string in another Dataframe and extract the v

Time:07-01

I have 2 dataframes: DF1

ID Name Category----
1 Apple Fruit
2 Orange Fruit
3 brocolli Vegetable
4 Spinach Vegetable

DF2

UserID Date UserName Description
111 01/01/2020 AAA Ordered 1 Box Apples
111 01/02/2021 AAA Ordered 1KG spinach
222 15/03/2021 BBB Ordered 3 boxes of Orange

Can anyone help how I can match the "Description" from DF2 which contains "Name" string from DF1 and add the respective "Category" column in DF2?

Desired Output:

UserID Date UserName Description Category
111 01/01/2020 AAA Ordered 1 Box Apples Fruit
111 01/02/2021 AAA Ordered 1KG spinach Vegetable
222 15/03/2021 BBB Ordered 3 boxes of Orange Fruit

CodePudding user response:

You can try str.extract then map

import re
c = '(' '|'.join(df1.Name.tolist()) ')'

df2['new'] = df2.Description.str.extract(c,flags=re.IGNORECASE)[0].str.upper().\
                  map(dict(zip(df1.Name.str.upper(),df1.Category)))

0        Fruit
1    Vegetable
2        Fruit
Name: 0, dtype: object

CodePudding user response:

This code uses MERGE to do the same task

import pandas as pd

# Input Data
df1 = pd.DataFrame({'Name':['Apple','Orange','Brocolli','Spinach'], 'Category':['Fruit', 'Fruit','Vegitable','Vegitable']})
df2 = pd.DataFrame({'Date':['01/01/2020','02/02/2021','03/03/2022'], 'Description':['Ordered 1 Box Apple', 'Ordered 1 KG spinach','Ordered 3 Box Orange']})

# Data Processing
pd.merge(df2, df1, left_on = df2['Description'].str.lower().str.split(' ', expand=True)[3], right_on = df1['Name'].str.lower(), how='left' ).drop('key_0', axis=1)

Output:

enter image description here

  • Related