Home > Enterprise >  'category' column for another by using pandas(Python)--Interact with two dataframes
'category' column for another by using pandas(Python)--Interact with two dataframes

Time:10-01

Lets say we have two data frames(df1,df2) like this df1

df2

We need to find the values from df1.name that exist in df2(it could be at any column, we need to search the value in df2) Finally, we create a column df1.category, which represent the category from df2. Like showed below df1(After)

Update: Here is my code for define df1 and df2

df1 = pd.DataFrame({'name': ['fish','pork','beef','apple','shrimp','orange','shrimp','apple','pork']})
df2 =pd.DataFrame( {'seafood': ['fish', 'shrimp'],
    'meat' : ['pork','beef'],
    'fruit': ['apple','orange']})

After I run this code,

mel = df2.melt(var_name='Category' ,value_name='name')
df1 = df1.merge(mel, on='name')

The index of name is automatically changed, is there any way to keep the original index? Thanks

CodePudding user response:

You can use melt and merge for this.

If you care about the order of the output, you could add how='left' to the merge function.

import pandas as pd
df1 = pd.DataFrame({'name':['fish','pork','beef','apple','shrimp','orange','shrimp','apple','pork']})
df2 = pd.DataFrame({'seafood':['fish','shrimp'], 'meat':['pork','beef'], 'fruit':['apple','orange']})

output = df1.merge(df2.melt(var_name='Category', value_name='name'), on='name')

print(output)

Output

     name Category
0    fish  seafood
1    pork     meat
2    pork     meat
3    beef     meat
4   apple    fruit
5   apple    fruit
6  shrimp  seafood
7  shrimp  seafood
8  orange    fruit

CodePudding user response:

Another way you can do with apply-lambda.. with if condition... But chris's way is better one...

df1 = pd.DataFrame({"name":["fish","pork","apple"]})
df2 = pd.DataFrame({"seafood":["fish"],
                   "meat":["pork"],
                   "fruit":["apple"]})

df1["category"] = df1["name"].apply(lambda x: "seafood" if x in list(df2["seafood"]) else
                                    ("meat" if x in list(df2["meat"]) else
                                     ("fruit" if x in list(df2["fruit"]) else np.nan)))

Output of df1;

    name    category
0   fish    seafood
1   pork    meat
2   apple   fruit

CodePudding user response:

I would do the good old for:

for i in df1.index:
    for c in df2.columns:
        if df1.at[i,'name'] in df2[c].values:
            df1.at[i,'category']=c
            break
  • Related