Replace column value in one Panda Dataframe with column in another Panda Dataframe with conditions-CodePudding

I have the following 3 Panda Dataframe. I want to replace company and division columns the ID from their respective company and division dataframe.

pd_staff:

id    name      company         division
P001  John      Sunrise         Headquarter
P002  Jane      Falcon Digital  Research & Development
P003  Joe       Ashford         Finance
P004  Adam      Falcon Digital  Sales
P004  Barbara   Sunrise         Human Resource


pd_company:

id  name
1   Sunrise
2   Falcon Digital
3   Ashford


pd_division:

id  name
1   Headquarter
2   Research & Development
3   Finance
4   Sales
5   Human Resource

This is the end result that I am trying to produce

id    name      company   division
P001  John      1         1
P002  Jane      2         2
P003  Joe       3         3
P004  Adam      2         4
P004  Barbara   1         5

I have tried to combine Staff and Company using this code

pd_staff.loc[pd_staff['company'].isin(pd_company['name']), 'company'] = pd_company.loc[pd_company['name'].isin(pd_staff['company']), 'id']

which produces

id    name      company   
P001  John      1.0        
P002  Jane      NaN         
P003  Joe       NaN         
P004  Adam      NaN       
P004  Barbara   NaN

CodePudding user response：

You can do:

pd_staff['company'] = pd_staff['company'].map(pd_company.set_index('name')['id'])
pd_staff['division'] = pd_staff['division'].map(pd_division.set_index('name')['id'])

print(pd_staff):

     id     name  company  division
0  P001     John        1         1
1  P002     Jane        2         2
2  P003      Joe        3         3
3  P004     Adam        2         4
4  P004  Barbara        1         5

CodePudding user response：

This will achieve the desired results

df_merge = df.merge(df2, how = 'inner', right_on = 'name', left_on = 'company', suffixes=('', '_y'))
df_merge = df_merge.merge(df3, how = 'inner', left_on = 'division', right_on = 'name', suffixes=('', '_z'))
df_merge = df_merge[['id', 'name', 'id_y', 'id_z']]
df_merge.columns = ['id', 'name', 'company', 'division']
df_merge.sort_values('id')

CodePudding user response：

first, lets modify df company and df division a little bit

df2.rename(columns={'name':'company'},inplace=True)
df3.rename(columns={'name':'division'},inplace=True)

Then

df1=df1.merge(df2,on='company',how='left').merge(df3,on='division',how='left')
df1=df1[['id_x','name','id_y','id']]
df1.rename(columns={'id_x':'id','id_y':'company','id':'division'},inplace=True)

CodePudding user response：

Use apply, you can have a function thar will replace the values. from the second excel you will pass the field to look up to and what's to replace in this. Here I am replacing Sunrise by 1 because it is in the second excel.

import pandas as pd

df = pd.read_excel('teste.xlsx')
df2 = pd.read_excel('ids.xlsx')

def altera(df33, field='Sunrise', new_field='1'): # for showing pourposes I left default values but they are to pass from the second excel
    return df33.replace(field, new_field)


df.loc[:, 'company'] = df['company'].apply(altera)