I have the following 3 Panda Dataframe. I want to replace company and division columns the ID from their respective company and division dataframe.
pd_staff:
id name company division
P001 John Sunrise Headquarter
P002 Jane Falcon Digital Research & Development
P003 Joe Ashford Finance
P004 Adam Falcon Digital Sales
P004 Barbara Sunrise Human Resource
pd_company:
id name
1 Sunrise
2 Falcon Digital
3 Ashford
pd_division:
id name
1 Headquarter
2 Research & Development
3 Finance
4 Sales
5 Human Resource
This is the end result that I am trying to produce
id name company division
P001 John 1 1
P002 Jane 2 2
P003 Joe 3 3
P004 Adam 2 4
P004 Barbara 1 5
I have tried to combine Staff and Company using this code
pd_staff.loc[pd_staff['company'].isin(pd_company['name']), 'company'] = pd_company.loc[pd_company['name'].isin(pd_staff['company']), 'id']
which produces
id name company
P001 John 1.0
P002 Jane NaN
P003 Joe NaN
P004 Adam NaN
P004 Barbara NaN
CodePudding user response:
You can do:
pd_staff['company'] = pd_staff['company'].map(pd_company.set_index('name')['id'])
pd_staff['division'] = pd_staff['division'].map(pd_division.set_index('name')['id'])
print(pd_staff):
id name company division
0 P001 John 1 1
1 P002 Jane 2 2
2 P003 Joe 3 3
3 P004 Adam 2 4
4 P004 Barbara 1 5
CodePudding user response:
This will achieve the desired results
df_merge = df.merge(df2, how = 'inner', right_on = 'name', left_on = 'company', suffixes=('', '_y'))
df_merge = df_merge.merge(df3, how = 'inner', left_on = 'division', right_on = 'name', suffixes=('', '_z'))
df_merge = df_merge[['id', 'name', 'id_y', 'id_z']]
df_merge.columns = ['id', 'name', 'company', 'division']
df_merge.sort_values('id')
CodePudding user response:
first, lets modify df company and df division a little bit
df2.rename(columns={'name':'company'},inplace=True)
df3.rename(columns={'name':'division'},inplace=True)
Then
df1=df1.merge(df2,on='company',how='left').merge(df3,on='division',how='left')
df1=df1[['id_x','name','id_y','id']]
df1.rename(columns={'id_x':'id','id_y':'company','id':'division'},inplace=True)
CodePudding user response:
Use apply, you can have a function thar will replace the values. from the second excel you will pass the field to look up to and what's to replace in this. Here I am replacing Sunrise by 1 because it is in the second excel.
import pandas as pd
df = pd.read_excel('teste.xlsx')
df2 = pd.read_excel('ids.xlsx')
def altera(df33, field='Sunrise', new_field='1'): # for showing pourposes I left default values but they are to pass from the second excel
return df33.replace(field, new_field)
df.loc[:, 'company'] = df['company'].apply(altera)