Home > OS >  Replace column value in one Panda Dataframe with column in another Panda Dataframe with conditions
Replace column value in one Panda Dataframe with column in another Panda Dataframe with conditions

Time:06-29

I have the following 3 Panda Dataframe. I want to replace company and division columns the ID from their respective company and division dataframe.

pd_staff:

id    name      company         division
P001  John      Sunrise         Headquarter
P002  Jane      Falcon Digital  Research & Development
P003  Joe       Ashford         Finance
P004  Adam      Falcon Digital  Sales
P004  Barbara   Sunrise         Human Resource


pd_company:

id  name
1   Sunrise
2   Falcon Digital
3   Ashford


pd_division:

id  name
1   Headquarter
2   Research & Development
3   Finance
4   Sales
5   Human Resource

This is the end result that I am trying to produce

id    name      company   division
P001  John      1         1
P002  Jane      2         2
P003  Joe       3         3
P004  Adam      2         4
P004  Barbara   1         5

I have tried to combine Staff and Company using this code

pd_staff.loc[pd_staff['company'].isin(pd_company['name']), 'company'] = pd_company.loc[pd_company['name'].isin(pd_staff['company']), 'id']

which produces

id    name      company   
P001  John      1.0        
P002  Jane      NaN         
P003  Joe       NaN         
P004  Adam      NaN       
P004  Barbara   NaN     

CodePudding user response:

You can do:

pd_staff['company'] = pd_staff['company'].map(pd_company.set_index('name')['id'])
pd_staff['division'] = pd_staff['division'].map(pd_division.set_index('name')['id'])

print(pd_staff):

     id     name  company  division
0  P001     John        1         1
1  P002     Jane        2         2
2  P003      Joe        3         3
3  P004     Adam        2         4
4  P004  Barbara        1         5

CodePudding user response:

This will achieve the desired results

df_merge = df.merge(df2, how = 'inner', right_on = 'name', left_on = 'company', suffixes=('', '_y'))
df_merge = df_merge.merge(df3, how = 'inner', left_on = 'division', right_on = 'name', suffixes=('', '_z'))
df_merge = df_merge[['id', 'name', 'id_y', 'id_z']]
df_merge.columns = ['id', 'name', 'company', 'division']
df_merge.sort_values('id')

CodePudding user response:

first, lets modify df company and df division a little bit

df2.rename(columns={'name':'company'},inplace=True)
df3.rename(columns={'name':'division'},inplace=True)

Then

df1=df1.merge(df2,on='company',how='left').merge(df3,on='division',how='left')
df1=df1[['id_x','name','id_y','id']]
df1.rename(columns={'id_x':'id','id_y':'company','id':'division'},inplace=True)

CodePudding user response:

Use apply, you can have a function thar will replace the values. from the second excel you will pass the field to look up to and what's to replace in this. Here I am replacing Sunrise by 1 because it is in the second excel.

import pandas as pd

df = pd.read_excel('teste.xlsx')
df2 = pd.read_excel('ids.xlsx')

def altera(df33, field='Sunrise', new_field='1'): # for showing pourposes I left default values but they are to pass from the second excel
    return df33.replace(field, new_field)


df.loc[:, 'company'] = df['company'].apply(altera)
  • Related