I have two dataframes
df1=
name | Code |
---|---|
1234567 | H74 |
df2=
name | Code |
---|---|
1234567 | J11 |
1234321 | J12 |
I want to change 'Code' of df2 to the value that is in 'Code' of df1 if 'name' in df2 matches 'name' in df1.
I have tried creating a dictionary of df1;
dict = df1.set_index('name').to_dict()['Code']
df2.replace(dict)
However that results in changing 'name' to 'Code' like:
name | Code |
---|---|
H74 | J11 |
1234321 | J12 |
Any help in achieving the following would be greatly appreciated:
name | Code |
---|---|
1234567 | H74 |
1234321 | J12 |
Thanks in advance!
CodePudding user response:
Use pd.concat
and drop_duplicates
:
>>> pd.concat([df1, df2]).drop_duplicates('name')
name Code
0 1234567 H74
1 1234321 J12
CodePudding user response:
Try these -
Using pd.DataFrame.where
~df2['name'].isin(df1['name'])
checks for the names that exist in df2 but not in df1- The
pd.where
keeps the currentcode
values as is, if above is true, else for the false, it picks updf1['Code']
- This can now be use to replace the df1['Code'] column.
Read documentation.
idx = ~df2['name'].isin(df1['name'])
df2['Code'] = df2['Code'].where(idx,df1['Code'],axis=0)
name Code
0 1234567 H74
1 1234321 J12
Using pd.DataFrame.join
- You can set the name column as index for both df1 and df2, and join (left) on df2.
- Next, you can use
ffill(axis=1)
to fill the df2 code values to the nans (where df1 doesn't exist) - Finally you can fetch just the
Code
column, which is actually the df1's code column filled with df2 code values where it was nan. - Reset the index to get back what you need.
- Remember to save it in a variable!
Read documentation.
df2.set_index('name').join(df1.set_index('name'), lsuffix='_')\
.ffill(1)['Code']\
.reset_index()
name Code
0 1234567 H74
1 1234321 J12
Using pd.DataFrame.update
This is simple, but an in-place operation. So be careful with it!
Read documentation.
df2.update(df1)
df2
name Code
0 1234567.0 H74
1 1234321.0 J12
CodePudding user response:
This can be done by setting the 'name' column as the index and using the update
function.
import pandas as pd
df1 = pd.DataFrame({'Name':'Ashutosh', 'Code':'AP', 'Number':'1234567', 'ID':'H74'}, index=[1]).set_index('Name')
df2 = pd.DataFrame({'Name':'Ashutosh', 'Code':'JN', 'Number':'1234567', 'ID':'J11', 'Alternate':'1234321', 'SID':'J12'}, index=[1]).set_index('Name')
So my dataframes would look like this:
>>> df1
Code Number ID
Name
Ashutosh AP 1234567 H74
>>> df2
Code Number ID Alternate SID
Name
Ashutosh JN 1234567 J11 1234321 J12
Using the update
function:
>>> df2['Code'].update(df1['Code'])
>>> df2
Code Number ID Alternate SID
Name
Ashutosh AP 1234567 J11 1234321 J12