Home > Blockchain >  Replacing single cell value between two Dataframes
Replacing single cell value between two Dataframes

Time:10-14

I have two dataframes

df1=

name Code
1234567 H74

df2=

name Code
1234567 J11
1234321 J12

I want to change 'Code' of df2 to the value that is in 'Code' of df1 if 'name' in df2 matches 'name' in df1.

I have tried creating a dictionary of df1;

dict = df1.set_index('name').to_dict()['Code']
df2.replace(dict)

However that results in changing 'name' to 'Code' like:

name Code
H74 J11
1234321 J12

Any help in achieving the following would be greatly appreciated:

name Code
1234567 H74
1234321 J12

Thanks in advance!

CodePudding user response:

Use pd.concat and drop_duplicates:

>>> pd.concat([df1, df2]).drop_duplicates('name')

      name Code
0  1234567  H74
1  1234321  J12

CodePudding user response:

Try these -

Using pd.DataFrame.where

  1. ~df2['name'].isin(df1['name']) checks for the names that exist in df2 but not in df1
  2. The pd.where keeps the current code values as is, if above is true, else for the false, it picks up df1['Code']
  3. This can now be use to replace the df1['Code'] column.

Read documentation.

idx = ~df2['name'].isin(df1['name'])
df2['Code'] = df2['Code'].where(idx,df1['Code'],axis=0)
      name Code
0  1234567  H74
1  1234321  J12

Using pd.DataFrame.join

  1. You can set the name column as index for both df1 and df2, and join (left) on df2.
  2. Next, you can use ffill(axis=1) to fill the df2 code values to the nans (where df1 doesn't exist)
  3. Finally you can fetch just the Code column, which is actually the df1's code column filled with df2 code values where it was nan.
  4. Reset the index to get back what you need.
  5. Remember to save it in a variable!

Read documentation.

df2.set_index('name').join(df1.set_index('name'), lsuffix='_')\
                     .ffill(1)['Code']\
                     .reset_index()
      name Code
0  1234567  H74
1  1234321  J12

Using pd.DataFrame.update

This is simple, but an in-place operation. So be careful with it!

Read documentation.

df2.update(df1)
df2
        name Code
0  1234567.0  H74
1  1234321.0  J12

CodePudding user response:

This can be done by setting the 'name' column as the index and using the update function.

import pandas as pd

df1 = pd.DataFrame({'Name':'Ashutosh', 'Code':'AP', 'Number':'1234567', 'ID':'H74'}, index=[1]).set_index('Name')

df2 = pd.DataFrame({'Name':'Ashutosh', 'Code':'JN', 'Number':'1234567', 'ID':'J11', 'Alternate':'1234321', 'SID':'J12'}, index=[1]).set_index('Name')

So my dataframes would look like this:

>>> df1

          Code  Number  ID
Name            
Ashutosh    AP  1234567 H74

>>> df2


          Code  Number  ID  Alternate SID
Name                    
Ashutosh    JN  1234567 J11 1234321   J12

Using the update function:

>>> df2['Code'].update(df1['Code'])
>>> df2


          Code  Number  ID  Alternate   SID
Name                    
Ashutosh    AP  1234567 J11 1234321     J12
  • Related