How to replace one pandas dataframe column values based on some other dataframe?-CodePudding

I have two dataframes. df1 and df2. This is the content of df1

  col1  col2  col3
0    1    12   100
1    2    34   200
2    3    56   300
3    4    78   400

This is the content of df2

  col1  col2  col3
0    2  1984   500
1    3  4891   600

I want to have this final data frame:

  col1  col2  col3
0    1    12   100
1    2  1984   200
2    3  4891   300
3    4    78   400

Note that col1 is the primary key in df1 and df2. I tried to do it via mapping values, but I could not make it work.

Here is an MCVE for checking those data frames easily:

import pandas as pd
d = {'col1': ['1', '2','3','4'], 'col2': [12, 34,56,78],'col3':[100,200,300,400]}
df1 = pd.DataFrame(data=d)
d = {'col1': ['2','3'], 'col2': [1984,4891],'col3':[500,600]}
df2 = pd.DataFrame(data=d)
print(df1)
print(df2)
d = {'col1': ['1', '2','3','4'], 'col2': [12, 1984,4891,78],'col3':[100,200,300,400]}
df_final = pd.DataFrame(data=d)
print(df_final)

CodePudding user response：

You can map and fillna:

df1['col2'] = (df1['col1']
               .map(df2.set_index('col1')['col2'])
               .fillna(df1['col2'], downcast='infer')
              )

output:

  col1  col2  col3
0    1    12   100
1    2  1984   200
2    3  4891   300
3    4    78   400

CodePudding user response：

If col1 is unique, combine_first is an option, too:

>>> (df2.drop("col3", axis=1)
        .set_index("col1")
        .combine_first(df1.set_index("col1"))
        .reset_index()
    )
  col1  col2  col3
0    1    12   100
1    2  1984   200
2    3  4891   300
3    4    78   400