Home > OS >  merge dataframes and replace existing column
merge dataframes and replace existing column

Time:08-25

df1 =  pd.DataFrame({'A':[3,5,2,5], 'B':['w','x','y','z'], 'C':['0','0','0','0']})
df2 =  pd.DataFrame({'B':['w','x','y','z'],'C':['1','2','3','4'], 'D':[10,20,30,40]})

I'm trying to merge df1 and df2 on B and keep all A B C and D columns:

     A  B  C     D
0  3.0  w  1   10.0
1  5.0  x  2   20.0
2  2.0  y  3   30.0
3  5.0  z  4   40.0

I've tried df1.merge(df2, how='outer', on='B')

   A  B C_x C_y   D
0  3  w   0   1  10
1  5  x   0   2  20
2  2  y   0   3  30
3  5  z   0   4  40

which is almost what I want, but need C in df2 to replace C in df1. How can I achieve that?

CodePudding user response:

If you don't want C from the lefthand side at all you could simply drop it before the merge:

df1 =  pd.DataFrame({'A':[3,5,2,5], 'B':['w','x','y','z'], 'C':['0','0','0','0']})
df2 =  pd.DataFrame({'B':['w','x','y','z'],'C':['1','2','3','4'], 'D':[10,20,30,40]})

result = pd.merge(
  df1.drop('C', axis=1),
  df2,
  how='outer',
  on='B')

    A   B   C   D
0   3   w   1   10
1   5   x   2   20
2   2   y   3   30
3   5   z   4   40

Edit: However, if you're wanting to combine in cases where you don't have C from df2 you could utilize combine_first():

df1 = pd.DataFrame({'A':[3,5,2,5,6], 'B':['w','x','y','z','q'], 'C':['0','0','0','0','88']})
df2 = pd.DataFrame({'B':['w','x','y','z'],'C':['1','2','3','4'], 'D':[10,20,30,40]})

result_2 = pd.merge(df1, df2, how='outer', on='B')
result_2['C'] = result_2['C_y'].combine_first(result_2['C_x'])
result_2.drop(['C_x', 'C_y'], axis=1, inplace=True)

    A   B   D       C
0   3   w   10.0    1
1   5   x   20.0    2
2   2   y   30.0    3
3   5   z   40.0    4
4   6   q           88

CodePudding user response:

here is one way to do it, by choosing the column you need to include in the merge

df1[['A','B']].merge(df2,
         on='B',
         how='outer')
    A   B   C   D
0   3   w   1   10
1   5   x   2   20
2   2   y   3   30
3   5   z   4   40
  • Related