I have two dataframes. df1
and df2
. This is the content of df1
col1 col2 col3
0 1 12 100
1 2 34 200
2 3 56 300
3 4 78 400
This is the content of df2
col1 col2 col3
0 2 1984 500
1 3 4891 600
I want to have this final data frame:
col1 col2 col3
0 1 12 100
1 2 1984 200
2 3 4891 300
3 4 78 400
Note that col1
is the primary key in df1
and df2
. I tried to do it via mapping values, but I could not make it work.
Here is an MCVE for checking those data frames easily:
import pandas as pd
d = {'col1': ['1', '2','3','4'], 'col2': [12, 34,56,78],'col3':[100,200,300,400]}
df1 = pd.DataFrame(data=d)
d = {'col1': ['2','3'], 'col2': [1984,4891],'col3':[500,600]}
df2 = pd.DataFrame(data=d)
print(df1)
print(df2)
d = {'col1': ['1', '2','3','4'], 'col2': [12, 1984,4891,78],'col3':[100,200,300,400]}
df_final = pd.DataFrame(data=d)
print(df_final)
CodePudding user response:
You can map
and fillna
:
df1['col2'] = (df1['col1']
.map(df2.set_index('col1')['col2'])
.fillna(df1['col2'], downcast='infer')
)
output:
col1 col2 col3
0 1 12 100
1 2 1984 200
2 3 4891 300
3 4 78 400
CodePudding user response:
If col1
is unique, combine_first
is an option, too:
>>> (df2.drop("col3", axis=1)
.set_index("col1")
.combine_first(df1.set_index("col1"))
.reset_index()
)
col1 col2 col3
0 1 12 100
1 2 1984 200
2 3 4891 300
3 4 78 400