I have two dataframes df1 and df2.
np.random.seed(0)
df1= pd.DataFrame({'key': ['A', 'B', 'C', 'D'], '2021': np.random.randn(4)})
df2= pd.DataFrame({'key': ['B', 'D', 'E', 'F'], '2022': np.random.randn(4)})
df1
key 2021
0 A 1.764052
1 B 0.400157
2 C 0.978738
3 D 2.240893
df2
key 2022
0 B 1.867558
1 D -0.977278
2 E 0.950088
3 F -0.151357
I want output dataframe to read:
key 2021 2022
0 A 1.764052
1 B 0.400157 1.867558
2 C 0.978738
3 D 2.240893 -0.977278
4 E 0.950088
5 F -0.151357
I want to have unique keys. If key found already just update the key else insert new row. I am not sure if I have to use merge/concat/join. Can anyone give insight on this please?
Note:I have used full outer join, it returns duplicate columns.
Thanks!
CodePudding user response:
I think you need create index from key
and then join in concat
:
df = pd.concat([df1.set_index('key'), df2.set_index('key')], axis=1).reset_index()
print (df)
key 2021 2022
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
4 E NaN 0.950088
5 F NaN -0.151357
CodePudding user response:
You can do it using merge function:
df = df1.merge(df2, on='key', how='outer')
df
key 2021 2022
0 A 1.764052 NaN
1 B 0.400157 1.867558
2 C 0.978738 NaN
3 D 2.240893 -0.977278
4 E NaN 0.950088
5 F NaN -0.151357