I want to merge two DataFrames on the index. But if one of those DataFrames is missing an index value I want it to put null ('Nan') values in the place of the new DataFrame for whatever Dataframe is missing that index.
import pandas as pd
dict1 = {
'Short Name': ['SOO','BS', 'SOC'],
'File': ['r1','r2','r3'],
'acc1': ['321','321','321']
}
dict2 = {
'Short Name': ['S00','SOC'],
'File': ['r1','r2'],
'acc2': ['123','123']
}
df1 = pd.DataFrame(dict1)
df1.set_index('Short Name', inplace=True)
df1
df2 = pd.DataFrame(dict2)
df2.set_index('Short Name', inplace=True)
df2
new_df = pd.merge(df1,df2, on='Short Name')
The output that I'm trying to achieve is something that looks like this:
File_x acc1 File_y acc2
Short Name
SOO r1 321 r1 123
BS r2 321 Nan Nan
SOC r3 321 r2 123
[DataFrame of dict1][1] [DataFrame of dict2][2] [1]: https://i.stack.imgur.com/u5g0y.png [2]: https://i.stack.imgur.com/AwenX.png
CodePudding user response:
Try join
out = df1.join(df2,lsuffix='_x',rsuffix='_y',how='left')
Out[934]:
File_x acc1 File_y acc2
Short Name
SOO r1 321 NaN NaN
BS r2 321 NaN NaN
SOC r3 321 r2 123