df1:
Reporting Country | v2x_regime_rep | Year | v2x_regime_partner | FDI_inward |
---|---|---|---|---|
Albania | 0.0 | 1995 | 0.0 | NaN |
Albania | 0.0 | 1995 | 1.0 | NaN |
Albania | 0.0 | 1996 | 0.0 | NaN |
Albania | 0.0 | 1996 | 1.0 | NaN |
df2:
Reporting Country | v2x_regime_rep | Year | v2x_regime_partner | FDI_outward |
---|---|---|---|---|
Australia | 1.0 | 1995 | 0.0 | 4.694731 |
Australia | 1.0 | 1995 | 1.0 | 203.394006 |
Australia | 1.0 | 1996 | 0.0 | 49.822881 |
Australia | 1.0 | 1996 | 1.0 | 49.822881 |
I would like to merge two datasets, using the keys that are Reporting Country, v2x_regime_rep, Year, v2x_regime_partner. However, in the end I end up with 8 columns whereas I would like to have following shape:
Reporting Country | v2x_regime_rep | Year | v2x_regime_partner | FDI_outward | FDI_inward |
---|---|---|---|---|---|
Australia | 1.0 | 1995 | 0.0 | 4.694731 | 5.0000 |
Australia | 1.0 | 1995 | 1.0 | 203.394006 | 9.822881 |
Australia | 1.0 | 1996 | 0.0 | 49.822881 | 20.822881 |
Australia | 1.0 | 1996 | 1.0 | 49.822881 | 45.822881 |
It is important that even if FDI_inward or FDI_outward is NaN, I would like to keep them because while one of them being NaN other one might have a value.
CodePudding user response:
The sample you gave for df1
is incomplete.
Try one of these solutions :
With pandas.merge
:
new_df = (
df2.merge(df1, on=list(df2.columns[:-1]), how='outer')
.dropna(subset=['FDI_inward', 'FDI_outward'], how='any')
.reset_index(drop=True)
)
With pandas.concat
:
new_df_ = (
pd.concat([df1, df2], axis=0)
.groupby(list(df1.columns[:-1]), as_index=False).first()
.dropna(subset=['FDI_inward', 'FDI_outward'], how='any')
.reset_index(drop=True)
)
# Output :
print(new_df)
Reporting Country v2x_regime_rep Year v2x_regime_partner FDI_outward FDI_inward
0 Australia 1.0 1995 0.0 4.694731 5.000000
1 Australia 1.0 1995 1.0 203.394006 9.822881
2 Australia 1.0 1996 0.0 49.822881 20.822881
3 Australia 1.0 1996 1.0 49.822881 45.822881