Home > front end >  Why do some keys become seperated columns in the merged dataframe using outer method?
Why do some keys become seperated columns in the merged dataframe using outer method?

Time:10-02

df1:

Reporting Country v2x_regime_rep Year v2x_regime_partner FDI_inward
Albania 0.0 1995 0.0 NaN
Albania 0.0 1995 1.0 NaN
Albania 0.0 1996 0.0 NaN
Albania 0.0 1996 1.0 NaN

df2:

Reporting Country v2x_regime_rep Year v2x_regime_partner FDI_outward
Australia 1.0 1995 0.0 4.694731
Australia 1.0 1995 1.0 203.394006
Australia 1.0 1996 0.0 49.822881
Australia 1.0 1996 1.0 49.822881

I would like to merge two datasets, using the keys that are Reporting Country, v2x_regime_rep, Year, v2x_regime_partner. However, in the end I end up with 8 columns whereas I would like to have following shape:

Reporting Country v2x_regime_rep Year v2x_regime_partner FDI_outward FDI_inward
Australia 1.0 1995 0.0 4.694731 5.0000
Australia 1.0 1995 1.0 203.394006 9.822881
Australia 1.0 1996 0.0 49.822881 20.822881
Australia 1.0 1996 1.0 49.822881 45.822881

It is important that even if FDI_inward or FDI_outward is NaN, I would like to keep them because while one of them being NaN other one might have a value.

CodePudding user response:

The sample you gave for df1 is incomplete.

Try one of these solutions :

With pandas.merge :

new_df = (
            df2.merge(df1, on=list(df2.columns[:-1]), how='outer')
               .dropna(subset=['FDI_inward', 'FDI_outward'], how='any')
               .reset_index(drop=True)
         )

With pandas.concat :

new_df_ = (
            pd.concat([df1, df2], axis=0)
              .groupby(list(df1.columns[:-1]), as_index=False).first()
              .dropna(subset=['FDI_inward', 'FDI_outward'], how='any')
              .reset_index(drop=True)
          )

# Output :

print(new_df)

  Reporting Country  v2x_regime_rep  Year  v2x_regime_partner  FDI_outward  FDI_inward
0         Australia             1.0  1995                 0.0     4.694731    5.000000
1         Australia             1.0  1995                 1.0   203.394006    9.822881
2         Australia             1.0  1996                 0.0    49.822881   20.822881
3         Australia             1.0  1996                 1.0    49.822881   45.822881
  • Related