Home > database >  Duplication issues when outer-merging with pandas
Duplication issues when outer-merging with pandas

Time:11-17

I have an issue regarding duplication and pandas. I have two dataframes I must outer-join, for example, df 1 is given

id type value1
1 a 100
1 b 200

where id==1 contains two types with different values and I want to join this with another df,

id value2 value3
1 50 300

I am merging the two using

df_merged = df1.merge(df2,how='outer',on='id')

The result is

id type value1 value2 value3
1 a 100 50 300
1 b 200 50 300

where it is clear that the value2 and value3 duplicates which may create issues if I e.g. wants to sum value2 or value3. Is there any way to merge and create e.g.

id type value1 value2 value3
1 a 100 50 300
1 b 200 NaN NaN

or some type of other approach?

Thanks!

CodePudding user response:

You could merge as you described, and then use:

df_merged.loc[df_merged.duplicated(subset=[dupe_cols]), [dupe_cols]] = np.nan
  • Related