I have an issue regarding duplication and pandas. I have two dataframes I must outer-join, for example, df 1 is given
id | type | value1 |
---|---|---|
1 | a | 100 |
1 | b | 200 |
where id==1 contains two types with different values and I want to join this with another df,
id | value2 | value3 |
---|---|---|
1 | 50 | 300 |
I am merging the two using
df_merged = df1.merge(df2,how='outer',on='id')
The result is
id | type | value1 | value2 | value3 |
---|---|---|---|---|
1 | a | 100 | 50 | 300 |
1 | b | 200 | 50 | 300 |
where it is clear that the value2 and value3 duplicates which may create issues if I e.g. wants to sum value2 or value3. Is there any way to merge and create e.g.
id | type | value1 | value2 | value3 |
---|---|---|---|---|
1 | a | 100 | 50 | 300 |
1 | b | 200 | NaN | NaN |
or some type of other approach?
Thanks!
CodePudding user response:
You could merge as you described, and then use:
df_merged.loc[df_merged.duplicated(subset=[dupe_cols]), [dupe_cols]] = np.nan