Home > Net >  Perform merge for specific duplicate rows in pandas DataFrame
Perform merge for specific duplicate rows in pandas DataFrame

Time:11-04

Let's be the following two DataFrames in python:

df:

code_1 other
19001 white
19009 blue
19008 red

df_1:

code_1 code_2
19001 00001
19001 00002
19009 00003
19008 00001

I want to merge df with df_1:

    df_merge = pd.merge(df, df_1, how="left", on=['code_1'])

df_merge:

code_1 other code_2
19001 white 00001
19001 white 00002
19009 blue 00003
19008 red 00004

I want the merge to remove duplicates in the case of code_1 and only do the merge for the first row. I could do a drop_duplicates for [other, code_1], but I would like to know if it is possible to include some parameter in the merge function to do it directly.

Expected result:

code_1 other code_2
19001 white 00001
19009 blue 00003
19008 red 00004

CodePudding user response:

In my opinion there is no specifc parameter for pandas.merge() that fit your needs, but you could reduce the result by dropping duplicates before merging, assumed there are only duplicates in df_1:

df_merge = df.merge(df_1.drop_duplicates('code_1'), how="left", on=['code_1'])
  • Related