Let's be the following two DataFrames in python:
df:
code_1 | other |
---|---|
19001 | white |
19009 | blue |
19008 | red |
df_1:
code_1 | code_2 |
---|---|
19001 | 00001 |
19001 | 00002 |
19009 | 00003 |
19008 | 00001 |
I want to merge df with df_1:
df_merge = pd.merge(df, df_1, how="left", on=['code_1'])
df_merge:
code_1 | other | code_2 |
---|---|---|
19001 | white | 00001 |
19001 | white | 00002 |
19009 | blue | 00003 |
19008 | red | 00004 |
I want the merge to remove duplicates in the case of code_1 and only do the merge for the first row. I could do a drop_duplicates for [other, code_1], but I would like to know if it is possible to include some parameter in the merge function to do it directly.
Expected result:
code_1 | other | code_2 |
---|---|---|
19001 | white | 00001 |
19009 | blue | 00003 |
19008 | red | 00004 |
CodePudding user response:
In my opinion there is no specifc parameter for pandas.merge()
that fit your needs, but you could reduce the result by dropping duplicates before merging, assumed there are only duplicates in df_1
:
df_merge = df.merge(df_1.drop_duplicates('code_1'), how="left", on=['code_1'])