I have two dataframe, I am able to merge by pd.merge(df1, df2, on='column_name')
. But I only want to merge on first occurrence in df1
Any pointer or solution? It's a many to one, and I only want the first occurrence merged. Thanks in advance!
CodePudding user response:
Since you want to merge two dataframes of different lengths, you'll have to have NaN
values in the merged dataframe cells where there are no corresponding indices in df2
. So let's try this. Merge left. This will duplicate df2
values for duplicated column_name
rows in df1
. Have a mask ready to filter those rows and assign NaN
for them in the columns from df2
.
mask = df1['column_name'].duplicated()
new_df = df1.merge(df2, how='left', on='column_name')
new_df.loc[mask, df2.columns[df2.columns!='column_name']] = np.nan