Home > Enterprise >  Pandas merge two data frame only to first occurrence
Pandas merge two data frame only to first occurrence

Time:12-18

I have two dataframe, I am able to merge by pd.merge(df1, df2, on='column_name'). But I only want to merge on first occurrence in df1 Any pointer or solution? It's a many to one, and I only want the first occurrence merged. Thanks in advance!

CodePudding user response:

Since you want to merge two dataframes of different lengths, you'll have to have NaN values in the merged dataframe cells where there are no corresponding indices in df2. So let's try this. Merge left. This will duplicate df2 values for duplicated column_name rows in df1. Have a mask ready to filter those rows and assign NaN for them in the columns from df2.

mask = df1['column_name'].duplicated()
new_df = df1.merge(df2, how='left', on='column_name')
new_df.loc[mask, df2.columns[df2.columns!='column_name']] = np.nan
  • Related