compare two dataframes row by row to extract a feature from one into the other (efficiency improveme-CodePudding

I have two dataframes df1, df2. Both of them have the same columns but df2 has a so called id column which I want to extract and put into df1['id_frame'] only if the dataframe rows are equal.

ind = merged.columns.get_loc('id_frame')
tmp = pd.DataFrame()
for i_row in range(len(df1)):
    for j_row in range(len(df2)):
        if df1[['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']].iloc[i_row]\
                   .equals(df2[['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']].iloc[j_row]):
            df2.iloc[j_row, ind] = df1['id'].iloc[i_row]
    tmp = pd.concat([tmp, df2[df2['id'].notna()]])
    df2 = df2[df2['id'].isna()]
df2= tmp

The code above works fine but its not efficient at all. How would you improve it?

df2 has a lot of duplicates removing which would do the trick but I need to keep the indices to make the assignment to the specific object hence I'm not sure how to do it with this approach.

CodePudding user response：

Try pd.merge and use all of ['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims'] as the merging keys.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html