Home > Software engineering >  compare two dataframes row by row to extract a feature from one into the other (efficiency improveme
compare two dataframes row by row to extract a feature from one into the other (efficiency improveme

Time:03-07

I have two dataframes df1, df2. Both of them have the same columns but df2 has a so called id column which I want to extract and put into df1['id_frame'] only if the dataframe rows are equal.

ind = merged.columns.get_loc('id_frame')
tmp = pd.DataFrame()
for i_row in range(len(df1)):
    for j_row in range(len(df2)):
        if df1[['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']].iloc[i_row]\
                   .equals(df2[['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']].iloc[j_row]):
            df2.iloc[j_row, ind] = df1['id'].iloc[i_row]
    tmp = pd.concat([tmp, df2[df2['id'].notna()]])
    df2 = df2[df2['id'].isna()]
df2= tmp

The code above works fine but its not efficient at all. How would you improve it?

df2 has a lot of duplicates removing which would do the trick but I need to keep the indices to make the assignment to the specific object hence I'm not sure how to do it with this approach.

CodePudding user response:

Try pd.merge and use all of ['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims'] as the merging keys.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html

  • Related