I have two dataframes df1
, df2
. Both of them have the same columns but df2
has a so called id
column which I want to extract and put into df1['id_frame']
only if the dataframe rows are equal.
ind = merged.columns.get_loc('id_frame')
tmp = pd.DataFrame()
for i_row in range(len(df1)):
for j_row in range(len(df2)):
if df1[['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']].iloc[i_row]\
.equals(df2[['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']].iloc[j_row]):
df2.iloc[j_row, ind] = df1['id'].iloc[i_row]
tmp = pd.concat([tmp, df2[df2['id'].notna()]])
df2 = df2[df2['id'].isna()]
df2= tmp
The code above works fine but its not efficient at all. How would you improve it?
df2
has a lot of duplicates removing which would do the trick but I need to keep the indices to make the assignment to the specific object hence I'm not sure how to do it with this approach.
CodePudding user response:
Try pd.merge
and use all of ['material', 'type', 'size', 'height', 'size_in', 'size_cm', 'weight', 'dims']
as the merging keys.
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html