I got two dataframe: df1 has around 20000 rows and df2 has 7226 rows as shown in picture:
df2:
I want to copy the 'Final Coding' value of df2 to 'Final Coding' value of df1 if both dataframes have same Review_Text values. I used following command:
df1.loc[df1['Review_Text'].isin(df2['Review_Text'],'Final Coding'] = df2['Final Coding']
The problem with above statement is that it does not replace all values of df1. It only replaces values until the index number of 7226 of df1 and after that it does not replace any value. I tried another command as follow:
mask = df1['Review_Text'].isin(df2['Review_Text'])==True
df1['Final Coding'] = df1['Final Coding'].where(~mask,df2['Final Coding'],axis=0)
But ended up with same problem. I searched the exisiting threads of stackoverflow and found another way. So I changed the command as follow:
df1.loc[df1['Review_Text'].isin(df2.set_index('Review_Text').index),'Final Coding'] = df2.set_index('Review_Text')['Final Coding']
However, above command threw following error:
ValueError: cannot reindex from a duplicate axis
There are some text in 'Review_Text' column which is duplicated that's why it threw this error. I am not sure how to overcome this issue. Could anyone help me in fixing the issue?
CodePudding user response:
You can do
Update before below :
df1.reset_index(inplace=True, drop=True)
df2.reset_index(inplace=True, drop=True)
#==================================
m = df1['Review_Text'].isin(df2['Review_Text'])
v = df1['Review_Text'].map(df2.set_index('Review_Text')['Final Coding'])
df1.loc[m,'Final Coding'] = v
Or we can do
df1.update(df1['Review_Text'].map(df2.set_index('Review_Text')['Final Coding']).to_frame('Review_Text'))