Home > Software design >  Python pandas: Replacing value based on Text matching
Python pandas: Replacing value based on Text matching

Time:12-23

I got two dataframe: df1 has around 20000 rows and df2 has 7226 rows as shown in picture:

df1: enter image description here

df2:

enter image description here

I want to copy the 'Final Coding' value of df2 to 'Final Coding' value of df1 if both dataframes have same Review_Text values. I used following command:

df1.loc[df1['Review_Text'].isin(df2['Review_Text'],'Final Coding'] = df2['Final Coding']

The problem with above statement is that it does not replace all values of df1. It only replaces values until the index number of 7226 of df1 and after that it does not replace any value. I tried another command as follow:

mask = df1['Review_Text'].isin(df2['Review_Text'])==True
df1['Final Coding'] = df1['Final Coding'].where(~mask,df2['Final Coding'],axis=0)

But ended up with same problem. I searched the exisiting threads of stackoverflow and found another way. So I changed the command as follow:

df1.loc[df1['Review_Text'].isin(df2.set_index('Review_Text').index),'Final Coding'] = df2.set_index('Review_Text')['Final Coding']

However, above command threw following error:

ValueError: cannot reindex from a duplicate axis

There are some text in 'Review_Text' column which is duplicated that's why it threw this error. I am not sure how to overcome this issue. Could anyone help me in fixing the issue?

CodePudding user response:

You can do

Update before below :

df1.reset_index(inplace=True, drop=True)
df2.reset_index(inplace=True, drop=True)

#==================================
m = df1['Review_Text'].isin(df2['Review_Text'])
v = df1['Review_Text'].map(df2.set_index('Review_Text')['Final Coding'])
df1.loc[m,'Final Coding'] = v

Or we can do

df1.update(df1['Review_Text'].map(df2.set_index('Review_Text')['Final Coding']).to_frame('Review_Text'))
  • Related