how to update a data frame based on the condition in another data frame in pandas-CodePudding

I have two data frame and I want to update one column of df_source based on the condition in both data frames:

df_source = pd.Dataframe({'Sentiment':['neg', 'neg','pos'], 'text': ['hello ', '12where', 'here [null]'], 'pred': ['neu', 'neg', 'pos')})

df2 = pd.Dataframe({'Sentiment':['pos', 'neg','pos', 'neu'], 'text': ['hello ', '12 where I', 'hello g* ', 'here [null]'], 'pred': ['neu', 'neg', 'neu', 'neu')})

I want to update the column of Sentiment in df_source based on this condition: if the text in both data frame were exactly the same and the pred column was the same then replace the sentiment in df_source with the sentiment in df2

So the output would be like this (as only one sample meets both condition "hello "):

Sentiment.    text.        pred
pos          hello         neu
neg          12where       neg
pos          here [null]   pos

What I have done:

df_source['Sentiment'] = df.where(((df['text']== df_source['text']) & (df['pred'] == dfs['pred'])) , df['Sentiment'])

It should work but this raises error (ValueError: Can only compare identically-labeled Series objects).

CodePudding user response：

First merge on the two columns and suffix.

df_source = df_source.merge(df2, how ='left', on =['text', 'pred'], suffixes=('_x', ''))

Replace the NaNs where there was no match using combine_first and then drop the extra merge column

df_source =df_source.assign(Sentiment= df_source['Sentiment'].combine_first(df_source.Sentiment_x) ).drop('Sentiment_x',1)

 

          text pred Sentiment
0       hello   neu       pos
1      12where  neg       neg
2  here [null]  pos       pos