I have some trouble with my dataframe comparison. What I have are two dataframes, the first has tokenised words.
df_1:
id sentence some more info
1 [I, am, happy] bla
2 [I, am, happier] bla
3 [I, am, the, saddest] bla
and
df_2:
id word more most
1 happy happier happiest
2 sad sadder saddest
What I want to do is compare the two dataframes and if a word in df_1 matches a word anywhere in df_2 that it will be changed to df_2['word'] in the row of the corresponding word. So my output would look something like this:
df_1
id sentence some more info new_sentence
1 [I, am, happy] bla [I, am, happy]
2 [I, am, happier] bla [I, am, happy]
3 [I, am, the, saddest] bla [I, am, the, sad]
I have tried some things using .compare() and writing a function, but nothing has seemed to work so far.
Thanks for your help in advance!
CodePudding user response:
Create dictionary from second DataFrame
by remove id
column, reshape by DataFrame.melt
and DataFrame.set_index
:
d = df.drop('id', axis=1).melt('word').set_index('value')['word'].to_dict()
And then map values in dict.get
with return same values if no match:
df_1['new_sentence'] = df_1['sentence'].apply(lambda x: [d.get(y, y) for y in x])
Or:
d = df.drop('id', axis=1).melt('word').set_index('value')['word'].to_dict()
df_1['new_sentence'] = [[d.get(y, y) for y in x] for x in df_1['sentence']]